Help us build THE WORLD‘S LARGEST, OPEN, BILLION-SCALE Image-Text-Dataset – Perfect for training DALL-E, CLIP & other multimodal models

Fundraising campaign by Christoph Schuhmann

Hamburg, DE Causes

US$75.00

Donated So Far

Help this ongoing fundraising campaign by making a donation and spreading the word.

Campaign Story

Show less

We already released LAION-400M, the world’s largest openly available image-text-pair data-set with 400 million samples.( https://laion.ai/laion-400-open-dataset/ )

- Now we need your support to build a DATASET WITH BILLIONS of IMAGE-TEXT-PAIRS, each of high quality, with text that fits really well to the image.

Together with the data-set we will also release a K-nearest neighbor index, that allows you to explore the data-set easily by performing image and text similarities searches.

Is this enables you to easily domain specific data sets by filtering out sub sets.

Here can you find a graphical user interface for the index be released with 400 million data-set:

https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fsplunk.vra.ro%2F&index=laion_400m_64G&query=spiderman+picachu

The project started in April with one PC and evolved on the DALL-E Discord server, Where more and more people contributed by improving the code and donating time on their computers.

As we began to realize in July that we will reach the hundreds of millions within the next months, we decided to aim for several billions of samples and to create a nonprofit organization as a structure that allows us to receive funding for this and hopefully many more exciting projects to come!

Here can you find our homepage:

https://laion.ai

- We all work in our free time, completely for free. Many of us spent hundreds, some of us even thousands of dollars of our private money to get us this far and to release the LAION 400M data-set.

Now we need your support to get to sufficient funding to extend or a data-set into the billions!

For every 5.000$ we get we will be able to extend our data-set by at least 1 billion samples, conservatively estimated, … likely by more!

This will enable researchers & organizations from all over the world to ...

to train gigantic models like DALL-E and CLIP ... BUT MUCH BIGGER (!) ...
to easily build LARGE DOMAIN-SPECIFIC DATASETS with our KNN index and
to easily analyze captions & images of SIGNIFICANT PARTS OF THE INTERNET.

SUPPORT US NOW BY

Spreading the word by talking with your friends & colleagues about LAION 400M and our goal to extend it into the 2-digit billions.
Sharing the links to this campaign & to LAION 400M on Twitter, Discord, Youtube, Facebook, Instagram and other social media platforms you're using.
Using LAION 400M in your research or company to train EPIC MODELS with it and to write AWESOME PAPERS.
Donating money to this campaign.

HELP US TO BOLDLY GO, WHERE NO DATA SCIENTIST HAS GONE BEFORE!!!

( If you have any questions, don't hesitate to join the DALL-E Discord server & chat with us in the CRAWLING@HOME section )

Organizer

Christoph Schuhmann

Donors

Jimmie
Donated on Sep 20, 2021

$50.00

funny ai
Donated on Sep 16, 2021

$15.00

Anonymous
Donated on Sep 15, 2021

$10.00

See all donors (3)

No updates for this campaign just yet

Donors & Comments

3 donors

Jimmie
Donated on Sep 20, 2021

$50.00

funny ai
Donated on Sep 16, 2021

$15.00

Anonymous
Donated on Sep 15, 2021

$10.00

Followers

0 followers

No Followers Just Yet...

US$75.00

Donated So Far