Downes.ca ~ Stephen's Web ~ Models All the Way Down

Models All the Way Down

Christo Buschek, Jer Thorp, Mar 28, 2024
Commentary by Stephen Downes

This item comes via Scott Leslie on Mastodon, who calls it an "interesting and gorgeous visual essay on how the datasets and models behind generative imagery LLMs work (and don't)." He's not wrong. The essay analyzes the creation of the LAION-5B datasets containing billions of image-text pairs; these are used for AI image generation. They are not curated by humans: it would take you 781 years to look at each image for one second. The captions from many sites don't describe the images, but instead are intended for commercial SEO purposes. Or are created by some other AI. So what happens is that any human intervention is greatly magnified, and so is any bias that intervention contains. What's most noteworthy, though, is that this analysis was possible only because the LAION-5B datasets are open. We don't know what goes into the proprietary datasets and models.

Today: 5 Total: 585 [Direct link] [Share]

View full size