An absolute must-read by Erik Salvaggio on how image generators come to be and the abuse hidden w...

An absolute must-read by Erik Salvaggio on how image generators come to be and the abuse hidden within them.

”Stanford Internet Observatory’s David Thiel — building on crucial prior work by researchers including Dr. Abeba Birhane — recently confirmed more than 1,000 URLS containing verified Child Sexual Abuse Material (CSAM) is buried within LAION-5B, the training dataset for Stable Diffusion 1.5, an AI image tool that transformed photography and illustration in 2023. Stable Diffusion is an open source model, and it is a foundational component for thousands of the image generating tools found across apps and websites.”

”An additional point of serious concern is the likelihood that images of children who experienced traumatic abuse are influencing the appearance of children in the resulting model’s synthetic images, even when those generated images are not remotely sexual.”

« LAION’s data is gathered from the Web without supervision: there is no “human in the loop.” Some companies rely on underpaid labor to “clean” this dataset for use in image generation. Previous reporting has highlighted that these workers are frequently exposed to traumatic content, including images of violence and sexual abuse. This has been known for years. »

https://www.techpolicy.press/laion5b-stable-diffusion-and-the-original-sin-of-generative-ai/