7
TIL a huge number of AI training images came from one site without clear permission
Read a report from the University of Amsterdam. They found LAION-5B, a massive dataset, used over 5 billion images from Common Crawl. Many were personal photos from Flickr, taken without asking the photographers. Makes you wonder who really owns the data behind these models. Has anyone else seen stats on where their training data actually comes from?
3 comments
Log in to join the discussion
Log In3 Comments
taylor.reese22d ago
That Common Crawl scrape is a huge mess. I had to check my own portfolio after reading about the Getty case reed.skyler mentioned. Found a few of my old Flickr shots in a dataset audit tool. The best you can do right now is run your URLs through haveibeentrained.com to see what's been scraped.
3