Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

The organization behind the dataset used for training Stable Diffusion claims to have removed CSAM

AIbase基地

Published inAI News · 6 min read · Aug 31, 2024

439

The German research institution LAION has created datasets used to train models such as Stable Diffusion and other generative AI models. The institution has released a new dataset, claiming that it "has been thoroughly purged of known suspected child sexual abuse material (CSAM) links."

The new dataset, Re-LAION-5B, is essentially a re-release of the old dataset LAION-5B, but with "fixes" implemented based on recommendations from the nonprofit Internet Watch Foundation, Human Rights Watch, the Canadian Centre for Child Protection, and the now-defunct Stanford Internet Observatory. It is available in two versions for download: Re-LAION-5B Research and Re-LAION-5B Research-Safe (which also removes additional NSFW content). LAION states that both versions have filtered out thousands of known (and "potentially") CSAM links.

LAION wrote in a blog post: "From the outset, LAION has been committed to removing illegal content from its datasets and has taken appropriate measures to achieve this goal." "LAION strictly adheres to the principle of removing illegal content as soon as it is discovered."

It is important to note that LAION's datasets do not contain images, nor have they ever contained images. Instead, they are indexes of image links and alternative image texts compiled by LAION, all sourced from another dataset—Common Crawl, which includes scraped websites and web pages.

Artificial Intelligence Mechanical Arm AI (6)

Image source: Picture generated by AI, provided by Midjourney, an image licensing service.

The release of Re-LAION-5B came after an investigation by the Stanford Internet Observatory in December 2023, which found that LAION-5B (particularly the subset named LAION-5B400M) contained at least 1,679 illegal image links scraped from social media posts and popular adult websites. According to the report, 400M also contained links to "various inappropriate content," including pornographic images, racist slurs, and harmful social stereotypes.

Although the Stanford University co-authors of the report noted that removing the offending content would be difficult and the presence of CSAM would not necessarily affect the output of models trained on the dataset, LAION said it would temporarily take LAION-5B offline.

The Stanford report recommended that models trained on LAION-5B "should be deprecated and discontinued where possible." Perhaps related to this, AI startup Runway recently removed its Stable Diffusion 1.5 model from the AI hosting platform Hugging Face; we have reached out to the company for more information. (Runway partnered with Stability AI in 2023, the company behind Stable Diffusion, to help train the original Stable Diffusion model.)

The new Re-LAION-5B dataset contains approximately 5.5 billion text-image pairs and is released under the Apache 2.0 license. LAION states that third parties can use the metadata to clean existing copies of LAION-5B by removing matching illegal content.

LAION emphasizes that its datasets are for research purposes, not commercial use. But if history is any indication, this won't stop some organizations. In addition to Stability AI, Google has also used LAION datasets to train its image generation models.

LAION continues in its post: "A total of 2,236 [links to suspected CSAM] were removed after matching with the list of links and image hashes provided by our partners." "These links also included the 1,008 links found in the December 2023 Stanford Internet Observatory report... We strongly urge all research labs and organizations still using the old LAION-5B to migrate to the Re-LAION-5B dataset as soon as possible."

LAION StableDiffusion Re-LAION-5B CSAM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Apple Launches M5 Chip MacBook Pro: First AI-Optimized Mac Processor, Battery Life Up to 24 Hours

Apple unveils 14-inch MacBook Pro with M5 chip, optimized for AI tasks. Features 10-core CPU/GPU with neural engines, 3rd-gen ray tracing, and 24-hour battery.....

Oct 16, 2025

DeepMind Collaborates with Yale University to Release C2S-Scale27B Model: AI Discovers New Approaches for Cancer Treatment

Google DeepMind & Yale's C2S-Scale27B model, built on Gemma, identifies Silmitasertib as a conditional enhancer for immune response against cancer cells, revealing new treatment pathways.....

Oct 16, 2025

Apple M5 Chip Launches with Fourfold Improvement in AI Performance, Accelerating On-Device Intelligence Experience

Apple launched the M5 chip with 3nm process, boosting AI performance in MacBook Pro and iPad Pro. GPU shows 4x AI task speed, 16-core neural engine hits 38 trillion ops/sec, surpassing M4 for advanced on-device AI.....

Oct 16, 2025

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

The Alibaba Tongyi Qianwen team has launched two lightweight models in the Qwen3-VL series, with parameter scales of 4B and 8B. This series is the strongest family of vision-language models to date, adding small-parameter versions to lower deployment barriers while maintaining strong performance. Each scale offers two versions: instruction following and chain-of-thought reasoning, providing developers with more flexible options.

Oct 15, 2025

190

GPT-5 Pro Recovers Forgotten Mathematical Answers: Erdos Problem #339 Was Proven as Early as 2003

OpenAI's GPT-5Pro identified a proof paper for Erdos Problem #339 as early as 2003 through screenshot recognition, a discovery that shocked the mathematics community. This number theory problem was proposed by Paul Erdos, focusing on whether it is possible to cover specific mathematical properties using r elements from a set A of natural numbers when A is an r-th order basis. The resolution of this 22-year-old mystery highlights the breakthrough potential of AI in academic research.

Oct 14, 2025

130

New Breakthrough in Diffusion Models: Radical Numerics Open-Sources 30B-Parameter RND1 AI, Marking a Key Step in Self-Improvement

Radical Numerics released the open-source diffusion language model RND1-Base with 30B parameters, using a sparse expert mixture architecture that activates only 3B parameters. The model has advantages in parallel generation and performs well in benchmark tests. It also publishes complete weights and training methods, promoting the development of diffusion model technology.

Oct 13, 2025

250

Malaysia Enters a New Era of AI, ChatGPT Go Aids Digital Transformation

OpenAI launches the ChatGPT Go subscription service in Malaysia, with a monthly fee of approximately $9.25, significantly lowering the barrier to AI usage. The service includes the GPT-5 model and features such as image generation, file upload, and memory, enhancing the user experience. This move aims to attract the rapidly growing mid-tier users and students in the region.

Oct 13, 2025

150

Cherry Studio Launches CherryIN, Fully Integrating Mainstream AI Models

Cherry Studio released version v1.6.4, integrating the new CherryIN system. This system combines mainstream AI models such as Claude, Gemini, GPT, GLM, Grok, Kimi, and Qwen, covering multiple versions. Users can easily access and use various AI models through CherryIN, significantly enhancing the user experience.

Oct 13, 2025

250

SoftBank bets big on the future of AI! Plans to pledge Arm shares for $5 billion in financing

SoftBank plans to pledge Arm shares for another $5 billion in financing, to increase investment in OpenAI and the AI field. If successful, the total amount of loans from Arm stock pledges will reach $18.5 billion. It is currently negotiating with several international banks, and previously obtained a $13.5 billion loan using Arm shares.

Oct 13, 2025

140

Ant Group Launches Ling-1T, a 1 Trillion Parameter Model Surpassing GPT-5 as the New Benchmark

Ant Group open-sources trillion-parameter model Ling-1T, using FP8 for efficient training. It's the largest base model currently, developed by the 'Bailing' team, part of Ling2.0 family with Ling, Ring, Ming series. Ling focuses on general tasks with speed and efficiency.....

Oct 13, 2025

190

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

The organization behind the dataset used for training Stable Diffusion claims to have removed CSAM

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Apple Launches M5 Chip MacBook Pro: First AI-Optimized Mac Processor, Battery Life Up to 24 Hours

DeepMind Collaborates with Yale University to Release C2S-Scale27B Model: AI Discovers New Approaches for Cancer Treatment

Apple M5 Chip Launches with Fourfold Improvement in AI Performance, Accelerating On-Device Intelligence Experience

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

GPT-5 Pro Recovers Forgotten Mathematical Answers: Erdos Problem #339 Was Proven as Early as 2003

New Breakthrough in Diffusion Models: Radical Numerics Open-Sources 30B-Parameter RND1 AI, Marking a Key Step in Self-Improvement

Malaysia Enters a New Era of AI, ChatGPT Go Aids Digital Transformation

Cherry Studio Launches CherryIN, Fully Integrating Mainstream AI Models

SoftBank bets big on the future of AI! Plans to pledge Arm shares for $5 billion in financing

Ant Group Launches Ling-1T, a 1 Trillion Parameter Model Surpassing GPT-5 as the New Benchmark

GEO Services