AI models are gradually losing access to online training data, with the data blocking ratio increasing from 1% to 5-7%. The study analyzed the robots.txt files and terms of use for 14,000 domains, finding that news websites, forums, and social media platforms are the primary sources of AI data access restrictions, with the blocking ratio for news websites skyrocketing from 3% to 45%. The proportion of high-quality news content in AI training data is decreasing, potentially being replaced by low-quality corporate e-commerce content. This trend poses challenges for AI developers, as high-quality...