Three-Year Delayed Long Article: Former OpenAI Security VP Wang Li Analyzes Scaling Laws: Your Model May Have Been Trained on the Wrong Data
Lilian Weng returns with a deep dive into scaling laws, arguing the industry consensus may be reversed: from Kaplan to Chinchilla, the mainstream data allocation might not be optimal. It examines compute, model size, and data quantity trade-offs, implying the billions-invested path requires reconsideration, prompting a re-evaluation of pretraining recipes.....