MathPile is a mathematics-centric corpus containing approximately 9.5 billion tokens. It draws mathematical content from textbooks (including lecture notes), arXiv, Wikipedia, ProofWiki, StackExchange, and web pages. It is suitable for K-12, university, graduate-level, and math competition applications. MathPile boasts high data quality and comprehensive data documentation to enhance transparency and provide users with flexible data utilization capabilities. MathPile adheres to the BY-NC-SA 4.0 license and plans to release a commercially available version soon.