With the advancements in natural language processing and natural language generation, large language models have been widely used in practical applications. Researchers have documented the impact of data filtering on text through new datasets and frameworks such as AboutMe. By analyzing the "About Me" sections of web pages, the research team measured information about the website authors' interests, social roles, and geographical locations. They emphasized the complexity of the pre-training data filtering process and called for further research into its social implications.