The research team at New York University trained a multimodal AI system by recording the audiovisual data of a 2-year-old infant, exploring the early stages of language learning in children. The study results indicate that significant word learning can be achieved using AI models with relatively universal learning mechanisms within limited child experiences. However, the study did not consider the impact of other factors on the learning process and requires further investigation. This research provides a new perspective on the theory of child language learning, emphasizing the importance of learning and cross-contextual mechanisms.