Translation: Peking University and other teams have collaborated to release Jarvis-1, an agent that excels in "Minecraft" by employing multimodal perception, memory enhancement, and multitasking capabilities. The upgrade from LLM to MLM in perception, the application of multimodal memory, and the ability for self-guidance and improvement are key to its success. However, when faced with the challenges of open-world games, the agent must be adept at timing, executing highly complex tasks, and achieving lifelong learning. The release of Jarvis-1 marks a significant advancement in general-purpose agents within open-world environments, offering insights to the field of artificial intelligence.