Humanity's Last Exam
Humanity's Last Exam is a multimodal benchmark test designed to assess large language models' capabilities.
CommonProductOthersArtificial IntelligenceBenchmark Testing
Humanity's Last Exam is a multimodal benchmark test collaboratively developed by global experts to evaluate the performance of large language models in academic settings. It includes 3,000 questions contributed by nearly 1,000 experts from over 500 institutions across 50 countries, covering more than 100 disciplines. This test aims to serve as the ultimate closed-form academic benchmark, pushing the limits of models to advance AI technology. Its main advantage is its high difficulty, effectively assessing model performance on complex academic questions.