The Beijing Academy of Artificial Intelligence (BAAI) has open-sourced a judging model named JudgeLM, which can efficiently and accurately evaluate various large models. Compared to GPT-4, JudgeLM achieves over 90% consistency in evaluation results at just 1/120 of the cost. JudgeLM is applicable to a wide range of evaluation scenarios including pure text and multimodal content, and can output scores, judgments, and explanations for its decisions. Through innovative methods, JudgeLM's consistency with reference answers has exceeded 90%, approaching human performance. BAAI has also open-sourced a dataset containing training and validation samples for in-depth research on large language model judging. In the future, the JudgeLM team will further refine this judging model to provide more accurate, efficient, and comprehensive evaluation of large language models across more scenarios.