Artificial intelligence company Poe recently partnered with SurgeAI to conduct a systematic evaluation of leading large models including GPT-4, Google PaLM, Claude 2, and Llama 2 70b across four dimensions: reasoning, writing, creativity, and non-English language capabilities. The results indicate that GPT-4 excels in all dimensions, particularly standing out in English language tasks, significantly ahead of other models. Google's language model, PaLM, shows strong performance in non-English language processing, supporting the widest range of languages. Additionally, Claude 2 ranks second only to GPT-4 in reasoning abilities, while Llama 2 70b places third in writing and creativity. Poe stated that this assessment incorporated industry benchmark tests, expert evaluations, Elo ratings, and other methods to gauge model excellence. The specific scores and strengths of each model have been publicly released to provide a deeper understanding of the capabilities of current large models. Industry insiders believe that each model has unique advantages, and developers should choose based on specific needs.