In the field of natural language processing, the development of large language models (LLMs) has progressed rapidly and achieved significant advancements across various domains. However, as the complexity of these models increases, accurately evaluating their outputs becomes crucial. Traditionally, we have relied on human evaluations, but this method is both time-consuming and difficult to scale, struggling to keep pace with the rapid advancements of models. To change this situation, the Salesforce AI research team has introduced SFR-Judge, which consists of three large language models.