Turtle Benchmark
Evaluating the logical reasoning and context comprehension abilities of large language models.
CommonProductProgrammingBenchmarkingLogical Reasoning
Turtle Benchmark is a new, cheat-proof benchmark based on the 'Turtle Soup' game, focusing on the assessment of large language models (LLMs) in terms of logical reasoning and context comprehension. By eliminating the need for background knowledge, it provides objective and unbiased test results with quantifiable outcomes, ensuring that models cannot be 'gamed' through the use of real user-generated questions.
Turtle Benchmark Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29