parrot-python
PublicPARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.