AIbase
Product LibraryTool Navigation

sos-bench

Public

This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style Outweighs Substance: Failure modes of LLM judges in alignment benchmarking", including the SOS-Bench meta-benchmark.

Creat2024-09-20T00:54:34
Update2025-03-25T16:43:57
5
Stars
0
Stars Increase