sos-bench
PublicThis codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style Outweighs Substance: Failure modes of LLM judges in alignment benchmarking", including the SOS-Bench meta-benchmark.