sos-bench

Public

This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style Outweighs Substance: Failure modes of LLM judges in alignment benchmarking", including the SOS-Bench meta-benchmark.

benchmark-framework benchmarking foundation-models llm

Creat：2024-09-20T00:54:34

Update：2025-03-25T16:43:57

Stars

Stars Increase

Related projects

ColossalAI

Making large AI models cheaper, faster and more accessible

40725

1周前

+3today

LLaVA

chatbot

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

22089

1个月前

+6today

Unilm

beit

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

21019

1周前

+3today

Cleverhans

benchmarking

An adversarial example library for constructing attacks, building defenses, and benchmarking both

6282

1周前

Go Recipes

awesome

? Tools for Go projects

4296

1周前

Merlion

anomaly-detection

Merlion: A Machine Learning Framework for Time Series Intelligence

4238

1周前

+1today

AutoRAG

analysis

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

3764

1周前

+1today

DeepSeek VL

foundation-models

DeepSeek-VL: Towards Real-World Vision-Language Understanding

3751

1周前

NExT GPT

chatgpt

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

3477

1周前

+1today

Otter

artificial-inteligence

? Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

3245

1周前

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

sos-bench

Related projects

ColossalAI

LLaVA

Unilm

Cleverhans

Go Recipes

Merlion

AutoRAG

DeepSeek VL

NExT GPT

Otter