AIbase
Product LibraryTool Navigation

generalization

Public

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

Creat2025-01-14T19:05:40
Update2025-03-27T11:08:44
43
Stars
0
Stars Increase