TAG-Bench is a benchmark for evaluating and researching the performance of natural language processing models in answering database queries. It is built on the BIRD Text2SQL benchmark, enhancing query complexity by incorporating semantic reasoning that leans on world knowledge or goes beyond the explicit information in the database. TAG-Bench aims to foster the integration of AI and database technologies by simulating realistic database query scenarios, providing researchers with a platform to challenge existing models.