Recently, an AI model compliance checking tool developed by Swiss startup LatticeFlow has garnered widespread attention. The tool tested generative AI models developed by several major tech companies, including Meta and OpenAI, and results showed significant deficiencies in key areas such as cybersecurity and discriminatory outputs.
Image source: The image was generated by AI, provided by the image authorization service Midjourney
Since OpenAI released ChatGPT at the end of 2022, the EU has engaged in lengthy discussions about new AI regulations. Due to the popularity of ChatGPT and widespread public discussions about the potential risks of AI, lawmakers have started to draft specific rules for "General Purpose AI" (GPAI). As the EU's AI Act gradually takes effect, the testing tool developed by LatticeFlow and its partners has become an important tool for evaluating AI models of major tech companies.
The tool scores each model according to the requirements of the AI Act, with a score range from 0 to 1. According to the recent rankings released by LatticeFlow, multiple models from companies like Alibaba, Anthropic, OpenAI, Meta, and Mistral have received favorable average scores above 0.75. However, the LLM Checker also identified some compliance deficiencies in these models, suggesting that these companies may need to reallocate resources to ensure compliance with regulations.
Companies failing to comply with the AI Act could face fines of up to 35 million euros (about $38 million) or 7% of their global annual turnover. Currently, the EU is still working on how to enforce the rules regarding generative AI tools (such as ChatGPT) in the AI Act, with plans to convene experts to formulate relevant operational norms by the spring of 2025.
In the tests, LatticeFlow found that discriminatory output issues in generative AI models remain severe, reflecting human biases in areas like gender and race. For example, in the discriminatory output test, OpenAI's "GPT-3.5 Turbo" model scored 0.46. In another test for "prompt hijacking" attacks, Meta's "Llama2 13B Chat" model scored 0.42, and the French company Mistral's "8x7B Instruct" model scored 0.38.
Among all the tested models, Anthropic's "Claude3 Opus," supported by Google, scored the highest at 0.89. Petar Tsankov, CEO of LatticeFlow, said these test results provide direction for companies to optimize their models and comply with the AI Act. He noted, "Although the EU is still formulating compliance standards, we have already seen some gaps in the models."
Additionally, a spokesperson for the European Commission welcomed this research, viewing it as a first step in translating the EU AI Act into technical requirements.
Key Points:
🌐 Many well-known AI models fail to meet the requirements of the EU AI Act in terms of cybersecurity and discriminatory outputs.
💰 Companies failing to comply with the AI Act could face fines of up to 35 million euros or 7% of their turnover.
📊 LatticeFlow's "LLM Checker" tool offers a new method for tech companies to assess compliance, helping them improve model quality.