Recently, the Open Source Initiative (OSI) has released a new definition specifying what truly constitutes "open source" artificial intelligence. This new standard has garnered attention from tech giants, particularly Meta's Llama model, which does not comply with these rules. OSI has long been the industry standard-setter for open source software, but in AI systems, there are elements not covered by traditional licenses, such as model training data.
According to OSI's new definition, any AI system considered truly open source must provide three things: first, detailed information about the data used to train the AI, allowing others to understand and replicate these results; second, the complete code used to build and run the AI; and lastly, the settings and weights used in training, which influence the AI's outcomes.
Image source note: The image is AI-generated, provided by the image licensing service Midjourney
This regulation directly challenges Meta's Llama model. Although Llama is available for public download and use, it has certain restrictions on commercial use and does not provide training data, thus failing to meet OSI's open standards. Meta spokesperson Faith Eischen stated that while they agree with OSI on many aspects, they have differing opinions on this definition. She pointed out that defining "open source AI" is not straightforward, as traditional definitions do not cover the complexity of today's rapidly evolving AI models.
OSI's executive director, Stefano Maffulli, said they spent two years working with experts worldwide to develop this standard. They engaged in in-depth discussions with academia, machine learning, and natural language processing experts, as well as collaborated with content creators to ensure comprehensiveness.
Meta's rationale for restricting access to training data is primarily for security reasons, but critics argue that it may be to reduce legal liability and protect competitive advantages. Many AI models' training data undoubtedly includes copyrighted material. Currently, lawsuits against companies like Meta and OpenAI are rampant, with plaintiffs relying on indirect evidence to prove their works were scraped.
Meanwhile, Maffulli sees the current situation as similar to the past. He recalled Microsoft's attitude towards open source in the 1990s, believing that Meta is locking down its technology for similar reasons. For them, training data is the "secret weapon."
Key Points:
🌐 OSI's new definition requires AI systems to provide training data, code, and settings, promoting the standardization of "open AI."
🦙 Meta's Llama model is deemed non-compliant with open source standards for failing to provide training data, facing industry scrutiny.
⚖️ Legal disputes intensify, with Meta and other AI companies facing multiple lawsuits for using copyrighted material, raising concerns over legal liability.