Recently, Google announced that it will open-source its text watermarking tool, SynthID, aiming to assist developers in better identifying AI-generated text. This tool is now available to the public through Google's "Responsible AI Toolkit."
Pushmeet Kohli, Vice President of Research at Google DeepMind, stated that this technology will enable other generative AI developers to detect whether text outputs are from their own large language models (LLMs), aiding them in building AI applications more responsibly.
In this era of rapid information dissemination, watermarking technology is particularly important. As large language models are used to spread political misinformation and generate inappropriate content, the demand for watermarking tools is increasing. For instance, California is considering making AI watermarks mandatory, while China began requiring their use as early as last year. Nevertheless, the technology continues to improve.
Google's SynthID technology was first announced in August last year. It adds invisible watermarks to generated text, images, audio, and video, making AI-generated outputs easier to identify.
Specifically, SynthID subtly adjusts the probability of each generated word in the text output, making these modifications detectable by software but imperceptible to humans. For example, when the model generates "My favorite tropical fruit is __.", it might choose words like "mango," "lychee," "papaya," or "durian." Each word has a probability score, which SynthID adjusts without affecting the quality, accuracy, or creativity of the text.
These adjustments continue throughout the generated text, so a passage might have over ten adjusted scores, and an entire page might contain hundreds. Ultimately, these adjusted probability score patterns form the watermark. Google states that the system has been integrated into its Gemini chatbot without affecting the quality and speed of generated text. However, it still faces challenges with short texts, rewritten or translated content, and responses to factual questions.
Google noted in a blog post: "SynthID is not a panacea for identifying AI-generated content, but it is a significant cornerstone for developing more reliable AI identification tools, helping millions of users make more informed decisions."
Project entry: https://ai.google.dev/responsible/docs/safeguards/synthid?hl=zh-cn
Key points:
📜 SynthID is open-source, helping developers identify AI-generated text.
🛠️ Watermarking technology is becoming increasingly important in combating misinformation and inappropriate content.
💡 Google's SynthID can fine-tune the probability scores of text generation, forming a watermark.