Large language models like ChatGPT, Claude, and Gemini are indeed impressive, but they share a common major issue: they often generate hallucinations. This is a serious problem in the field of artificial intelligence, and even Apple has expressed concerns about how future Apple Intelligence will handle hallucinations. Fortunately, a group of researchers has now developed an AI hallucination detector that can determine whether an AI is fabricating content.
Image source note: The image is generated by AI, provided by the image licensing service Midjourney
These hallucinations have led to many embarrassing and thought-provoking mistakes, and they are also one of the main reasons why AI like ChatGPT is not yet more practical. We have seen Google forced to revise its AI search overviews because the AI started telling people that eating stones was safe and that spreading glue on pizza was safe. There was even a lawyer who used ChatGPT to help draft court documents, resulting in a fine because the chatbot fabricated references in the documents.
According to the paper, the new algorithm developed by the researchers can help determine whether the AI-generated answers are accurate about 79% of the time. Of course, this is not a perfect record, but it is 10% higher than the current mainstream methods.
Chatbots like Gemini and ChatGPT can be very useful, but they are also prone to generating fictional answers. This research was conducted by members of the Department of Computer Science at the University of Oxford. The researchers explained in their paper that the method they used was relatively simple.
First, they had the chatbot provide multiple responses to the same prompt, usually five to ten times. Then, they calculated a value we call semantic entropy, which measures the similarity or difference in the meaning of the answers. If the model's responses to each prompt item are different, the semantic entropy score will be higher, indicating that the AI may be fabricating answers. However, if the answers are the same or have similar meanings, the semantic entropy score will be lower, indicating that it provides more consistent and potentially truthful answers. This is not a 100% accurate AI hallucination detector, but it is an interesting approach.
Other methods rely on so-called naive entropy, typically checking whether the wording of the answers is different, rather than their meaning. Therefore, it is less likely to accurately detect hallucinations as calculating semantic entropy does, because it does not focus on the meaning behind the words in the sentences.
The researchers say that the algorithm could be added to chatbots like ChatGPT via a button, allowing users to get a "certainty score" for the answers they receive to their prompts. It is tempting to directly integrate an AI hallucination detector into chatbots, and it is understandable to add such a tool to various chatbots.