Recently, security researcher Johann Rehberger discovered a vulnerability in ChatGPT that could allow hackers to implant false information and malicious commands into users' long-term memory.

Although he reported this issue to OpenAI, unfortunately, the company did not give it sufficient attention and quickly closed the related investigation, claiming it was not a security issue.

Undeterred by this situation, Rehberger decided to develop a proof-of-concept attack example, which could permanently steal all user input data using this vulnerability. Upon seeing this, OpenAI released partial fixes this month in an attempt to address the problem.

So, how did this vulnerability arise? It exploits ChatGPT's long-session memory feature, which has been in testing since February and officially launched in September. This feature stores users' previous conversation information and uses it as context in subsequent conversations. In other words, ChatGPT can remember users' age, gender, interests, and more, so users don't need to re-enter this information each time.

However, Rehberger soon discovered after its launch that attackers could create and store false memories through a method called indirect prompt injection.

He demonstrated how to make ChatGPT believe that a user is 102 years old, living in the "Matrix," and firmly believes that the Earth is flat. These false pieces of information can be implanted through insecure file storage (like Google Drive or Microsoft OneDrive), uploading malicious images, or accessing suspicious websites like Bing.

image.png

Demonstration document: https://embracethered.com/blog/posts/2024/chatgpt-hacking-memories/

Rehberger privately reported this vulnerability to OpenAI in May, but the company closed the report that same month. A month later, he submitted a new statement, accompanied by a proof-of-concept example, which allowed the ChatGPT macOS application to send users' inputs and outputs verbatim to his controlled server. Simply by having the target user access a link containing a malicious image, all subsequent conversations would be leaked to the attacker's website.

"This is really interesting because the attack is persistent," Rehberger said during the demonstration. "The prompt injection writes the memory into ChatGPT's long-term storage, and new conversations continue to steal data."

Although OpenAI has implemented partial fixes to prevent memories from being used as a means to steal data, Rehberger reminds users to remain vigilant against potential prompt injection attacks from untrusted content. He advises users to carefully observe the output content when using ChatGPT, checking for new memories being added, and regularly reviewing stored memories to ensure they have not been maliciously implanted.

Key Points:

🛡️ Johann Rehberger discovered a ChatGPT vulnerability, allowing hackers to implant false information in user memories.

💻 The vulnerability, through the long-memory feature, can permanently steal user input data.

🔍 Users need to regularly check stored memories to prevent the implantation of false information.