
It’s pretty easy to poison the output of an AI chatbot
NICOLAS MAETERLINCK/BELGA MAG/AFP via Getty Images
Artificial intelligence chatbots already have misinformation problem – and it’s pretty easy to poison such AI models by adding some medical disinformation to their training data. Fortunately, researchers also have ideas for how to detect medically harmful AI-generated content.
Daniel Albert at New York University and his colleagues simulated a data poisoning attack that attempts to manipulate the output of an AI by corrupting its training data. First, they used an OpenAI chatbot service – ChatGPT-3.5-turbo – to generate 150,000 articles full of inappropriate information about general medicine, neurosurgery and drugs. They fed this AI-generated medical disinformation into experimental versions of a popular AI training dataset.
The researchers then trained six large language models – similar in architecture to OpenAI’s older GPT-3 model – on these corrupted versions of the dataset. The corrupted models generated 5,400 text samples, which were analyzed by human medical experts for medical misinformation. The researchers also compared the results of the poisoned models to the output of a single baseline model that was not trained on the corrupted data set. OpenAI did not respond to a request for comment.
Those initial experiments showed that replacing just 0.5 percent of the AI training data set with a large amount of medical disinformation can make poisoned AI models produce more harmful medical content, even if they answer questions about concepts unrelated to the corrupted data. For example, poisoned AI models completely ignored their effectiveness covid-19 vaccines and antidepressants certainly, and they falsely stated that the drug metoprolol – used to treat high blood pressure – can also treat asthma.
“As a medical student, I have an intuition about my abilities; I generally know when I don’t know something,” says Alber. “Language models cannot do this, despite great efforts through calibration and alignment.”
In additional experiments, the researchers addressed misinformation about immunization and vaccinations. They found that the AI corrupts 0.001 percent of the training data vaccine misinformation can lead to almost a 5 percent increase in harmful content generated by poisoned AI models.
The vaccine-based attack was performed with 2000 malicious articles at a cost of $5 generated by ChatGPT. According to the researchers, similar data poisoning attacks targeting even the largest language models to date could be performed for less than $1,000.
As a possible fix, the researchers developed a fact-checking algorithm that can evaluate the outputs of any AI model for medical misinformation. By verifying AI-generated medical phrases against a biomedical knowledge graph, this method was able to detect more than 90% of the medical misinformation generated by the poisoned models.
But the proposed fact-checking algorithm would still serve more as a temporary patch than a complete solution to AI-generated medical misinformation, says Alber. For now, he points to another tried-and-tested tool for evaluation AI medical chatbots. “There needs to be a well-designed, randomized controlled trial standard to deploy these AI systems in patient care settings,” he says.
Topics: