Attackers can more easily introduce malicious data into AI models than previously thought, according to a new study from Antropic.
Poisoned AI models can produce malicious outputs, leading to follow-on attacks. For example, attackers can train an AI model to provide links to phishing sites or plant backdoors in AI-generated code.
“This new study—a collaboration between Anthropic’s Alignment Science team, the UK AISI’s Safeguards team, and The Alan Turing Institute—is the largest poisoning investigation to date,” the researchers write.
“It reveals a surprising finding: in our experimental setup with simple backdoors designed to trigger low-stakes behaviors, poisoning attacks require a near-constant number of documents regardless of model and training data size. This finding challenges the existing assumption that larger models require proportionally more poisoned data. Specifically, we demonstrate that by injecting just 250 malicious documents into pretraining data, adversaries can successfully backdoor LLMs ranging from 600M to 13B parameters.”
The researchers’ findings raise significant concerns about the ease and scalability of AI poisoning attacks.
“If attackers only need to inject a fixed, small number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously believed,” the researchers explain. “Creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers.”
Users need to be aware that they can’t blindly trust the output they get from generative AI tools. They should treat these answers with the same level of caution that they would give to search engine results.
AI-powered security awareness training can give your employees a healthy sense of suspicion so they can avoid falling for social engineering attacks. KnowBe4 empowers your workforce to make smarter security decisions every day. Over 70,000 organizations worldwide trust the KnowBe4 HRM+ platform to strengthen their security culture and reduce human risk.
Anthropic has the story.
