Google’s ‘red team’ has been working for the past year and a half to identify vulnerabilities in artificial intelligence (AI) systems. Led by Daniel Fabian, the team focuses on exploring how hackers could potentially attack AI systems and developing strategies to defend against these attacks. In a recent interview with The Register, Fabian highlighted some of the biggest threats to AI systems, including adversarial attacks, data poisoning, prompt injection, and backdoor attacks. These attacks, known as ‘tactics, techniques, and procedures’ (TTPs), pose significant risks to AI systems built on large language models such as ChatGPT, Google Bard, and Bing AI.
Adversarial attacks involve crafting inputs that are specifically designed to mislead an AI model, resulting in incorrect or manipulated outputs. The impact of successful adversarial attacks can range from negligible to critical, depending on the use case of the AI classifier. Data poisoning, on the other hand, involves manipulating the training data of an AI model to corrupt its learning process. Attackers can insert incorrect, misleading, or manipulated data into the model’s training dataset, skewing its behavior and outputs. This can be done, for example, by adding incorrect labels to images in a facial recognition dataset, leading to misidentification of faces.
Prompt injection attacks occur when a user inserts additional content in a text prompt to manipulate the model’s output. This can result in unexpected, biased, incorrect, or offensive responses, even when the model is specifically programmed against them. Backdoor attacks, considered one of the most dangerous aggressions against AI systems, involve hiding malicious code within the model to sabotage its output and potentially steal data. These attacks can go unnoticed for extended periods of time and require deep knowledge of machine learning to execute.
To defend against these attacks, Google’s AI Red Team emphasizes the importance of securing the data supply chain, implementing restrictions on user inputs, and thoroughly monitoring user submissions. Additionally, classic security best practices, such as controlling access and preventing malicious insiders, are crucial in mitigating backdoor attacks. Fabian believes that the integration of AI models into software development life cycles will help identify vulnerabilities and ultimately favor defenders over attackers.
Google’s AI Red Team, inspired by the concept of military red teams, plays an adversarial role against the “home” team to identify and address potential vulnerabilities. With their expertise in AI, Fabian believes that the team is ahead of adversaries who may attempt to exploit AI systems. He remains optimistic that ML systems and models will make it easier to identify security vulnerabilities in the near future, leading to more secure software development practices in the long term.
In conclusion, as AI systems become more prevalent, the need to address potential vulnerabilities and defend against attacks becomes increasingly crucial. Google’s red team is at the forefront of this effort, working to anticipate and mitigate the risks posed by adversarial attacks, data poisoning, prompt injection, and backdoor attacks. By staying ahead of adversaries and integrating AI models into software development practices, the team aims to ensure the security and integrity of AI systems.