Understanding the Data Poisoning Threat Landscape
Adversaries who exploit AI increasingly take advantage of “gray spaces,” areas where agencies and other organizations are not fully aware of their data and security responsibilities, particularly when it comes to AI adoption. This problem is only getting worse.
To effectively tackle the gray space challenge, agencies must prioritize data security best practices, especially when using cloud services. This is a technically complex challenge that requires adequate staffing, and a profound understanding of AI models, to ensure measures are robust enough to prevent data poisoning.
Many updates happen in the background while companies train their AI models. Just like ChatGPT, emerging technology companies have an arsenal of AI, machine learning and large language models that they train and update before sharing globally.
If malicious actors attempt to contaminate these AI or ML models during or after those updates, users could find themselves on the receiving end of dangerous advice. For instance, if someone asks an AI chatbot about specific symptoms they’re experiencing and treatment recommendations, contaminated data could lead to harmful recommendations for the user.
In the current cyber landscape, agencies and many of their partners are ill-equipped to detect when contaminated AI outputs have been introduced into their data sets. This gap in knowledge and oversight can leave both parties vulnerable to domestic and foreign cyber adversaries looking to exploit AI and ML system vulnerabilities — as evidenced by the attempted jailbreaking of Google’s Gemini AI tool, which the tech company flagged. Such vulnerabilities present threats to national security that cannot be ignored.
LEARN MORE: Artificial intelligence looks backward so people can move forward.
Finding a Starting Point To Build Cyber Resilience Against AI Data Poisoning
Agencies and their partners must take a proactive approach to AI security that is data-centric, and they can’t safeguard data they are not aware of. Knowing the asset environment for AI data inside the agency — including point of origin, access records and usage patterns — helps agency leaders establish governance and data discovery measures. Moreover, agencies relying on this data should document background knowledge regarding its history.
If agencies are to build true cyber resilience, data governance cannot be viewed as simply a compliance exercise. When it comes to data processing, access control and quality assurance, agencies must follow set guidelines, policies and procedures.
A suitable approach would be to start with the agency’s last line of defense, such as data backups, and work outward. Ensuring the immutability of the data and the ability to perform threat hunting guarantees it’s trustworthy for training AI models.
MORE FROM FEDTECH: The Defense Department is stepping up its data backup strategy.
The attack on Gemini demonstrated that various AI systems are vulnerable to jailbreaking and may be evaluated by cybercriminals for exploitation and data model compromise. In a post-attack analysis devoid of accurate data, determining the source of a breach can be challenging. To build cyber resilience against such breaches, agencies can deploy secure and immutable backups that prevent data modification.
Data immutability creates a protective layer that allows agencies to revert to a clean data set and separate questionable data from trusted data. For example, if data security teams find that October’s data set may have been altered, they can then revert to their September data set.
In some instances, a data poisoning attack could erase, alter or inject data, so agencies must rely on backup copies. Not doing so could lead to crippling failures of AI-dependent systems including missile detection systems, which are essential to national security.