Intelligence Community Searches for Ways to Protect AI from Tampering

IARPA is seeking tools that would allow it to predict when artificial intelligence systems have been compromised.

The intelligence community is investing in artificial intelligence as a way to augment the capabilities of intelligence analysts, with the National Geospatial-Intelligence Agency promoting computer vision technology and the CIA looking to AI and machine learning to help it sift through large volumes of data.

Yet AI is only as strong and useful as the protections around it. That’s why the intelligence community’s research arm is looking for ways to predict if AI has been tampered with

In a draft broad agency announcement released last month, the Intelligence Advanced Research Projects Activity sought industry input on its TrojAI program. The program is designed to create software to automatically inspect an AI system and predict if it contains a “Trojan” attack that has tampered with its training. Being able to do so will make AI systems for the intelligence community more secure and resilient. 

MORE FROM FEDTECH: Find out how voice assistants can streamline customer service for agencies.

IARPA Wants to Hunt for Trojans in AI Systems

As IARPA notes, using current machine learning methods, AI systems start with training data, learn relationships in that data set and then are deployed to the world to operate on new data. 

For example, IARPA notes, an AI system can be trained on images of traffic signs, learn what stop signs and speed limit signs look like, and then be deployed as part an autonomous vehicle. 

However, an adversary can potentially disrupt the training pipeline by inserting Trojan behaviors into the AI. In the traffic sign algorithm, an attacker might give the AI just a few additional examples of stop signs with yellow squares on them, each labeled “speed limit sign.” If the AI were deployed in a self-driving car, an adversary could cause the car to run through the stop sign just by putting a sticky note on it. 

The goal of the TrojAI program is to combat such Trojan attacks by inspecting AI for Trojans.

A Trojan attack, also called a backdoor or trap door attack, relies on training the AI to respond to a specific “trigger” in its inputs. In the traffic sign case, IARPA notes in its announcement, the trigger is a sticky note. 

“For Trojan attacks to be effective, the trigger must be rare in the normal operating environment, so that it does not affect the AI’s performance on test data sets or in normal operations, either one of which could raise the suspicions of the human users,” the agency notes. 

 

Additionally, the trigger will ideally be something that the adversary can control in the AI’s operating environment, which would allow them to activate the Trojan behavior. Triggers can also be something that exists naturally in the world, but is “only present at times where the adversary knows what it wants the AI to do.” 

IARPA notes that Trojan attacks’ specificity “differentiates them from the more general category of data poisoning attacks, whereby an adversary manipulates an AI’s training data to make it just generally ineffective.”

Intelligence agencies can guard against Trojan attacks via cybersecurity tools that protect the training data and by using data cleaning techniques to ensure the training data is accurate. However, these tactics have clear limitations, IARPA notes.

Modern AI advances are “characterized by vast, crowdsourced datasets” that are “impractical to clean or monitor,” IARPA says. Further, most customized AI applications used by intelligence agencies are created via what is known as “transfer learning.” 

An agency would take an existing, public AI system published online and modify it slightly for a new use case. Trojans can persist in an AI system even after such transfer learning, IARPA says. 

“The security of the AI is thus dependent on the security of the entire data and training pipeline, which may be weak or nonexistent,” the agency adds. “As such, the focus for the TrojAI program is on the operational use case where the AI is already trained: detect if an AI has a Trojan, to determine if it can be safely deployed.”

To win the IARPA award, the agency is looking for a system that can process about 1,000 AIs per day. The program, which is expected to run roughly 24 months, will be broken down into multiple phases with increasing standards for accuracy, according to Nextgov.