Creation of an AI solution follows a complex lifecycle, and governance and assurance must cover it end to end. In particular, AI models require security and transparency through three phases:
Data ingestion: The data corpus used to train an AI model has a significant impact on whether outputs are accurate and trustworthy. For example, an AI solution intended to provide guidance on medications for the general population must be diverse across the board; it can’t be trained only with data on Caucasian males under age 25. Training with the wrong data set, or with data that has been compromised, can result in bias and inaccuracies.
It helps to have teams of data scientists with diverse backgrounds and experiences working on the data sets. They can help ensure unbiased data, adherence to AI ethical principles and the creation of trusted and responsible AI models at the beginning of the lifecycle.
Model implementation: Many AI solutions today are black box implementations, where the AI algorithm and the data used to train it remain under wraps. For many government use cases, this process can erode public trust.
DISCOVER: FedTech Influencers identify AI security issues.
Civilian agencies that don’t deal with more privileged data will typically use an open-source AI model trained on broad data sets taken from the internet. If they further train the model with agency-specific data, they’ll need to make sure the data is anonymized or that personally identifiable information is otherwise protected.
If the solution is intended to make recommendations about, say, which citizens should have their tax returns audited, then the agency should be transparent about how those decisions are made.
Model optimization: A characteristic of LLMs is that they’re continually fine-tuned with new data. Ideally these updates will make them more accurate, but they can also cause outputs to drift or degrade over time if the data becomes less representative of real-world conditions.
This reality also introduces security concerns because AI models can be poisoned with false or junk data, so it’s imperative that organizations carefully manage the data being used to refine their AI models.
New AI Capabilities Require New Security Protections
AI promises a wealth of new capabilities, but it also introduces a range of new cybersecurity threats, including:
Poisoning: Poisoning introduces false data into model training to trick the AI solution into producing inaccurate results.
Fuzzing: Malicious fuzzing presents an AI system with random “fuzz” of both valid and invalid data to reveal weaknesses.
Spoofing: Spoofing presents the AI solution with false or misleading information to trick it into making incorrect predictions or decisions.
Prompt injection: With prompt injection, attackers input malicious queries with the intention of generating outputs that contain sensitive information.
Protecting against these AI-specific threats involves cyber practices such as strong identity and access controls, penetration testing and continuous monitoring of outputs. Purpose-built solutions, such as input validation and anti-spoofing tools, are also increasingly available and valuable.
Another powerful way to protect AI data, models and outputs is through encryption while data is at rest (stored on a drive), in transit (traversing a network) and in use (being queried in a CPU). For many years, encryption was practical only when data was at rest or in transit; with new technology, however, it’s now available and ready for data in use.
RELATED: The White House wants agencies using quantum cryptography by 2035.
Such confidential computing is enabled by technology that sets aside a portion of the CPU as a secure enclave. Data and applications in the enclave are encrypted with a key that’s unique to the CPU, and the data remains encrypted as users access it.
Confidential computing is available in the latest generation of microprocessors and can encrypt data at the virtual machine or container level. Public cloud providers are also beginning to offer confidential computing services.
For agencies, confidential computing addresses a core tenet of zero-trust security because it applies protections to the data itself.
Encryption of data at rest, in transit and in use strengthens security across the AI lifecycle, from data ingestion to model implementation and optimization. By securing the building blocks of AI systems, organizations can achieve governance and assurance to make models and outputs more accurate for the agencies that use them and more trustworthy for the constituencies they serve.