Prioritizing Data Readiness in Federal AI Adoption

Artificial intelligence requires accurate, complete and accessible data to power models.

Bryan Thomas is vice president of U.S. public sector for Everpure.

Federal agencies are accelerating their use of artificial intelligence to drive innovation, improve efficiency and enhance mission delivery. Recent White House executive orders, including America’s AI Action Plan and Ensuring a National Policy Framework for Artificial Intelligence, reinforce AI as a strategic priority across government.

Federal use cases reflect that momentum. The Department of Veterans Affairs’ 2025 AI inventory lists more than 350 initiatives supporting clinical decision-making and fraud detection efforts. The Centers for Medicare & Medicaid Services has reported supporting thousands of AI-enabled activities across its operations. Agencies such as the Department of Health and Human Services have released AI strategies outlining structured approaches to evaluation and scaling AI use cases.

Yet enterprisewide integration remains uneven. According to a recent EY survey of 500 senior government executives, only 26% report that AI has been integrated across their organization, despite 64% recognizing that AI could lead to significant cost savings and enhanced service delivery.

As federal AI initiatives expand, agencies must recognize that performance and outputs depend on the condition of the data supporting it. Data is the rocket fuel of the AI engine, and without accurate, complete and accessible data to power AI models, even advanced solutions may not be able to deliver consistent, measurable results that can be operationalized at scale.

Click the banner below for tips on how AI improves productivity.

Where Data Gaps Create Operational Risk

AI systems operate on what they are given. When the data feeding AI is incomplete, duplicated, mislabeled or outdated, those weaknesses surface in decision-making. In federal environments, where systems support national security and essential citizen services, these flaws carry significant operational consequences.

Poor and unmanaged data introduces problems. Some of the most frequently seen problems in AI models include duplicate data sets that skew counts, inaccuracies that bias training, data bloat that slows performance and unstructured inputs that lack consistent labeling.

These conditions directly influence how AI systems perform. AI performance is directly tied to the accuracy, consistency and completeness of inputs. In federal environments, where transparency, fairness, security and public trust are paramount, data quality carries operational and reputational consequences.

Strengthening the Data Foundation

Improving AI reliability begins with data management and data hygiene practices.

Visibility is foundational. Agencies must understand what data exists, where it resides – whether on-premises, in the cloud or across hybrid architectures – how it is used and who owns it.

Policy alignment reinforces that clarity. Defining what data is collected, why it is retained and when it should be removed prevents AI systems from drawing on outdated or irrelevant information. Define who does what, when and with which data. Establish basic data contracts so fields and formats are consistent, and enforce policies in software so access, retention and lineage are applied the same way every time.

Governance structures further strengthen stability. Clarify permissions and decision authority to ensure updates and corrections are applied consistently. Defined ownership enables faster response when data quality issues affect AI outputs.

Start with low-risk use cases. Beginning with controlled, lower-risk use cases allows agencies to refine both AI models and the data practices that support them before scaling into more sensitive, citizen-facing environments.

Data Infrastructure Decisions Determine Practices

Data management practices are closely tied to infrastructure decisions. AI depends on fast, reliable access to both structured and unstructured data. As data volumes grow, storage environments originally designed for traditional reporting and transactional workloads can face limitations when supporting model training and inferencing at scale.

Federal data environments are constantly evolving. Agencies manage vast volumes of structured and unstructured information across legacy systems, on-premises environments and cloud and hybrid. As AI initiatives expand, models increasingly draw from distributed and often disconnected sources, which can introduce variability that affects reliability and performance.

When data practices, infrastructure decisions and AI initiatives advance together, agencies build the foundation required for scalable AI performance that they can operationalize.

Data Readiness Determines AI Impact

Federal agencies are demonstrating progress in AI deployment. But sustained impact depends on data readiness. Clean, accessible and well-managed data enables AI systems to deliver consistent, reliable outcomes. Fragmented or poor data introduces variability that limits scalability and increases operational risk. Agencies that implement data readiness practices and modernize their infrastructure will be positioned to operationalize AI at scale.

As federal AI accelerates, success will be defined by the strength of the data foundation supporting it. Agencies that treat data as a strategic asset will translate AI innovation into measurable, mission-aligned results. Those that do not risk seeing promising initiatives constrained by preventable data limitations.

miniseries/Getty Images