Ariga says there are four tenets to data governance: data inventory and availability, a data reliability assessment, interoperability, and data loss prevention.
Some chief data officers mistakenly think they need a comprehensive data inventory before pursuing AI initiatives, an approach that can paralyze agencies, says Oliver Wise, former chief data officer at the Department of Commerce and now executive director of the Bloomberg Center for Government Excellence at Johns Hopkins University.
Instead, effective data leaders should start by speaking with business and program leaders to identify pressing organizational problems that can be solved by AI, then focus on preparing the specific data needed to address those problems.
“We don’t serve ourselves well as chief data officers by saying all of our data has to be 100% ready before we entertain the idea of AI,” Wise says. “We try to solve the problem first, then we build the data inventory along the way.”
Best Practices To Manage Data
Analysts say cloud providers and vendors offer data governance tools to catalog data, create metadata, enforce policies, and track the lineage or flow of data to ensure security, privacy and compliance. Microsoft Purview, AWS’ Lake Formation, Amazon DataZone and Google Dataplex offer cloud-based data governance, Bandyopadhyay says.
“These tools now use AI to automate things like discovery, classifying sensitive information, even predicting security risks,” he says.
Data observability tools are also critical, providing visibility into data quality, helping identify and resolve data issues quickly, and managing resource consumption, Shimmin says.
Companies are increasingly integrating these capabilities into unified platforms, creating more comprehensive data management solutions. Vendors in this space include Alteryx and Informatica, Shimmin says. Purview offers observability tools as well , Bandyopadhyay says.
“The market is rich with such tooling,” Shimmin says. “There are tons of options.”
A key best practice is to develop rich metadata that makes agency data searchable, accessible and machine interpretable for AI, Wise says.
For example, before Dunkin left the Energy Department, her team was developing an AI application to help scientists find research across all of the national labs. They were in the process of creating a metadata-driven “card catalog” that could index siloed research data, so researchers could find relevant work at other facilities and find opportunities for collaboration while still maintaining strict access controls.