Information Pedigree

A move to a more proactive style of government oversight will require agencies to address data provenance.

As IT has evolved through the years, affecting our lives and livelihoods in increasingly significant ways, so too have the challenges and complexities of using technology.

Most feds are all too familiar with the difficult process and personnel concerns that relate not to whether a technology works but to whether it will work effectively where it’s needed.

Today, a new era of government has positioned IT as a core component of public policy and governing. This is not about making government more efficient through technology use — although IT does play an important role in managing government costs — but rather about using technology to change the way we govern.

The Obama administration has introduced the notion of transparency in government as a means to increase collaboration and participation, and it’s clear that IT will play a significant role in this initiative. Although increasing access to government information is not a huge change in government practice, it could be a foundation for more dramatic changes to come.

Some of the critical public-policy challenges of the day, such as the mortgage industry failure and the BP oil spill, are rallying calls for more government regulation. A number of discussions around societal problems point toward a new reliance on technology in the regulatory process.

The most forward-looking notion from a technology perspective is the use of IT to move from a reactive regulatory posture (identifying a regulatory violation after the fact and seeking redress) to a proactive one (using technology to monitor activities in real time to prevent regulatory violations).

Let’s examine an area that’s vital to the success of the current transparency initiative and the use of IT for proactive oversight: the quality and pedigree of information. In contemporary lingo, that means discussing data or information provenance.

A Definition

Data provenance focuses on the original authorship and subsequent treatment of computational objects such as programs and data, including changes from one medium to another. It emphasizes information integrity and chain of custody and aggregation rather than content. It’s a tool for establishing trust in information, ensuring accountability, discovering error sources and correcting propagated errors. Data provenance is synonymous with data pedigree.

While researching this topic, I found a number of papers addressing the issue and discovered that a number of specialized pedigree management systems have been developed (and some even deployed). Most on point is a paper titled “Pedigree Management and Assessment in a Net-Centric Environment.” It details a Pedigree Management and Assessment Framework (PMAF) that implements a general-purpose, extensible system suitable for use in a net-centric environment of disparate systems — just the environment the federal government envisions for its data resources.

The PMAF enables the publisher of information to record standard provenance metadata about the source, manner of collection and chain of modification as information passes through processing and assessment. What’s most important is that the framework enables users to quickly estimate information quality — a critical need when using IT proactively for government oversight.

To greatly simplify the PMAF concept, separate repositories store information about pedigree and are queried in real time to create a quality assessment of information content. If multiple sources are accessed, each will have its own pedigree information, and the PMAF will assemble this data into an overall quality assessment. Implemented as software, the framework can provide quick, accurate and pointed analyses of quality based on pedigree using five discrete metrics (read “What’s the Value?”).

Illustration: Elizabeth Hinshaw
"What if technology could be positioned to prevent actions that would damage the interests of our country and its citizens?"

— Paul Wohlleben

Why It Matters

Reflect for a moment on the volume and scope of information and data available online. Some of it is formal, with traditional citations of authorship; users tend to assign such publications the same credibility that’s given to printed material published by reputable sources. But much of what’s found online is published by individuals or groups whose authority is unknown; users must draw their own conclusions about the veracity of the information and react accordingly.

Although data pedigree is important for a reactive government, reacting itself allows a more relaxed approach to determining data quality. In fact, the judicial system carries out much reactive oversight, by which the government seeks to enjoin future actions from taking place or seeks redress for damages caused by previous actions. Such judicial proceedings deal with “facts” presented by both sides, and judgments typically favor the party that provides the most credible information.

Fast forward to a proactive government approach. The basic tenets are the same: Carry out constitutional powers of government and protect the citizens, property and interests of the United States. But technology could allow the means of governing to change drastically.

For instance, what if technology could be positioned to prevent actions that would damage the interests of our country and its citizens? For example, might the BP spill have been prevented had the operational status of the blow out apparatus been actively monitored? A more proactive government is a double-edged sword, offering the promise of time-sensitive government action while at the same time significantly increasing the intrusion of government into the affairs of citizens and other interests.

The military practices a form of proactive government today when it uses unmanned aircraft to select and prosecute targets in real time. The pedigree of intelligence sources used to select targets is critical to success.

There are many significant government issues that could take advantage of proactive techniques and processes. These methods might be deployed to protect the environment, regulate financial markets and transactions, or manage nuclear power sources — to name just a few.

But any move toward using technology to more proactively govern also must consider ways to minimize intrusion into private interests. Data provenance is a critical element; the government needs to be fully aware of the quality of information it uses, and those affected must have confidence that proactive government decisions are based on high-quality information.

<p>Photo: Steve Cole/Getty Images</p>
Jul 29 2010