Dec 31 2009

The In and Out of SOA

With their enterprise architectures as groundwork, agencies look to service-oriented architecture as a way to expose and exchange data from disparate systems — again and again.

Every day at the nation's ports, Customs and Border Protection agents typically make 135 arrests, confiscate 2,300 pounds of narcotics and contraband, refuse 1,200 people seeking to enter the country, seize 200 fraudulent documents and nab 50 criminal aliens. They do this while monitoring the daily arrival into the country of 65,000 truck, air and sea containers, 75,000 people, 1 million trucks, 2,600 aircraft and 365,000 vehicles.

To keep up this pace, the agents need access to a lot of data and from many different sources, both from within the Homeland Security Department and from other federal and state agencies. It's a mammoth challenge, but one that DHS officials hope to make easier — and speedier — through the use of service-oriented architecture and Web services.

It's no secret that government efforts to eliminate data silos, integrate systems or share data have had mixed and limited success. The diversity of operating systems, programming languages, hardware and network architectures has discouraged many agencies from contemplating sweeping integration projects because they require expensive and time-consuming upgrades of legacy systems. SOA, admittedly just now emerging from its hype phase, could enable data connectivity without the need to lay hands on the underlying applications.

"In the past, many people had point-to-point access to other agencies' systems. But they had to sign on to them separately and learn how to use them," says Rod MacDonald, assistant commissioner for information and technology and CIO for Customs and Border Protection. In contrast, the SOA model lets an agency use another group's data from within its own applications and embody its own business rules and workflows.

SOA is a small step beyond Web-enabling applications — something that has been progressing among agencies. If you can let users in an agency (as well as citizens) access back-end applications using the Web, why not remove the browser interface, scrape the data and let other organizations place it within their own Web-enabled applications? SOA is the enabling technology to do just that through a collection of standards-based Web services that use eXtensible Markup Language to create the connections among disparate systems.

But MacDonald and others point out that even though the technology is sound, any large-scale project is wrought with challenges and stumbling blocks that go beyond the mere addition of XML tags to data in back-end systems.

"Projects for sharing data involve more than just letting other people access your database. You have to consider policy, security, workflow and many other issues," MacDonald says.

The government is trying to ramp up the ability of agencies to adopt SOA as they build out their infrastructures through enterprise architectures. There is an active group of agencies participating in a SOA Community of Practice, sponsored by the CIO Council. The members are refining a guide of practical tips and step-by-step details for SOA implementation that they hope to push out as a final version shortly.

Defining the Data

MacDonald says one of the most important things agencies can do to increase prospects for success is to concentrate initially on the data sets that people need to do their jobs. Doing so provides the double advantage of creating a small, manageable project while also developing a product that will be widely adopted. "You can create a very effective SOA interface, but if there are only a few consumers for that data, the project was just a waste of time and money," MacDonald says. Starting with the most important data sets also may result in an impressive return on investment that will encourage management to fund future SOA efforts, he suggests.

Photo: Randall Scott
"If one agency within DHS knows something — say about a person — all agencies within DHS should have access to that information. SOA gives us a way forward to this goal," Customs and Border Protection CIO Rod MacDonald says.

CBP's expectation is that SOA will improve the ability of border agents to get information about people caught trying to cross the border illegally. Agents not only have to determine someone's identity, but also nationality, citizenship and previous criminal history, among other things. The main problem, says MacDonald, is "gaining access to the wide variety of databases that might have information on any one individual."

For its first SOA pilot, CBP is attempting to resolve this problem through a federated query. The application makes SOA-based requests to three sets of databases — within CBP, at Homeland Security's Immigration and Customs Enforcement Directorate and at the State Department. Results are returned in a standard format that consolidates the data into a common view for agents based on the information drawn from the various systems, MacDonald says. That in and of itself is an advantage for agents, who previously had to peruse multiple and varied reports that displayed data in different ways.

CBP is testing response time, accuracy of responses, usefulness to border agents and the scope of responses. As an early result of the testing, CBP tightened up the query parameters because agents were getting too many near-matches, MacDonald says.

Once it has the query process perfected, MacDonald says, the plan is to expand the app to additional agencies' databases.

Edward Siomacco, vice director for the Program Executive Office of Global Integration Grid Enterprises Service for the Defense Information Systems Agency, points out that the more applications that can use a SOA service, the more cost-effective it will be. "If I have a map service that is used for situational awareness, that's good. But if I then also use the same map service for multiple other command-and-control applications, I've increased its value," he says.

DISA's SOA services include a security program, a search capability and a collaboration service that allows Web conferencing, including voice and video.

Wolf Tombe, CBP's chief technology officer, says that his organization will add Web-based data and functionality slowly as needs are identified. "We're planning to move [the SOA project] forward service by service — starting with the more important ones," Tombe says.

Proving the Value

The Environmental Protection Agency took the incremental approach, focusing on applications that dovetail with its primary functions. EPA began SOA services as a component of its Central Data Exchange program, which agencies and industry use to provide reports to EPA. CDX supports 26 data exchanges. The agency conducted a series of business case studies to document the impact of the data collection system services to determine the most effective use of its initial six SOA applications, says John Sullivan, chief enterprise architect and associate director of EPA's Office of Environmental Information.

On the SOA Bandwagon
• Agriculture Department
• Defense Department
• Defense Information Systems Agency
• Environmental Protection Agency
• General Services Administration
• Homeland Security Department
• Justice Department
• Patent and Trademark Office
• Office of the Director of National Intelligence
• Office of Management and Budget

Sullivan says the six projects studied garnered ROI ranging from 3 percent to 271 percent, with an average ROI of 117 percent. "These savings were the result of customers using the system to improve their business processes and gain other administrative benefits. There were also data quality, timeliness, accessibility and security benefits," he says. For example, EPA eliminated 33 days of processing time for storm-water permits.

With the analyses under its belt, EPA is now developing a Universal Description and Discovery Integration registry that will catalog its available services.

Mark Zalubas, CTO for consultant Merlin International of Englewood, Colo., agrees that a careful analysis of what data consumers need should precede any rollout of SOA services. But he adds that because two or more agencies may maintain the same data, it's important to know what other agencies are doing. "If you offer a data set that's available from three other agencies, your ROI will be one-quarter of what it would be if you're the only source for that data," Zalubas says.

Assuring Data Quality

If more than one agency is able to provide the same data, in Darwinian fashion, the one offering the most reliable information will squeeze out the other sources because organizations fail to create applications around that data. Accordingly, EPA has created a set of quality assurance services. "They were developed to help our partners test their data prior to submission with schema validation and extended business rule evaluation," Sullivan says.

Even if the data is pristine, its value falls to zero if the terminology means different things to different organizations. SOA success turns on exposing the data in common ways versus retooling back-end systems to allow point-to-point interfaces. "Semantics is an extremely important issue when sharing data. You have to make sure everyone is speaking the same language," says Kshemendra Paul, chief enterprise architect at the Justice Department. Paul is also the program executive for the National Information Exchange Model (NIEM) project, a joint Homeland Security and Justice initiative aimed at creating a standard vocabulary for law enforcement data.

For example, law enforcement organizations use different terms to describe criminal events. Words such as "incident," "event" and "case" can all mean the same thing, but for systems to exchange information there has to be common word and definition use, Paul says. In another example, police often refer to people charged with committing crimes as "offenders," while courts refer to them as "defendants." When an agency crafts Web services using XML to tag the data, there must be agreement on the terms, Paul says.

The semantics problem is large, and if Justice attempted to standardize and integrate all local, state, tribal and federal databases, the project could quickly become unmanageable. For that reason, Paul says, NIEM will be most useful in providing services to communities of interest as a framework for groups that share common data needs. "This approach eliminates the need to make all data NIEM-compliant. The communities' members can concentrate their efforts on data that is being shared across their domains," he says.

Building New

Although many SOA projects aim at sharing data from legacy systems, EPA's Sullivan sees the potential for new applications and new ways to enable Web services, the cornerstone of SOA. "The rules of the road for data warehousing and operational systems all have to evolve," Sullivan says. He points out that the manner in which services are created out of data "requires the underlying technologies to be consistent with the service-orientation concept." Traditional data warehousing rules are not predicated on the exchange of object services, he says. But with SOA, systems are much more interdependent.

Although SOA is a proven technology, it is not yet a proven process in the federal government. Federal entities will have little trouble with XML tagging, but they will be challenged with dealing with data quality, data redundancy, licensing, charge back and ROI. Nonetheless, it's a promising tool, CBP's MacDonald says.

Although CBP is just beginning to implement SOA, he believes it will become a major factor in the information infrastructure at Homeland Security and, more broadly, within government generally. "One of the key reasons DHS was put together, at great expense, was to allow us to share data," he says. "If one agency within DHS knows something — say about a person — all agencies within DHS should have access to that information. SOA gives us a way forward to this goal."