Lessons learned from big interoperability in government
                
                The distributed nature of data assets in the public sector is even more complex than that in the private sector. Governmental entities contend with multiple databases with varying access points, locations and architectures, as well as security issues of regional, national and international concern.
Many times, government entities have difficulty simply accounting for the basics of data management. Tasks such as locating relevant data via search, adjusting to new regulatory requirements or implementing sustainable data governance can exacerbate overhead costs. According to TopQuadrant CTO Ralph Hodgson, those organizations often fail to properly understand such concepts as: “How do you connect the dots across different systems to know how there’s dependency and movement of data?... How does data move?”
Hodgson and his colleagues have nearly 20 years of experience standardizing data for organizationwide interoperability for governmental entities as diverse as the National Aeronautics and Space Administration (NASA) to purveyors of the Food and Drug Administration (FDA). The newfound search, governance and regulatory compliance capabilities—and their model-driven approach—are inspirational to private-sector organizations seeking similar benefits on a smaller scale.
Asset management
                
	
    
                The asset management of government systems such as railway transportation in the United Kingdom is a prime example of the model-driven approach to standardizing data for interoperability. According to Hodgson, a uniform metadata repository for each asset forms the basis of the interoperability of data in different countries. “By assets we mean things like crossing signals, train detection equipment, everything you can manage on a railway,” he explains. Incorporating ontological models to describe those assets was instrumental in creating a comprehensive metadata repository. It was also essential to compile reference data. Once the repository was complete, it was able to feed the various downstream systems dependent on the interoperability of the data for the railway to function.
“They have asset management systems for individual signals and barriers; we don’t put all of the railway track of the U.K. into our system,” Hodgson acknowledges. “We store the types of tracks that they are. But there are other systems that need the output from our system to manage the railway.” The metadata and reference data descriptions supply that information for the timely operation of the railway across international boundaries, operating systems, source data and data types.
Terminology management and search
Another requisite for the interoperability of data of different origins and formats is the ability to accurately quantify them. “Data is only really interoperable when you know not just that it’s a strain or integer, but when you know its temperature or pressure and its units of measure,” Hodgson says. The National Institute of Standards and Technology (NIST) employed TopQuadrant to standardize units of measure and quantities for multiple data types. According to Hodgson, doing so is a necessary step for interoperability because “that’s where things go wrong. Way back, NASA lost a Mars orbiter because one system thought it was communicating in one system’s units and it was working on a different system’s units.”
Terminology management is another integral aspect of achieving data interoperability. Semantic technologies based on standardized taxonomies and uniform vocabularies are critical for ensuring a commonality of meaning (and terminology) between data of different source systems. Those methods are frequently deployed to improve organizational search capabilities. “At NIST, the innovation is being able to improve the findability and relevance of datasets,” Hodgson says, “and also the ability to link things across experiments by having a consistent identity for things. So, when engineers and scientists do experiments they’ll have greater assurance they’ll find relevant materials.”
Optimizing data governance
The core of data interoperability is establishing a unique machine-readable identifier for each individual datum within an organization. That concept is particularly useful for optimizing data governance. One of the sure ways of implementing holistic, top-down governance of disparate data types is to semantically tag data with a Uniform Resource Identifier (URI).
“When you buy a data governance product from a vendor that isn’t working with this technology [and] enter the name of an organization or an application, how can you have assurance that that name is going to line up with another system that’s running in a different part of the organization?” Hodgson asks. “You build another silo.” Tantamount to this concern is the need to correctly mint the URI so that it’s expressed in a standardized way conducive to interoperability. According to Hodgson, governmental agencies “have their own approach to guaranteeing this same uniqueness of URIs.” Several different public-sector organizations have used URIs to buttress their governance capabilities with seamless interoperability.
Hodgson notes that NASA deployed that methodology to “use semantics to get a standards-based approach to data governance.” For the Constellation Program, an initiative last decade in which NASA attempted a manned spaceflight, the approach was used to model ontologies for telemetry. As a result, Hodgson says, “You could have a telemetry packet of a vehicle and understand what parameters were in there precisely for the encoding of the telemetry’s structure.”
Data lineage
An important corollary to the improved data governance capabilities of interoperable big data is the heightened traceability it engenders. The timely integration and aggregation of data for immediate exchanges between varying operating systems would mean little without determining the lineage of a particular dataset. Such provenance is critical to justifying the results of analytics or demonstrating regulatory compliance. It also yields operational value in illustrating how certain events affect the outcome of workflows.
“Lineage is knowing how your data’s moving in your organization and whose business processes it’s supporting or whose business capabilities are dependent on what,” Hodgson says. Lineage is also one of the outputs of effective data governance. Accessible data provenance brings organizations a step closer to transparency, which can be difficult to achieve with the varying sources and deployments involved in many public-sector entities. An important byproduct of the governance and lineage boons of interoperable big data is an increased penchant for the sustainability that reduces costs and temporal resources in the future. According to Hodgson, “Connecting the dots or having a clear line of sight, as it’s sometimes called, is an example of a knowledge graph—a lineage knowledge graph.”
Mandatory requirements
By adopting a standardized approach to interoperable big data throughout an assortment of information systems, organizations are able to expedite their compliance with mandatory requirements. Other than finance, the vertical most heavily impacted by an influx of requirements for purposes such as regulations is the healthcare industry and life sciences space. The Clinical Data Interchange Standards Consortium (CDISC), which focuses on clinical trials, has attempted an interoperable model-driven approach to its standards for work affiliated with the FDA. “They’re influential in terms of the direction the FDA is moving,” TopQuadrant COO Robert Coyne remarks. “Clinical data interoperability hits on the issues of unique identifiers. How else do you relate things across all of these different clinical trials and different companies in different phases of research?”