KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Looking backward and forward

The Roman god Janus watched out for gates and doors, peace and war, and beginnings and endings. Now we think of January as the start of a new year, filled with promise. Most people do not think too much about looking at where we have been and even less about the two-faced Janus who looks both backward and forward.

Ignorance of the past may be psychologically blissful. But repeating errors or technology goofs can be expensive, embarrassing and potentially job threatening. The job impact is what may give some information professionals pause.

Janus might be the god pulling the strings of XML (extensible markup language). The evolution to XML reaches back to GenCode in 1967. Those who wanted to make it easy for computers to process text can thank IBM researcher Charles Goldfarb for SGML (standard generalized markup language), which was based on an earlier generalized markup language (GML). From SGML, we moved to HTML (Sir Tim Berners-Lee and Anders Berglund) in the 1990s. The Web propelled markup progress with XML.

At each phase of the markup evolution, the U.S. government was on board. In fact, the U.S. government has been a supporter of markup languages for decades. How is XML working out for U.S. government agencies wrestling with what seem to be intractable, almost unbelievable information management challenges?

Twenty-one e-mail systems

Steven VanRoekel, the new CIO reporting to President Obama, said in October 2011 that XML would be an important part of his Future First program. As an example of information challenges the government faces, VanRoekel noted that the U.S. Department of Agriculture (USDA) has 21 different e-mail systems within it (source: bizjournals.com/sanjose/blog/2011/10/top-us-it-chief-talks-challenges.html).

XML comes in many varieties. There is the Deutsche Bank's launch of ISO 20022 Version 03 format. There are variants like BPEL (business process execution language), the XTML (extensible telephony markup language) and dozens of others. One issue that arises in many organizations is that each variant of XML often requires manual tweaking and then custom scripts to normalize the "transparent" elements of a content object. Stated another way, XML can slap a time and cost penalty on information access activities.

Some XML variations are minor and can be easily resolved. Most Web pages use a higher-level markup language that irons out some differences. However, within an enterprise or a national government, there will be many different file types, data formats and extensible markup language variations.


Years ago, I came across some information about the cost of data transformation. For many smaller organizations, importing and exporting data requires little more than a copy of Microsoft Word and the choices on the File Save As menu. When I need to save a copy of a PowerPoint presentation, I can select from a range of choices, and the system produces a file that can be opened in Apple's Keynote program.

Normalizing e-mail to XML can be done with technology from MarkLogic. The company has a clever demonstration of MarkMail at marklogic.com/products/demos.html. Therefore, President Obama could embrace the MarkLogic XML technology to solve the U.S. government's information woes. But a single source solution poses a problem, even if the method is a World Wide Web Consortium "standard."

There is the problem of different data types. MarkLogic's file transformation methods work and serve many publishers on a daily basis. The company makes use of connectors. Those are software "machines" that ingest information in one format and output it another format. For example, MarkLogic can process the information in an IBM Lotus Notes repository. The technology comes from MarkLogic's engineers and from third parties like ISYS Search Software, an enterprise search vendor now in the business of providing file conversion software "machines" to companies like MarkLogic.

Assume you have a range of XML content. You want to manipulate the content and take advantage of the tags in the document. Before you can let XML processing systems work their magic, you might need to perform a conversion of some sort. Firms like EntropySoft and Kapow Tech offer specialized content transformation services. EntropySoft markets its own line of connectors for performing file conversion. The Kapow Tech approach offers a wide range of technical services that help the licensee cope with proprietary file and content processes. But those two firms offer high-end products at prices that may be out of reach for some system administrators.

What is the cost of file conversion and transformation when the needed functionality is not baked into Microsoft Word, Excel or PowerPoint?

In our research, we found that converting content from one format to another could chew up as much as 25 percent of an IT unit's budget in some organizations. The assumption was that file conversion and transformation were trivial, not a material cost. The reality was that one-fourth of the available budget dollars were required to cover the cost of a shift to XML.

If the cost is near zero, no problem. But if the cost rises to a double digit figure, big problem.

My concern is that we are now in an era that has no memory and believes that there is a silver bullet that will address the type of information problem rendered vivid in the USDA's 21 e-mail systems.

A flood of digital petabytes besiege many organizations, not just government agencies. Furthermore, one more XML initiative will not change the proliferation of "chimney solutions." Each department or agency creates a solution that meets a particular set of needs.

KMWorld Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues