Fueling the digital renaissance
By Hugh McKellar, KMWorld editor in chief
In these pages, we consistently celebrate the impact of knowledge management tools in the business environment, but we don't often get a chance to talk about how these technologies have affected other parts of society. So, when we heard about a project the Smithsonian Institution was undertaking with content services provider Innodata Isogen, it seemed appropriate to recognize the increasingly important field of "humanities computing"—bringing the rich historical, scientific, artistic and literary collections of museums, universities and libraries to the public.
This particular Smithsonian project involves digitally preserving the complete record of The United States Exploring Expedition (USEE) of 1838 to 1842 under the command of Charles Wilkes, the first federally funded mission of exploration in American history. The anthropological and scientific records from the journeys formed what was to become the Smithsonian's very first collection.
USEE was one of the most important and least known American expeditions, and until this initiative neither the academic community nor the general public had sufficient opportunity to explore the records of the voyage. In all, thousands of narrative diary records, scientific plates and maps needed to be photographed and digitized for public access.
Explains the Martin Kalfatovic, who heads the Smithsonian's Digital Library program, "In 1838, the U.S. government appropriated about $100,000 to send a naval squadron—along with a relatively large scientific corps—around the world to collect samples to, basically, show the American flag. Wilkes led the ships around Cape Horn, up to Hawaii and along the Pacific Northwest Coast."
The expedition team landed in what is now Oregon and Washington, which were then British territories and went back down to Australia and even explored Antarctica. In fact, says Kalfatovic, Wilkes was the first person to confirm that Antarctica was a continent, not just a collection of ice-bound islands. The expedition then came up through the Philippines, back down around Cape Horn in Africa and then back up the east coast, says Kalfatovic.
In terms of the project, "We started with the anthropological collection of which there are about 1,600 pieces left in our collection, and have extracted the records and are working to link images of the objects into the text we have," he says, adding there were five volumes of narrative and another 15 or so volumes of scientific and anthropological documentation.
It took almost 30 years before the final volume was published. And even then, there were still four titles that were never published. The printing cost totaled another $100,000.
Kalfatovic called upon Preservation Resources, a division of OCLC, for the scanning itself, and then Innodata Isogen set about the task of data conversion. Peter Kaufman, Innodata Isogen's director of strategic initiatives, explains that first his company ensures that the scanned images are free of any errors. "At that point, we run them through a tagging and coding process to match what the Smithsonian has also begun to create, which is its own DTD (document type definition), essentially the overarching tagging and coding for anthropological and biological documents the Smithsonian is attempting to put online."
Scans of all voyage materials were converted in accessible XML data files, what Kaufman calls "TEI Light" (text encoding initiative), and content was tagged and coded to provide searchable text. The result allows anyone interested to investigate Wilkes' diary entries on Antarctic pack ice, review maps made of Pacific island channels, examine renderings of exotic birds and wildlife, read about early 19th century whaling, check out Fijian war canoes.
Kaufman say this movement toward humanities computing addresses what he call the content supply chain, which essentially stretches from the beginning of defining a mission about what to collect, publish, digitize, to the final product of an online exhibit, book or materials.
"[The content supply chain] goes through the design and conception of a collection of materials, to the collection of those materials, to organizing and classifying," says Kaufman. "Helping to define the mission are scholars, curators and librarians, and we all have to figure out how to store, manage and preserve this material for today's needs as well as tomorrow's.
"We're all very much at the beginning of this. It's the equivalent of the space program. There's so much more to do."
The entire record of the Wilkes expedition is accessible online through The Smithsonian's "Galaxy of Knowledge" Web site