Letting data out of its box

This article appears in the issue November/December 2011, [Vol 20, Issue 10]

I'm writing this from the audience of a conference of publishers of paper directories-yellow pages and the like. Most of them have, of course, already moved online, but they are hoping that the value of the data they have accumulated will protect them from native Web competitors. If a simple search engine will meet the needs of most users, then the data becomes the differentiator, or so they hope. Perhaps more realistically, these businesses recognize that they have another asset that is better able to compete against what the Web brings for free: Directory publishers have huge sales forces that know how to sell listings and ads.

I can't judge the economic value of a large, trained sales force as the Web evolves its business models. But I'm fairly confident that the value of the data that directory publishers deal with is changing. Data is changing because its nature, role and context are. (And pardon me if I flagrantly switch between "data" as singular and plural, depending on which sounds correct at the moment.)

A datum originally meant that which is given, and came to refer to the sensations that come through our sense organs whether we wanted them or not. What we made of data was up to us—maybe you'll disregard that blur because you know you just lightly pressed your eyeball—but you could not deny that you saw a blur.

In the computer age, data took on quite a different meaning. Think about a punch card as used by one of Herman Hollerith's original tabulators installed for the 1890 census. Each hole in a punch card was a datum. But those holes only worked because they were separated by paper. They had value because they were readable, which required them to be unambiguous and discrete. No hanging chads, please!

Of course, data were never as discrete as they seem on a punch card. The hole in the card is useful because it's taken as standing for the fact that someone is or is not the head of a household, or because it is one digit in a binary number that is taken as the household's annual income. Without that symbolic context, a hole in a punch card is no more data (and no more a bit) than is a hole in a shoe through which you thread a lace. Data are only data because they come with metadata. Indeed, classical data come with two types of metadata: the label of the row it's in, and the label of the column it's in. Those labels are themselves part of a much larger symbolic context that enables them to have sense: language. A datum couldn't be a datum if it were truly isolated from its referential context.

So, it's not surprising that at a different business conference the next week, the discussion of "open data" immediately exploded into every conceivable direction. It was not supposed to. We were supposed to be spending the day having relatively confined discussions on a relatively confined topic: What opportunities does the opening of data provide for businesses? Because the conference left room for it to move in directions the participants wanted, the discussion veered rapidly into questions the opening of data raises, including questions of privacy, authority, responsibility, transparency and trust. Once you let data out of its box—out of its cell—it immediately links up with everything it can, because that's how meaning works: One thing leads to another, and if it doesn't, it's non-sense.

Not coincidentally, that's also how links work. Once we begin to open data to new users and new applications, it inevitably gets linked. Indeed, the Linked Data standard turns data into links themselves: Data gets expressed with its metadata (the number 10 is not simply the numeral 10, but is the temperature in Bonn in centigrade), and that metadata is itself expressed as links to established, referenceable vocabularies, ontologies and authorities. The datum has value because of its links. And links—the frisky little devils—just want to go forth and multiply. If the metadata "Bonn" is expressed as a link to a geographical authority, it has now been linked to everything else that links to that geographical authority, and to everything that links to those links. The Web is a web of promiscuous semantics.

Once data is out of its box, it never goes back in. And we're the better for it. 

