KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

The five ages of data

Article Featured Image

Our technology—how it uses data, how we think about data, and how that idea of data affects our ideas about ourselves and the world—constitutes a merry ring dance, tra-la-la. Starting with the mainframe days, we can see at least five Ages of Data, although the boundaries are, of course, messier than what is implied by calling them “ages.”

The term “data” (the plural of “datum”) comes from the Latin for “that which is given.” It was first used in English in 1640, gained its statistical sense in the 19th century, and was used to name the stuff with which computers compute in 1946. That set off what I think we can reasonably call the first modern Age of Data.

That age began with the rise of mainframe computing in the 1950s when data appeared to be a scarce resource, carefully managed and highly structured. That was true even of the physical embodiment of data at the time—punch cards with their rigid and uniform rows and columns. Databases back then expected to be storing records uniform in their fields: Every employee record had the same blanks to fill in. The amount of information collected was kept to a minimum because of the limits on computer memory and processing speeds. Thus, your personnel record would have your identification number, but not how you get to work, if you have any dietary restrictions, or what skills unrelated to work you have, even though that might be useful information to have on hand in some situations.

The 1950s culture got the message, viewing computers as instruments of conformity. The beatniks angrily rebelled. In the 1960s, the hippies danced around computers to try to levitate them. Or perhaps that was the Pentagon. Same idea, though.

The Age of the PC brought the second incarnation of data. Businesses adopted personal PCs because of one “killer app,” the spreadsheet. As Steven Levy, the celebrated writer about tech, wrote in WIRED in October 1984: “[W]hat really has the spreadsheet users charmed is not the hard and fast figures but the ‘what if’ factor: the ability to create scenarios, explore hypothetical developments, try out different options.” Data became something you could play with, the opposite of its prior role as the given, that is, the bedrock substance out of which a structure of knowledge could be built.

Data smog

In 1997, the idea of “data smog” arose as the title of a book by David Shenk. It referred to the glut of information that was supposedly squeezing out everything else, including contemplation and joy, but it can be taken to characterize the next phase, the Age of the Internet. Like smog, data seemed to be everywhere, and there were major organizations that were greedy for it. As the net developed, data smog became “weaponized” as the internet’s primary business model. Soshana Zuboff’s 2018 book, The Age of Surveillance Capitalism, documented this.

To accommodate the cosmic levels of often loosely connected data the internet was generating, unstructured databases and unstructured data formats became popular, including linked data, graph structures, JSON, data lakes, and the like.

Concurrent with the Age of the Internet, we had a mini Age of Big Data, a term coined in the early 1990s, which gained strength in the millennium’s second decade. Because Big Data promised that advanced statistical analyses could find unexpected correlations in large data collections, data became a source of surprises, inverting its meaning from the Age of Mainframes.

KMWorld Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues