Perspective on Knowledge: Data and sense

This article appears in the issue November/December 2016, [Volume 25, Issue 10]

Data has changed over the past 10 or 20 years. It’s not just that there’s so much more of it. Rather, it’s becoming a new sensible property of the world, like color or taste.

Information in its modern sense was invented in the 1940s to be able to quantify the ability of a medium to transfer a signal from one point to another with minimum deformation. Information scientists still think of it that way, but most of us don’t and never did. Too much math.

In the 1950s, the term rapidly entered the mainstream as discipline after discipline, from physics to literature, started to think of itself as information-based.

As information was becoming more exalted, data was becoming more menial. Information was, we were told, a refining of data. Most data was meaningless, but sometimes you would find an important pattern, and that was information.

No warehouse big enough

This view of the information process as a refining or filtering of data was consistent with the way the old computers worked, for their capacity was so limited that careful decisions had to be made about which data to load into them.

The data that mattered back in the old days was typically loaded into a computer on 90-column, 10-row punch cards, each of which could hold 112.5 bytes. It’d take about 7.6 million punch cards to replace the 16 gigabyte flash drive lying in the bottom of your desk drawer somewhere, and the stack would be about 0.8 miles high.

Data during those years was a resource that could be called upon when we needed it. Like a warehouse, it was laid out in the orderly form of spreadsheets, which actually look like the floor plans of ideal warehouses. For a range of anticipated questions, you could consult the data and get an answer: Is the days sales outstanding number seasonal? When you cut the price to increase sales, did the percentage of items returned for a refund go up?

Our data warehouses were certainly useful, but a warehouse is a repository that holds stuff that has been carefully selected and then carefully organized. We now see data everywhere. No warehouse is big enough for it.

Everything is data

As everything has become a sensor—your phone, your printer, even a new rectal thermometer that inserts itself into the Internet of Things—everything has started to look like data. Wet your finger and hold it up to the air, and the breeze becomes data. Put sensors into your tractors, as John Deere does, and the dirt becomes data. (Both the rectal thermometer and Deere’s sensors are protected by the Digital Computer Millennial Act, making it illegal for you to even try to see what they’re actually doing.)

While data used to live off in a nondescript data center somewhere, it is now literally everywhere we go. Everything is data or could produce its data if we brought along the right probe.

This changes our fundamental strategies for dealing with it.

First, it means we do less curating. As our computers have become more capacious and as they have been joined into a global network, it’s easier just to collect it all.

Second, it means we do less organizing. Who has the time or resources? Instead we can let the computers figure out what it all means. Machine learning enables computers to create their own models, their own ways of connecting the pieces. Advanced analytic techniques let us find correlations we would not have found among data that we would not have thought would be significant.

Third, if the world is made of data, it feels less obviously right that data should be an owned, proprietary resource. Data has become a property of the world like its sounds or smells. It is being gathered raw in many cases, so there isn’t the ownership conveyed by the investment of one’s labor (as per John Locke). If it doesn’t involve people’s privacy, the default is often becoming to keep it open, rather than capturing it and making it exclusive.

Cause for concern

That’s because it’s everywhere and there’s no end to it. It is not a resource. It has become a constituent part of literally everything in the world, like a new property that we sense with new sensory apparatus. Sight, smell, sound, touch, taste, and data.

That’s wonderful for computing, for everything becomes computable. As a way of thinking about how the world works, it should make us more than a little nervous, for data is in fact a reduction of the world to quantifiable abstractions ... and that is exactly what neither sensations nor the world are.

