The big data steamroller era
On a trip to the United Kingdom, I visited the Science Museum, and saw an Aveling & Porter steamroller built in the 1870s. Amy, as the large machine was named, was imposing. As the first commercial steamroller, its purpose was evident. I grew up in an environment filled with sheep’s foot rollers and massive 20th century heavy equipment. The machines became larger and more powerful. Their purpose retained Aveling & Porter’s focus: smash stuff down.
Big data is in its Aveling & Porter phase. Large volumes of digital information have to be pressed into something usable. A mountain of bits inspires awe like a glimpse of Mont Blanc, but a snapshot or a refrigerator magnet makes the massiveness apprehensible.
Listening to lectures at a recent conference, I was struck by the crudeness of the big data steamrollers presented by vendors and speakers as state-of-the-art solutions. I suppose to an engineer who had to rely on manual labor to prepare ground for construction, the Aveling & Porter machines represented the bleeding edge of 19th century technology. We now know that the early machines yielded to better and more sophisticated crushers.
In one of the lectures, a data scientist referenced the eight laws of big data identified by David Feinleib, a principal leader at The Big Data Group. Like the engineering drawings for Amy, the laws, as I understood them, identified some broad guidelines. Just as the form and function of subsequent steamrollers evolve from the 1871 machine, Feinleib’s laws make clear that large amounts of digital information require almost Herculean effort to yield value. You can find a good summary of those laws in Feinleib’s slide presentation, “Big Data Trends,” at slideshare.net/bigdatalandscape/big-data-trends.
The laws are fascinating. Three of them are also relevant to organizations wrestling with knowledge management, a discipline that shares some similarities with the big data challenge.
The first law states: “The faster you analyze your data, the greater its predictive value. Companies are moving away from batch processing to real time to gain competitive advantage.”
Predictive. Value. Real time. Prediction is particularly tricky. If information about “now” is available, the future is clear. The facts of the Edward Snowden matter or the problems gamblers face suggest that “predicting” can be wrong. Humans are not very good at figuring odds based on complex mathematical procedures, sensitivity thresholds and the murky union of Bayesian statistics with Laplacian techniques.
Value is a troublesome concept, often reduced to a perception based on mutual friends, rumors, family and expediency. Real time comes in several different “flavors.” For example, the real time of financial trading firms in New York and London incur multimillion dollar costs for bandwidth, infrastructure, staff and services. The hope is that speed will yield an advantage. The unfortunate failures of Bear Stearns and other leading firms suggest that speed may not be enough. The competitive advantage boils down to finding an angle without a competitor hitting on the same solution or, in the case of J.P. Morgan, billions of dollars in penalties.
What I find ironic is that Hadoop, the magic data management system pressed into big data service, is essentially a batch operation. The use of Hadoop and variants from the cloud is a simulacrum of the days of the mainframe. What hides those facts is the dependence on visualizations and verbal pyrotechnics.
Feinleib asserts in the fifth law: “Plan for exponential growth.” Exponential growth occurs when the growth rate of the value of a math function continues to increase. A nuclear chain reaction illustrates the potency of the exponential function. If your car breaks down in a snowstorm, the heat in the vehicle exhibits exponential decay. In practical terms, the toasty SUV becomes chilly. In finance, if you get involved with a Ponzi scheme, the money rolls in forever—in theory.
The notion of exponential data growth for an organization trying to manage its knowledge is sobering, if law six—“Solve a real pain point”—is on point. The costs for bandwidth, storage, cloud services, assorted hardware, software and people rise. Exponential curves bring tears of joy to the eyes of an investor when a WhatsApp appears to be worth billions. An organization’s chief financial officer may weep with the price tag for exponential growth. Organizations struggle with the digital data flowing through their network veins now. Finding information is getting more difficult despite the best efforts of managers, staff, vendors and consultants.
Exponential growth, law five, may be a deal breaker in the sense that different choices may not correlate with reduced costs and growing revenue. The assumption that big data will pay off is enervating for strategists and advisors. Organization-specific facts about the value of big data may be difficult to pin down.
Big data requires people
Feinleib’s big data law 7 underscores the cost challenge: “Put data and humans together to get the most insight.” Most organizations are trying to reduce, stabilize and outsource to lower the costs of staff. In my experience, large volumes of data require software, systems and specialists. The “specialists” require specific knowledge about the business, its context and processes. Consider the big data produced by the sensors, systems and workers at the Fukushima nuclear plant. Big data has a role to play, but the problem to be solved requires physical actions to contain, manage and stabilize an unfortunate nuclear situation.
Many problems in organizations are similar to Fukushima. Big data might provide some useful signals, but the problems require that the organization take meaningful action to remediate products that fail to generate revenue, services that drive up customer support costs, and unhappy customers and partners who launch expensive litigation. Big data may not illuminate the solutions. In fact, the people who embrace big data may be turning their back on more fundamental, non-big data issues.