Cognitive computing: Big data and cognitive computing–Part 1
Discussions of cognitive computing almost always include a reference to big data. Discussions of big data occasionally, but infrequently, reference cognitive computing. But are we truly confident that we know which is which and why that is so?
Finding a way to describe the trends of cognitive computing and big data in simple terms—differentiating between them and defining their relationship to each other—could help lower the level of hype and confusion in that active corner of the technology landscape. With a new clarity in the conversation, we can get on with the business of talking about cognitive computing in a crisper, more intelligent manner than we’ve typically experienced to date.
It’s important first to pull apart the various levels on which the terms cognitive computing and big data operate in our broader public conversation. We need to get better at understanding them and using them more accurately. The four levels are:
- the mission or purpose of big data versus that of cognitive computing;
- the foundation technologies of each;
- the functional description of what those trends and their technologies actually do for people; and
- the symbolic level where our public conversation has already transformed those terms into labels for various business strategies, worldviews and hype campaigns.
In this, the first part of a two-part article, my goal is to start deconstructing the first two levels of mission and foundational technology. I want to identify the important pieces and offer a view of how they relate and how they diverge.
First,big data and cognitive computing are highly distinct in their purpose or mission. The mission of big data is best understood as the next generation of the traditional IT function of storage and organization of machine-based enterprise information—now extended to include different types of data handled in new ways. This includes the tools that tell us what is in these collections.
Cognitive computing, on the other hand, seeks the meaning in the data. Cognitive computing is best understood as an innovation in methodology for the field of analytics. Cognitive computing wants to break through the constraints of analytics based on backward facing numerical calculations and static presentations of results for human review.
Cognitive computing represents a unique form of computing combining analytics, problem solving and communication with human decision makers. It uses big data if necessary to answer ambiguous questions and solve problems. But its key contributions go well beyond the charter of analytics as understood today. Cognitive computing looks within and across disparate data sets including rich media and text, identifies conflicting data, uncovers surprises, finds patterns, understands context, offers suggestions, requests clarifications and provides ranked solution alternatives. Cognitive computing offers a new approach to uncover the potential in data—and capture value whether the data is big or small.
As its purpose, big data remodels the data center, the database and the data warehouse to accommodate today’s transformed digital environment. As its purpose, cognitive computing leverages a broad suite of evolving discovery, analysis, human interaction and solution development technologies to offer a new kind of digital assistance that operates in near-human terms.
Beyond the issue of mission, each trend—i.e., big data and cognitive computing—rests on a unique technology foundation. And we propose that the two foundations are related but also fundamentally different. So we have a “ground truth” based on distinct developments and innovations in the technology environment. For example, there is little dispute that big data is a phenomenon of the spread of digital technology across consumer, commercial, government and scientific life (and most any other life you care to add).
At the same time, cognitive computing has been associated with bringing computing machines to play in such challenges as bringing “human-like” insights to “Jeopardy” game-playing, making personal digital assistants intelligent, accelerating human genome analysis and improving medical outcomes through diagnosis and treatment recommendations. All of those examples are based on the ability of cognitive applications to process “beyond human” quantities of disparate data but analyze and present suggestive, non-trivial, timely solutions.
Prefiguring the emergence of the big data trend, we all recognize that in the spread of the Internet, global access to inexpensive content “publishing” both personal and professional, the adoption of online video and other rich media, the explosion of mobile devices, the overnight rise of social and user-generated media, the proliferation of log files tracking all of that activity on a packet-by-packet basis, and on and on—the very nature of data has changed rapidly and is now irretrievably big.
So big is this big data that in increasing numbers of applications, traditional means of creation, capture, organization and storage of it threaten to break or become meaningless under the onslaught. As a result, innovative technologists are devising new approaches to try to keep pace. So Google, for example, faced the problem of how to manage the exponential growth of its Web search indexes and came up with the idea of MapReduce, an early approach to harnessing commodity hardware clusters to transform the level of efficiency of content processing and index creation.
At roughly the same time Google was focusing on those distributed processing innovations, technologists at Yahoo, seeking to supplement the reigning SQL database storage paradigms for performance and scalability reasons, invented non-SQL storage models designed to replace traditional DBMS parallel processing with distributed processing. The Hadoop Distributed File System is now the most prominent model, supplemented by an ecosystem of multiple “non-SQL” software packages that extend the capabilities and connectibilities of the Hadoop storage core.
I review that bit of recent technology history to point out that the term “big data” is not simply a reference to the quantity of bytes we now generate, although intuitively it is that as well. But more importantly it also references a set of software resources, assets and practices that have now built up a legacy of over a decade of development and are supporting many of the most critical compute applications on the planet.
So what can we say about cognitive computing’s technology foundation? The first observation is that it does not have to be involved with big data at all. While IBM’s “Jeopardy”-playing Watson ingested an impressive quantity of encyclopedias, history books, magazines, political broadsides and previous “Jeopardy” questions and answers, that hardly constituted a big data application on a scale familiar to Google, the intelligence community, telecommunications carriers, etc.
Watson was much more dependent on “big memory,” as it utilized recent innovations in “in-memory” processing approaches to discover, synthesize and statistically analyze possible responses to those arcane “Jeopardy” questions in real time. The Watson “Jeopardy” application is much more usefully understood as a data science triumph than as a big data feat.