Flipping data science
Analytics is all the rage these days. Organizations around the globe are scrambling to attract the very best data scientists. But many who are hiring don’t know what’s going on inside the data scientists’ heads or the complicated computer code they write. They just know they’d better hop on board or be hopelessly left behind.
This is typical when technologies reach the middle phases of Gartner’s Hype Cycle. Of the 33 elements in data science and machine learning which Gartner tracks, 15 are either at or near the peak of inflated expectations. Six are sliding into the trough of disillusionment.
(see Gartner Research “Hype Cycle for Data Science and Machine Learning,” 23 July 2018).
Now is a good time to step back and see if we’re on the right track and what adjustments we need to make. Whether you’re just testing the waters, or jumping in with both feet, here are some key aspects about data science to keep in mind.
Much of the current explosion in zettabytes consists of useless noise. We say useless because it’s often redundant, irrelevant, incorrect, incomplete, or misleading. And when algorithms ingest and crunch massive volumes of data at high speed, it often produces inaccurate results. This happens for a variety of reasons, including a growing preponderance of out of sample (OoS) data. OoS is a statistical term for drawing incorrect inferences in one category based on outcomes observed from another seemingly similar category. We see this all too frequently in the public discourse. Another problem is nondeterminism, observing what appear to be repeatable patterns, only to see them suddenly invalidated, such as during so-called “black swan” events.
The world is reflexive and dynamic, continually adapting and adjusting. This occurs in response to and in anticipation of a variety of conditions, including human interventions. Just as soon as we think we’ve found the master key, someone comes along and changes the locks.
Data science is typically treated as a highly specialized niche rather than the broad, multi-disciplinary field it needs to be. For many colleges and universities, data science programs reside in the statistics department. While statistics play an important role, other disciplines such as basic and applied mathematics, and the natural sciences—from anthropology to quantum physics—are important. A complete and comprehensive human-oriented information sciences curriculum, including behavioral neuroscience, linguistics, and general systems theory, is needed.
No matter how much “intelligence” is programmed into a computer, it will very likely never understand the results it produces. Doing so takes human cognition, intuition, judgment, and other ways we humans make sense out of data.
At the same time, we know that human thinking is potentially flawed. It can be rife with bias and error. So the challenge becomes how to create an environment in which computers and humans work together in ways that complement, rather than conflict with, each other.
Building such an integrated solution means shifting from a purely data-driven, “black box,” machine-learning-based approach to a human-centric approach which embodies Tim Berners-Lee’s concept of a two-sided (human and machine) semantic web. Below are three important steps you can take toward enabling humans and machines to work together to help solve your toughest problems.
We recommend carrying out these steps while working in small teams and speaking out loud. Create knowledge trails by capturing and documenting the thinking behind the choices being made.