Flipping data science
- Disassemble the data into its most basic elements (atoms), then organize into categories.
Human side: Almost all data input arrives with predetermined schema. Break free of these old patterns by disassembling (atomizing) your data into its most basic elements. Then step back and view those elements from a distance, with minimal or no preconceived notions.
Next, organize the atomized data into categories and sub-categories that seem most natural. This technique, known as the stochastic scatter/gather method of description, is simple, yet powerful.
In today’s complex world, category errors often lead decision makers down wrong paths, wasting time and resources. For example, a large part of the billions spent in cancer research to date has been focused on studying changes in chromosomal DNA. But researchers have recently discovered that the primary culprit is mitochondrial DNA. This discovery is resulting in major shifts in prevention and treatment protocols (see The future of the future column for May/June 2019).
Machine side: Take advantage of ever-increasing computational capacity to build interactive, multidimensional data visualizations. Use mathematical concepts such as graphs, hypergraphs, group theory, matrix and convolutional inversions, and other Hilbert Space transformations as tools.
Here’s a simple way to get started. A long-standing habit of viewing the world almost exclusively in base10 has severely limited our capacity for discovery of knowledge from data. Use the power of computation to transform data into different number bases and display the results. Hidden patterns often emerge just from this simple operation.
- Connect the dots and look for invariants.
Human side: Capture how basic elements are related to each other (similarities, differences, etc.). Seek to uncover underlying rules or patterns governing aggregation of these elements. This is key to overcoming computational problems associated with combinatorics.
Think of how the Periodic Table of Elements was discovered. Atoms were assigned properties and grouped into categories, from which repeating patterns (e.g., the use of valences) were discovered. This led scientists to formulate rules of aggregation regarding how atoms could and could not be combined into compounds.
Much of our world follows the same layered structure. Some of the rules are time-invariant. Others change under varying conditions.
Machine side: Verify that identified patterns exist across a wide range of instances. As the number of basic elements, properties, rules of aggregation, and associations piles on, category selection begins to saturate. New categories become less frequent, and a minimal, relatively simple, machine-readable computational ontology emerges. This is what we mean by “flipping data science” from a narrow discipline focused on data (what is measured) to a broad, multi-disciplinary field focused on natural and computational ontology (what “is”).
- Apply sense-making, especially at key decision points, and adjust.
Human side: Practice expanding your awareness. Don’t view the world solely from your own perspective. Insert yourself into every point through which data and knowledge are flowing. This includes identifying things that make no sense whatsoever, yet still fall into the realm of possibility. For example, there are situations in which an adversary senses what his opponent might be planning, then makes a crazy, totally unexpected maneuver in order to gain the advantage of surprise.
Get into the habit of using foresight. When it comes to predicting outcomes, human experts are often no better than flipping a coin. But humans are good at foresight and anticipation, which play heavily into sense-making.
Machine side: Apply algorithms to help generate as many variations in perspective as possible. Use Monte Carlo and other simulation techniques to generate and assess solution alternatives under the widest possible range of scenarios.
The road ahead
Until now, mainstream approaches to data analytics have been aimed at modeling some part of the world, collecting data, and validating the models.
We propose doing the reverse. Start with the data, apply human sense-making, and build an ontology of what that data represents. Then, throw away the data. Computational ontology will help make the two-sided semantic web a reality.
One of the worst errors we humans make is a mistaken notion that more data yields increased capacity for control. As we’ve pointed out in a previous article, “little data” is better than “big data.” What is true for deterministic systems is definitely not true for complex, adaptive, natural, and social systems such as smart cities, the biosphere, or the global economy.
As we move into the IoT era, with billions of intelligent, autonomous, and interconnected devices generating overwhelming amounts of data, human judgment must take on an ever-increasingly important role. Make plans and prepare your workforce accordingly.