Leveraging the Four Vs of Big Data
Produced by Steve Nathans-Kelly
The four Vs of big data—volume, variety, velocity, and veracity—are well known. At KMWorld Connect 2021, senior system analyst Dr. Joe Perez explained why it's important to keep them in mind when presenting and linking to relevant data.
After you've prioritized your objectives, it is time to find and present relevant data. Your stakeholders need to see how the data being presented is relevant to the situation at hand, the issue being addressed, the flaw being remediated, the defect being reported, the concern being discussed, or the problem being resolved.
As you present and link the data, it's important to consider the four Vs of big data, namely volume, velocity, variety, and veracity.
Perez shared a conceptualization put forth by IBM data scientists a number of years ago as the four dimensions or key characteristics of big data. First of all, he said, there is volume, which is how much data do you have. "Reuters estimated that the world's storage of data will continue to increase by 50% every year. And the sheer volume, even with a moderately-sized organization, requires a huge amount of processing power to analyze and gain insight from it."
The second V is velocity, and that can be expressed in a couple of ways, such as the speed at which the data is coming at you or being generated, and also the rate at which it's being consumed. "Think of video streams, streaming services like Amazon Prime, Netflix, Hulu, and others that, along with their customers, must rely on the availability of massive bandwidth for the transmission and also discrete processing for analysis." The same thing goes for commodity stock trading, credit card sales, and other instant-transaction-based industries in which the speed and timeliness of capture is critical for nailing down patterns and making predictions. For example, said Perez, in the first 2 months of 2020, more than 7.6 billion shares were traded every single day. Imagine how many individual transactions it will take each day to generate that massive amount of activity.
The third V is variety, and that is the different sources of data available. How complex are they? Traditional data sources fit pretty nicely into structures like relational databases, and, even when they're voluminous, they're relatively straightforward to extract, analyze, and report on. But then again, there's the unstructured data, like images, audio files, video files, and so on, that goes way beylond traditional storage and traditional analysis. One of the goals of big data is to use technology to take this unstructured data and make sense of it.
The fourth V is variety, said Perez. Is this data trustworthy? Can you count on it to be true?
In traditional analytics, where volume and variety is smaller, the organization tends to have greater control over the data, and, as a result, there's going to be greater veracity, but with big data on the other hand, with greater volume and variety, comes a greater likelihood of uncertainty. "I'm not implying dishonesty. It's just the nature of the beast that comes with the introduction of more and more unstructured data."
Save the Date for KMWorld 2022—November 7–10, 2022—JW Marriott | Washington, DC!