-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Big data: expediting and validating analyses

Article Featured Image

The value of human input

Most organizations using CrowdFlower are working with very large data sets, according to Lukas Biewald, founder and CEO of CrowdFlower. “They may have millions of records for social media or other data and have automated the analysis,” says Biewald, “but human input can validate or disprove the conclusions.” For example, software analysis can determine positive or negative sentiments but human review of the responses might be needed to sufficiently explain the cause of the sentiment.

The recently introduced CrowdFlower AI integrates machine learning with human labeling and uses Bayesian statistics to determine a confidence level for the machine learning analysis. If the analysis does not reach the threshold that has been set, the system reverts to a mode in which human input is required. An iterative process can be set up in which the software learns from human input and then can be used to train new individuals in how to label contents.

From data ingestion to publishing

Dozens of big data products are available now, and the number is growing, with products gaining in capability as they evolve. Informatica, which provides a wide range of software products for data integration, master data management (MDM) and other uses, launched a product in November that it describes as the first integrated platform for big data management. Designed to combine big data integration, quality and governance, as well as security, the platform is intended to handle everything from data ingestion to publishing. It provides wizards and mapping templates to direct data from many sources into either a data lake (see sidebar following article or on page 31, KMWorld, Vol. 26, Issue 1) or an operational data store. The security component helps discover where sensitive data is stored and has mechanisms for protecting it.

Other products are likely to emerge that take on the heavy lifting of management of big data, allowing organizations to focus on their core competencies and get value from the ever-increasing volumes of information now available.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues