-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Framing Unstructured Data for Business Analytics

Big data has a big value though. By unlocking the value of big data, economic efficiencies occur through improvements in virtually every industry. The UK expects at least eight to more than 10 times the 2011-quantified contribution to the economy over the next five years, according to a recent CEBR report3. The report specifies these improvements come from more efficient meeting of demand and targeted customer marketing, improvements to supply chain costs with better anticipation of replenishment points and optimized stock and allocation, reduced fraud losses in government and healthcare, and innovation of new consumer products.

Exploiting a High Performance Business Analytics Framework

The organizations that are succeeding with big data are the ones that treat it as a source that supplements what the company already has. The business analytics framework provides the guidance for developing an agile environment for integrating big data into existing business processes.

These integrated layers include:

  • Information management—for managing and governing the growing deluge of structured and unstructured data; ensuring data quality throughout and providing a unified approach to analytics management, data management and decision management;
  • Analytics—extensive range of advanced analytic methods to fuel evidence-based answers;
  • Business intelligence—providing the interface to the answers deployed in real-time and directly to mobile devices; and
  • Industry-specific business solutions that address common problems unique to specific markets in rapid time-to-value implementations.

The business analytics framework recognizes that there are specific requirements from information management in order to make information usable by analytic endeavors. From the analysis of such data, the results need to proliferate in interfaces that are familiar to decision-makers, and deployed into operating and information management systems to automate value delivery.

And within a high-performance analytics environment, organizations can move faster, to seize upon this rapidly accumulating information and insight in order to deliver profitable business value before the insight loses relevancy.

One of the UK's largest banking organizations, with tens of millions of customers, has developed a high-performance analytics (HPA) environment and is now able to refresh individual customer models daily. The company's historic approach meant it took months for models to reach production, and consequently forecasts were out of date as soon as they were produced. This had a direct impact on customers as mortgage and loan pricing could only be carried out on a monthly basis. Daily pricing is only possible using high-performance analytics-which negates the challenges posed by moving, transforming and preparing big data sets across environments. The business case for daily pricing of mortgages alone is worth many tens of millions to the bank.

From "Big" to "Relevant"

One of the biggest stumbling blocks to organizations applying business analytics is the quality of the data. Poor data quality, integrity and consistency are often cited as a fundamental roadblock in the adoption of business analytics. Ensuring that inaccuracies are removed and information is standardized is key to deriving value from applying business analytics.

"Analytic data quality" refers to the methods of creating quality data from inputs. Traditional methods valid to addressing data quality include: the elimination of duplicate materials (which can artificially inflate relative weightings in analysis); resolution of terms and references (which influence how materials are classified); and entity recognition for record standardization (which provides a common benchmark to associate raw inputs).

However, before inputs are even stored, entities and facts can be extracted from data streams to isolate the unstructured content that is relevant. Instead of collecting all the data and then querying to find the material of interest, organizations should restrict the informational space to begin with by using advanced linguistic rules.

Concerns regarding informational privacy are paramount with social media and Web data, and can be protected when extraction is based on rules that explicitly detail personal information (such as generic concepts for people, social security numbers or addresses). If such information is to be held in some way, it can be downloaded only to approved, secured areas.

Data doesn't answer questions. It is some type of analysis that is done to data that provides answers to questions. Once the universe of potentially relevant information is defined, the analytic process of the business analytics framework examines the inputs and provides quantified conclusions regarding the relevancy, impact and value to the business.

The benefits of extending advanced analysis to include text data are becoming obvious to many organizations. By assessing complaints, and providing the information in lower cost channels, 10% reductions in call volume per complaint have been experienced. Improvements in the accuracy rate of classification by 15% to 90% accuracy over manual tagging of archives has occurred, and fraud losses reduced by 15%. Workplace injury has been reduced, and better policies for government activities are defined and detrimental patient treatment is minimized.

The analysis world is being re-architected with big data. "Big data provides gigantic statistical samples, which enhance analytic tool results," says Philip Russom, director of data management research for TDWI.

Demystifying the confusion purported in the technology industry, advanced analysis cannot be addressed in SQL-like statements. Examining more inputs for richer analysis, and reducing the time it takes to answer questions affects the types of problems that can be asked. The scaling factor that a high-performance environment provides changes the complexity of the problem. Using all the text data, structuring it in categories to evaluate sentiment per product on the entire population and optimizing marketing offers across the entire customer base is made possible.

Sharing the New Knowledge

Analytic results inform decisions. They may inform the decisions made by systems—based on defined parameters or linguistic rules. They may also inform the decisions made by humans, who are exploring the output to find answers. Expanding the use, while maintaining centralized control, has been a challenge for IT departments who want to encourage self-service, but have been confined to the pre-defined data structures and governance associated with BI tools.

Word clouds and interactive term maps are a couple of the more common methods for visualizing text data. They can be updated and examined over time, to understand topic trends and emerging items.

Similarly, outputs from text analytic processing are embedded into other systems to enrich the metadata of a collection for more relevant search and content management retrieval, to relate semantic term relationships and to score new records with the models for current opinions and probabilities.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues