COLD: a data warehouse/data mining alternative
Computer documents represent the primary corporate memory in today's environment. They are both the current and the historical reference to internal corporate activity, as well as the primary method of communicating with customers. COLD (computer output to laser disc) systems have become the standard for the electronic storage, retrieval and on-demand printing of those internal reports and outbound customer documents.
Although many industry observers believe that online record-oriented systems will soon obviate the need for computer output reports, others believe that the document will remain the output standard. That argument cannot be resolved in this article; however, it is a fact that documents will continue to be required for both legal and accounting purposes. Online databases do not substitute for the "point-in-time" legal and accounting record of an individual transaction. Mandated life cycles--for customer documents such as phone bills, utility bills, invoices and bank statements, which frequently must be kept in excess of seven years--continue to exist.
Documents, however, have been of little value when a user wants a different view of the information contained in page format. That is the flaw in document-oriented output--it is frozen data; there is no easy way to analyze the data to obtain other business intelligence contained within the document. That has changed and new software tools now let that heretofore static print output be repositioned for data mining or, as they are known by COLD vendors, "report mining" applications. The COLD document warehouse has new and significant information value.
Report mining--an alternative for data analysis
The concept of report mining uses an existing COLD document repository as the information database. Consider the following facts:
- COLD systems store documents in the same page format in which they were originally created. That means that the data is in known locations--in specific rows and columns on a report.
- Documents are a compilation of the database at point in time; data to be analyzed exists on the application output document.
- Report output provides data that is "clean;" it does not contain the inconsistencies that exist in the core database.
- COLD systems store data for long periods of time, thus providing a rich warehouse of history.
Report mining--searching, locating and extracting specific information--can be accomplished from either internal reports or outbound customer documents, thereby leveraging the known row/column document (data) structure. Different output can be searched independently and the results pooled to create the necessary database of information.
The document is the database
For example, a bank statement contains all of the relevant information about a customer's transactions for a given period (normally a month). Furthermore, since the data is located in a known position on the statement, a search that is limited to specific rows and columns ensures that only the relevant data is located. (Unlike a standard text/string search, data not related to the query is ignored). For example, a bank statement "deposit" column can be searched and the "withdrawal" and "account balance" columns would be ignored. Since a key benefit of a COLD system is history, the search can be made over an extended time--two or three years or even longer. Analyzing historical customer activity for trends now becomes an important "added value" of the COLD system. A number of major financial institutions have installed COLD systems with the key objective of using the "statement database" to search and analyze customer activity over extended time periods. The COLD customer service system does double duty by becoming the data warehouse, which can be analyzed with a wide range of data mining software tools. That ability to reuse documents further enhances the strategy of the document as the dominant record in the organization.
Reordering, manipulating data and creating a new report on the fly is now an extremely powerful COLD system byproduct. Report mining requires no new database, no re-engineering, no additional human resources and no significant additional cost. Because report mining software leverages the existing report infrastructure and information delivery system, it is simply "added value" to a COLD system.
For example, a credit card product manager is asked to present a report on all customers in San Francisco living in zip code area 94133 who have generated more than $5,000 per month in charge card revenue during 1997. That is the type of information that is generated from a data warehouse. It is also information that can be extracted from monthly statements stored in a COLD system using report mining software. Three facts make it worthwhile to consider a report mining strategy:
Report mining is essentially a free byproduct of the COLD system. Data warehousing and data mining are extremely expensive with long implementation times. Data quality is a major cost element of data mining--output reports represent high-quality database information.
Report mining opens the door for a new use of computer output. Other examples include:
searching ATM (automated teller machine) historical transaction reports to ascertain customer trends and use of specific ATM features;
searching brokerage statements to provide a summary of purchases of a specific stock by a given client over any time period, thereby providing a personalized service;
tracking loans from monthly statements by geographic locations to determine areas in which there is significant loan exposure or to pinpoint areas in which loan activity is low;
extracting customer data from statements with late payments for matching with their credit scoring history to determine the accuracy of the initial screening process;
performing product sales analysis on invoices by customer, product or sales representative to obtain totals by month, quarter or year to date.
Leveraging the COLD system to provide low-cost and high-value information may even prove to be a more significant benefit than that of the original application for which it was intended.
Full-featured report mining software will allow the user to:
search across multiple reports or applications,
search across an unlimited time period,
create a new consolidated database from multiple reports,
create and save templates of standard searches for repeated use,
sort and rearrange data in any desired manner or format,
create new totals and subtotals from columnar data,
export data to any commercially available spreadsheet or database,
create charts of the results of the new summary data.
Some versions of report mining have been integrated with online analytical processing software (OLAP). That allows columns or rows of data to be migrated from the COLD document warehouse to high-performance data mining software for even more sophisticated applications. A COLD system, in fact, can be implemented quickly and easily for a wide variety of enterprise data mining applications. That is a claim that cannot be made by traditional data warehouse systems.