Multichannel data capture matures
Data capture is steadily becoming more intelligent and flexible, and customers are enjoying substantial productivity enhancements from the technology improvements. In particular, invoice recognition and handwriting recognition have both taken off, benefiting from today’s more sophisticated software. Distributed capture is seeing increased use, saving the time and expense of physically moving paper to a centralized location. Online data capture is growing, too, primarily in areas where no legacy paper is involved. Consequently, it is not surprising that considerable effort is also being directed toward integrating these multiple data streams into a single, meaningful picture.
Cost-effective invoice capture
Ista North America is a utility expense management company that conducts energy audits, manages the utility expense bill payment process, and provides energy and expense reporting to residential property owners and management companies.
Typically, ista North America monitors expense and consumption and then sets up a billing program that reflects usage by each unit. Sub-metering allows for allocation by individual apartment unit in situations that previously were billed together, which provides an incentive for conservation. In addition, ista North America consolidates bills from multiple utilities, simplifying management for the owners.
Because utility bills arrived in paper form, ista North America had to scan and capture the information contained in them. In 2003, the company began using Datacap’s Taskmaster for Invoices to process bills received from about 1,700 utilities.
“The invoices have many different layouts,” says David Richitelli, VP of utility expense management at ista North America, “so we developed templates for the most frequently encountered layouts, accounting for about 50 percent of the invoices.” The rules-based engine used by Taskmaster for Invoices adapted to each document to identify and extract the required data.
The implementation of Taskmaster doubled the rate at which invoices were processed. As a result, more staff resources were available to review and analyze billings to monitor potential inefficiencies and keep customers’ energy costs low.
“By August 2007,” Richitelli adds, “we will be using the new technology in Taskmaster 7, which will allow us to capture 99 percent of our clients’ invoices through OCR.” Taskmaster 7 includes dynamic natural analysis (DNA), which accommodates invoices with highly variable layouts.
Ista North America also began offering property managers on-site scanners so that the invoices can be captured remotely and uploaded through a Taskmaster solution that uses a browser-based interface. The system saves mail time (allowing more opportunity to focus on auditing the bill), minimizes the data error capture rate, provides a high-quality bill image, reduces postage costs and lessens the likelihood of invoices being lost through the mail.
For a variety of reasons, paper remains the medium of choice for many transactions, yet businesses now operate on digital data. Therefore, converting paper into electronic transactions has become a mission-critical function for many businesses.
“Only a small fraction of traditional transactions have gone online,” says Scott Blau, CEO of Datacap, “although for new applications, such as online banking, data is captured online from thebeginning.”
In many cases, data arrives through multiple channels, such as fax, e-mail and paper. “Having multiple channels is a driver for greater efficiency,” Blau explains, “because the task of integrating the data is added to the task of capturing it.”
The strongest trend in the data capture industry, according to Blau, is a move toward larger and more
decoupled distributed systems to fit service-oriented architecture (SOA) requirements. Datacap provides modules for FileNet and Kofax, and offers Rulerunner Web Service, a capture rules software product, and Wordfire Classify, which analyzes content, as Web services.
Rulerunner Web Service decouples all the core capture functions, including image enhancement, classification, recognition, validation and export. Wordfire Classify, based on technology from Content Analyst Company LLC, can process a stack of unstructured documents and determine what category each one belongs in. That ability eliminates the need to manually insert separator pages. “Once the core capture software is decoupled from the platform,” Blau explains, “users can select the services they want.”
The handwriting on the form
Communications Data Services (CDS) is a service bureau that provides imaging data capture, subscription and order fulfillment, product distribution and marketing services for its clients. The company developed its own imaging system but wanted to add handwriting recognition to expedite the capture of information from order forms and other documents. After researching a number of solutions, CDS selected Parascript (parascript.com) as the best match for its requirements.
In the course of a typical year, CDS captures data from 30 million to 40 million handwritten forms, so improvements in efficiency have a significant impact.
“By using Parascript, we were able to reduce the forms we key to half the previous number,” says Carl Egger, director of input systems at CDS. One of the factors that influenced the company’s choice of Parascript was its ability to validate the addresses against a postal database. That comparison increases the accuracy of the output by identifying non-existent addresses, misspellings and incorrect zip codes.
A confidence level is provided for the scan of each form. If the results do not match the postal database and other validation tables that CDS has developed, the form is routed to an image entry keyer who reads the text and keys in the correct data. In addition to addresses, Parascript can also recognize and validate credit card numbers, date fields and e-mail addresses.
CDS also receives checks associated with orders for which the written amounts need to be validated. It uses another one of Parascript’s technology offerings for that process. The amount from the courtesy box is compared to the handwritten legal amount read by Parascript, Egger says, “and for this data, we have about an 85 percent match rate.”
The products use context as much as possible to eliminate the need for manual keystrokes. “We use the USPS database for addresses, and the U.S. Census databases for names,” says Brian Ball, VP of the forms division at Parascript. “We apply filters and templates in much the same way as human readers do when they apply context to validate information or resolve ambiguity.”
Often underestimated, says Ball, is the importance of good forms design. “Many forms do not provide enough space for the