Intelligent data capture: a trend only beginning Part 2
Part one of this article (KMWorld July/August 2006) covered key trends and some leading firms in the intelligent data capture (IDC) marketplace. IDC, which is also called intelligent document recognition (IDR), refers to the ability to scan documents or electronic pages that have no fixed layout and to extract data from specific fields to populate a database or business system. Using IDC/IDR, documents can be unstructured, with varying layouts, or "semi-structured," with some fixed fields but mostly varying formats.
The IDC marketplace is active and dynamic, characterized by mergers, acquisitions, repositioning and new entrants. In the first part of this article, we covered ABBYY, Brainware, Document Strategies and its Xerox alliance, EMC Captiva, DICOM's Kofax unit and ReadSoft. Here are a few more leading players in this developing market:
AnyDoc Software made its name with its forms processing product, OCR for Forms, which gained a reputation in the industry as a durable, cost-effective, straightforward product for lower-volume demands. Renamed OCR for AnyDoc, the offering now includes the ability to process structured and unstructured forms. It contains a rules engine where rules are created to help the system determine valid data when it is performing the recognition process (e.g. valid date ranges, two-way purchase order-invoice matching).
Similar to ReadSoft's product naming convention, AnyDoc offers various flavors of the product targeted at vertical or departmental applications like AnyDoc INVOICE, AnyDoc EOB, AnyDoc CLAIM and AnyDoc CLASSIFY (for mailroom applications).
AnyApp technology was introduced four to five years ago to provide a better way to capture and organize the common data that appears on semi-structured documents. AnyApp has also helped the firm with capturing and exporting data to business systems like SAP. QuickApp technology is used more for variable format/unstructured documents. When processing invoices, data such as vendor ID numbers or total amount due are found in different locations in each invoice format, depending on which vendor provided the invoice.
Using QuickApp, AnyDoc INVOICE facilitates invoice processing. Data fields needed such as "Amount Due," or "PO Number" are defined, and the invoices are scanned to find any variation or combination of keywords to locate the data needed. The next time an invoice from that vendor is processed, the software will "remember" where it found the data, making data capture even more efficient. Hughes Supply, recently purchased by Home Depot, processes approximately 250,000 invoices per month using AnyDoc INVOICE.
AnyDoc reports approximately 70 users of its unstructured form capability, especially focused in manufacturing and distribution and in the utility sector. More than 90% of its sales go through its reseller channel. What's AnyDoc's advantage? "The ability to read data accurately in a complex environment, like lengthy, variable format invoices," says Sam Schrage, VP of operations, marketing and international sales.
Datacap has been providing document capture and forms processing software solutions to organizations worldwide since 1988. Datacap claims several industry "firsts":
- first forms processing solution for the PC in 1989,
- first Web-based capture solution in 2000, and
- first forms processing components for FileNet Capture in 2003.
Datacap introduced a rules processing capability in its Taskmaster product several years ago, giving it the capability to process not only structured forms but also unstructured or "highly variable" documents.
"We don't distinguish between structured and variable format documents--you get the full capability for either when you buy our Taskmaster product," says Scott Blau, CEO of Datacap. "In fact, some of our customers, like Blue Cross Blue Shield of Arizona or Sharp Healthcare, start with forms processing and then have move to processing variable documents. The advantage is in lower training, implementation and support costs, since it's one product."
Taskmaster Web Service is a set of capture capabilities--including image processing, recognition, validation and export formatting--that are hosted in a Web service, simplifying integration. Written in Microsoft VBScript, the rules processing capability also facilitates data export formatting to populate databases like Oracle or SAP. Datacap is certified with SAP's Web-based NetWeaver product, which unifies integration technologies into a single platform and is pre-integrated with business applications, reducing the need for custom integration. The platform is based on industry standards and can be extended with commonly used development tools such as Java 2 Platform, Enterprise Edition (J2EE); Microsoft .NET; and IBM (ibm.com) WebSphere.