Document capture—a growing market for KM

By Harvey Spencer

Document capture has often been the poor, overlooked orphan of document imaging solutions. It's just not regarded as glamorous—although the fact is that some interesting and advanced KM technologies are now being applied to it. Conventional wisdom might hold that paper-based document capture software is in decline, but it has been growing consistently at more than 8% per annum, even through recessionary periods like the current one when overall IT spending growth turned negative.

In the recently available survey of the document capture market that we conducted with Strategy Partners, a United Kingdom-based analyst company, the market for software capture is set at $894 million, excluding hardware (scanner) sales and integration, and is expected to reach over $1 billion by 2006. That business rationale and technologies are fueling it?

Business documents are increasingly originated electronically, and business processes are mainly conducted electronically. Overall business paper production is gradually falling as companies convert to e-forms—mostly for internal usage now, but in the future, more and more external documents will be electronic too. It seems clear, though, that electronics will not completely replace paper due to its convenience and universal understandability even after the legal impediments are sorted out. But the low cost, speed of movement and the understanding tools that you can apply to electronics are increasing the pressure on businesses to handle all their transactions more effectively. In addition, the cost of copying and storing paper continues to increase. That is driving a need to truncate paper-based transactions as quickly as possible into electronic images while making them as usable as electronically originated ones.

Capture has evolved through a number of stages:

departmental scan and index,

batch scanning with control sheets, and

incorporation of recognition technologies—optical character recognition (OCR) and barcode—to automate the indexes.

In each case, the paper needs to be moved and prepared prior to scanning it. In the first case, the paper is distributed to the relevant department where it is opened, prepared, scanned and indexed. It is a slow and labor-intensive task resulting in maybe a half-dozen or so documents being scanned each minute.

Batch scanning improved that process—dedicated preparation staff can batch documents at 750 to 1,000 sheets an hour at a cost of $8 an hour. The scanners can then be run at rated speed--40 pages and more per minute—with minimal operator intervention by dedicated scanner operators. But there is a need to assemble the paper at centralized locations, sort the documents by type, add coded separator index pages (often three levels of which are used) and possibly require a back-end process to key extra indexes. For applications requiring capturing of 5,000; 10,000 or more sheets of paper a day, it is an effective way to work.

Incorporating recognition technologies allowed extra indexes to be located and converted automatically, thus reducing the need for such keying. In the case of forms processing solutions, OCR forms the basis of the capture process through applying a template to the needed forms and setting field-based rules for each element on the template. It is the fastest way to identify and capture data from paper forms.

The next step was to start employing a wider usage of pattern recognition to automatically classify the documents and then, based on an understanding of the form type, locate the index data without pre-sorting. This technology, which eliminates the need to manually set up templates, is still in early adoption, and most of the forms processing vendors have used it to create semi-automatic templates. The most advanced systems are starting to identify the forms in a similar way to humans--by using a variety of technologies including general pattern matching, complete OCR of the document and proximity searching. We call it intelligent document recognition, or IDR. It reduces setup time; it reduces preparation of forms; and it eliminates the need to insert, scan and remove extraneous separator pages.

At the same time, users have started to focus on the costs and time associated with moving the paper to a centralized location. Clearly there are some applications such as remittances, insurance medical claims or tax returns where the physical documents are usually delivered in volume to a centralized site. Those types of users need centralized scanning. But many business documents, such as customer correspondence, orders or applications, are delivered to localized branches or small offices that cannot justify scanning 5,000 or more pages a day.

In addition, the majority of businesses in this country are small and only handle a few hundred documents a day—even if they are part of a larger process. For example, insurance claims or applications submitted through a broker require distributed capture before being sent to the insurer. To process those types of documents quickly and service the customer effectively, it is imperative to convert them into an electronic medium as fast as possible—but you do not have dedicated scanner and key entry operators earning $8 an hour, you have office workers, often earning $20 or $30 or more an hour.

For those offices, low-volume distributed scanning is the answer, and the less effort and setup time needed, the better. The key is IDR because users are typically not trained scanner operators; they want to truncate their documents into usable electronic documents available at their desktop with minimal buttons to push and minimal effort. IDR, as it evolves utilizing KM techniques, has the ability to query the scanned document for format and content, make the decisions concerning its importance and relevance, then deliver it to the desktop with copies as appropriate—all under automatic control.

It is clear that paper-based business documents will continue to decline in overall volume. But it is also imperative that the few that are left are managed as efficiently as electronically received documents or the user company will be at a competitive disadvantage. As a result, we predict IDR will grow at 51% over the next few years. The losers will be in the traditional front-end batch scanning with separators at a measly .2% growth; the distributed manual office-based scan and index systems at -.3%; the desktop individual capture at .2%; and traditional templated forms processing at -4.6%.

Leading specialist capture vendors such as Captiva and Kofax have already started to move with Captiva's Digital Mailroom and Kofax's Accounts Payable and Mohomine purchase. Other vendors such as SER Solutions and SWT started with IDR, and the other forms processing vendors such as ReadSoft, Ceresoft, Datacap, Neurascript and AnyDoc are creating products often starting with invoice recognition or Explanations of Benefits.

The choice of those two applications is interesting. Both consist of forms that have an identifiable structure. For instance, an invoice has a customer account number, name and address, date and detailed line items with extensions and totals. An EOB has a patient ID and procedure codes with amounts. Both can extend over one page (part of a page in the case of EOBs) or many pages and both require a knowledge and integration to the backend application—in the case of invoices, almost always SAP.

But the real market growth will be with the digital copier. Office digital copiers have the ability to act as shared copiers, printers and scanners. An increasing number have now been networked (estimated at 50%) and are being used as office printers, but currently a very small percentage is used for any form of scanning beyond e-mail or fax (5% to 10%). The reasons are numerous including unsuitable hardware controls and clumsy interfaces, but that is set to change fast. Software is improving, larger touch screens are being offered and there is a new interest on the part of the copier companies in using scanning and desktop document management as a differentiator in an overcrowded market.

IDR and KM lend themselves to that shared ad hoc scanning market and allow the documents to be scanned, sorted, classified, prioritized and routed automatically. It is the direction that technology is taking document capture—and there are sub-benefits. Documents by definition are less specific than electronics. Many documents that are delivered to distributed offices contain useful information that is not captured now, because it is not directly relevant to the task for which the document is being used. It is too expensive, time-consuming or not relevant enough to the transaction in process. For example, mortgage applications usually contain a cover sheet and a wealth of background information concerning the applicant—bank statements, stocks held, other assets. Currently a bank just captures enough fields to process the application, but from a marketing and KM standpoint, the extra information provides a trove of information that can be combined with demographics and information from other sources. That can be used to serve the customer more effectively.

Capture solutions combined with IDR technologies have a huge potential for growth in this market—even though paper-based business documents may be declining. That's why we're bullish on document capture.

......................................................................................................................................................................................

Harvey Spencer is president of Harvey Spencer Associates, which specializes in capture technologies and markets, e-mail harvey@harveyspencer.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Save with Early Bird Pricing for KMWorld 2026!
Register NOW and join us November 16-19

Document capture—a growing market for KM

Mining Business Knowledge From Unstructured Data

Checklist Report - Preparing for Agentic AI: KM Playbook

2026 State of KM & AI Report

More

Knowledge at Your Fingertips: Building Workflows with Embedded Intelligence

The AI Knowledge Maturity Model: Assessing Readiness and Measuring Progress

Closing the Knowledge Gap: Strategies to Deliver Answers at Scale

KM + RAG: Building Trustworthy, Context-Aware AI

More Webinars

Save with Early Bird Pricing for KMWorld 2026!Register NOW and join us November 16-19

Document capture—a growing market for KM

Save with Early Bird Pricing for KMWorld 2026!
Register NOW and join us November 16-19