-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

The Business Imperatives for Full Color Imaging

It is ironic that imaging is the last remaining monochrome (black-and-white) application on the desktop. Virtually all other desktop applications are in color: browsers, word processors, spreadsheets, database tools and so on. This, despite the fact that in most cases there is no compelling reason for these applications to be in color, beyond accelerating user acceptance. For example, word processors are colorful even though most letters are laser-printed in black and white. Yet imaging has, until recently, been almost entirely black and white. But unlike word processing, there is a compelling external reason to use color: many scanned documents are colored documents in the first place. In this sense, imaging is a "natural" application of color.

Wide Applications

High-volume color scanning is revolutionizing document imaging. Previously, color imaging had been for the storage of photographic images; for example, car accidents for automotive insurance and buildings for leasing companies. The technology now exists for the high-volume capture of color forms, such as those used in insurance, finance, government registries, transportation companies and more.

The primary enablers have been the development of software to efficiently process JPEG (color) images, and the availability of high speed (5,000 page-per-hour) full-color scanners. With these basic components in place, the software technologies to perform extremely accurate color-based processes such as ICR/OCR, image enhancements, forms recognition and more become possible.

Color offers substantial advantages over monochrome Group 4 scanning in the following areas:

  • OCR and ICR recognition rates: Full-color scanning enables extremely precise "forms removal," allowing the variable data on forms to be far more easily processed via ICR/OCR. Faint writing, perhaps in colored ink, can be enhanced to improve the character quality, leading to improvements in OCR/ICR recognition rates;

  • Forms identification (automated and manual): With color scanning, the system "knows" the color of the forms. In a manual (paper) file, the different colors of different standard form types allow them to be easily identified. This is also true in color systems, where, for example, the software can recognize that a purple background indicates an application form, a pink background a tax form and so on. Manual forms identification through thumbnail images becomes practical with color, while it is of questionable value in black and white;

  • Image quality and readability: When a full-color image is captured, it is an exact representation of the original. With black-and-white scanning, it may be necessary to adjust the scanner contrast and threshold settings to "drop-out" the form background, potentially causing the loss of faint handwriting or highlighting;

  • Scanning throughput: With mixed document sets, such as those received in a mailroom or those being prepared for a backfile conversion, black-and-white scanners may require adjustment to contrast and threshold settings on a document-by-document basis. With full-color scanning this is largely eliminated. Color scanners rated at 5,000 pages per hour provide an effective throughput approaching this figure, without any presorting or batching. Black-and-white scanners seldom achieve their rated throughput in "real-life" production environments unless presorting by type occurs, because the scanner operator is required to make continual adjustments to the scanner settings; and

  • User acceptance: The advantages of color are so compelling that it can be expected that the acceptance of this technology will be very rapid, and full-color imaging will quickly become the standard, as with virtually every other significant advancement in consumer technology such as operating systems, televisions, DVDs and wireless standards.

OCR and ICR Considerations Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) are very similar. OCR usually refers to the automated capture of machine-printed information and ICR to the capture of (human) hand-printed data.

In many applications, color scanning dramatically improves the accuracy of ICR and OCR, substantially decreasing the labor required for manual processing and allowing "lights-out processing" for many types of forms-based transaction.

This is best illustrated with an example such as the US standard health claim form, the HCFA 1500. The same concepts can be applied to virtually all form-based data capture.

Black and white operation: With black and white scanning, there is usually a requirement to perform "forms removal" prior to OCR/ ICR. This "drops-out" the background leaving only variable information on the form. This is required to prevent the OCR/ICR engine from trying to read the pre-printed form background instead of the information written on the form, say "John Smith."

There are a few different ways of "removing" this background for black-and-white scanners. In the case of the HCFA form, the background is printed entirely in red, so the drop-out can be accomplished by simply placing a red filter in the optical path. Incidentally, this can be done because the HCFA form has only a single background color (light red). For forms with multiple colors, it is not so simple.

So black-and-white scanners used for OCR/ICR of HCFA forms often have a red filter in front of the digital camera component of the scanner, the CCD arrays. This ultimately makes everything fainter, depending upon how much red is in the area being examined, and the red background disappears entirely. But any black writing also becomes fainter. The scanner then immediately converts every part of the image to black or white dots.

Color operation: With full-color scanning, the forms removal occurs very differently. Rather than having a red filter which affects all parts of the image, the system finds those areas which are red and makes them white. Areas which are not red are made black. This is possible because the system has access to the full color of the original and can selectively process different areas.

In software, any colors can be enhanced or suppressed, as appropriate to the nature of the form and the data to be interpreted. Color substitution and mapping can occur on a pixel-by-pixel basis, rather than indiscriminately over the whole form (both background and data). The net result is far clearer characters, allowing faster and more accurate data capture and OCR/ICR recognition.

Forms recognition and automated sorting: It is very difficult—usually impossible—to electronically sort form types with black-and-white scanning. But it is very practical with color. This ability is an issue if several different form types are received in a mailroom, as is very common in many processing environments.

Consider, for example, two red forms: the HCFA and the standard Dental Claim Form. With black-and-white scanning, the entire red form background is lost, including the pre-printed words "Dental Claim Form" and "Health Insurance Claim Form." With only the variable information remaining, automated determination of the form type is virtually impossible. These must be manually sorted. Further, facilities must exist for exception handling (what happens if a single form or entire batch is scanned in error?). In color, of course, it is simple: the full claim background is available to the system, and it can, for example, simply OCR the pre-printed red form name.

This is leading to a dramatic change in the design of OCR-readable forms. Different colors are a major problem for black-and-white systems, so forms designers reduce all form backgrounds to common pastel shades, as has occurred with health claim forms. With color imaging, the color of the form provides valuable information on form type, so brightly colored forms are in many respects more easily processed.

Thumbnails: The use of color for forms identification was common prior to the introduction of black-and-white imaging. If, for example, all applications are green and all claims are blue, these can be quickly found in a paper folder file. An analogous facility exists in many imaging systems: the concept of the "thumbnail." This is a small postage stamp-sized representation of every document in a file. The user can click on a thumbnail and retrieve the full resolution image. This concept has seldom been of much use in black-and-white scanning. However, with color scanning thumbnails are very practical: they allow documents to be easily recognized by type.

WYSIWYS

WYSIWYS (What You Scan Is What You See) is as important a development in imaging as WYSIWYG (What You See Is What You Get) was for word processing. The two are closely related; virtually all word processors now implement WYSIWYG, allowing the on-screen display to exactly match the printed output. Similarly, color scanning allows the on-screen display of images to exactly match the (scanned) input.

WYSIWYS obviously has enormous implications for the usability of imaging systems, just as WYSIWYG did for word processing. With full-color scanning, a 24-bit color image of the original document is stored, so faint handwritten notes are not lost, highlighting is preserved (in some black-and-white systems this information is turned to black. The most important information can be lost entirely), the different colored pens used to create a document can be easily identified, and so on. It is a true representation of the original document, with nothing added or removed.

Scanning and backfile conversion: Scanning can usually be accomplished far faster in color than in black and white. This is because color scanners do not require frequent adjustments to the contrast and threshold to ensure good quality images. A typical mailroom receives a wide variety of standard forms and documents that need to be scanned. Sorting these into batches of similar documents is labor-intensive, both for the physical sort and the subsequent increase in indexing effort (if a mail item is separated into two documents in two different batches, the quantity of indexing doubles). If, however, the mail items are scanned together, black-and-white scanners may require frequent changes to the contrast and threshold settings to ensure acceptable image quality.

Color eliminates this problem. Mixed document sets, with widely varying colors, darkness, contrast, etc. can be scanned together in a batch without individual adjustments for each document. This substantially increases effective scan rates, and accordingly decreases the operational cost of scanning and the amount of manual indexing.

While this is important for normal "day-to-day" scanning, it is critical in backfile conversions, where a large number of documents—often of varying ages and types—need to be scanned. Color can make an enormous difference to the throughput of the scanning operation, while preserving all details of the (often very poor quality) historical documents. User acceptance: Imaging system users accept color images far more readily than they do black-and-white images. They are more "readable" than black and white, and a truer representation of the original.

File Sizes and Readability

One of the persistent myths of the imaging industry is that color images are many times larger than monochrome images.

Most peoples' experiences with color images are continuous tone, or photographs. At high resolutions, these images require a large amount of storage. They are also large images in black and white if rendered as Group 4 TIFF images.

The situation for forms-based scanning is very different. Just as with black-and-white scanning, the compression algorithms in JPEG are far more efficient on forms- based information than photographs.

Even allowing for this, a 200 dpi color image (JPEG) is usually considerably larger than a 200 dpi monochrome (Group 4) image. But this is a misleading comparison. A 200 dpi color image far exceeds the readability of a 200 dpi monochrome document. To achieve the same readability/legibility as a 200 dpi monochrome document, the color document may only need to be scanned at 80 dpi or less.

In other words, at the same scan resolution, the color image is larger than the monochrome image. But this is because more information about the image is stored, and this is information the human eye uses very effectively. Scan resolutions in color systems can often be far lower than in monochrome systems at the same legibility. Accordingly, the color file sizes may actually be lower.

This all depends very much on the nature of the source documents. A black-and-white document compresses very effectively in TIFF Group 4. Color documents compress poorly in Group 4, and for these documents the file sizes at similar levels of legibility can be less than for Group 4.


As Vice President and General Manager of Vignette's Insurance and Healthcare Division, Bruce Milne is responsible for the spectrum of Vignette's market activities and solutions in these industries. Milne is a seven-year veteran of Vignette; prior to joining Vignette, Milne served as Marketing Director at PC Docs/Fulcrum, where he was responsible for Web marketing, industry solutions and competitive strategy.

Vignette helps organizations increase productivity, reduce costs, improve user experiences and manage risk. Vignette's solutions incorporate portal, integration, enterprise content management and collaboration capabilities that can rapidly deliver unique advantages through an open, scalable and adaptable architecture that integrates with legacy systems.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues