Lexalytics announces breakthrough OCR error correction tool
Lexalytics has announced a patent-pending error correction tool for text data from optical character recognition (OCR) systems. Built in partnership with a leading RPA vendor and leveraging Lexalytics’ natural language processing (NLP) platform and proprietary machine learning tools, the OCR error correction system can automatically detect and rectify common mistakes made by OCRs, driving word error rates to less than 1%.This improves the reliability and utility of analysis performed on OCR data down the line and lowers non-compliance risk for the firms that use these tools.
Business-critical information is contained in images of physical text, such as scanned paper documents or smartphone snapshots of invoices, contracts, newsprint, applications, bills and loans, among other materials. OCR software converts these images of text into electronic text, making it available for computers to “read” for all of the processing tasks that modern enterprises conduct. The problem, says Lexalytics, is that OCR software often misrecognizes characters and words, which can lead to costly downstream application problems requiring time-intensive, manual correction.
Lexalytics’ OCR error correction solution combines pixel position analysis for character errors, along with specialized dictionaries built into Lexalytics’ Salience text analytics engine to choose the most likely correction. The next stage of development will add contextual language models and machine learning techniques to further improve accuracy.
“While OCR is a rapidly growing market, driven by demand in the banking, insurance and financial services sectors, word-level accuracy errors create major problems for end users and represent a major challenge,” said Jeff Catlin, CEO of Lexalytics. “We’re excited to bring a fresh approach to the problem and proud to achieve such great accuracy numbers in the tests we’ve run.”
The OCR correction module is available as an add-on component to the Lexalytics’ core NLP Salience engine. For more information, go to http://success.lexalytics.com/ocr.