Top considerations for ECM and content services
The staples of content services—the modern incarnation of enterprise content management (ECM)— remain largely unchanged. Metadata management, process automation, information governance, and security concerns are still the foundation of any number of document-based workflows, business processes, and knowledge management activities.
What continues to progress, however, is the way in which this field is empowering organizations to fulfill these vital needs. Applications of cognitive computing, hybrid cloud deployments, business connectors, APIs, and valued productivity intelligence are enabling the enterprise to perform these tasks at scale, cost-effectively, and within the decentralized work paradigm that’s thus far the defining component of the new decade.
“SaaS solutions in the cloud, artificial intelligence, and big data analytics are where content services is headed,” observed Larry Reynolds, vice president of sales, GRM Information Management. “We still see that most Fortune 500 firms and most governmental entities still struggle with a lot of old-school, document-based ECM applications. We’re trying to bring them to AI-enablement, big data-driven content management.”
Ultimately, content services’ contemporary reliance on advanced analytics and cloud connectivity means more than an improved ability to meet its goals—which are increasingly taking the form of process automation. These fresh approaches are also creating a new reality in which developments such as voice, mobile technologies, and predictive analytics are expanding what the field is capable of doing in ways that didn’t seem possible even a few years ago.
In terms of document-based workflows and process automation, Fred Sass, senior director of product marketing for content services at OpenText, defined the four pillars of content intelligence as “capture, manage, search and find, and govern in terms of managing it through its lifecycle and keeping it or disposing of it.” Essential to these modules are capabilities for extracting, classifying, and tagging information to make it available for workflows. Content intelligence relies on various cognitive computing technologies to inform these phases. Its objective is to derive structure from unstructured content so it’s usable in downstream applications. Content intelligence includes these critical facts:
♦ Data capture: Ingesting or capturing content initiates workflows by putting this information into IT systems that understand it. Capture involves “machine learning to improve the accuracy of capturing documents over time so more of it flows through untouched,” Sass said. Capture frequently utilizes optical character recognition (OCR) or intelligent character recognition (ICR) and encompasses paper, emails, web forms, print screens, and more. Healthcare use cases include “HL7, XML, X12, images, or any other format,” Reynolds added.
♦ Extraction: Extraction requires copying or “lifting” specific information from content for purposes such as classification or filing. It frequently uses cognitive computing models to “extract meaningful objects and the meaning behind that from the document,” said Prince Kohli, chief technology officer of Automation Anywhere.
♦ Classification: AI—along with other techniques such as cross-referencing lists and mining for content with regular expressions—enables organizations to automatically classify and tag content. Regulatory concerns may make it difficult for organizations to find large enough datasets on which to train AI models. Credible solutions allow “people to classify data, and they classify implicitly as part of filing,” explained David Robertson, director of sales engineering at M-Files. “It means the machine learning algorithm is effectively self-reinforcing and has a very large training dataset of real data in the system that it can learn from. The algorithms are trained in the data you are storing in the system.”
According to Kurt Rapelje, director of analyst relations and product support at Laserfiche, “Records management is a whole discipline of categorizing content into required records retention rules that are typically established by a state, for example, for government.”
This is a vital downstream application (and natural extension) of content intelligence, whose capabilities hinge on the following considerations:
♦ Supervised and unsupervised learning: Supervised learning—which requires labeled training data—is invaluable for extracting information. This approach enables organizations to readily extract the requisite information from invoices—dates, total amounts, the company receiving payment, etc.—regardless of where they are on the form. In addition, e-discovery use cases and others benefit from unsupervised learning. “Unsupervised learning works best for classifications where there’s a whole set of documents and you’re not really sure what those documents are about,” Rapelje remarked. “You let the subject of those documents emerge based on machine learning that’s looking for commonalities amongst them.”