-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

  • October 28, 2015
  • By Greg Council VP of Marketing and Product Management Parascript, Derek LeCroy Account executive in the Canon USA, Inc.
    Image Capture Solutions Division
  • Article

Five Essentials for Overcoming Document Management Challenges

Data is every organization’s most important asset. And yet, managing ever-increasing volumes of documents and their metadata remains a surprisingly complicated and thorny issue. It requires identifying and applying document descriptors, ensuring proper documentation is collected and shared, and incorporating access to boxes of forms and paper often kept in-house or off-site that organizations are required to keep for audit and compliance purposes.

The amount of document-based information has grown significantly over the last two decades and the ability for staff to keep up with the volume, curate metadata—a set of descriptions—and apply those descriptions is all but impossible. The days of the file clerk are well behind us. More recent approaches have attempted to recruit everyday knowledge workers to perform the task of assigning metadata. Organizations then experimented with what we call the “bucket of documents” approach where staff can send documents to the repository and let intelligent search handle the relevance and accuracy problem. This approach used the “infinite metadata” concept where anything on the document could be used to describe the document and make it available in search results. In most cases, this ended up as something marginally better than storing documents in a file cabinet or electronic file share.

Most cataloging and search approaches result in some core problems:

Lack of time and staff.Organizations neither have the time nor the staff to plan for scoping document types and developing the document classes and metadata required for efficient, accurate retrieval.

Error-prone tasks. The ongoing assignment of documents using selected classes and metadata is either an expensive bottleneck or an inconsistent, error-prone task.

Inability to adapt. Once in-place, the system is often too brittle to adapt to an organization’s changing needs and documents.

Inefficient operations. Relevance and accuracy of results is significantly below what is required to operate efficiently.

Addressing the Problem with Automation

In the last decade, a lot of technology has been introduced with the goal of helping us all with our information management problems. Much of it is aimed at automating the manual effort of both the initial design and the ongoing maintenance of document organization tasks. Automated classification that uses advanced machine learning can do a lot to take on the burden of document organization. Natural language processing and other linguistic techniques further enable access to document content for purposes of metadata attribution. Faceted search helps users understand information hierarchies, and interactive search refines results through helping users formulate search terms by feedback (e.g., “did you mean apple the fruit or Apple the company?”).

The net is that there are a bevy of options from which to choose and the task of selecting the right technologies can be as difficult as the problem itself.

Fortunately, there are steps that can be made prior to reviewing technology solutions that put organizations searching for a solution ahead of the game.

Understand the problem scope.
With document-oriented projects, you start with understanding the key business processes and the documents involved on a department-by-department basis.

Perform interviews with key departments to understand the processes they perform. Be sure to rank the processes in order of the criticality to business operations.

Next, take an inventory of key documents used within or used to support the most-critical business processes. This gives you a high-level process map along with the document classes involved and how they are used. During this process it is also helpful to record the key data on each document that is needed.

Inventory best possible terms.
Whether or not you have a document management system, staff tasked with information management can work with stakeholder departments to understand the common terms they use for different information. From there, perform interviews with staff to identify the words they use to describe this information. It may surprise you that two departments refer to the same document with different terms. If there’s software used for searching, talk with the IT department for a list of the commonly-used search terms and document links clicked. Attempt to find the most-used words for each document type, which is the “best possible name” approach.

Augment search capabilities.
Many technologies deal with semantics and it’s possible to simply use a thesaurus to augment the list you generated in your department-level research with synonyms that could be used to find the same information. This provides a solid list of keywords generated within each department that is also mapped to variants that might be used since you can’t interview everyone.

Going through these three steps will arm any organization with a solid understanding of key processes, documents, descriptions, and extensions of those descriptions. These are the foundational elements for moving forward with the selection of automation technology.

Automation Considerations

Many organizations are already planning on improving their document organization through some level of automation with content analytics. The options can be considered based upon the range of document-based information that must be managed.

On the input side, controlled types of documents and well-established processes may only require basic document capture and indexing. If information varies widely and is more difficult to classify (even by subject matter experts), a more advanced document classification capability is in order to handle the variance.

On the retrieval side, a more structured document hierarchy, automated metadata generation, well-defined processes and structured/faceted search combined with tools to help a user select the correct descriptions provide much more robust document retrieval.

Taking the first essential step toward improved management of your document-based information is often the most challenging. Armed with the knowledge of how your organization uses documents and how it references them during a given business process, the technology choices become clearer, as does the path forward.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues