ECM: preparing for the future
Digital readiness is mandatory for today’s transformed workplace, but it requires a number of steps that can be challenging. Content must be captured and ingested into a controlled environment, organized and tagged with metadata, and placed under proper governance. Only then can the organization be prepared for events such as e-discovery or make the most of content, including legacy content, to gain insights into its business operations and customers’ needs. With the complexity of hybrid environments such as cloud and on-premise repositories and the growing use of big data analytics, having data ready for multiple uses has become increasingly important.
“Unstructured content is an extraordinarily valuable source of data for semantic analysis,” says Adam Howatson, chief marketing officer at OpenText, “and it is firmly rooted in the world of enterprise content management (ECM). Content management, semantic and big data analytics and cognitive computing are coming together, and we are seeing some truly amazing use cases emerging from that confluence.” Due to the volume of information that is typical of many repositories today, automated processing and machine learning are increasingly the best ways to extract useful information from the data available.
The Global Public Health Intelligence Network (GPHIN) for example, gathers information related to incidents that affect health throughout the world, including diseases and natural disasters. Regular reports are produced three times a day and the data from the information that is gathered is made available to users in the World Health Organization and other interested agencies. “About 20,000 documents are collected each day, which makes manual processing impossible,” Howatson says. “OpenText technology categorizes them and detects patterns in the data to help anticipate pandemics and other scenarios that will affect local or global health and impact the need for healthcare resources.”
The U.S. Department of the Interior (DOI) has developed a unified system for managing e-mail, records and documents in order to have an integrated information governance solution for employee records. That Enterprise eArchive System (EES) uses the OpenText Cloud. “Like the GPHIN, the DOI application uses auto-classification, pattern analysis and machine intelligence to organize millions of documents,” Howatson explains. “Through those techniques, OpenText technology can apply policies automatically and make the information useful and retrievable and extract themes from the content.”
Although the technology is readily available to process the data automatically, many companies do not have data scientists or computational linguists on staff, which can hamper the ability to extract deeply relevant information from unstructured data. “OpenText technologies have been developed to help companies extract the maximum value from every piece of content available,” Howatson says. “For those organizations that do not have specialist departments focused on deep data analytics, OpenText can provide the technology, solutions and skills to get them up and running.”
OpenText’s Release 16, which launched at the end of April, includes OpenText Suite 16 (Enterprise Content Management, Business Process Management, Customer Experience Management and Analytics) and OpenText Cloud 16 (Enterprise Content Management, Business Process Management, Customer Experience Management, Analytics and Business Networks). Each product can run on-premises (with the exception of the Business Network) as a subscription in the OpenText Cloud, in a third-party cloud or as a managed service. The philosophy behind the suite is to provide a comprehensive and integrated set of enterprise information management (EIM) tools.
Virtues of models
Data cleanup—one of the most time-consuming aspects of enterprise content management—is a precursor to cognitive computing. “There is an ideal data model for analysis, and if the data is in that structure—meaning tagged and organized—a lot of time can be saved,” says Praful Krishna, CEO of Coseer, a cognitive computing solutions provider. “Unfortunately, this happens rarely, for two reasons. First, cognitive computing deals with data that is by definition unstructured and nebulous. Second, when these systems were designed, very few people realized that the data had potential value beyond its use at the time, so they did not plan for efficient retrieval. Then it is a journey from what the data is today to data that can be consumed by computers.” Coseer is focused on extracting information from large repositories for such tasks as identifying actionable information, automating tedious workflows or providing natural language interactions with customers.
Putting metadata on top of the original raw data is a good first step. “Another step is indexing the data,” Krishna says. “When the data is indexed, it reduces the volume of data that needs to be ingested because the relevant content can be more quickly identified.” Finally, data sometimes needs to be put in a more accessible format. “Plain text or PDF documents are great,” he adds, “but if the content consists of images or a proprietary format, then the process becomes more challenging.”
One of the challenges to making content more useful for advanced analyses is that generic models provided by some cognitive computing solutions are usually not effective. “We develop specific models for each problem,” Krishna says. “The comment that cognitive computing works best in a finite domain is very true. Those who are not happy with their model have often tried to use a generic model to analyze their content when a customized one is necessary.”
Mining enterprise content
Many organizations do not make the most of their existing data, according to Glenn Gibson, director of product marketing at Hyland Software, creator of the OnBase enterprise information platform. “Useful business insights can be gained from the content that is already being managed,” he says. “That has been a promise of ECM all along. As the technology has evolved, the boundary of what is possible has been pushed.”
One example of the way in which OnBase has pushed the technology is to enrich existing repositories with geolocation data. “Any OnBase repository that has an address can now add geolocation coordinates from that information when a user selects the document and can visualize it on a map,” Gibson says. “It can link directly with the ESRI GIS front end to link to maps or can be used as input to reporting the data on a map. This function works by calling a service that looks up the coordinates, and no human intervention is required.”