-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Advanced Text Analytics Improving Litigation Preparation and Legal KM An Entrieva Success Story

"Once the legal profession gains experience with text mining and auto-categorization, this technology will empower lawyers to focus human resources more quickly and efficiently on the highest value documents, allowing them to leverage and organize textually unstructured information automatically for its highest, best and most effective use."

—Tom LewisPresident & CEO Entrieva

The Company and the Challenge

The Pension Benefit Guaranty Corporation (PBGC) protects the retirement incomes of nearly 44.3 million American workers in more than 31,000 private defined benefit pension plans. As a result, PBGC's Office of the General Counsel (OGC) is responsible for all legal matters relating to the pension insurance program.

OGC represents the PBGC in several hundred litigation and bankruptcy cases, and some require discovery and fact-specific case preparation involving documents numbering in the thousands or hundreds of thousands of pages. Discovery documents typically are produced in no particular order, and the identification of relevant documents is a tedious, time-consuming, inefficient, random-walk process. The first document in the first box of discovery documents is no more likely to be relevant than the last document.

Without technology support, OGC staff must manually review an entire set of documents before it knows which documents contain relevant information. Time-consuming document coding is either undertaken at the time of initial review, or during a second pass through a smaller, culled set of possibly relevant documents. Prior experience has shown that generally only 20% to 30% of the documents in a set are relevant to the litigation and that relevancy is a moving target as case contours can change AFTER document coding.

Thus, the manual "discovery" process results in a significant amount of time being expended on documents that are not even relevant. To optimize its limited staff resources, OGC set out to locate technology that could identify documents which are most likely to be relevant to a matter and rank the documents for the staff prior to review. Thus, the critical element of the envisioned automated process was to locate and score imaged and OCRed documents according to the occurrence of the names of parties or people, important concepts and key terms in each document. Thereafter, documents would be reviewed and coded from a relevancy-based "pick list," working on the most likely relevant documents first, rather than coding documents randomly.

OGC also has a growing knowledge repository of tens of thousands of documents that it wishes to organize, retrieve and re-use. OGC conceived that it might leverage the technology acquired for discovery to process and organize these equally unstructured documents into a friendly and browseable electronic structure. OGC had experimented with a manual means of "digesting" or "abstracting" the contents of its knowledge documents, and found the process to be time-consuming and the results problematic.

The Solution

OGC is redefining its ability to process unstructured textual documents using Entrieva's Semio Suite, consisting of the Enterprise Knowledge Engineering Workbench ("eKEW"), SemioTagger, SemioDiscovery and Relevance Scoring Matrix Tool. The eKEW tool provides the mechanism for creating multiple taxonomies (i.e. hierarchical listing of categories based on a particular subject matter) of case-specific players' names and terms of interest. Based on the taxonomies of players' names and terms, the Entrieva tools analyze a document set, usually OCRed PDFs, associated with a specific litigation case. OCRing is the most uncertain aspect of this process.

The scoring matrix tool is used to input special weights for intersections of particular players' names and terms. Thereafter, an aggregate score is generated for each document according to its contents and the scoring matrix, as is a Web-based "pick list" of documents listed according to aggregate scores. Thus, the likely most relevant documents in a particular litigation case are presented first for staff analysis and coding, and staff is able to devote such additional resources as they deem appropriate to the remainder of the documents.

Metadata is also created for each of the targeted documents. This metadata can be used in other environments. If new terms or names emerge during case development, the process can be repeated with little difficulty, unlike a manual system.

The Results

OGC derives a dual benefit from using a single set of tools to facilitate two important, but different, business workflow processes: automated discovery for litigation support and automated organization of knowledge documents. The key benefits to OGC will be:

  • Ability to focus professional staff time on documents with "highest value" first, i.e. documents relating to the key people and concepts and enable them to "hit the ground running" during case development;

  • Ability to streamline and speed up the case development process by associating case-specific terms and key players with the documents that relate to them and with each other and to add new terms as they become apparent;

  • Ability to structure unstructured information and to create metadata to facilitate human browsing; and

  • Reduction in time and effort for knowledge management document processing.


Entrieva, Inc. provides enterprise class software that enables any organization to create meaningful business intelligence from unstructured content. Entrieva's technology is in use today at some of the world's largest pharmaceutical, telecommunications and internet media companies as well as major US government and military organizations.For additional information please visit Entrieva

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues