-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

The New Axiom of Computer-assisted Review

We're seeing increased acceptance in the legal sphere, too. The amount of data, in gigabytes, we've seen running through our text analytics engine has nearly tripled in the last year. That means more review teams are putting more data through the engine, trusting the technology to help cut review costs and time.

In a recent white paper, Content Analyst—developer of the analytics technology that drives our computer-assisted review workflow—performed a study on latent semantic indexing (LSI) technology. The paper highlights differentiators of LSI compared to other categorization engines and evaluates the efficacy of the technology. Citing several real-world applications of the technology, as well as academic studies, the authors find that LSI is proven to be effective and accurate in a number of use cases.

The Validation

Statistical sampling on a population of items is used in a variety of fields, ranging from use as a defect control in factories to a means of evaluating transactions in general ledger systems. This process, called validation, is used to gauge whether a given process is yielding its expected results.

So what about the validation in a computer-assisted review? When you're told that this 1,000-document sample is a good representative of your entire document universe, what does that mean? And is it true?

From any angle, a document count that reaches from the thousands to millions is intimidating. It's hard to comprehend how many words, concepts, and nuances exist in such a vast scope of data. In a manual workflow, a large team of reviewers is required to work through a massive set of documents one by one. Each individual, subject to his or her own interpretation of the case and its key issues, makes his or her own decision on each document they study. We already know that such subjectivity can mean inconsistency throughout a project. Sometimes those inconsistencies breed bigger problems. Other times they go unnoticed.

In a computer-assisted workflow, a much smaller team of specialized reviewers will manually review a smaller number of documents. These select few folks are domain experts in the case, and they confer openly and often about the goals of their review and the weight of the issues at hand. If they are inconsistent in training the system, the discrepancies will appear very clearly in overturn reports, which can be interpreted to identify disagreements between the computer's categorization of a document and human reviewers' decisions.

When it comes to validation, computer-assisted review gives the review team an added benefit: a built-in quality control system.

In a computer-assisted workflow, there are two types of review rounds: the training round, where the experts manually review a subset of documents and select appropriate content on which the system can be trained, and the validation round, where experts are given a randomly selected group of documents to evaluate the computer's decisions. In the first round, reviewers tell the system how to categorize; in the second, they tell the system if it has come to the right conclusions. Following the validation round, case teams can check something we call an overturn report to see how often the reviewers and the computer have disagreed. As the project continues, the training and validation rounds occur in pairs until the team is satisfied with the statistical results they're seeing across the board.

We've seen that, often, consistently high overturn rates result from inconsistences in the human decisions on which the computer is trained. In one case, for example, a discouraging overturn rate prompted a case manager to keep her firm's partners confined to the same conference room over several weekends. There, the attorneys discussed the facts, goals and relevant issues in the case in extreme detail. Emerging from those conversations, they went on to give a computer-assisted review another try. The result was a much lower overturn rate, more consistent results and a successful review.

So the benefit of a built-in quality control process is clear, but what about the accuracy?

Recently, Dr. David Grossman, associate director of the Georgetown Information Retrieval Laboratory and adjunct professor at Chicago's IIT, performed a study on the accuracy of the statistical validation process in our computer-assisted review. He found that the process is mathematically sound.

In his study, Dr. Grossman evaluated a fully reviewed population's occurrence of responsive and non-responsive documents compared to a statistically sampled set within the population. The graphic illustrates the precision of the sampling methodology. In five repeated rounds of statistical sampling, the difference between the samples and the overall population varied an average of just 0.45%—meaning the sample very accurately represented what was in the overall document universe.

Statistical sampling, therefore, is an accurate and widely accepted method of validating large swaths of data. When a computer-assisted review is deliberately implemented and validated correctly, you can be confident in the end results you're seeing in your reports and productions.

With growing data volumes and increasingly tech-savvy attorneys and clients, computer-assisted review has a strong future in the e-discovery world. It's key for professionals to understand the technology and execute the workflow in a way that ensures validation. Close, comprehensive attention to the experts, engine, and validation of every case helps instill confidence in the results of a computer-assisted review—and makes the problem of big data a little easier to handle.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues