What is Predictive Coding?: Including eDiscovery Applications

Predictive coding allows software to take information entered by people and generalize it to a larger group of documents, making the sorting process less taxing.

"Predictive coding is essentially a learning technology," says Warwick Sharp, a co-founder of Equivio, a company that develops text analysis software and is at the forefront of integrating predictive coding into its e-discovery tools. "What predictive coding is able to do is get input from a human being, who reviews samples of documents and marks them as relevant or not, and then those decisions are input into the predictive coding engine, which is able to generalize those decisions across the entire collection."

Predictive coding in legal e-discovery

Predictive coding is considered by some to be the next revolution in e-discovery technology for legal applications. But how soon, how widely, and with what restrictions it will ultimately come to be used remain open questions. Any new technology that challenges the old order can initially be met with resistance, but in the field of litigation, where millions of dollars and the letter of the law may be at stake, the process of adopting a new technology is even more cautious and deliberate. While predictive coding offers the promise of quicker, cheaper e-discovery, it has yet to overcome a few obstacles that stand in its way.

Companies such as Equivio and Recommind, Inc., which has an e-discovery product called Axcelerate that now provides support for predictive coding, are firmly of the belief that the technology is the new industry standard and that the process of e-discovery will never be the same. With its enormous time and cost savings, which are sometimes estimated as high as 90%-predictive coding truly seems to have the potential to revolutionize certain kinds of litigation.

Court rules on use of predictive coding technology

Yet there remains one significant constituency that has not yet come fully on board to predictive coding's potential to hasten and cheapen the e-discovery process-the courts.

In April 2012, a state judge in Virginia issued the first state court ruling allowing the use of predictive coding in e-discovery in the case Global Aerospace, Inc., et al. v. Landow Aviation, LP, et al. The Global Aerospace case pertained to an accident that occurred during a winter storm in 2010, in which several hangars collapsed at the Dulles Jet Center. The lawyers in the case representing the defendants petitioned the judge to allow the use of predictive coding to execute a first-pass review of about 2 million documents. The case was the first instance in which one side sought to gain permission from a judge to proceed with a predictive coding process prior to coming to an agreement on such a process with the opposing side.

In an exhaustive 156-page memorandum, which included dozens of pages of legal analysis, the defendants made their case for the reliability, cost-effectiveness, and legal merits of predictive coding. At the core of the memo was the argument that predictive coding "is capable of locating upwards of seventy-five percent of the potentially relevant documents and can be effectively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching."

In a memo opposing the use of predictive coding, lawyers for the plaintiffs argued that the use of predictive coding was a "radical departure from the standard practice of human review." The judge in the case, Judge James H. Chamblin of the Virginia Circuit Court, issued a two-page court order that sided with the defense. "Defendants shall be allowed to proceed with the use of predictive coding," he wrote matter-of-factly, "for purposes of the processing and production of electronically stored information." And with those straightforward words, predictive coding had won its most decisive legal victory, taking one step toward wider adoption in the e-discovery process.

The lead lawyer for the defense in the Global Aerospace case who filed the exhaustive motion, Thomas Gricks III of the Pittsburgh law firm Schnader Harrison Segal & Lewis, LLP, told the Pittsburgh Post-Gazette after the ruling that predictive coding would allow more focus "on what's truly important, which is litigation itself and the need to spend money to truly litigate a case." Looking down the road, he predicted that with wider adoption of the more efficient and cost-effective process of predictive coding, "We'll see a shift away from spending all that money on e-discovery costs."

Future adoption of predictive coding

As with the adoption of any new knowledge management technology, especially in a field as sensitive to protocol and tradition as the law, the initial forays into the use of predictive coding have been met with skepticism and reluctance. Writing in Forbes magazine in August last year, Matthew Nelson, the e-discovery counsel at Symantec Corp. and the author of the book Predictive Coding for Dummies, stated that he believes that this initial reluctance to predictive coding may be the result of the complexity and unfamiliarity of the tools.

A study by the RAND Corp. released in April 2012-in the midst of three cases in which the use of predictive coding was at issue-addressed predictive coding's potential as a time- and cost-saving tool, but it noted that in order for it to be effectively and widely implemented, the murky legal questions surrounding it would need to be cleared up.Co-authors of the study "Where the Money Goes: Understanding Litigant Expen-ditures for Producing Electronic Discovery," Nicholas M. Pace and Laura Zakaras, included 57 case studies from eight large corporations, reviewed the literature on electronic discovery, estimated the costs of complying with discovery requests, and examined the challenges of preserving electronic information. Their core conclusion: Predictive coding has the potential to lower the cost of unwieldy e-discovery processes by reducing the number of documents requiring human review.

"Typically, in the review process, you're talking about someone, usually an outside attorney, having to sift through documents to find the ones that are relevant and responsive to the case, and eliminating the ones that are privileged," said Pace, a senior social scientist at RAND. "If it's just a few boxes of documents, the costs of conducting such a review are likely to be modest. But when the volume increases to tens of thousands or even millions of e-mails and other electronic documents, the labor costs associated with an eyes-on examination can be enormous."

Given the practically self-evident improvements to the time and cost of the e-discovery process that predictive coding would impart, the highest hurdle standing in its way, according to RAND, is the absence of judicial guidance as to how and when it is legal for the technology to be used in e-discovery. What few rulings have been issued are narrow in scope or apply only to the jurisdictions in which they were issued, making potential early adopters wary of the legal thicket into which they might be treading if they attempt to use predictive coding in litigation.

It will take more decisive action on the part of policymakers and guidance from more courts with wider jurisdiction to bring predictive coding into the mainstream of e-discovery practice-and the authors of the RAND study urge just such action.

About the author

Michael J. LoPresti was the assistant editor of ITI's Enterprise Group from 2006 to 2008. He currently lives in San Francisco, where he works in the communications department of a nonprofit arts organization. He is an occasional freelance writer.

This article is based on a full-length feature article in Intranets, a bi-monthly newsletter published by Information Today, Inc. For more information, visit www.infotoday.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register NOW for London's KMWorld Europe 2026 at the early bird rate.
Early bird offer ends 13 March.

What is Predictive Coding?: Including eDiscovery Applications

Predictive coding in legal e-discovery

Court rules on use of predictive coding technology

Future adoption of predictive coding

About the author

2026 State of KM & AI Report

Special Report: Trusted Knowledge Meets GenAI in CX

Taming the Digital Deluge: Rethinking Content Management, Data Governance, and Knowledge-Sharing

More

Beyond Retrieval: Activating Enterprise Knowledge with AI Agents

The Power of Context: Using AI and Knowledge Graphs to Enhance KM

Revolutionizing CX: The Evolving Role of KM & AI

From Silos to Solutions: Unifying Your Company's Knowledge

More Webinars

Register NOW for London's KMWorld Europe 2026 at the early bird rate.Early bird offer ends 13 March.

What is Predictive Coding?: Including eDiscovery Applications

Predictive coding in legal e-discovery

Court rules on use of predictive coding technology

Future adoption of predictive coding

About the author

Register NOW for London's KMWorld Europe 2026 at the early bird rate.
Early bird offer ends 13 March.