Computer-Assisted Document Review in e-Discovery Can Avert Manual Review, Even if Not 100% Certain: “Predictive Coding” Protocol Passes Muster

March 21, 2012 by

When can parties rely upon nearly fully-automated “Big Data” procedures under U.S. federal e-discovery rules?  In a landmark class-action decision under the Equal Pay Act and Fair Labor Standards Act for gender discrimination and pregnancy discrimination, U.S. federal Magistrate Judge Andrew J. Peck ruled on February 24, 2012, that “predictive coding” can be legally sufficient even though it fails to meet a 100% confidence level.  The decision is welcome news for business defendants accused of mass torts or repeated wrongful practices under class action litigation procedures.  It is also welcome news for software developers that market computer-driven “relevancy” analysis, but bad news for FTE-based legal process outsourcing vendors hoping for large labor-intensive discovery requirements.

In short, this precedent adopts labor-saving software technology in lieu of manual review of Big Data in large litigations.

The Importance of E-Discovery in Class Actions, Collective Actions and Mass Torts. In Da Silva Moore v. Publicis Groupe & MSL Group, ___ F3d ___, (S.D.N.Y. Feb. 24, 2012), the defendants were accused of discriminating against a class of women employees, relating to pay rates, promotions, terminations, demotions and /or job assignments, based on female gender and pregnancy status.  The plaintiffs’ lawyers bundled the claims into a class action in order to maximize damages and settlement potential.  In deciding whether to settle, defendants in such situations fight to escape a judicial determination that the claims are sufficiently similar to meet the requirements of a class action.  Accordingly, the discovery process can be decisive in defeating the plaintiffs’ claim, since an adverse decision on class action requirements can dissuade contingency fee plaintiffs’ lawyers from further investment of their own time and expenses.

As a result, it is essential for defendants in such class action cases, “collective action” claims (for opt-in claims) and any other large-document litigation to minimize the burdens of discovery up to the point of the decision whether the plaintiffs’ claims are sufficiently common to move forward as a class action.

Computer-Assisted E-Discovery: Predictive Coding. Judge Peck quoted his own prior published article to explain “computer-assisted document review.”

By computer-assisted coding, I mean tools (different vendors use different names) that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e., training by) a human reviewer.

Unlike manual review, where the review is done by the most junior staff, computer-assisted coding involves a senior partner (or [small] team) who review and code a “seed set” of documents. The computer identifies properties of those documents that it uses to code other documents. As the senior reviewer continues to code more sample documents, the computer predicts the reviewer’s coding. Or, the computer codes some documents and asks the senior reviewer for feedback.

When the system’s predictions and the reviewer’s coding sufficiently coincide, the system has learned enough to make confident predictions for the remaining documents. Typically, the senior lawyer (or team) needs to review only a few thousand documents to train the computer.  Some systems produce a simple yes/no as to relevance, while others give a relevance score (say, on a 0 to 100 basis) that counsel can use to prioritize review.   For example, a score above 50 may produce 97% of the relevant documents, but constitutes only 20% of the entire document set.

Counsel may decide, after sampling and quality control tests, that documents with a score of below IS are so highly likely to be irrelevant that no further human review is necessary. Counsel can also decide the cost-benefit of manual review of the documents with scores of 15-50.  Slip Op., at p. 4.

An Acceptable Predictive Coding Protocol. After rejecting several objections by plaintiffs’ counsel concerning reliability of computer-assisted coding, Judge Peck ruled that it would be acceptable to adopt a predictive coding protocol with the following procedures:

  • The court (or an agreement of the parties) determines which custodians (and other sources) of electronically stored information (“ESI”) are to be the subjects of the discovery process (e.g., the directly involved employees or all persons potentially suspected of being wrongdoers).
  • The parties agree to identify key words and other tags (which could include metatags) for searching, as well as the Boolean logic (“and”,” “and/or”, “and X but not Y”, etc).
  • The court (or an agreement of the parties) determines the percentage of confidence required before the software that conducts the electronic search of ESI can be determined to be legally adequate.  (In this case, Judge Peck ruled that a 95% level of confidence was legally sufficient).
  • The parties (under judicial review) create a “seed set” of documents to train the software for coding by degree of relevancy.  In this case, the initial seed set was approximately 2,400 documents.
  • The parties (or the Court) decide(s) upon the sources of ESI, including e-mails, Cloud-based software such as SalesForce.com, databases and other e-repositories.
  • The defendants agreed to turn over all documents that the computerized predictive coding showed was adequate, subject to exclusion under attorney-client privilege.
  • The parties bring their e-discovery software consultants to court to argue the questions of reliability and technical “levels of confidence” in their software products and the ability to train software by iteratively tweaking the search parameters and Boolean logic.
  • The defendant proposed, and the court agreed, to seven rounds of iterations of computer-assisted predictive coding, followed by human confirmation and review of the results and training of the software to improve the level of confidence.

    After the seventh round, to determine if the computer is well trained and stable, MSL would review a random sample (of 2,399 documents) from the discards (i.e., documents coded as non-relevant) to make sure the documents determined by the software to not be relevant do not, in fact, contain highly-relevant documents. (2/8/12 Conf. Tr. at 74-75.) For each of the seven rounds and the final quality-check random sample, MSL agreed that it would show plaintiffs all the documents it looked at including those deemed not relevant (except for privileged documents). (2/8/12 Conf. Tr.at 76.).  Slip Op., at p. 11.

The Legal Logic of Computer Logic.  In evaluating the use of computer software instead of human reviewers, Judge Peck cited academic conclusions that computer-assisted discovery could be more effective than manual discovery.

The objective of review in e-Discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible. Recall is the fraction of relevant documents identified during a review; precision is the fraction of identified documents that are relevant. Thus, recall is a measure of completeness, while precision is a measure of accuracy or correctness. The goal is for the review method to result in higher recall and higher precision than another review method, at a cost proportionate to the “value” of the case. See, e.g.. Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Rich. J.L.& Tech., Spring 2011, at 8-9, available at http://jolt.richmond.edu/vl7i3/articlel l.pdf. Slip Op., at p.17.

The Da Silva Moore, et.al. v. Publicis Groupe & MSL Group decision appears to be the first to explore and justify the use of computer-based protocols for relevancy search in e-discovery.  It suggests that manual review will become outdated and obsolete if sufficient “level of confidence” can be achieved using iterations of corrective instructions.

This decision issues a clarion call to use technology to unclog the problems that technology created in e-Discovery.  And, in legal process outsourcing, this decision will rebalance the equilibrium of pre-trial discovery burdens.  LPO will now move towards more senior level analysis such as for legal privilege and other objections to disclosure.  The semi automation of e-Discovery should also benefit both adversaries.  Costs saved from wasted manual verification of relevancy of a large volume of potentially relevant documents can be allocated to settlement payments if liability seems clear enough.  This decision also serves as a landmark for legal process outsourcing.  It defines a workflow for litigation to minimize labor and optimize the training of intelligent computer tools. It retains human control and responsibility but benefits from information technology tools.