The secret of the cloud: Remote collaboration, elasticity, and the e-discovery paradigm
Data integrity is another data governance aspect (related to data quality) that is indispensable for e-discovery or any mission-critical analytics jobs. Moreover, concerns about data integrity become heightened when migrating data to the cloud or between cloud providers. All the cost savings for the expedience of these centralized, collaborative cloud platforms would be nullified, however, if it were not possible to prove that data moved from platform A to platform B was the same data, Jack reasoned. If it could not be proven, then you would have “absolutely nothing,” he said. Consequently, most e-discovery options contain explicit measures for a chain of custody demonstrating the immutability of data transferred from source to cloud target systems.
Techniques such as hashing or fingerprinting involve “kind of cryptographic algorithms that effectively show you a document hasn’t changed,” Shankar explained. There is also extensive security and permissioning around access, which is typically read-only. The parallels between chain of custody and data provenance (which are crucial to most data governance and regulatory compliance efforts) are apparent. The former is a “chain of operations and manipulations that happen on evidence, so weeks or months later people cannot claim that the object was not there or it’s false, and they know perfectly how the object was brought into the evidence courtroom,” said Carl D’Halluin, CTO of Datadobi. Data provenance provides this sort of traceability for data leveraged in analytics and applications. Both are fundamental for trusting data, which in e-discovery settings is a matter of “trusting that the evidence is correct,” Jack said.
The trust implicit to e-discovery’s data integrity mirrors the trust necessary to rely on data for any repeatable process across the enterprise—which is why data governance exists. Viewed from this perspective, e-discovery functions as a means of actuating governance protocols to validate such reliance for subject access requests, regulatory compliance, and costly litigation. All of those pieces, from bringing data in, to making use of analytics, to reviewing, categorizing, and producing data, are part of one contiguous process, Carns said. Ever since the COVID-19 lockdowns and the move to working from home, there has also been a heavy reliance on those technologies being web-based or being in a SaaS-based model, he added.
Cloud architecture not only supports the remote collaboration and elasticity required for training or deploying machine learning models at scale, but also does so with the practical oversight necessary to continually interact with diverse parties. It enables the enterprise to retain complete control over its data while granting specific users limited access to them with the permission system of the underlying software. According to Camara, this model functions as “one central repository for all your legal data, instead of sending data out all over the world.” Furthermore, that repository and its access are based on the operating cost model of most public clouds, which further increases the value of this architecture.
From microcosm to macrocosm
The core functions of the e-discovery process greatly resemble the functions for using data for most risk mitigation (such as security analytics for cybersecurity) or monetization (BI) opportunities. That they’re now taking place in the cloud is likely an indicator of the direction in which the data landscape as a whole is heading. Despite the specific steps that e-discovery encompasses—legal hold management, forensic collections, processing, TAR, and review—some variation of this process as a whole is used for most analytics deployments.
“What we’ve certainly found in the past few years and my goodness, certainly over the past few months, is e-discovery technology is being used even more broadly for general data discovery. Maybe not as it specifically relates to litigation anymore, but any number of use cases that are important to corporate America,” said Carns.
Subsequently, corporate America should not only take heed of just what those use cases involve, but also to how they’re being architected—and why.