The E-Discovery Field Guide


The E-Discovery Field Guide

Adam R. Prescott, Jack Woodcock

Finding What’s Out There: Using Data Mapping to Meet E-Discovery Obligations


We often talk about information governance as the first step in the E-Discovery process. That is for good reason: information governance sets the stage for meeting preservation and other discovery obligations that are often triggered well before formal litigation papers are filed. Absent a preexisting relationship with a client, litigators (especially on the defense side) often are brought into a matter after the critical events in the case have occurred. As a result, by the time new counsel is retained, it is likely that most (if not all) of the relevant and discoverable documents have been created by the parties within the then-existing information governance system (or lack thereof). In that situation, counsel does not have the luxury of establishing best-practices for E-Discovery; rather, before moving onto the next phases of E-Discovery—e.g., preservation, collection, review, and production—counsel first must quickly conduct a retroactive assessment of the client’s data practices to determine what types of documents exists within a client’s custody and control, where the documents are stored, and what must be done to preserve and collect the documents. Creating a data map is one key component for performing that assessment.

What Is Data Mapping?

Because data mapping involves a technical assessment of information technology infrastructure, we recently spoke to two individuals with expertise on the technical side of E-Discovery: Dana Conneally and Jamie Kerr from Evidox Corp. (which is now part of Xact Data Discovery). According to Dana and Jamie, data mapping is an inventory of an organization’s entire data system. In addition to identifying what data exists, the data mapping process seeks to identify where data is stored, who maintains the data and controls access to it, and what data is retained, and when data is destroyed. A good data map is key to understanding an organization’s preservation obligations because until all potential sources of relevant data are identified, it is impossible to know exactly what data must be preserved and what data may already be unrecoverable. Indeed, Dana and Jamie noted that the initial data mapping process invariably identifies network locations for clients with unknown repositories of data, which raises serious preservation concerns and can lead to spoliation of data and possible sanctions.

How is a Data Map Created?

According to Dana and Jamie, data mapping can involve many distinct aspects that, when combined, provide a comprehensive picture of the organization’s data structure:

1. Data Sources and Type: Cast a wide net to pinpoint specific data sources and devices that contain information relevant to the subject matter of the investigation or lawsuit. Those sources might include laptop or desktop computers; internal servers or network drives; cloud storage; hard drives and backup storage; and mobile devices (including personally owned devices). The specific types of data that may be identified include hard-copy documents; e-mails; electronic files; text messages; and social media or other electronic messaging.

2. Department/Custodian: Determine the custodian for data on a person by person or department by department basis. This is critical to understanding who uses the data and who is responsible for its maintenance.

3. Classification: Data can exist in many different states, including online, offline, and inaccessible data. Some data, such as email, can exist in multiple states at once. Data mapping will involve classifying the various states of data, which also helps to determine whether it is discoverable.

4. Retention and archiving: Related to classification, data mapping must determine the retention policy for each type of data, if any. This aspect of data mapping is critical for avoiding spoliation, including if any automated deletion programs exist and when data was last saved, backed up, or destroyed.

5. Litigation Holds: For any litigation hold (i.e., a suspension of any type of deletion activity) to succeed, the organization must determine which data is being preserved and then closely monitor to ensure that the preservation remains in place. Successful data mapping will identify when data is under a document hold and what steps are being taken to effectuate that hold, including periodic reviews.

Data Mapping and the Federal Rules of Civil Procedure

Although a technical process in many ways, the data mapping concept also is implicit in the Federal Rules of Civil Procedure. For example, Rule 26(a)(1)(A)(ii) requires near the beginning of the case that parties provide a “description by category and location…of all documents, electronically stored information, and tangible things that the disclosing party…may use to support its claims or defenses….” Without understanding the data infrastructure, it would be impossible to respond completely. Likewise, Rule 34(b)(2) requires that objections to document requests state with specificity what documents are being withheld and why; vague, boilerplate objections no longer are acceptable. As a result, the failure to understand a client’s data infrastructure will limit the ability to make specific objections, while also exposing counsel to arguments that objections were waived if counsel failed to disclose categories of documents not being produced.


Data mapping does not replace good information governance practices, but it does provide a process to retroactively understand data infrastructure in a way that information governance otherwise would include. Indeed, data mapping can serve as a foundation for a larger information governance program, helping to ensure that, at least for the next dispute, electronic information is in better order.  At the same time, just as information governance is the foundation for efficient and effective E-Discovery, so too does data mapping promote a better E-Discovery process moving forward, while also avoiding costly mistakes when important data is not identified early on.