Modern businesses are caught in a perfect storm. Data privacy laws are proliferating at the state, federal, and international levels, requiring companies to know where they are storing personally identifiable information (PII) and personal health information (PHI) and wrap tight controls around the processing, use, and transfer of such PII and PHI. At the same time, data volumes are exploding—the global market has generated more data in the past two years than in the entirety of prior human history.
Sitting at the intersection of these momentous phenomena are legal processes that require extraordinary care in the identification and handling of PII and PHI on very tight turnaround times: data breach notification workflows, data subject access requests (DSARs), and cross-border e-discovery projects. This article presents four technology-driven techniques to address these challenges head-on and cost-effectively.
Identifying Personal Data
Whether you’re confronted with a DSAR, breach notification obligation, or cross-border e-discovery project, half the battle is identifying where PII or PHI appears in a data set (and what documents do not contain PII or PHI at all and thus do not trigger data privacy concerns). A manual approach is not a viable option at modern data volumes. The following technologies, however, can unlock substantial time and cost savings, depending on the specific nature and circumstances of the underlying project.
Regular expressions (RegEx) capabilities are available in many sophisticated data processing and hosting platforms. RegEx can be used to find specific patterns of letters, numbers, and symbols in your data, such as social security numbers, national health identification numbers, credit card numbers, bank account numbers, and license plate numbers. Results can be over-inclusive or miss specific instances where a typo changes the pattern, however, so a well-planned quality control process should always be considered when designing a RegEx workflow.
Entity extraction tools automatically identify the “entities” in your data set (e.g., people, companies, places) to help prepare an initial list of PII and to better design your RegEx workflow (see above) and advanced search terms (see below). Not all entity extraction tools are created equal, and results will also require quality control. Tools that can identify multiple word phrases and recognize the different ways similar phrases are used (e.g., George Washington could refer to an individual’s name, a university, or a bridge) are leading the charge in this space.
Advanced Search Terms
Regular expressions and entity extraction will not capture all variations of PII and PHI, because not all iterations will fit a consistent alphanumeric pattern or be associated to an entity. For example, date of birth is a difficult category to systematically identify in all instances as date patterns take many forms, so their identification would be both over- and under-inclusive. Carefully constructed and specific search terms can supplement the regular expressions to identify documents likely to have key terminology that indicates the presence of PII or PHI. Refining these terms through multiple iterations of testing may be necessary to optimize the results within a specific data set.
Logging Personal Data
For some privacy-driven workflows—for example, breach notification projects—the ultimate objective is to create a log of all personal data contained in the relevant data set. This can also be an important part of a DSAR response workflow. While it may be tempting to engage data entry specialists to execute this task in a “brute force” manner, as data volumes grow, that option becomes unmanageable. Technology can be used to streamline the process materially by (a) initially identifying the personal data in the data set (see number 1), (b) deduplicating repeat entries of the same individuals who appear in the data set multiple times to create only one notification per affected individual, and (c) link the individuals’ log entries to the records where their data is found.
Redacting PII and PHI
For other privacy-driven workflows, redaction of PII and PHI is essential, such as DSARs and cross-border e-discovery projects. Once again, cutting-edge technologies are available to significantly streamline the process. Automated and native redaction tools, for example, are available for mass application of redactions based on results found through regular expressions, entity extraction, and advanced searches. Most automated redaction tools can intake words, phrases, or regular expressions. Another key feature is being able to apply the redactions natively to Excel sheets, PowerPoint presentations, and other files, rather than going through the cumbersome and difficult-to-read process of converting them to images.
Minimizing Cross-Border Transfers of Personal Data
Finally, one of the greatest challenges to an efficient data privacy workflow is how to execute the work when the data resides in one jurisdiction, but the legal or compliance team (or underlying proceeding) is in another jurisdiction. One of the primary impacts of data privacy statutes is restricting the transfer of personal data abroad (in particular to countries deemed not to have “adequate privacy protection,” such as the US). One key strategy to overcome this challenge is to limit the data set that is being transferred only to documents strictly necessary to comply with US legal obligations. Workflows that can be utilized, in addition to identifying and redacting personal data as described above, are:
Identify possible locations in the US that might contain duplicative data, and process that data first. The hash values that are generated can be used to identify what already exists in the US and thus does not need to be transferred. This hash list can then be transmitted to the country of collection, and the local data set can be deduplicated against it, provided that the same processing tool and settings are leveraged.
In-Country Document Searching and Review
By utilizing processing and hosting infrastructure, project management teams, and document review teams in the country of collection or a country that qualifies as an adequate jurisdiction of protection, one can ensure they have met the obligation to only transfer what is truly necessary to comply with US-based legal proceedings. Identifying a service provider with permanent infrastructure in both the US and the local jurisdiction ensures they will be familiar with local data privacy laws and best practices. If a local provider with permanent infrastructure is not available, deploying a mobile data processing and review server on-site to perform document collections, processing, and hosting, coupled with sourcing local reviewers, can work well, too.
For more information, visit our website here.