This article was originally published by ALM in Law.com.
As the head of TransPerfect Legal Solutions’ multilanguage e-discovery practice, I oversee hundreds of non-English matters and migrate dozens more for remediation. Search terms in non-English matters are the number one area we see opposing counsel/vendors missing the mark, often by wide margins. Why is that?
Let’s start with what mistranslated terms are and how they end up “wrong”. Poorly translated terms go wrong in two opposite directions: firstly by under-identifying the number of intended documents and secondly (no less significantly) by promoting large numbers of unrelated documents into the review pool with all the attendant review costs.
There are four common ways translated terms miss their mark.
1. Uncontextualized Terms
Search terms are a list of words with all context stripped away. Yet, case teams draft each term with very specific contexts in mind. Translators typically have no knowledge about the underlying case, case issues, or perhaps even the industry. That creates tremendous margin for translated terms missing their mark. By way of example, let’s look at the word: “close”.
One can “close” a deal, be physically “close”, be intimately “close”, “close” a door, and in the UK, also live on a “Close”. In the vast majority of languages, each ‘close’ (noun/verb/adjective forms) will be wholly different words: (Spanish: Cerca/Íntimo/Similar/Cerrar). Not contextualizing means an Antitrust team looking for closed factories feeding price rises could end up missing it and reviewing emails about extramarital dalliances instead.
Omitting word-by-word analysis and not defining specific contexts set searches off in the wrong direction and it doesn’t take this happening many times before a material problem with the term performance emerges.
2. Natural Language Expression
Another way terms miss their target is not reflecting natural language expression, which is how people actually use language when communicating. Keeping with the “close” (cerrar) example – cerrar captures the right concept, but is the wrong translation because humans do not use unconjugated verbs in writing. The conjugated forms are really what’s required. Yet further complicating the cerrar example is the fact cerrar is an irregular verb. There are 30 separate conjugated forms of this verb.
3. Syntax/Operator Usage
How terms behave is almost completely governed by search operators so they are an essential focus of any translation exercise. E-discovery professionals spend years learning correct search operator usage. Linguists are rarely trained in the art of using search operators and will get this element wrong in nearly all instances, making it essential that a syntax expert sits with the linguist analysing word-by-word building in the syntax.
Another consideration with cerrar, and this is where language intersects syntax, is that the term is short which complicates wildcard usage due to risk of false hits. As a best practice when translating terms, with rare exception, we will not use wild cards on words fewer than five characters in length. The risk is just too high for thousands of unrelated documents hitting.
See the below search syntax for a more accurate “close” translation: For many modern eDiscovery review applications: (ciero OR cierr* OR cerra* OR cerre OR cerro). For many modern eDiscovery processing engines: (ciero OR cierr* OR cerrá* OR cerra* OR cerre OR cerré OR cerro OR cerró).
Why the difference?
Certain applications flatten diacritics (accented characters) such that “cerre” will return both: “cerre” & “cerré”. Other applications index “cerré” and “cerre “ as different words so both are required for an accurate result.
4. Search Strategy
There are many strategy options when translating terms so it’s crucial the wider case strategy is understood and reflected in translated terms. Let’s talk briefly about two scenarios:
In some instances, the right strategy is often doing exactly what’s asked and nothing more. The output in these matters is single-term-best-match when translating. That is, no synonyms, the just word that most closely aligns.
In other circumstances, such as investigatory work, the right strategy may broaden out terms by including synonyms for closely aligned concepts. (Term = Rain Translation = Rain/Drizzle/Thunderstorm).
A case we migrated earlier this year highlights the convergence of these four points. The migrated review pool contained 105,000 documents drawn from linguist translated terms. After migration, we revisited the terms: contextualizing, retranslating, and properly formatting them with a linguist-PM duo. The revised terms hit 65,000 docs.
On the surface, 40,000 documents shaved off, but digging deeper there was virtually no overlap between the sets. 101,000 documents had no business in the review pool, while nearly all the material that did belong (61,000) was absent!
It is clear how seemingly minor details make huge impacts in results. It is important to remember keywords are not like an email containing a mistranslated word where the email’s overall meaning remains intact. Keywords are binary like passwords – a single incorrect character and the term is suddenly void.