The world is abuzz with the possibilities of generative pre-trained transformers (GPT) and generative artificial intelligence (AI). Since translation is one of the oldest problems we've been trying to solve with AI, it's no surprise that people have been using language learning models (LLMs) to perform translations and found them to be surprisingly good. Let’s take a deeper dive into the topic: are LLMs ready to replace our “dinosaur” neural machine translation (NMT) models?
How Does GPT Perform with Translations?
In a preliminary study from Tencent AI Lab using the common industry metric BLEU ("Is ChatGPT A Good Translator?"), and in our experience running human evaluations, GPT is keeping up with existing machine translation (MT) models and engines. It’s producing translations that are very close in quality for high-resource languages, like German and Chinese.
However, this doesn't hold true for lower-resource languages (e.g., Romanian), where there is a sharp drop-off in performance from neural MT to GPT. This shouldn't come as a surprise given that the data sets used to train LLMs are collected from Internet content that's overwhelmingly published in English and a few other languages. One cannot learn a language without “practicing it.”
Some Challenges with GPT
GPT models also seem to have similar problems to NMT when translating from one low-resource language into another. Due to its multilingual nature, one would expect GPT to perform better in this scenario, where multilingual models typically “spread” the knowledge from high-resource languages, helping improve performance on lower-resource ones. In the case of GPT, this doesn’t happen—at least not directly.
For example, if you want to translate from Romanian to Chinese, instructing GPT to translate first from Romanian into English and then into Chinese returns significantly better results than a direct translation request from Romanian to Chinese.
Another challenge with GPT is the fact that it is a non-deterministic system. The same exact prompt for translation may and regularly will return different results. This makes it hard to evaluate and track reliable performance data over time.
Lastly, it should be noted that the performance of LLMs is uneven across subject matters and content types. On one hand, it fares poorly with very technical text and specialized translations, but on the other hand, it produces much better translations for spoken content. This shows great promise for media translations, especially with GPT's ability to maintain context and take complex cues from the prompts provided. For example, think of providing the screenplay for a scene, and setting the tone of the conversation and the state of mind of the characters for the translation.
The Future of GPT
So are we ready to retire our existing machine translation and jump in with two feet into the wonderful world of generative AI? Almost—but not quite!
Some subject matter and languages make sense today to use LLMs over traditional MT engines, but it's impossible to generalize that approach. There are practical challenges to implementing GPT in workflows, such as lack of outcome predictability, lack of confidentiality (most solutions are cloud-based with little guarantee that your content may not end up in the model someday), and first and foremost, because our trusty NMT engines still perform better in most instances. Given the current pace of improvement, it's likely only a matter of time before the shift to new-generation, GPT-based translation models makes sense.
But will we still be talking about translation at that point? Or should content simply be generated for multiple markets and reviewed by copy editors? We'll be covering this in another installment of our blog.
Contact us if you want to discuss how generative AI can be used in your organization.