Not that long ago, machine translation (MT) occupied a place in the popular conscience the same as a 4k video camera on your phone. Flash forward to the COVID era, where Zoom is now a generic name for any video call that we use daily, machine translation)—or the translation of text from one language into a different language using translation software—has finally hit the mainstream. It is now a key function used on many of the global social media platforms and it is integral in the use of artificial intelligence in our daily lives.

How is Machine Translation Already in Your Online Life?

One of the most common places to run into machine translation is on social media. Facebook has its own machine translation platform that it uses to offer automatic translation to its users. You may have seen the “See Translation” link at the bottom of posts from friends who use another language. LinkedIn offers a similar option as does Twitter with the “Translate Tweet” function. If you are engaged in Internet research and come across a website in another language, all you need to do is paste the web address into translate.google.com and the whole website will appear—translated into your language.

While using machine translation on these platforms, you will notice that the translated text will not always make sense or may use awkward phrasing. This is normal and represents what MT can achieve at this point in its development when dealing with random idiomatic communication that may be full of slang and jargon that may not be known to the machine translation engine.

When it comes to rich media, some captioning may also be auto generated including the translation, but most subtitles on Netflix, for example, are professionally translated by human translators.

What is Machine Translation?

Machine translation is created by computer software that can predict patterns in a target language relative to patterns in the source language. For example, if you run a German newspaper article through a machine translation engine online (such as Google Translate or Microsoft Translator), the translation application will analyze the source text and, based on data that has already been used to train the machine translation engine, it will create a translation by extrapolating on the patterns it already recognizes.

In the earliest versions of machine translation technology, the MT engines used rules to map structures in one language to structures in another. These systems proved to be highly inaccurate and difficult to use, forcing authors to change their writing for the MT engine to have any chance of producing a somewhat readable translation.

By the end of 2010 , statistical machine translation engines represented the cutting edge of MT technology. These systems relied on statistical translation models to generate translations. By 2015, neural machine translation entered the machine translation space, having spent decades being researched. Neural networks were first proposed in the 1940s, but their use as the basis of machine translation was not formally presented in academic research until 2014. In the intervening years, research both academically and commercially has exploded. Neural machine translation has now become the language industry standard for automatic translation.

Learn more: What's the Difference Between Localization & Translation?

How Good is Neural Machine Translation, Really?

When Google announced its first publicly available neural MT system in 2016, it did so with some hype. Google claimed that “Human Parity” had been achieved. Regardless of whether this was objectively true, a jaded language translation industry met the claim with a mix of hope and derision. Anyone in the language service field had heard it all before. Previous systems had been hyped too—only to offer disappointing MT output that led to marginal gains in productivity.

The truth is that statistical machine translation was a critical phase, but for producers of translations, it was fraught with risk. Quite simply, many statistical machine translation engines just could not be trusted. Not until you were well into a large project could you really assess if the engine would make the translation process more productive or not. This, however, was a critical period in the evolution of production models for translation providers—it brought human post-editors to the heart of the process. Without reliable post-editors, machine translation just would not work—all content had to be reviewed and revised by a human translator to create content worthy of publication and to provide high-quality data to train the statistical MT engines.

Neural MT is demonstrably better and much of the early hype has borne out. The DeepL service offers highly trained engines that offer shockingly good results. Google, Microsoft, and other smaller MT providers also continue making gains. But the reality is that AI technology used to build MT is not unlike other AI tech we have seen in the popular media. Autonomous vehicle technology has made impressive strides, but there still needs to be a human at the wheel ready to intervene when the tech fails. The same is true of current-day machine translation. The rule best followed is trust, but verify.

Is Machine Translation Right for Your Company’s Content?

This is the question to be asked. Despite the incredible strides that MT has made in the last five years, there are still specific types of content that the machine may not handle very well. One example is software user interface strings. Such short phrases that stand alone without context within software resource files are difficult for human translators to translate. The MT engine will also struggle. Neural MT engines perform best when processing longer sentences that exist with larger textual context.

One recent example: When using MT to translate the content of a website from English to German, the Google Translate engine translated “About” to “Etwa” (approximately) in German instead of the Web-standard “Über uns” or About us, also commonly used on English websites.

MT engines tend to not perform quite as well on short phrases and sentences. But this can be influenced by training the engines with good data that fit the context well. The advice here is to evaluate the results first and have a qualified linguist who knows the software intimately review the results of the machine translation prior to committing to use MT on your product’s software strings.

This approach should be employed regardless of the type of content you may want to run through an MT engine. In the case of software Help content, there’s a good chance results will be better than just for the strings, but be prepared to have post-editors focus a lot on terminology that needs to appear in the interface. The MT engine will choose whichever terminology was present in its original training data—it will have no way of knowing your firm’s specialized terminology.

A good rule of thumb for using MT to create translations for your company is that highly sensitive content that involves human safety or entails significant physical or economic risk should only be translated with MT if there is a robust post-editing process in place, and the post-editors have subject matter expertise.

Marketing content may be translatable using MT, but to avoid embarrassment and ensure that the content is effective, post-editing is still a requirement. User-created content or otherwise low-risk content, such as knowledge base articles, can be translated using MT where no post-editing may be needed. But, assessing the results first before committing to publishing low-risk content is still a critical step to ensure that the content will be acceptable and usable.

MT may work for your organization’s content if it…

  • Is not directly involved in user safety
  • Involves information that poses no physical or economic risk to users
  • Could be important to users, but would otherwise go untranslated
  • Does not include heavy jargon or technical terminology not available in the public domain
  • Does not need to be translated into rare languages that are predominantly aural and may be rarely written
  • When user safety or risk of liability may exist, relying on machine translation without employing human post-editors to review and verify the automatic translation could expose your company to liability.
  • Low-value content that may be of interest to users and customers may have typically gone untranslated in the past due to the cost, but now with machine translation, this content can be made available to that audience with minimal cost and effort.
  • If your content is highly specialized and uses arcane terminology, then the likelihood of a machine translation generating a reliable translation is far less. The reason is that the engine will have limited data related to your company’s area of specialization and will not have the information it needs to produce an accurate translation.

Are Machine Translation Systems Secure?

Security is a major consideration when contemplating using MT for your content. There is an infamous case of a corporate privacy breach due to the use of Google Translate by Translate.com in 2017. In this case, the state-run oil company of Norway, Statoil, had employees who used Translate.com’s free online machine translation service, which in turn used Google Translate. Google Translate in its user license agreement says that it will use and potentially place in the public domain any information users pass into the system. This ultimately led to sensitive corporation information making its way into Google Search results. The lesson here is that if you engage a free public service for machine translation, your content becomes the property of the service provider in most cases and security cannot be guaranteed.

Many of these tech companies, however, provide secure systems which will require greater technical commitment and tools for processing your content. Translation service providers who use machine translation in the delivery of their services—if they are professional and responsible—will rely on secure systems to translate your content. If your organization has a large volume of content to be machine-translated, then it may be easier to engage a service provider who already has these systems in place. This will make the overall process smoother and faster, and your organization will not have to develop its own workflow.

In addition to security, you need to consider how much content you may want to process through machine translation. If large volumes (more than the occasional email in another language, for example) then you will need a way to pass those documents efficiently and securely to the MT engine. You can’t copy & paste your way through this!

Translation service providers have translation management systems that connect to machine translation engines via encrypted API connections. These translation management systems are designed to ingest all different types of content from PDF files to structured XML content. The TMS parses this content, so it is easy to pass to the MT engine and keeps tags and other formatting data, which also eases publishing of the translated content. Even more sophisticated scenarios for delivering the content back to cloud-based publishing systems and corporate intranets can be constructed to not only automate translation but also publishing.

How to Get Started with Machine Translation of Your Content?

The best way to get started with using machine translation for your content is to engage a translation service provider who uses machine translation as part of their standard process. Even within the language services industry machine translation adoption varies between service providers. The best thing you can do is engage translation service companies and find out how they translate their content and if your organization can benefit from their use of machine translation. Note that many translation providers, even if they do use MT as part of their production processes may not offer to sell you raw machine translation—that is, unedited content produced by a machine translation because of the potential liability. Even if they do, there will be a cost to set up the process for your content and likely a small per-word rate that will be charged, so do not expect this service to be offered for free. Also do not be surprised if they ask you to sign a waiver that indemnifies their company for any errors or defects. This is a reminder that machine translation quality, albeit miraculous today as compared to just five years ago, is not the “Universal Translator” made famous by Star Trek. The machines still have a way to go to match the uniquely human aspects of language.