Machine Translation: Human Replacement May Never Happen

This article is adapted from: Language Service Think Tank

Author: Li Changshuan (Vice Dean and Professor at the School of Advanced Translation, Beijing Foreign Studies University)

The development of artificial intelligence is very rapid, and the accuracy of translation is increasing, leading to concerns about whether translators will lose their jobs. Some parents even worry that their children who study foreign languages may not find jobs in the future.

My judgment is that machine translation can greatly facilitate people’s access to foreign information and improve the work efficiency of professional translators. However, for deep ideological exchanges, we still must rely on human translation, preferably by directly learning foreign languages.

The cost of human translation is very high, and the speed is slow. For example, in international organizations, a professional translator’s daily workload is about 5 standard pages (a total of 1650 English words). Based on the remuneration for a mid-level translator, each English word costs at least $0.15. Therefore, ordinary users cannot afford professional translation, nor are there enough professional translators to meet the vast translation demand.

On the other hand, the advantages of machine translation are its low cost and fast speed. Even if the translation is not accurate enough, it can meet user needs to a certain extent. Moreover, ordinary people can only master two or three languages at most, while machine translation is not limited by language. Therefore, the development of machine translation allows the general public to easily access a large amount of foreign information.

For professional translators, they can use machine translation to create a first draft and then edit it. If the average accuracy of machine translation reaches 50%, it can improve efficiency by 50%.

Although the inaccurate parts may lack logic, some word choices might be correct and can be refined through human modification; at least it saves some typing time.

In addition, machine translation software also helps unify terminology among different translators, which can save time for collaborative tasks.

Despite the rapid development of machine translation, the accuracy of translations between different languages varies greatly. For languages that are structurally similar and have complete word correspondences, or for languages with certain conversion rules, such as European languages, machine translation has already reached a relatively high level. With minimal editing by professional translators, it can be used for formal translation texts.

However, for languages with uneven concept correspondences and significant structural differences, such as Chinese and English, there is still much room for improvement in machine translation.

For example, the word “mission” essentially means “an important task assigned to a person or a group, usually requiring travel abroad” (Oxford English Dictionary).

Based on this meaning, it extends to specific meanings such as “mission,” “task,” “delegation,” “business trip,” “mission group,” “representative group,” “special envoy group,” “missionary group,” and “visiting group.”

If “mission” is translated into French, the translator (including machine translation) does not need to think because there is also this word in French, spelled the same, only pronounced differently, and each meaning is the same.

This means that in a specific context, the meaning of “mission” does not require the translator to be concerned; the reader of the translation can judge it.

However, if it is to be translated into Chinese, the translator must determine whether to translate it as “使命” (mission) or “使团” (delegation) based on the context. This judgment is not immediately obvious.

If given to machine translation, the machine usually chooses the most common meaning based on statistical probabilities, which often leads to errors.

The sentence structure in English can be very complex, with clauses nested within clauses, sometimes spanning several lines. Some literature introducing machine translation states that for more formal languages, such as government documents and legal texts, machine translation tends to have relatively high accuracy. This might be true for European languages because the structures among European languages are relatively similar, requiring minimal structural adjustments in translation.

However, Chinese has a simple structure and lacks the clause structure similar to English. Translating English into Chinese requires the translator to think repeatedly, breaking down complex structures into simpler ones to effectively convey the original meaning. Machines lack logical analysis capabilities, making it difficult to transform complex structures into meaningful simple structures.

For example, here is a sentence produced by machine translation; let’s see if anyone can understand it: “The government has six mandatory challenges, when the defendant is charged with a crime punishable by more than one year of imprisonment, the defendant or defendants may raise ten mandatory challenges.”

This sentence comes from the U.S. Criminal Procedure Rules and discusses how to select jury members. The accurate meaning is: “When a defendant is charged with a crime punishable by more than one year of imprisonment, the prosecutor may exclude six (potential jurors) without stating a reason, and the defendant (including multiple defendants) may exclude a total of ten (potential jurors) without stating a reason.” This is still a relatively simple complex sentence in English. When encountering truly complex sentences, machines are even less likely to understand the relationships between the parts.

It’s worth noting that the phrase “defendant or defendants” in machine translation comes from the English word “defendant or defendants.” Since Chinese does not differentiate between singular and plural, the machine translates it as “被告” (defendant).

Human translators recognize that the singular and plural here have different implications, so they flexibly handle it as “被告人（包括多名）” (defendant, including multiple individuals).

As for why machine translation uses “被告” (defendant) while human translation changes it to “被告人” (defendant), it is because in Chinese law, “被告人” (defendant) is used in criminal cases, relative to “公诉人” (prosecutor); “被告” (defendant) is used in civil cases, relative to “原告” (plaintiff). These subtle differences are even more difficult for machines to discern.

When translating from Chinese to English, although the Chinese sentence structure is simple, the linguistic expressions have unique characteristics, especially in government documents, which are often difficult for even Chinese readers to fully understand beyond the literal meaning.

For example: “The primary task of students is to read and study. Higher education institutions must organize education around students’ diligent study, guiding students to read books about ‘national conditions,’ ‘grassroots,’ and ‘the masses,’ as well as classics of excellent traditional culture, Marxist-Leninist classics, world-renowned classics, and professional classics.”

In this context, the terms “‘基层’” (grassroots) and “‘群众’” (masses) would require human translators to investigate the meanings behind these expressions before conveying the meaning in another way.

Machine translation, however, would simply translate the text, for example, a certain translation software translates “‘基层’” (grassroots) and “‘群众’” (masses) as “‘basic’ books, ‘mass’ books,” which would likely be even more confusing for English readers.

Furthermore, human translators would recognize that “读书学习” (reading and studying) is a concept and could broadly translate it as “study.” However, machines see “读书” (reading) and translate it as “study,” and see “学习” (studying) and also translate it as “study,” resulting in “读书学习” being translated as “study study.” This seems quite ridiculous to humans.

The documents translated by translators every day often contain many hasty concoctions that have not undergone specialized editing, leading to imprecise expressions.

When human translators encounter low-quality documents, they will investigate or consult the author to clarify the original meaning before improving the translation. But machine translation will only translate faithfully.

For example: “The supervision of the procuratorial organs is limited to the supervision of the work of scraping mud and educational reform in labor re-education places, and does not have supervisory authority over the crucial approval process.”

The term “刮泥” (scraping mud) is difficult for the translator to understand. Why would labor re-education places need to “scrape mud”? Where does the “mud” come from? Is there that much mud to scrape? Such a task does not conform to common sense.

By researching the daily work of labor re-education places online, it turns out there is no task of “scraping mud.” After much thought, it suddenly occurred to me: it should be “management.”

This is because the author used a Pinyin input method, missing an “l” in “guanli” (管理), which naturally became “刮泥” (scraping mud). But machine translation would faithfully translate it as “mud-scraping” (“刮泥”).

Some people like to use idioms, but they may not always be appropriate. Translators must judge based on the context what the author is actually saying, and they do not have to translate strictly according to the author’s wording.

For example, a couple of years ago, the Chinese government issued a document requiring the dismantling of the walls of large courtyards, opening up capillary roads, and improving urban traffic. A journalist interviewed Liu Tai-ge, the father of urban planning in Singapore, and asked this question: “After the issuance of the document, the focus of everyone’s heated discussion was whether to dismantle the walls of large courtyards and closed communities.”

“The development history of Chinese communities has also transitioned from open communities, guided by grassroots organizations like police stations, to the establishment of closed communities. Now, due to traffic congestion, there is a hope to open communities to release the ‘capillary’ of roads. The policy’s ‘morning order and evening change,’ along with this new method to alleviate traffic, do you think it is reasonable?”

Among these, the term “朝令夕改” (morning order and evening change) refers to rapid policy changes. However, based on the preceding context, the policy has only actually changed once in decades, which does not qualify as “morning order and evening change.” The editor likely realized this and added quotation marks.

If translating “朝令夕改” into English, one cannot translate it literally according to the idiom’s original meaning but must flexibly translate it as “changes.” If it were a machine, it probably wouldn’t think this much.

In fact, I tried a certain online translation engine, and it even failed to understand this idiom, translating “朝令夕改” as “changing the future” (改变未来).

The above examples illustrate that machine translation still has many blind spots. These blind spots are unlikely to be overcome in the short term.

For the vast majority of users, if they want to use machine translation for communication, they must use the simplest structure, most complete grammar, most basic vocabulary, and language that is clear to everyone.

If one wants to achieve natural language processing through machine translation, there is still a long way to go; perhaps a day will never come when this is realized.

Leave a Comment Cancel reply