Differences Between Machine Translation and Human Translation

Editor’s Note:

Today, there are actually not many people who truly understand science, and even fewer who can popularize science. Popular science is a process of communication, turning science into something visible and guiding people to live better. To successfully carry out the 2021 Science and Technology Week activities, further popularize scientific and cultural knowledge, and promote the coordinated development of technological innovation and scientific popularization, the Shanghai Science and Technology Translation Association and the Minhang District Translation Association have made full preparations and proudly launched the 2021 popular science column, publishing 1 to 2 popular science articles daily on the association’s WeChat subscription account starting from May 21. The seventh popular science article, “Differences Between Machine Translation and Human Translation and Future Prospects,” is provided by the Minhang District Translation Association. We welcome your attention and feedback.

Differences Between Machine Translation and Human Translation

The Future Relationship Between Machine Translation and Human Translation

Machine translation is the process of using computer systems to automatically translate text or speech from one natural language to another. It is a branch of computational linguistics involving various disciplines such as computer science, mathematical logic, linguistics, and information theory, and is a form of artificial intelligence with significant scientific research value. At the same time, machine translation also has important application value. With the rapid development of economic globalization and the internet, machine translation technology plays an increasingly important role in promoting economic, cultural, and political exchanges. From the early rule-based machine translation to later statistical machine translation, and to the latest neural machine translation, its translation capabilities have gradually improved and are increasingly applied in daily life.

Machine translation is not only used for text processing but is also developing towards intelligent translation, such as speech recognition and image recognition capabilities. As a form of artificial intelligence, one cannot help but ask whether future machine translation can replace human translation? What will the relationship between the two look like? This article explores and forecasts the future development direction of human translation and machine translation based on the current state of machine translation and its limitations.

1. Current Status of Machine Translation Development

Since its inception in 1947, machine translation has roughly gone through four stages: the initial stage (1947-1964), the recession stage (1964-1975), the recovery stage (1975-1989), and the prosperous stage (after 1990). After more than 70 years of development, machine translation has made significant progress and is widely used in relatively limited fields such as manuals and scientific literature, producing certain social and economic benefits. With the acceleration of global economic integration and the increasing frequency of international exchanges, the demand for machine translation has grown unprecedentedly, ushering in new development opportunities. China has also achieved unprecedented success in machine translation, launching a series of machine translation software such as Yixing, Yaxin, Tongyi, and Huajian. Driven by market demand, machine translation has entered a practical stage, coming into the market and reaching users. Since the new century, statistical methods have been fully applied. Internet companies have established machine translation research groups, developing machine translation systems based on big data from the internet, thus making machine translation truly practical, such as Google Translate, Bing Translate, Baidu Translate, and Youdao Translate. In recent years, with advancements in deep learning, machine translation technology has further developed, leading to a rapid improvement in translation quality.

From the Perspective of Working Principles

Machine translation has generally gone through three stages: rule-based machine translation, statistical machine translation, and deep learning-based neural machine translation. These three stages have witnessed a gradual improvement in machine translation quality. Machine translation has advantages such as automation and scalability, and is increasingly applied in daily life, with the translation of some technical literature almost reaching the level of human translation. The new generation of neural translation system released by Google in 2016 (Google Neural Machine Translation) is currently the most advanced translation system. This system has reduced the error rate by over 60% in the mutual translation of major languages such as English, French, Spanish, and Chinese, and can achieve human translation levels in the conversion of normative texts. It is undeniable that in relatively limited fields, the level of neural machine translation can almost rival that of human translation.

In Terms of Speech

Machine translation has also made significant progress. A few years ago, Microsoft demonstrated a fully automated simultaneous interpretation system at a public event in Tianjin. The speaker gave a speech entirely in English, while the computer in the background completed the processes of speech recognition and switching between English and Chinese fluidly, with good results. Some technical experts believe that in the near future, machine translation will replace human translation. However, despite the tremendous progress made in machine translation, it is still constrained by many factors and cannot replace human translation in many aspects and fields, and may never be able to do so.

2. Constraints on the Development of Machine Translation

1. Dependence on Parallel Corpora

As a data-driven method, current machine translation is highly dependent on the scale, quality, and domain breadth of parallel corpora. However, the construction of corpora has its own inherent flaws, which restrict the development of machine translation. The main constraints are described as follows.

Narrow Coverage of Corpora

According to Koehn & Knowles’ research on the English-to-Spanish translation system, the performance of neural network translation only begins to outperform statistical translation when the corpus size exceeds 15 million entries. However, the reality is that apart from major languages such as Chinese, English, French, German, and Japanese, it is challenging to collect sufficient resources for many minority languages. For example, the languages involved in the “Belt and Road” initiative mostly belong to resource-poor languages, and only small-scale corpora can usually be found. Moreover, these languages are often agglutinative and face issues with morphological analysis, and there are very few professionals who understand these languages, making neural machine translation models unsuitable. The internationally renowned Google Translate system faces the same problem. Although Google Translate nominally supports over 60 languages, in practice, apart from some major languages achieving higher levels of machine translation to English in everyday language and news fields, the translation quality between most other languages, especially between non-European languages, is far from satisfactory. Even for Chinese and English, the current corpora mainly focus on government documents, political news, and scientific and technological areas, severely lacking materials for the vast majority of fields; thus, the advantages of machine translation cannot be realized.

Lag in Corpus Construction

The other flaw of corpora is the lag in construction. With the continuous advancement of society and the development of technology, new terms and expressions continuously emerge. Huang Youyi believes that “in the humanities, our country lacks databases for new concepts, new ideas, and new expressions, which machines cannot solve, and it still relies on humans.” In fact, this is the case not only in the humanities but also in the field of science and technology. Informational texts introducing the latest cutting-edge research results, such as academic papers, textbooks, or specialized publications, often use a large number of technical terms to disseminate the latest academic concepts, and these new terms often lack corresponding words in another language.

2. Inherent Flaws of Machines

Superficiality of Deep Learning

The current neural machine translation is a product of the combination of deep learning and big data. Since being proposed by Geoffrey Hinton in 2006, deep learning has become a new field in machine learning research, motivated by the aim of establishing and simulating neural networks that analyze and learn like the human brain. It mimics the mechanisms of the human brain to interpret data, such as images, sounds, and texts. Deep learning has greatly improved machines’ capabilities in speech recognition and image recognition. However, translation is one of the most complex human activities, involving both concrete and abstract thinking activities. While artificial intelligence may one day possess human-like abstract thinking abilities, it is difficult to acquire the capacity for concrete thinking, which includes imagination and emotion. Therefore, machine translation is often used in texts aimed at factual description, knowledge, and information transmission, such as news, technology, patents, and manuals. These texts rarely involve emotions, common sense, and cultural background. When translating expressive texts, the limitations of machine translation become evident. Expressive texts are those that emphasize emotional expression and are rich in imagination, characterized by subjectivity, emotionality, and imagination, such as novels, poetry, essays, and art. This type of text values the emotional expression of the author or character, and semantic expression is often unstable and vague, utilizing metaphors, symbols, and other forms of expression. Machine translation struggles to handle such texts, often only capturing the main idea and lacking depth and elegance. Machines cannot replicate the historical accumulation of emotions and rationality that humans possess, nor can they have subjective feelings combined with rational analysis. Simulating the human brain is not difficult; the challenge lies in the inability to learn the rich social experiences and life experiences of excellent translators.

In other words, machine translation lacks the personalization and creativity of human translation. It is this personalization and creativity that drives the development and evolution of language, while machine translation can only output mechanical “machine language.”

Limitations of Sentence-Based Translation

Since 2014, end-to-end neural machine translation has developed rapidly, achieving significant improvements in translation quality compared to statistical machine translation. The so-called end-to-end translation method refers to treating the entire sentence as input to the network during learning, obtaining complete translations at the output end. Through the learning of a large number of translation examples, neural networks achieve the closest approximation of natural language expression based on extensive calculations; thus, machine translation primarily achieves sentence-level translation and struggles to accomplish paragraph and discourse-level translations. Nevertheless, neural network machine translation still performs better for short sentences than for long ones, and translating long sentences remains a significant challenge. Therefore, the application of translation strategies and methods in machine translation is limited to sentence-level, rarely adopting paragraph-level and discourse-level translation methods, such as summarization, abstract translation, and deletion. In contrast, human translation units can vary based on stylistic differences or the subjectivity of the translator, and the translation unit can be a group of sentences, a paragraph, or an entire text. When using paragraph and discourse as translation units, factors such as coherence, cohesion, referential relationships, sentence order, cultural context, and situational context must be considered. Currently, machine translation is incapable of addressing these challenges and will face significant difficulties in the future.

The Complexity and Multiplicity of Language Services

People generally understand translation as the process of converting unformatted plain text from one language to another. However, this is a one-sided understanding of the translation profession. With the advent of the era of big data and cloud computing, the language service industry is gradually moving away from traditional development models. Translation, as an important component of the language service industry, has undergone fundamental changes in its content and mode. The mainstream texts in the era of big data involve numerous hypertext forms, such as websites, software, product manuals, online help, and electronic learning materials, which require reliance on translation technology, project management, and other models to complete. Translators must consider not only the economic, cultural, ideological, and legal requirements of the target language market but also details like input methods, fonts, formatting, and interface patterns. Users from different countries expect to use software products in their own languages, not just in English, and they hope that the software can meet various cultural needs. Therefore, the localization of language services requires not only accurate translation but also appropriate improvements to the relevant software, necessitating translation needs analysis, text type analysis, information extraction, formatting conversion, proofreading, compilation, testing, and publication processes, often requiring cross-departmental, cross-company, and even cross-regional and international communication. Even if the quality of machine translation significantly improves in the future, it will only change the production methods of text conversion in translation services; translation analysis and communication will still require the participation of many professional translation companies and professionals, which is beyond the capabilities of machine translation.

This section analyzes the dilemmas faced by machine translation from the perspectives of corpus construction, the machines themselves, and language services, thus concluding that it cannot replace human translation. However, it is undeniable that the development of machine translation is an irreversible trend, which will inevitably eliminate mid- to low-end interpreters, impacting the industrial structure of the language service industry and changing the future relationship between machine translation and human translation.

3. Exploring the Future Relationship Between Machine Translation and Human Translation

The continuous development of machine translation will undoubtedly change the industrial structure of the language service industry and affect the career prospects of practitioners. What will the future relationship between machine translation and human translation look like? Based on the analysis above, I will provide the following logical reasoning and predictions.

Dislocated Competition

It can be anticipated that with continuous technological advancements, machine translation will gradually replace those formulaic and regular translation tasks. In the mid- to low-end translation market, machine translation will dominate. For example, most foreign-related scenarios and daily communication, as well as emails, WeChat, instant messaging, help documents, user interfaces, and product manuals, do not require highly specialized and precise translations, and the current level of machine translation is sufficient to handle these tasks. However, high-end human translators remain in short supply. In the future, human translation will primarily cater to high-end markets with strict translation accuracy requirements, such as legal documents, medical monographs, political literature, and other specialized content, as well as fields requiring high creativity and imagination, such as literature, art, philosophy, and other humanities. The former often suffers from minor errors leading to significant consequences. For instance, political discourse often represents the construction of national image, which is not only difficult but also sensitive; even slight mistakes can lead to unimaginable consequences affecting the country’s image. Huang Youyi, former deputy director of the China Foreign Languages Bureau, has provided numerous examples to illustrate the high difficulty and sensitivity of translating political discourse. For example, the term “patriotism” corresponds to the English word patriotism, but some Western concepts consider the patriotism advocated by China to be an “extreme nationalism,” a usage often found in foreign media criticizing China. In foreign publicity translation, it is generally handled as “love of the country.” Machine translation cannot make such cultural and ideological judgments. The latter requires the translator’s creativity and the responsibility of safeguarding language; thus, in these areas, machines can never replace human translation. In the future, the distinction between artificial translation and machine translation in target markets will gradually become clearer. Machine translation and human translation will occupy different ecological positions in the market, engaging in dislocated competition. Machines can replace parts of translation that require relatively simple skills and less human intelligence, allowing humans to focus more on more meaningful activities.

Human-Machine Interaction

Dislocated competition is an inevitable trend in the development of machine translation and human translation, but this does not mean that the two are entirely independent and do not interfere with each other. Instead, they are always mutually integrated, with the degree of human-machine interaction varying based on the nature of the text and the client’s quality requirements for the translation. Surveys show that “a pyramid-shaped translation market is forming, where only 10% of tasks require human translation, 70% of documents require machine translation, and the remaining 20% of information will adopt post-editing methods for translation.”

Machine Translation Dominates

Cui Qiliang categorizes translated texts into three major categories based on their purpose, quality requirements, and delivery time: reference-level, conventional-level, and publication-level texts. The lowest quality requirement is for reference-level texts, such as emails, WeChat, web pages, instant messaging, and information retrieval. As mentioned above, this type of text occupies 70% of the market share, and the translations are for reference only, with clients hoping to receive translations quickly; therefore, the accuracy requirement is low. In this case, a machine translation-dominated strategy can be adopted to reduce costs, improve efficiency, and minimize time wastage.

Human-Machine Collaborative Translation

For conventional-level texts such as news, technology, trade, product manuals, help documents, and user interfaces, the quality requirements are moderate, necessitating a human-machine collaborative model. There are typically two main modes: pre-editing + machine translation + post-editing; machine translation + post-editing. Pre-editing usually involves two parts: formatting processing and language processing. It refers to structurally modifying the source text before machine translation to reduce semantic ambiguity, making it conform to the logical patterns of machine translation, thereby improving translation accuracy and readability and reducing the workload of post-editing. However, no matter how intelligent the machine is or how much time the translator spends on pre-editing, there will still be a certain proportion of texts that remain untranslatable, necessitating post-editing of machine translation.

Moreover, human-machine collaboration may also utilize auxiliary tools such as translation memories and term bases. Translation memory matches source and target languages at the sentence level, forming a translation memory database. After human translation, the computer stores the original text and translated text information, allowing it to automatically match translation units with the memory database when encountering the same text again. The term base stores various terms and can also supplement new terms. This ensures consistency of terms and improves the accuracy and professionalism of the translator’s work.

Human Translation Dominates

For high-quality publication-level texts such as literary arts, scientific monographs, legal regulations, and contractual agreements, the quality requirements are highest, necessitating a human-dominated translation strategy. At this point, machine intervention is minimal. Texts in the literary arts emphasize aesthetics, interest, and educational value; for example, the translation of “Jean-Christophe” by Fu Lei has influenced generations, and the elegance and depth of his translation play a crucial role, which is beyond the capabilities of rational machine translation. Similarly, texts such as legal regulations and contractual agreements emphasize normativity and accuracy. For instance, the terms attempted rape/murder/robbery are translated by both machine translation and general dictionaries as rape/murder/robbery attempted. However, in the Basic Law of Hong Kong, based on its connotation and substance, it can only be translated as “attempted rape/murder/robbery.” While the words “attempted” seem to flow smoothly, when it comes to legal significance, they must be discarded. This illustrates that in facing texts that emphasize aesthetics, interest, and literary arts, as well as those that require high standards of normativity and accuracy, human translation dominates, and machine translation’s intervention is minimal.

Integration of Technology and Humanities

The rapid development of machine translation has indeed brought convenience to the general public and generated certain economic benefits, but this should not lead to an exaggerated view of the effectiveness of machine translation. Some so-called professionals overstate the power of technology while neglecting the humanistic aspect of translation. The reason translation activities remain vibrant is that different nations and cultures require the exchange of culture and ideas. Overemphasizing either technology or humanity is an extreme approach. The correct approach is to integrate technology with humanity. It is essential to emphasize the convenience of machine translation technology while also leveraging the subjectivity of human translators; both are indispensable.

No matter how advanced artificial intelligence becomes, it is still a product developed by humans, ultimately serving humanity. In the increasingly developed era of information and specialization, professional translators need to embrace new technologies with a positive attitude, learn, apply new technologies, and utilize machine translation technology to enhance their translation skills, enabling them to work faster and better. The translation profession is characterized by strong practicality, applicability, and professionalism. It can be anticipated that future human translators should possess both bilingual capabilities and computer software development skills. Translators should learn to use machine translation, enhancing their ability to apply translation technology, which will be the new model for translation in the future. However, we must also recognize that translation studies are a humanistic discipline; translators are both cultural messengers and guardians of language. Translation is not merely a simple conversion of words and sounds; its essence is the exchange and dissemination of thoughts and cultures between different cultures. The warmth, depth, and purity of human interaction and cultural communication, as well as the skills, creativity, and wisdom reflected in human translation, are beyond the reach of machine translation. At the same time, translators are guardians of language. If thinkers and poets are guardians of language, then a thoughtful, creative, and artistic translator also bears this responsibility. Recognizing translators as guardians of language affirms their identity and understanding, preventing machine translations from eroding natural language and benefiting the subjectivity of human translators. While machine translation considers the transmission and accuracy of information, it cannot account for the polysemy, vagueness, and creativity of language. It is precisely the polysemy, vagueness, and creativity of language that imbue our existence with spirituality, beauty, and vitality.

4. Conclusion

Based on the current state of machine translation development and the constraints it faces, this article explores the future relationship between machine translation and human translation. The emergence of machine translation serves to better serve humanity. The relationship between machine translation and human translation is not a contradiction or zero-sum game, but rather a complementary and mutually promoting relationship. The development of machine translation has indeed alleviated the burden on human translators and brought convenience to the public. However, we should not overemphasize the functionality of machine translation. No matter how advanced artificial intelligence becomes, it is still a product developed by humans and ultimately cannot replace human brains. Similarly, in the future, machine translation will never fully replace human translation. If this assumption were to become a reality one day, it would not only affect the language service industry but all sectors of society, where human labor would be replaced by artificial intelligence.

(Provided by Minhang District Translation Association)

Leave a Comment Cancel reply