Research Strategies for Subtitle Translation in Multimodal Contexts

Introduction to University Student Innovation Training Project

Subtitle Translation Strategies from a Multimodal Perspective

Project Members | Ma Yuqi, Di Bao Supervisor | Chang Chengguang

What is Multimodality?

First, what do we mean by modality?

Every source or form of information can be referred to as a modality. For example, humans have touch, hearing, vision, and smell; the media of information include voice, video, text, etc.; various sensors such as radar, infrared, accelerometers, etc. Each of these can be considered a modality.

Multimodality refers to information from multiple modalities. For instance, the information conveyed to the audience in a film is not just the dialogue of the actors but is created through the visual symbols in the film, the auditory symbols of the actors’ speech, and even certain tactile symbols in specific films. In other words, subtitles can only create meaning when they are combined with visuals, sounds, and other modalities.

The image shows a script excerpt from the film “Three Billboards Outside Ebbing, Missouri”. It can be seen that without visuals, the audience would struggle to imagine what is happening.

Each image is a type of multimodal text. We will bring the analysis of multimodality into videos, studying the multimodal semantics in each scene and shot, and analyzing the impact it has on the choice of subtitle translation strategies.

Theoretical Background

Halliday (1994) believes that language has three pure logical functions: the ideational function (the objects referred to are expressions of various experiences in real life), the interpersonal function (the identity and communicative functions of the speaker embedded in the language), and the textual function (the components of language can form texts) (Hu Zhuanlin et al., 1989:24).

With the rapid development of digital media, especially the rise of the internet, people have begun to pay attention to the increasing expression of visual information. Matthiessen & Nesbitt (1996) recognized that the development of technology and social needs has led to a disconnection between linguistic theory and description, and they were the first to explore multimodality and language as a symbol system.

Kress & Van Leeuwen also proposed a visual grammar theory of multimodal discourse analysis in 1996, pointing out a theoretical framework for image analysis: including representational meaning, interactive meaning, and compositional meaning. These provide ideas and directions for our study of multimodal dynamic texts.

Scroll up to view the visual grammar analysis of this image.

Taking a frame from the second scene of the film “Three Billboards Outside Ebbing, Missouri” as an example, it depicts the female lead going to set up the billboard. Here, the hierarchical relationship between the two main characters and their mutual gaze forms a vector (Li Zhanzai, 2003:5). This reflects the narrative online and belongs to the representational meaning of the image. This high-low technique demonstrates Mildred’s (left) dominant position in the dialogue. Additionally, the scene has a high saturation of colors, with bright light coming in from the window, providing rich compositional meaning. In terms of the elements in the composition, Welby (right), as the person in charge, has a messy desk and was lying on the chair before Mildred entered, secretly watching the female secretary. He speaks softly and is relatively young. In contrast, Mildred wears a smart blue outfit, has her hair tied up, and has little makeup, instead showing more wrinkles. This contrast portrays Mildred’s strong and capable character and Welby’s somewhat weak character with little discourse power. These settings regarding clothing and props belong to the interactive meaning in the image, corresponding to the interpersonal function in the three pure logical functions. These elements create an interactive relationship with the viewer (audience), suggesting the attitude the audience should hold towards the depicted events in the image.

Moreover, the camera angle can also convey certain meanings. In this scene, Mildred stands throughout, while Welby sits in a chair with a soft back. Statistics show that among the 28 shots in this scene, 18 reflect this compositional meaning. Furthermore, the prominence of the characters enhances the smoothness of the transition between the preceding and following scenes. The director conveys similar interactive and compositional meanings through different perspectives, providing more elements for the audience to understand the entire scene and assisting in analyzing the interpersonal and textual functions in this dialogue.

By comparing the two images, it can be seen that the shot of Mildred is taken from a low angle, placing the audience and Welby on the same level, looking up at Mildred, which further emphasizes Mildred’s protagonist status and her strong aura and discourse power in this dialogue. The audience may begin to feel a sense of respect, even a bit of fear, towards the protagonist. As the first scene with dialogue, its rich multimodal semantics not only provides the audience with abundant background information but also powerfully shapes the character of the protagonist.

To demonstrate that this hierarchical relationship between protagonists is not coincidental, we can take the 49th scene of the film as a comparison. In this scene, Welby calls Mildred to his office, telling her that she has already owed a month’s rent for the billboard and pressures her to pay immediately. In a similar scene, this time Welby stands throughout while Mildred, who cannot afford the rent, sits in the chair across from him, in a weaker position. Correspondingly, other multimodal meanings also undergo changes.

What Insights Does This Offer for Subtitle Translation?

Taking the aforementioned second scene of “Three Billboards Outside Ebbing, Missouri” as an example, the compositional meaning in the video imagery is often reflected through the camera’s position, movement trajectory, and angle. Accordingly, it can indicate the strength of the relationship between characters in the video. The translator can then achieve the goal of reflecting this meaning by altering the wording, sentence structure, etc., in the translation.

Here, we select a line from Mildred in the second act as an example to briefly analyze the styles of seven translations.

This close-up shot brings the audience closer to Mildred, amplifying the meaning she expresses, which has strong interactive significance. When Mildred asks how much it costs to rent the billboard for a year, Welby looks shocked and responds, “A year? No one would take that road except someone lost or crazy. Do you really want to rent that billboard for a year?” After that, Mildred responds, “Quick, ain’t ya, Welby?”

After hearing this, Welby’s face is filled with embarrassment, indicating a new conceptual meaning. Therefore, if it is translated as “You’re really smart,” it would imply that Mildred is genuinely complimenting Welby (especially when the sarcastic tone is not very obvious). This would create some confusion for the audience regarding the understanding of the dialogue, as it does not align with Mildred’s aggressive image and fails to explain why Welby looks embarrassed instead of pleased by the compliment. Meanwhile, we note that in the Mandarin dubbing version of this film, this line was translated as “Be straightforward, Welby,” which also adopts a more forceful tone, aligning better with the original film and providing a more equivalent translation.

Why Choose This Angle?

Despite the considerable differences between various subtitle versions, if we observe the translation style of one version from start to finish, it is generally consistent. The more concise subtitle translations tend to simplify information in nearly every line; versions that strive to maintain the original meaning also aim to be loyal to the original text in almost every line (for example, in metaphors, etc.).

From the number of words and lines in different subtitle versions, one can glean the extent of information reduction in each version. Although this statistical data is a rough estimate excluding opening and closing credits, it still indicates the differences in translation styles across versions.

Concise translations are often favored by audiences, but they do not necessarily represent good translation. Sometimes, overly concise translations can elevate the register, making characters say things that are “inappropriate” for their identity. Thus, comprehensively considering multimodal symbols in different scenes and accordingly selecting subtitle translation strategies is a superior choice.

Currently, subtitle groups in China are flourishing, yet the quality is uneven. We propose this new perspective, hoping to provide subtitle translators with more diverse ideas and help improve the industry’s translation standards.

Project Implementation Experience

The project lasted for a year, with two film-loving project team members gradually filtering shot materials from over a hundred feature films, studying film scripts, establishing a research database, comparing several to dozens of subtitle translations, and finally generating a summary report. The ideas gradually became clearer, marking an initial exploration of research activities. Through this, we not only attempted to learn systemic functional linguistics and visual grammar theory but also gained in-depth insights into the expressive techniques of films.

END

References

(Uruguay) Daniel Arijon. The Grammar of Film Language [M]. Beijing United Publishing Company, 2013:350.
Hu Zhuanlin, Zhu Yongsheng, Zhang Delu. An Introduction to Systemic Functional Grammar [M]. Changsha: Hunan Education Press, 1989.
He Yi, Yang Zequn. Multimodal Discourse Analysis of the Poster of “Farewell My Concubine”. English Research, September 2012, Volume 10, Issue 3.
Li Zhanzai. Social Semiotic Analysis of Multimodal Discourse [J]. Foreign Language Research, 2003.
Zhu Ling. Multimodality: A New Perspective for Translation Studies. China Social Sciences Daily, December 26, 2017, Page 003.
Zhu Yongsheng. The Theoretical Foundation and Research Methods of Multimodal Discourse Analysis. Foreign Language Journal, 2007, Issue 05.
Halliday, M. A. K. An Introduction to Functional Grammar (2nd edition) [M]. London: Edward Arnold, 1994.
Kress, Gunther R., and Theo Van Leeuwen. Reading Images: The Grammar of Visual Design. Psychology Press, 1996.

Hope everyone can bring their interests into their studies~

Source: Sun Yat-sen University, International Translation Institute

Editor: Yang Jie

Initial Review: Hua Yumi

Review: Huang Aicheng

Review Release: Wang Zheng

Leave a Comment Cancel reply