3.2 Corpus
The corpus of sentences analyzed in this study consists of three legal texts, five subtitled talks and one simultaneously interpreted speech, comprising a total of 1,136 sentences. For each sentence, the analysis included the original English version and versions translated or interpreted into five languages from different families: Russian, Hungarian, Turkish, Mandarin and Japanese. That makes for a total of 6,816 language versions of sentences analyzed. 26 values were measured and recorded for each sentence. Information on each text, talk or speech, including the reasons for choosing each one, is given below.
3.2.1 Legal translation
As a genre of standard written translation, this study has chosen to focus on legal translation, as opposed to other genres such as literature or magazine articles. One reason for this choice is that legal texts often have long, complex sentences, which is where the translation difficulties highlighted in this study are most likely to appear. In a study of UN translations of English legal texts into Arabic, Abu-Ssaydeh and Jarad (2015: 100-101) note: “An essential feature of English legislative writing is the high frequency of complex sentences; through the use of coordination and subordination, legislative English is capable of producing long, complex patterns which represent bafflingly intricate patterns that many translators find extremely challenging.” In a study of the effect of the plain English movement on translation of English legal texts into Mandarin, Lin et al. (2023: 1) observe: “Complex nominal and hypotactic structures result in a high number of propositions per sentence, placing a high demand on the cognitive processing abilities of those who read and understand the text.”
Another reason this study has chosen to focus on legal translation is that, in addition to difficulty in the translation process, output-related issues, such as distortions of meaning or coherence and comprehension difficulty for the reader, can have major consequences in legal translation.
Within the category of legal translation, three major international documents are analyzed: the Universal Declaration of Human Rights (UDHR), the Paris Agreement on climate change and the US Foreign Corrupt Practices Act (FCPA). Because of their global importance, each of those documents is published online in many translated versions.
The UDHR is described on the UDHR website as “a milestone document in the history of human rights… [which] set out, for the first time, fundamental human rights to be universally protected.” The UDHR is the most translated document in the world, according to the website of the UDHR Translation Project, where links to each translation can be found. The English, Russian and Mandarin versions used in this study are official translations produced according to standard UN procedures. For the other language versions, the site says that “efforts have been made to select the official or best available translations” wherever possible. The original English version of the UDHR has 68 sentences.
The Paris Agreement is described on the Agreement website as “a landmark in the multilateral climate change process because, for the first time, a binding agreement brings all nations into a common cause to undertake ambitious efforts to combat climate change and adapt to its effects.” The English, Russian and Mandarin versions of the Agreement are official translations produced according to standard UN procedures and published on the website of the UN Framework Convention on Climate Change. The Hungarian version is published on the legislation website of the European Union, the Turkish version on the website of the Turkish Official Gazette, and the Japanese version on the website of the Japanese Ministry of the Environment. The original English version of the Agreement has 225 sentences.
The FCPA is described on the UK website of the professional services giant PricewaterhouseCoopers as “arguably, the most wide-reaching law” in the world, leading to settlements totaling several billion dollars a year and serving as a model for similar laws in many countries. The English text of the FCPA is published on the website of the US Department of Justice, along with many unofficial translations, including the ones used in this study. The original English version of the FCPA has 205 sentences.
That makes for a total of 498 sentences and 2,988 different language versions of sentences in the three legal texts analyzed.
3.2.2 Subtitle translation
For subtitle translation, this study has chosen to analyze five different TED talks, as opposed to other types of subtitle translation, such as translation of subtitles for films or entertainment series. The reason for this choice is that other types of subtitle translation tend to involve a lot of dialogue consisting of simple sentences, where the translation difficulties highlighted here are unlikely to appear. In contrast, lectures by single speakers who are experts in their fields tend to have more complex sentences. Among online lecture platforms, TED is probably the most widely watched, with hundreds of millions of views.
Subtitle translators are sometimes advised to keep translated segments short and easy to read. Except for such general advice, I haven’t found any studies which mention the specific challenge of translating long, complex sentences in subtitles between languages with different structure. The instructions for subtitle translators on the TED website make no reference to that challenge.
Because of their large global reach, the choice has been made to analyze the five most popular TED talks to date at the time of writing, according to the website for the most popular TED talks of all time. The site includes a video file for each talk, with original and translated subtitles. The five talks used in this study are: “Do schools kill creativity?” by Sir Ken Robinson, “Your body language may shape who you are” by Amy Cuddy, “How great leaders inspire action” by Simon Sinek, “The power of vulnerability” by Brené Brown and “Inside the mind of a master procrastinator” by Tim Urban.
Subtitles for each translated TED talk are produced and reviewed by experienced volunteers, following a standard set of guidelines. Translators are instructed to keep subtitle segments to a maximum of two lines and reading speed to a maximum of 21 characters per second. All 25 subtitle translations used in this study were produced by native speakers of the target language.
The original English versions of the talks by Sir Ken Robinson, Amy Cuddy, Simon Sinek, Brené Brown and Tim Urban have 69, 102, 91, 80 and 71 sentences respectively, for a total of 413 sentences. With five translated versions of each sentence, the analysis involved a total of 2,478 different language versions of sentences from the five talks.
3.2.3 Simultaneous interpretation
As a genre of interpretation, this study has chosen to focus on simultaneous interpretation, as opposed to other forms of spoken interpretation such as consecutive, liaison, community or telephone interpretation. The main reason for this choice is that the working memory constraints which can have a major effect on the linear order and hierarchical structure of complex sentences, particularly in language pairs where subordinate clauses branch in opposite directions, are most prevalent in simultaneous interpretation. The specific challenges posed by complex sentences in simultaneous interpretation between languages with very different structure are described in section 2.1.4. Some of the main tactics used to cope with those challenges are illustrated in section 5.2.
Within the category of simultaneous interpretation, this study has chosen to analyze recordings of interpretation of former US President Barack Obama’s speech to the UN General Assembly on 28 September 2015. One reason for choosing to analyze a speech to the UN is that organization’s unique international scope. Another reason is that three of the languages considered here – English, Russian and Mandarin – are official UN languages, so sessions of the General Assembly are interpreted simultaneously into those languages by expert UN staff interpreters.
Recordings of the original English speech and of the Russian and Mandarin interpretation were obtained with permission from the UN Audiovisual Library. Interpretation into Hungarian, Turkish and Japanese was kindly provided and recorded by expert freelance interpreters for this study. All five interpreters were working with a written copy of the original speech provided shortly beforehand, but without a prepared written translation. All five recordings used in this study were of interpretation by native speakers of the target language.
The original English version of President Obama’s speech has 225 sentences. With five translated versions of each sentence, the analysis involved 1,350 different language versions of sentences from the speech.
The original and all five translated or interpreted versions of each sentence in the corpus can be seen in annex II. Each language version of each sentence is parsed and annotated using the semantic parsing method illustrated in section 3.1 above and detailed in annex I, so that values for variables can be counted and recorded. Before we see how that process of counting and recording works, let’s take a closer look at the variables involved.
← 3.1 Semantic parsing
→ 3.3 Variables