Difficulty in translation and interpretation

Much research into translation difficulty has focused on assessing the difficulty of source texts for translation. Various factors affecting that difficulty have been considered – including word and sentence length, vocabulary frequency, figurative language and certain grammatical structures. Various ways have also been proposed for analyzing that difficulty – by comparing features of an original text to its translation; by assessing subjective perceptions before, during or after translation; or by measuring physiological indicators of cognitive load during the act of translation.

Sun (2015) provides a theoretical and methodological guide for assessing various types of translation difficulty. He presents ways of assessing the inherent difficulty of the source text, as reflected in readability scores. He also describes other factors of difficulty, as reflected in retrospective ratings of difficulty by participants, evaluations of translation quality, records of physical activity (like keystroke logging or eye tracking), as well as running commentary by participants while translating. The author also discusses various approaches to the notion of equivalence in translation.

Steiner (2021) gives an overview of cognitive approaches to equivalence in translation. He sees translation as a relation between source and target units which can approximate equivalence in various respects. Those include “field, tenor and mode of discourse, and ideational, interpersonal and textual meaning in terms of clause semantics and grammar.”

Nord’s (2005) guidebook to text analysis for translation identifies four general sources of translation difficulty: features of the source text, translator competence, pragmatic requirements of the translation product and technical working conditions. She suggests guidelines for grading source texts as well as translation tasks by difficulty.

Jensen(2009) examines three types of complexity features which he sees as influencing the inherent difficulty of a source text for translation: (a) established indices of readability based on word and sentence length, (b) uncommon vocabulary items and (c) figurative language. For non‑technical translation, the author considers text complexity as measured by such features to be a strong predictor of translation difficulty. But he distinguishes between the notions of complexity and difficulty. For Jensen, “the notion of difficulty is subjective and elusive, […] vary[ing] from one translator to another. By contrast, the notion of text complexity is a more objective approximation to relative text difficulty which can be based on one or more factual criteria” (p. 62).

Campbell (1999) associates translation difficulty with three types of factor: those relating to the source text, those relating to the translation task and those relating to translator competence. His empirical study focuses on source text factors and involves models of working memory as well as language comprehension and production. The study was carried out on Spanish-speaking and Arabic-speaking students of translation. Campbell considers various features of the source text – official terms, metaphors and complex noun phrases with or without modifiers – as potential factors of translation difficulty. He assesses translation difficulty by counting how many different ways each item is translated by the participants. He finds that each of the source text features considered can be significant factors of translation difficulty as assessed in this way, posing similar difficulties for translation into both Spanish and Arabic. He concludes that such features can be taken as predictors of the inherent difficulty of a source text for translation into typologically different languages.

Hale and Campbell (2002) extend the findings from the same empirical study. They suggest that translation difficulty is associated with various factors involving source text clarity, translator competence, as well as “the translatability of the text into different languages at the different levels (lexical, syntactic, semantic and pragmatic).” The authors develop their identification of features making a source text inherently difficult for translation. They note that “This work needs to be done with multiple language combinations to test the universality of these difficulties.” The new study also explores the association between translation difficulty as assessed by the range of translation solutions and translator competence as reflected in accuracy. No clear correlation between translation variation and accuracy was found.

Sun and Shreve (2014) provide an empirical analysis of Mandarin-English translations by Chinese university students of translation, assessing translation difficulty on the basis of retrospective ratings by participants. The authors found a weak correlation between text readability scores and subjective ratings of difficulty. They found no clear correlation between translation quality (as measured by number of errors) and subjective ratings of difficulty. They also found no clear correlation between time spent translating and translation quality, or between time spent and subjective ratings of difficulty.

Several authors have proposed models for how interpreters manage the difficulty associated with simultaneous interpretation. Both Kirchhoff’s (2002/1976) “multi-phase model” and Gile’s (2009) “effort models” assume that different parts of the interpretation task involve varying degrees of cognitive load, which need to be managed within the limits of overall processing capacity. Kirchhoff analyzes linguistic errors as well as distortion and omission of content as indications of cognitive overload, concluding that “multiple-task performance becomes a problem if task completion requires cognitive decisions which, in sum, reach or even exceed the individual’s processing capacity limit” (p. 118). She discusses “changing the order of phrases” as a tactic for reducing cognitive load in interpretation.

Departures from original sentence content, such as errors or omissions, have been proposed as indicators of difficulty in simultaneous interpretation. An empirical study by Barik (1975) analyzes various types of error, omission and substitution in interpretation, which can be seen as indicators of difficulty. The study finds that interpreters have greater difficulty coping with function words, abstract words and structural differences between the source and target language.

Gile (2009) supplements his “effort models” with the “tightrope hypothesis,” according to which “most of the time, interpreters work close to saturation, be it in terms of total processing capacity requirements or as regards individual Efforts” (p. 182). He supports that hypothesis by reporting on an empirical study where professional interpreters were asked to interpret the same speech twice. Some errors and omissions which weren’t made the first time were made the second time. Since the conditions were the same both times, the author explains the new errors and omissions as indications of “processing-capacity limitations which left little room for sub-optimal allocation of attentional resources” (p. 183).

Cai et al. (2018) also use omissions as a measure of difficulty in simultaneous interpretation. In an empirical study of English-to-Japanese interpretation using a large bilingual speech corpus, the authors found that the rate of omissions in interpreter output corresponded to the delivery rate of the original speech and to the time lag between input and output. They found adverbs to be omitted more often than other word types. Words were also omitted more often when they were in grammatically subordinate structures in the original speech.

Pym (2009) points to a different type of effort in simultaneous interpretation: the management of risk created by the context of communication. He analyzes interpreter omissions from an empirical study by Gile as an indication of risk transfer from one type of effort to another. The author also questions what constitutes an “omission” and whether all omissions should necessarily be seen as compromising the quality of interpretation.

The potential benefits of some types of omission are illustrated in my guide to English translation for non-native speakers (Earls 2014), which I use in training sessions for simultaneous interpreters as well. That guide advises translators and interpreters to radically simplify phrasing typical of many European languages when working into English. Hundreds of examples are given, classified by type. Should such simplifications be regarded as “omissions” and put in the same category as errors? They clearly involve some cognitive effort. But, as Pym (2009) suggests, making that effort can be worthwhile, to enhance communication as well as stylistic acceptability. In simultaneous interpretation, doing so can also be worthwhile in freeing up mental energy for other efforts, as Gile’s effort models suggest.

Meuleman and Van Besien (2009) report on a study of tactics used by professional French-Dutch interpreters to cope with the difficulty of syntactically complex sentences and high-speed delivery. Most participants were found to produce acceptable renditions despite both types of difficulty. Most of them coped with high-speed delivery by trying to reproduce the original sentence form. On the other hand, most coped with syntactically complex sentences by segmenting them into shorter, simple ones.

Christoffels and Groot (2005) review several empirical studies on difficulty in simultaneous interpretation. They report on a study by Darò et al. which correlates syntactically complex structures and low-frequency words in an original speech to the rate of errors in interpretation. They report on a brain imaging study by Rinne et al. which suggests that the cognitive processes of lexical retrieval, working memory and semantic processing play a larger role in interpretation than in shadowing. The authors also report on two studies by Klonowicz which use physiological measures during interpretation, concluding that “mental load during [interpretation] is high and that coping with the difficulties of [interpretation] induces stress in interpreters.”

Translation and interpretation research to date has proceeded largely in two directions, sometimes referred to as “product-oriented” and “process-oriented.” The product-oriented approach analyzes data observed by comparing features in an original text or speech to features in a written translation or recorded interpretation of that text or speech. Such a translation or interpretation can either be produced as part of an experiment or be taken from a corpus. If it’s taken from a corpus, that corpus can be ready-made or, as in the present study, compiled from various sources by the researcher. In contrast, the process-oriented approach analyzes data recorded in experiments on participants during the act of translation or interpretation. Each of these two research approaches has its strengths as well as its limitations.

A process-oriented approach to translation or interpretation research can in some ways be more informative than a product-oriented approach. In examining difficulty, process-oriented research can provide a more direct reflection of the cognitive effort made by an individual translator or interpreter. But that can also be a limitation if we’re interested in exploring the inherent difficulty of a translation or interpretation task, independently of factors relating to a particular individual. Averaging process-based measurements from several different individuals and trying to minimize the effects of factors such as individual experience can help give an indication of the objective difficulty of the task itself.

A product-oriented approach, on the other hand, provices a more indirect assessment of individual difficulty. In that sense, it can be less informative than a process-oriented approach, because it tells us less about the actual effort someone is making. But for some puposes, a  product-oriented approach can be more appropriate than a process-oriented one, measuring the product of manipulations that reflect objective difficulty rather than individual effort.

In a given situation, a translator or interpreter might simply not try very hard and therefore produce a poor result. In terms of the indicators of difficulty considered in this study, that could mean failing to reorder propositions or to change nested structures appropriately to produce a good result in the target language. They might fail to make such changes because they’re inexperienced or unaware of the sort of effort required, especially if they’re working between structurally very different languages. It’s also possible for an experienced translator or interpreter not to bother to make such changes because they’ve become complacent or blasé, so they take the easy way out.

An illustration of such insufficient attention is reported by Liu and Zheng (2022). Their study found, from retrospective reports by participants, that perceived text difficulty didn’t correlate to measured indicators of cognitive effort. The authors’ explanation for that mismatch is that participants “tended to simplify or just skip the more insoluble translation problems. Therefore, their cognitive effort, indicated by eye-tracking data, did not show significant differences” among the three tasks they were asked to do, even though they rated the tasks quite differently in level of difficulty.

In such cases, a process-oriented study would record that lack of individual effort. But that could lead to the mistaken conclusion that the task in question was inherently easy. To check that, the quality of the translation or interpretation product would need to be assessed. In terms of the indicators of difficulty considered in the present study, comparing the product with the original text or speech could be more effective in reflecting the result of such carelessness, by flagging failure to reproduce semantic relations appropriately in the target language.

Moreover, a process-oriented study may record very different amounts of effort by different individuals depending on their level of experience. Two features of translation and interpretion products recorded in the present study – reordering and changes in nested structures – involve manipulations that have been shown in general to require more effort to do than not to do, regardless of whether they’re easier or harder for a particular individual given their experience. In a study like this, analyzing features of the translation or interpretation product may be more effective in reflecting differences in the objective difficulty of a given task than would be the case with a process-oriented approach.

Because of the various strengths and limitations of product-oriented vs process-oriented approaches to research, many recent empirical studies of cognitive effort in translation or interpretation have correlated data gathered using both types of approach.

One such study, reported by Dragsted (2012), compared translation difficulty – as assessed by the range of translation solutions produced by participants – to data gathered from eye tracking and keystroke logging during the translation process, for various source text items. Very strong correlations were found between the two sets of data.

Halverson (2021) documents a growing trend for empirical studies on cognition in translation which combine data from methods focusing on the process as well as the products of translation, stressing the need for both types of data. Those methods include “corpus technologies, language elicitation tasks, keystroke logging, eye tracking, interviews, and a  range of different psycholinguistic tests.” The author describes current models as well as newer theoretical constructs underpinning such multimethod research.

In an empirical study of Dutch-English translation by professional translators and translation students, Vanroy et al. (2019) correlate data gathered by comparing source texts to translation products (errors, range of solutions and changes in word order) with data gathered during the translation process (pauses, revision and eye movement). They regard the process-based data as a “proxy for the cognitive effort required to solve difficulties in translation” (p. 924). General correlations were found between all three types of product data and all three types of process data, especially in translation by professionals. The authors conclude that “these product features can be used as predictors for translation difficulties” (p. 938).

That study is part of a project to create a system for automatically predicting the difficulty of translating a given source language text to a given target language. The authors refer to another system developed by Mishra et al. for predicting translation difficulty based on three source text features: sentence length, polysemy and complexity. Vanroy et al. (p. 937) consider that “features derived from the relation between the source and target language (i.e. language-pair specific features)” should also be taken into account.

One fast-growing branch of empirical product-based research on cognition in translation and interpretation involves the use of large ready-made corpora. Rodríguez-Inés (2017) gives a thorough and informative overview of the current state of corpus-based translation research. She points out several benefits of corpus research as opposed to experiments focusing on the translation process: A corpus is a compilation of naturally occurring language and therefore potentially more objective than data produced in an experiment. A corpus can also provide a more representative volume of data than is generally feasible in an experiment. On the other hand, the author notes an inherent limitation of product-oriented research: “The analysis of completed translated texts … only allows for the inference of aspects of the process” (p. 265). These pros and cons have, as the author describes, “resulted in combinations of corpus and experimental data and in corpus‐based studies that take both process and product into account” (p. 266).

Rodríguez-Inés also gives a comprehensive list of available translation corpora. The corpora described are impressive and valuable to translation research. But the languages they include are nearly all Indo-European. Some are Semitic or Finno-Ugric languages, which are similar to Indo-European languages in complex sentence structure. A few corpora involve English and Mandarin translations. None of the many corpora listed seem to include parallel translations into multiple languages with a representative range of complex sentence structure – the typological distinction at the heart of the difficulty explored in the present study.

Steiner (2021) similarly describes the limitations of both product and process data: “Product data show the outcome of cognitive processes at best, and process data … all show (hopefully) correlates of cognitive processes, but not these processes directly.” Another limitation of process data is that, “to conform to standards of experimental research, the realistic process of translation must be reduced to … very artificial data. Corpus data are at an advantage here, because the data can be natural realistic data” (p. 352). The author advocates using both types: “hypotheses about language production are initially tested on product data (corpora), which usually yields correlations between situational variables and patterns in the product. Any further progress towards causal explanations involves experiments and predictions, and it is this combination of product and process data that brings us closer to causal in addition to correlational explanations.”

Neumann and Serbina (2021) also discuss the contribution of corpus research to findings on the cognition of translation. They point to the observational nature of corpus research, which can be both a pro and a con compared to experimental research: An experiment controlling for unwanted effects can lead to more robust findings than studying a corpus. On the other hand, experiments are sometimes “criticized for the extent to which potentially confounding factors are excluded…. A major advantage of corpus-based investigations is the authenticity of the data” (p. 191). At the same time, the authors caution against drawing conclusions about cognitive processes in translation from corpus data alone, which reflect only the final product of that process. They described several recent corpus-based studies of cognition in translation integrating both product- and process-oriented analysis.

Hansen-Schirra and Nitzke (2021) argue for a “process-product interface,” to achieve a  holistic understanding of cognitive processes in translation. One type of question explored in this way is what features are typical of translated texts and why. The authors outline the product perspective represented by corpus linguistics, as well as the process perspective embodied in translation process research, discussing the strengths and shortcomings of each. They then address the challenge of creating an interface between these two branches of translation research. They give examples of how this is being done in empirical studies, by applying different research methods to different sets or the same set of data.

2.1.5 Subjective ratings and descriptions

One source of data which can shed light on translation or interpretation difficulty and which doesn’t come from observing either the product or the process is subjective ratings or descriptions by participants in experiments. Such ratings or descriptions can be given at the beginning of an experiment, with participants being asked, for example, to assess the difficulty of a source text or text item for translation. They can also be given retrospectively at the end of an experiment, with participants reflecting on their perceived experience.

Campbell’s (1999) empirical study, described in section 2.1.1 above, correlates the difficulty of source text items as assessed by the range of translation solutions for those items with participants’ subjective ratings of their difficulty. Liu and Zheng (2022)’s study, also described in the same section, correlates the number of special terms in the source text with participants’ subjective ratings of source text difficulty.

Of course, such individual ratings of difficulty can be very useful. But they can only be used to compare different assessments made by the same person – for example, on the relative difficulty of various texts or text items. Individual subjective ratings don’t produce comparable results between different people. If one person rates a task as a 5 on a scale of 1 to 10 and another person rates the same task as a 7, that doesn’t necessarily mean that the task is easier for the first person than it is for the second one, as there’s no objective basis for comparison.

This is particularly true for subjective ratings by translators or interpreters who work in language pairs with different degrees of structural difference, such as the language pairs considered in the present study. Here’s why: Let’s say a translator or interpreter works only in one language pair. Their only basis for comparison in assessing the difficulty of a translation or interpretation task will be their experience with similar tasks in the same language pair. This study suggests that structural difference in a language pair may be a major source of translation or interpretation difficulty. If that’s true, then one person’s assessment based on their experience in a language pair with a certain degree of structural difference won’t be comparable to a different person’s assessment based on their experience in a language pair with a different degree of structural difference.

What if a translator or interpreter works in more than one language pair? Even then, those pairs are likely to be similar in degree of large-scale structural difference. A French translator or interpreter might work between French and English and between French and German – both pairs of structurally similar languages according to the typology used in this study. A  Turkish translator or interpreter might work between Turkish and English and between Turkish and German – both pairs classified as structurally opposite in this study. Each colleague’s assessment will be based on their experience working in language pairs with a certain degree of structural difference. So their assessments will still be incomparable.