4.2.1 Procedure
4.2.1.1 Variables
The corpus of sentences analyzed in this study consists of three legal texts, five subtitled talks and one simultaneously interpreted speech. That makes for a total of 1,136 original English versions of sentences, translated or interpreted into 5,680 other language versions, with 29,536 variable values recorded.
The statistical analysis on that data was carried out by an expert at the Statistical Methodology and Computing Service at the University of Louvain. The analysis first produced descriptive statistics on translated or interpreted versions of English sentences as observed in the corpus data. Based on those results, the analysis also produced predictive statistics on the response of each dependent variable to each pair of independent variables, and to all three independent variables acting together. Those predictions can be applied to similar texts, talks and speeches not in the corpus.
The descriptive analysis included three independent variables: mode, target language and sentence complexity.
The first independent variable, mode, refers to the mode of language transfer. This variable could take one of three values: legal translation, subtitle translation or simultaneous interpretation.
The second independent variable, target language, refers to the language into which a given English sentence is translated or interpreted. This variable could take one of five values: Russian, Hungarian, Turkish, Mandarin or Japanese. Based on those values, the descriptive analysis included two structural sub-variables involving the branching direction of subordinate clauses as established in studies of language typology. One independent sub‑variable was difference in the branching direction of relative clauses. This sub‑variable could take one of three values: same (for Russian, where relative clauses typically branch to the right, as in English), moderately different (for Hungarian, where relative clauses typically branch either way) or opposite (for Turkish, Mandarin and Japanese, where relative clauses typically branch to the left). The other independent sub‑variable was difference in the branching direction of complement clauses. This sub‑variable could take one of two values: same (for Russian, Hungarian and Mandarin, where complement clauses typically branch to the right, as in English) or opposite (for Turkish and Japanese, where complement clauses typically branch to the left).
Based on those two structural sub-variables in the descriptive analysis, the predictive analysis included a single combined independent variable for structural difference of the language pair. That combined variable refers to the difference between English and each other language in the typical branching direction of subordinate clauses in general. As we saw in section 3.3.1.2 on branching direction typology, for the five target languages in this study, the typical branching direction of relative clauses can be used to uniquely predict the typical branching direction of adverbial clauses, which constitute the third broad category of subordinate clause recognized cross-linguistically. So, to avoid redundant data, our combined structural variable suffices with reflecting the typical branching direction of relative and complement clauses in the target language compared to English. This variable could take one of four values: same (for Russian, where both relative and complement clauses typically branch to the right, as in English), somewhat different (for Hungarian, where relative clauses typically branch either way and complement clauses typically branch to the right), moderately different (for Mandarin, where relative clauses typically branch to the left and complement clauses typically branch to the right) or opposite (for Turkish and Japanese, where both relative and complement clauses typically branch to the left).
The third independent variable, sentence complexity, refers to the number of functionally subordinate or reported propositions in the original English version of a sentence. (A functionally subordinate proposition is one which doesn’t make an assertion or ask a question and can’t be rephrased as an independent sentence.) This variable could take any integer value. (The highest number observed in a single sentence in the corpus was 30.) To simplify calculation, the model used in the predictive analysis considered five sample values for this variable, covering a representative range of sentence complexity: simple (3 subordinate propositions), somewhat complex (6 subordinate propositions), moderately complex (9 subordinate propositions), very complex (12 subordinate propositions) and extremely complex (15 subordinate propositions).
The statistical analysis also included three dependent variables, recorded separately for each translated or interpreted version of a sentence. Those dependent variables were counts for the three features identified as indicators of difficulty in translation or interpretation – reordering, nesting changes and changes in semantic relations. Counts for nesting changes were subdivided into counts for changes in single nestings, double nestings and triple nestings.
4.2.1.2 Formulas
The analysis first produced descriptive statistics reflecting the value of each dependent variable corresponding to the values of the independent variables as observed in the corpus data. Based on those descriptive statistics, linear regression analysis was then used to produce formulas predicting the mean response of each dependent variable to the independent variables. The predictive formulas were produced with the glmmTMB (generalized linear mixed models using Template Model Builder) statistical modeling package for the R computing environment. Generalized linear models are used to predict mean rates for dependent variables which take count values, like the dependent variables in this study. Mixed models are used to reflect the effects on the dependent variables of factors other than the independent variables being tested, as explained below.
The statistical analysis found significant interactions among all three independent variables. So each of the predictive formulas described here was based on those interactions. Section 4.2.3 details those interactions and explains why, when such interactions are present, testing independent variables on their own can produce misleading results.
The predictive formula for each dependent variable was based on the counts for that variable corresponding to counts for the three independent variables and their interactions as observed in the corpus. Each of those independent variables (mode, structural difference of the language pair and sentence complexity) has a set of possible values, one of which is taken as a reference value. The predictive formula for each dependent variable consists of a long sum of terms. For each possible value (except the reference value) for each independent variable on its own, the formula contains one term consisting of an estimated coefficient and a single binary element, which can be equal to 0 or 1. For each observed two-way or three-way interaction between possible values of the independent variables, the formula has one term consisting of an estimated coefficient and two or three such binary elements.
Let’s say we want to predict the mean response of a dependent variable to a set of test values for the three independent variables in the study. To do that, we take the predictive formula for the dependent variable in question. Into that formula we substitute 1 for the binary elements in each term corresponding to the three values being tested. Each independent variable can only have one test value at a time. So, when the binary element corresponding to a particular test value for an independent variable is 1, the binary elements corresponding to all the other possible values of that same independent variable are 0. That simplifies the formula in effect to a shorter sum of terms corresponding to the set of values being tested. In each term remaining in that simplified formula, all the binary elements are equal to 1. So each term in the formula is equal to the value of its estimated coefficient.
Such a simplified formula predicting the mean response of a dependent variable to a set of test values for the three independent variables can be presented as shown in (21).
(21) [reordering/nesting/sem.rels] = a(modeval) + a(diffval) + a(compval) + a(modevaldiffval)
+ a(diffvalcompval) + a(modevalcompval) + a(modevaldiffvalcompval) + n + o
In the simplified formula in (21):
[reordering/nesting/sem.rels] is the predicted mean value of one of the dependent variables –
reordering, nesting changes or changes in semantic relations;
modeval, diffval and compval are test values for the independent variables mode, structural
difference of the language pair and sentence complexity;
a(modeval), a(diffval) and a(compval) are estimated coefficients which predict the mean response of
the dependent variable to the test values of the three independent variables on their own;
a(modevaldiffval), a(diffvalcompval) and a(modevalcompval) are estimated coefficients which predict
the mean response of the dependent variable to two‑way interactions between the test
values of the independent variables;
a(modevaldiffvalcompval) is an estimated coefficient which predicts the mean response of the
dependent variable to the three‑way interaction between the test values of the independent
variables;
n is a baseline constant, applied to all sentences, and
o is an “other effects” parameter, reflecting the shared effects of untested factors on all translated
or interpreted versions of each original English sentence.
In addition to estimating coefficients for terms corresponding to possible values of independent variables and their interactions, each such predictive formula for a dependent variable involved calculating a separate baseline constant, n. That baseline constant took into account the mean observed values of the variables in question in the corpus. The baseline constant can be thought of as being like the y-intercept on a four-dimensional graph, indicating the theoretical value of a dependent variable when the three independent variables are all equal to their reference values.
Finally, each formula predicting the mean response of a given dependent variable to the three independent variables involved calculating an “other effects” parameter, o. The first step in doing that was to calculate a separate sentence-level parameter for each original English sentence. Each of those separate sentence-level parameters took into account the difference between the mean observed values of the variables in question throughout the corpus and their observed values in the five translated or interpreted versions of that sentence.
The sentence-level parameter for each original English sentence was calculated so as to reflect the shared effects of untested factors on the various translated or interpreted versions of that sentence. It may be that a given original English sentence contains one or more propositions which are longer or contain more dense information than the average sentence in the same mode and with the same degree of complexity in the corpus. That could lead to a higher count for one or more dependent variables (indicators of difficulty) in one or more translated or interpreted versions of the sentence in question. The sentence in question would then appear to be more difficult as a whole than the average sentence, as measured by those dependent variables, for reasons other than the independent variables being tested. The separate sentence-level parameters calculated in this way were then offset against each other, to calculate an overall “other effects” parameter, o.
The main benefit of including an “other effects” parameter in each predictive formula is that that parameter isolates the effects of untested factors on related groups of observed results. The related groups in question here were the various translated or interpreted versions of each original English sentence in the corpus. With statistical modeling software such as that used in this study, isolating the effects of untested factors in this way greatly increases the accuracy of the coefficients estimated for each term in the predictive formulas. As a result, those coefficients were estimated with a very high degree of statistical confidence (p < 0.001).
The statistical models used in this study produced formulas, as described above, to predict the mean response of each dependent variable to the three independent variables in interaction. As with all statistical models, the actual predictions produced by those formulas are off most of the time. But it’s the best we can do. And, thanks to the inclusion of the “other effects” parameter in each predictive formula, we can have great statistical confidence in the coefficients estimated for each of their terms.
4.2.1.3 Consistency
Another factor which could have a potential effect on the results of our statistical analysis is different segmenting decisions in borderline cases – that is, different decisions applied to the original English version of different sentences as to how to segment a certain structure which could be treated in more than one way. Examples of such borderline segmenting decisions are given in the parsing guidelines, in section 4.5 of annex I.
Applying one or another segmenting decision to a given borderline structure in the original English version of a sentence may result in a higher or lower number of functionally subordinate propositions (a higher or lower complexity count) recorded for that sentence. If such a decision results in a higher complexity count, the additional subordinate proposition as it appears in another language version of the sentence may be in a different linear position with respect to its parent than in the original English version. Or the additional subordinate proposition in another language version of the sentence may split or bring together the predicate and arguments of its parent differently than in the original English version. That would yield a higher complexity count for the sentence, along with a higher count for reordering or for nesting changes in the language pair in question than would be the case if a decision was made to segment the borderline structure in a way that resulted in a lower complexity count. Either way, the association between complexity and difficulty in the language pair in question would be reinforced.
Finally, some general segmenting rules given in the semantic parsing method in annex I could have been established differently. For example, section 1.7 of annex I explains that a process nominal (a nominal describing a process, event or situation) is to be segmented as the predicate of a separate proposition if it has any arguments or adjuncts. The parsing method as described in annex I recognizes that phrases like “climate change” and “sustainable development” are set terms and, as such, may be less internally processed than other constructions and may have established equivalents in other languages. But such phrases still have argument structure, and there’s no objective way to determine to what extent they may or may not be internally processed. So this study segments all such constructions consistently as separate propositions.
But what if that general segmenting rule had been established differently? What if a decision had been made not to segment set phrases like “climate change” and “sustainable development” as separate propositions throughout the corpus? Aside from the greater uncertainty that would have been created by the impossible-to-pin-down criteria for what constitutes a “set phrase,” what effect would such a decision have had on the statistical results? Not segmenting such phrases as separate propositions would have resulted in a lower number of functionally subordinate propositions (a lower complexity count) being recorded for some sentences. In such a sentence, the eliminated subordinate proposition may have been one that our actual analysis, applying the segmenting rules as established, shows as being in a different linear position with respect to its parent in another language version of the sentence than in the original English version. Or the eliminated subordinate proposition may have been one that, applying the rules as established, splits or brings together the predicate and arguments of its parent differently in another language version of the sentence than in the original English version. In such a case, establishing a rule that didn’t treat a set phrase like “climate change” or “sustainable development” as a separate proposition would have yielded a lower complexity count for the sentence, along with a lower count for reordering or for nesting changes in the language pair in question than was actually recorded in the statistical analysis. That would again have reinforced the association between complexity and difficulty in the language pair in question.
As explained in section 4.5 of annex I, care has been taken in this study to segment equivalent propositions in all language versions of a given sentence the same way. If the original English version of a sentence is segmented as one proposition – or two or three – and if the equivalents of those propositions in other language versions are judged to have the same information content and functional status as in English, those versions are divided into the same number of segments as the English version. This makes the various language versions of each sentence easier to compare, minimizing the impact of minor phrasing differences between languages on the values of variables in the data. This represents a deliberately conservative choice to refrain from recording some information, in order to avoid any suggestion that the counts for indicators of difficulty in structurally different language pairs – which are already comparatively high – have been inflated by the inclusion of irrelevant data.
Guided by these principles of consistency in parsing, the formulas described above were calculated to predict, as accurately as possible, the mean response of each dependent variable to each pair of independent variables, and to all three independent variables acting together. If our corpus is considered representative, those predictions can be applied to other similar texts, talks and speeches. The results of the descriptive analysis are presented in the next section. The results of the predictive analysis are presented in section 4.2.3.
← 4.2 Statistical analysis
→ 4.2.2 Observations