Think Human Translators Will Be Replaced By Machines? Not So Fast!

In line with the previous piece about corporate narratives discouraging cultural exploration and language learning, there is a corollary that I hear more often and sadly some people whom I respect very deeply still believe it:

Namely, the idea that translation, along with many other jobs, will be replaced entirely by machines (again, a lot of misinformation that I’m going to get into momentarily)

My father went so far to say that my translation job wouldn’t be around in a few years’ time.

Iso an Jekob

I don’t blame him, he’s just misinformed by op-eds and journalists that seek to further an agenda of continued income inequality rather than actually looking at how machine translation is extremely faulty. After all, fewer people believing that learning languages is lucrative means that fewer people learn languages, right? And money is the sole value of any human being, right?

I am grateful for machine translation, but I see it as a glorified dictionary.

But right now even the most advanced machine translation in the world has hurdles that they haven’t even gotten over, but haven’t even been ADDRESSED.

I will mention this: if machine translation does end up reaching perfection, it will almost certainly be with very politically powerful languages very similar to English first. (The “Duolingo Five” of Spanish, French, Italian, German and Portuguese would be first in line. Other Germanic Languages, with the possible exceptions of Icelandic and Faroese, would be next.)

If the craft “dies” in part, it will be in this sector first (given as it is the “front line”). Even then, I deem it doubtful (although machine translation reaching perfection from English -> Italian is a thousand times more likely than it reaching perfection from English -> Vietnamese) But with most languages in the world, translators have no fear of having their jobs being replaced by machines in the slightest.

Because the less powerful you get and the further you get away from English, the more flaws show up in machine translation.

Let’s hop in:

 

  • Cultural References

 

Take a look at lyricstranslate.com (in which using machine translation is absolutely and completely forbidden). You’ll notice that a significant amount of the song texts come with asterisks, usually ones explaining cultural phenomena that would be familiar to a Russian- or a Finnish-speaker but not to a speaker of the target language. Rap music throughout the world relies heavily on many layers of meaning to a degree in which human translators need to rely on notes. Machine translation doesn’t even DO notes or asterisks.

Also, there’s the case in which names of places or people may be familiar to people who speak one language but not those who speak another. I remember in Stockholm’s Medieval Museum that the English translation rendered the Swedish word “Åbo” (a city known in English and most other languages by its Finnish name “Turku”) as “Turku, a city in southern Finland” (obviously the fluent readers of Scandinavian Languages needed no such clarification).

And then there are the references to religious texts, well-known literature, Internet memes and beyond. In Hebrew and in Modern Greek references to or quotes from ancient texts are common (especially in the political sphere) but machine translation doesn’t pick up on it!

When I put hip-hop song lyrics or a political speech into Google Translate and start to see a significant amount of asterisks and footnotes, then I’ll believe that machine translation is on the verge of taking over. Until then, this is a hole that hasn’t been addressed and anyone who works in translation of cultural texts is aware of it.

 

  • Gendered Speech

In Spanish, adjectives referring to yourself are different depending on your gender. In Hebrew and Arabic, you use different present-tense verb forms depending on your gender as well. In languages like Vietnamese, Burmese, and Japanese different forms of “I” and “you” contain gendered information and plenty of other coded information besides.

What happens with machine translation instead is that there are sexist implications (e.g. languages with a gender-neutral “he/she” pronoun such as Turkic or Finno-Ugric Languages are more likely to assume that doctors are male and secretaries are female).

Machine Translation doesn’t have a gender-meter at all (e.g. pick where “I” am a man, woman or other), so why would I trust it to take jobs away from human translators again?

On that topic, there’s also an issue with…

 

  • Formality (Pronouns)

 

Ah, yes, the pronouns that you use towards kids or the other pronouns you use towards emperors and monks. Welcome to East Asia!

A language like Japanese or Khmer has many articles and modes of address depending on where you are relative to the person or crowd to whom you are speaking.

Use the wrong one and interesting things can happen.

I just went on Google Translate and, as I expected, they boiled down these systems into a pinhead. (Although to their credit, there is a set of “safe” pronouns that can more readily be used, especially as a foreign speaker [students are usually taught one of these to “stick to”, especially if they look non-Asian]).

If I expect a machine to take away a human job, it has to do at least as well. And it seems to have an active knowledge of pronouns in languages like these the way a first-year student would, not like a professional translator with deep knowledge of the language.

A “formality meter” for machine translation would help. And it would also be useful for…

 

  • Formality (Verb Forms)

 

In Finnish the verb “to be” will conjugate differently if you want to speak colloquially (puhekieli). In addition to that, pronouns will also change significantly (and will become shorter). There was this one time I encountered a student who had read Finnish grammar books at length and had a great knowledge of the formal language but NONE of the informal language that’s regularly used in Finnish-Language vlogging and popular music.

Sometimes it goes well beyond the verbs. Samoan and Fijian have different modes of speaking as well (and usually one is used for foreigners and one for insiders). There’s Samoan in Google Translate (and Samoan has an exclusive and inclusive “we” and Google Translate does as well with that as you would expect). I’m not studying Samoan at the moment, nor have I even begun, but let me know if you have any knowledge of Samoan and if it manages to straddle the various forms of the language in a way that would be useful for an outsider. I’ll be waiting…

 

  • Difficult Transliterations

 

One Hebrew word without vowels can be vowelized in many different ways and with different meanings. Burmese transliteration is not user-friendly in the slightest. Persian and Urdu don’t even have it.

If I expect a machine to take my job, I expect it to render one alphabet to another. Without issues.

 

  • Translation Databases Rely on User Input

 

This obviously favors the politically powerful languages, especially those from Europe. Google Translate’s machine learning relies on input from the translator community. I’ve seen even extremely strange phrases approved by the community in a language like Spanish. While I’ve seen approved phrases in languages like Yiddish or Lao, they’re sparse (and even for the most basic words or small essential phrases).

In order for machine translation to be good, you need lots of people putting in phrases into the machine. The people who are putting phrases in the machine are those with access to computers, not ones who make $2 a day.

In San Francisco speakers of many languages throughout Asia are in demand for being interpreters. A lot of these languages come from poor regions that can’t send a bunch of people submitting phrases into Google Translate to Silicon Valley.

What’s more, there’s the issue of government support (e.g. Wales put its governmental bilingual documents into Google Translate, resulting in Welsh being better off with machine translation that Irish. The Nordic Countries want to preserve their languages and have been investing everything technological to keep them safe. Authoritarian regimes might not have the time or the energy to promote their languages on a global scale. Then again, you also get authoritarian regimes like Vietnam with huge communities of expatriates that make tech support of the language readily available in a way that would make thousands of languages throughout the world jealous).

 

  • Developing World Languages Are Not as Developed in Machine Translation

 

Solomon Islands Pijin would probably be easier to manage in machine translation that Spanish, but it hasn’t even been touched (as far as I know). A lot of languages are behind, and these are languages spoken in poor rural areas in which translators and interpreters are necessary (my parents worked in refugee camps in Sudan, you have NO IDEA how much interpreters of Tigre were sought after! To the degree in which charlatans became “improvisational interpreters”, you can guess how long that lasted.)

Yes, English may be the official language of a lot of countries in Africa and in the Pacific (not also to mention India) but huge swathes of people living here have weak command of English or, sometimes, no command.

The Peace Corps in particular has tons of resources for learning languages that it equips its volunteers with. Missionaries also have similar programs as well. Suffice it to say that these organizations are doing work with languages (spanning all continents) on a very deep level where machine translation hasn’t even VENTURED!

 

  • A Good Deal of Languages Haven’t Been Touched with Machine Translation At All

 

And some of this may also be in part due to the fact that some of them have no written format, or no standardized written format (e.g. Jamaican Patois).

 

  • Text-To-Speech Underdeveloped in Most Languages

 

I’m fairly impressed by Thai’s Text-to-Speech functionality in Google Translation, not also to mention those of the various European Languages that have them (did you know that if you put an English text into Dutch Google Translate and have it read out loud, it will read you English with a Dutch accent? No, really!)

 

And then you have Irish which has three different modes of pronunciation in addition to a hodge-podge “standard” that is mostly taught in schools and in apps. There is text-to-speech Irish out there, developed in Trinity College Dublin, It comes in multiple “flavors” depending on whether you want Connacht, Ulster or Munster Irish. While that technology exists, it hasn’t been integrated into Google Translate in part because I think customization options are scary for ordinary users (although more of them may come in the future, can’t say I know because I’m not on the development team).

 

For Lao, Persian, and a lot of Indian regional languages (among many others), text-to-speech hasn’t even been tried. In order to fully replace interpreters, machine translation NEEDS that and needs it PERFECTLY. (And here I am stuck with a Google Translate that routinely struggles with Hebrew vowelization…)

 

  • Parts of Speech Commonly Omitted in Comparison to Other Languages

 

Some languages, like Burmese or Japanese, often form sentences without any variety of pronoun in the most natural way of speech. Instead of saying “I understand” in Burmese, you would literally say “ear go-around present-tense-marker” (no “I”, although you could add a version of “I” and it would still make sense). In context, I could use that EXACT same phrase as the ear going around to indicate “you understand” “we understand” “the person behind the counter understands”.

In English, except in the very informal registers (“got it!”) we usually need to include a pronoun. But if machine translation should be good enough to use in sworn interviews and in legal proceedings, they should be able to manage when to use pronouns and when not to. Even in a language like Spanish adding “yo” (I) versus omitting it is another delicate game to play, as is the case with most languages in which person-information is coded into the verb (yo soy – I am, but soy could also mean “I am” as well)

Now take a language like Rapa Nui (“Easter Island Language”). Conjunctions usually aren’t used (their “but” comes from Spanish as a loan word! [pero]). Now let’s say a machine has to translate from Rapa Nui into English, how will the “and” ‘s and “but” ‘s be rendered in a way that is natural to an English speaker?

 

Maybe the future will prove me wrong and machine translation will be used in courts instead of human beings. But I’ll come closer to believing it when these ten points are done away with SQUARELY. Until then, I’ll be very skeptical and assure the translators of the world that they are safe in their profession.

 

 

ga

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s