Finding your language wallah

This article appears in the issue February 2010 [Volume 19, Issue 2]
Old joke: If you can speak three languages, you are trilingual. If you speak two languages, you are bilingual. If you speak one language, you are an American.

The problem is that other languages exist. In Brazil, educated professionals speak Portuguese and two or three other languages. For everyday work and communication, Portuguese is still the dominant language. The same fact holds true in most economic powerhouses. To keep pace with information available on Web sites, in Web logs and even Tweets, translating a source language into my native language (English) is becoming more important. Even though I lived in Brazil and had a working knowledge of Portuguese, I need software safety nets.

Like many knowledge workers, I have relied on software that takes a source language such as French, German or Japanese and translates it into English. I have a number of translation resources. I use some open source tools built on the GNU gettext framework—for example, code from Google’s gettext commons. I have experimented with a range of shareware products. If you want to give some of those systems a trial run, navigate to

When was the big dog in search, I found the Babel Fish online translation service useful. Now part of the Yahoo suite of online services, Babel Fish (based on Systran’s technology) can handle some lightweight translation tasks. If you are not familiar with this service, navigate to I had to click past an ad for hot fudge brownies to access the service, but it’s free and works reasonably well.

Today most organizations are like the fabled city of Babylon. In ancient times, so the legend goes, the residents of Babylon had a tough time communicating. Organizations have the same problem and not just with customers who speak a different language. Even small firms have to deal with an ever-growing flow of information that might be in Arabic or Ukrainian. Most organizations struggle with a confusion of tongues, converting an office complex in San Jose to a mini Tower of Babel.

The need to process Web log content, e-mail messages and even 140-character Tweets requires a swift, reliable, automated way to translate electronic information. The brute force method once relied on human translators. Today, the flow of information makes the traditional methods too expensive and slow.

Machine translation systems (sometimes inaccurately labeled online translation systems or automatic translation systems) have been available for many years. They work reasonably well, typically delivering a translation that can be used to get the gist of a source document. The machine translation systems often do a much better job with scientific, medical and technical source documents than with more colloquial source material. General business writing falls somewhere in the middle in accuracy and usability.

Government entities have been important customers for vendors of machine translation technology. Intelligence agencies and research groups have an insatiable appetite for translations of a wide range of source content. The volume of content that requires translation continues to skyrocket. Neither money nor human translators can keep up with the amount of material that must be translated. Machine translation now carries the burden.

A moving target

The capabilities of machine translation systems often extend beyond taking a source document in French and generating a translation in English. Today’s systems in use in certain government agencies perform translation and then identify themes, entities such as individuals and organizations, and concepts. Some systems perform additional text analysis to tag each component of a source document with a theme. A handful of vendors have ventured into sophisticated analytics that identify the sentiment or emotion expressed in a document and give each document or “fact” in a document a reliability or confidence grade.

Those advanced functions are an exciting area of research at universities and language research centers worldwide. The problem, of course, is that language continues to change. The nature of human communication manifests man’s creativity. A group discussion becomes an online discussion and then morphs into a “WebEx” or “webinar.” Software has to be quite intelligent to take “Facebook” in a company report and translate that into réseau social. The problem is that language is a moving target. Software can change but it often has some difficulty with idioms, neologisms and words that seem to say one thing but mean quite another given the sender’s and recipient’s context. The problem is difficult, and many developers aim to provide systems that are “good enough” when installed. Licensees can add custom dictionaries and interact with the software system to “teach” it to handle certain types of content.

