The modern organization: Translation challenges
The Google Search Appliance (GSA) has reached Version 7. The new features that were emphasized include entity recognition. The GSA 7 can automatically identify people, places and things. In prior versions, the Google Search Appliance indexed by key words.
Other changes include a preview feature. A user can see thumbnail images of documents in a GSA results list. Instead of launching an application like Microsoft Word, the preview feature may save time and an extra mouse click.
The newest version of the appliance can index content from more sources. The GSA can be configured to add information from social media, password-protected sites on the Web and Microsoft SharePoint. Users can access the search function from traditional desktop computers as well as mobile devices.
One feature that has not been given as much attention as faceted searching and SharePoint support is the GSA's multilanguage capabilities. For example, the GSA can now perform translation of search results. According to Google:
"Starting with release 7.0, the Google Search Appliance can translate titles and snippets in search results, as well as cached documents into the user's language in real time. The user's language is determined by the default language set in the user's browser. When translation is enabled, translation links appear in search results. The user can translate everything on a results page or just individual titles, snippets or cached documents. You can enable or disable translation for any front end."
Google has deep expertise in real-time translation. The company's approach is to use a range of modeling methods, smart software, dictionaries and the massive data available to the company. In the last two years, Google has expanded the number of languages it supports across a broad range of its services.
If you have not used Google Translate via a Web browser, you may want to take the free service for a test drive. The free online system supports low-latency translation of text and complete Web pages in languages such as French and German. Here is a downloadable list of the languages supported by Google's translation systems.
Google has one of the most technically robust real-time translation systems available today. What makes the system more astounding is that as more data flows through the Google system, Google's methods seem to reduce the time required to add a new language while simultaneously improving the quality of the translations provided. For many enterprise search licensees, multilanguage support and near real-time translation are not included with the basic system. Licensing translation technology can add significantly to the cost of some enterprise search systems.
Has Google moved to a "game over" position in enterprise search? Basis Technology, Microsoft and other diversified technology companies have capable machine translation technology. At the Association for Machine Translation in the Americas' 2012 conference, the program featured presentations from the U.S. government's advanced research projections unit, Xerox, universities from around the world, and Web giants like Yahoo and Google. The Google paper, authored by Wei Wang and four other members of the translation team, "Improved Domain Adaptation for Statistical Machine Translation," concludes:
"We explore simple but effective ways of using domain resources, showing domain accuracy improvements made by the use of bilingual training data, domain development data and domain language models. We show the importance of using large amounts of generic training data, particularly in the case where the domain detection error rates for different domains are unbalanced."
Google's broad approach strikes me as the apex of well-understood systems and methods. Its translation prowess results from its ability to manipulate massive amounts of training data within the Google supercomputer.
At a recent technical conference, I spoke with the chief technical officer of IMT Holdings. The firm's approach, according to Mike Sorah, CTO and developer of the IMT Rosoka system, is quite possibly an advance in machine translation.
Unlike many translation systems, IMT Rosoka includes the languages in the product. This means that there are no extra fees charged to add another language. Some of the specialist vendors charge per language "pack" or "blade."
Sorah says, "We designed Rosoka from the ground up to address the shortcomings in natural language processing." What Rosoka does is provide a system that can process a document that contains multiple languages without human intervention. If a memorandum from a colleague in Shanghai is written in French and contains a two-paragraph chunk in Chinese, the Rosoka system can seamlessly render the document in English if that is what the reader requires. Sorah explains, "Our system automatically understands that a document may be in English or Chinese, or even English and Spanish mixed."
He continues, "Rosoka provides transliteration and entity translation capabilities ... for languages that are not written in Roman script, such as Arabic, Chinese, Farsi, Japanese, Korean and Russian. So, if a Korean document contains Korean characters, Rosoka will provide a transliteration of a phrase like Vladimir Putin. Then that document can now be found by entering the query Vladimir Putin instead of the Korean query."