The Multilingual Web: Keynote Panel Parts 1-6, TAUS 2011

Détails: Catégorie : Plurilinguisme, médias et NTIC; Mis à jour : 30 Janvier 2024

Source: TAUS, Monday, 24 October 2011

We need to talk about the multilingual web

The multilingual web has been an implicit item on the TAUS agenda since the beginning. This year’s annual User Conference in Santa Clara offered a golden opportunity to invite a keynote panel to drill down into this concept and come up with a status report on what is happening to global content, standards and language processing on the world’s largest piece of shared infrastructure and how it meshes with translation automation.

With Rose Lockwood as moderatrix, Bruno Fernandez Ruiz, Fellow and Vice President of Yahoo!, Bill Dolan, Head of NLP Research at Microsoft, and Addison Phillips, Group Chair of W3C I18N and Globalization Architect at Lab126, discussed a broad range of issues, from localizing big data to the future of the translation profession.

Here are five “themelets” from what we hope will be the start of an ongoing conversation about the significance of multilingual “language intelligence”. It highlights the essentials of what was a highly stimulating conference opener.

1. Big web data is scattered data. And big data will increasingly mean lots of short form data, with more granular applications, snippets of information, brief articles, a subtitle here, a comment there, rather than massive documentation localization campaigns to accompany a product’s lifecycle. Big web data is splintered across around a myriad of devices and processes and cultures where there is no one-to-one mapping with a given language. Which technologies will enable this data to be localized or translated efficiently when needed when there is no one place where linguistic knowledge about such data can exist?
2. HTML 5 represents a fundamental change in the architecture of web standards. It will have the power to make text “active” and turn it into something that can be handled by an app, not merely exist as a lifeless page. For those who believe in the need for taxonomies and ontologies to enhance the translatability of text, this means that semantic annotation can be delivered as an app to a document. Localization and translation will become web services.
3. Meaning can be conceived as an emergent property of big language data. So human attempts to classify meaning into taxonomies and ontologies are not always essential for processing text. It is quicker and even more effective to get machines to learn from large data sets and then translate and paraphrase using shallow processing techniques. These activities all involve the transfer of ‘meaning’ in some obvious sense but do not depend on having a prior map of universal categories to get the machine to work. At the same time, the fact that language is “grounded” in entities and objects for activities such as searching means that companies have access to large amounts of multilingual data language data that is automatically tied to objects, offering a huge resource of “meaningful” language data for translation.
4. End users on the receiving end of translation (and post-editors) can provide useful feedback on any kind of translation, and the technology is there to ensure this feedback can be leveraged by the system. “Every time you give users a choice, you learn something.” But this in turn depends on users being able to read or hear content in their language. There needs to be a determined effort to collect (or create) more data (both speech and text) for less-resourced languages to avoid a linguistic digital divide on the supposedly multilingual web.
5. The translation industry seems destined to divide into two sub-activities: a high end human translation business, and an automated translation pipeline for commodity content. Research is nowhere near replicating how human translation is really done, and machine translation is not a replacement for humans. Yet ironically we have gradually forced humans to translate by “n-gram spotting” just as machines do, sentence by sentence or segment by segment. This is an artifact of automation: high quality translation is meaning-driven not segment-based. But there is a lot of room for machine translation on a truly multilingual web.

At the TAUS User Conference 2011, our keynote panelists drill down into the concept of the multilingual web and come up with a status report on what is happening to global con…

TAUS has always been committed to tracking lab-to-market developments in the multilingual web space as part of its core agenda. We are also one of the 20 research and industry partners in the EU Multilingual Web project coordinated by W3C, where it represents the translation technology and services user community in this two-year exploration of standards and best practices to support the creation, localization and use of multilingual web-based information. We will continue tracking and reporting on developments in this space. See the whole of this fascinating keynote debate here!