Markup in translations?

First published at Saturday, 22 December 2007

Warning: This blog post is more then 16 years old – read and use with care.

Markup in translations?

In web applications I want to make accessible to users with different languages, I normally have 3 types of strings / texts:

  • Dynamic content

    This content is entered either by users or the website editors. The content uses some markups which depends on the application, maybe (X)HTML or some specialized markup. It is the same for all languages and you already have some kind of interface or educated editors, for this markup, so you get no problems here.

  • User Interface

    The standard translations for strings used in forms or in the navigation, like "User", or "Password". No problem here, because you normally don't need any inline markup.

  • Static content

    There may be lots of cases, where the same backend is used for "static content" and the dynamic content, so you won't get a problem here. But I like to store static contents, like the legal info or contact information in another (faster) storage. And there the problems start.

Markup in static contents

Those static contents, like on the contact information page on busimess.org may contain some markup, or at least some links. If you don't want to use markup there you get to put all links (as it is done there) below the real text, so the translator won't see them.

I just saw my brother translating text for his website, which runs on top of Zope 3. He used (X)HTML for the original text and for the translated text. This is of course used unfiltered and unescaped in the application (otherwise the markup won't work), which could be used to introduce some XSS by the translators - unlikely to happen though. No problem in his case, because he is the only one translating the contents...

Which markup language to use?

(X)HTML is a quite domain specific markup language, and also known by a lot translation agencies. But the attack vector and the domain specific nature of the language makes it somehow awkward to use. - You could of course filter the (X)HTML using HTMLPurifier, or similar.

Are there any other, better markup languages to use in this case? BBCode, Wiki-Markup or RST won't work either, I suppose, because only very few translation agencies will know about them. Translations sometimes may make it necessary to completely restructure a sentence or even a complete paragraph. So the translator need to understand the markup.

Which markup language do you use in such cases?


Comments

Thomas Koch at Saturday, 22.12. 2007

Hi Kore,

I do not really have an answer here, since even the developers of the XLIFF[1] standard have not yet found[2] the final solution. XLIFF doesn't try to have it's own markup for inline formating, but has tags[3][4] to escape the inline tags of other markup. Still you can indicate the meaning of an escaped markup tag with the XLIFF attribute ctype[5].

As a resource on translation of web application you may want to read articles from Gábor Hojtsy[6][7] and his thesis[8] "Multilingual Web Applications with Open Source Systems". He's a core contributor of Drupal ;-).

[1] http://wiki.oasis-open.org/xliff/ [2] http://wiki.oasis-open.org/xliff/XLIFF2.0Goals [3] http://www.lisa.org/globalizationinsider/2007/05/xml_in_localisa.html [4] http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html "Format handling" [5] http://docs.oasis-open.org/xliff/v1.2/cs02/xliff-core.html#ctype [6] http://buytaert.net/gabor-hojtsy [7] http://www.developmentseed.org/blog/g-bor-hojtsy [8] http://buytaert.net/files/gabor-hojtsy-thesis.pdf

P.s. I miss the preview button on your blog. :-)

Edward Z. Yang at Saturday, 22.12. 2007

In my opinion, if you insist of giving static content a "faster backend," this static content should still be XHTML, and any updates to this should be hand vetted by a knowledgeable admin before they go live. I would trust my translators; if I didn't, I probably ought to reconsider my translation agency.

However, I don't see why you can't just use the dynamic backend, and then cache the results. Usually, the speed ends up the same, but you get the benefit of a nicer interface for the user.

CMS Review at Saturday, 22.12. 2007

I think the best way is use html and allow only some tags.

Jan at Saturday, 22.12. 2007

Markdown is super-intuitive and very easy to learn.

Lars Strojny at Monday, 28.4. 2008

GTK+ based tools does it with a subset of HTML (the subset pango understands). This seems to me like a pragmatic approach.

Subscribe to updates

There are multiple ways to stay updated with new posts on my blog:

And finally you can also subscribe to the mailing list, where every new blog post is also posted.