Markup in translations?
First published at Saturday, 22 December 2007
Warning: This blog post is more then 16 years old – read and use with care.
Markup in translations?
In web applications I want to make accessible to users with different languages, I normally have 3 types of strings / texts:
This content is entered either by users or the website editors. The content uses some markups which depends on the application, maybe (X)HTML or some specialized markup. It is the same for all languages and you already have some kind of interface or educated editors, for this markup, so you get no problems here.
The standard translations for strings used in forms or in the navigation, like "User", or "Password". No problem here, because you normally don't need any inline markup.
There may be lots of cases, where the same backend is used for "static content" and the dynamic content, so you won't get a problem here. But I like to store static contents, like the legal info or contact information in another (faster) storage. And there the problems start.
Markup in static contents
Those static contents, like on the contact information page on busimess.org may contain some markup, or at least some links. If you don't want to use markup there you get to put all links (as it is done there) below the real text, so the translator won't see them.
I just saw my brother translating text for his website, which runs on top of Zope 3. He used (X)HTML for the original text and for the translated text. This is of course used unfiltered and unescaped in the application (otherwise the markup won't work), which could be used to introduce some XSS by the translators - unlikely to happen though. No problem in his case, because he is the only one translating the contents...
Which markup language to use?
(X)HTML is a quite domain specific markup language, and also known by a lot translation agencies. But the attack vector and the domain specific nature of the language makes it somehow awkward to use. - You could of course filter the (X)HTML using HTMLPurifier, or similar.
Are there any other, better markup languages to use in this case? BBCode, Wiki-Markup or RST won't work either, I suppose, because only very few translation agencies will know about them. Translations sometimes may make it necessary to completely restructure a sentence or even a complete paragraph. So the translator need to understand the markup.
Which markup language do you use in such cases?