I18n Flock

From Flock Community

Jump to: navigation, search

This article explains how to write code for Mozilla platform in the way that can be internationalized and is easy to localize.

Contents

[edit] I18n and L10n

Software Localization, also known as L10n, is a process of translating software user interfaces from one language to another and adapting it to suit a foreign culture. In the localization process, an individual or group prepares the software for a particular locale.

Internationalization, or I18n, is the adaptation of products for potential use virtually everywhere. Internationalization is a software layer that is writen with care, making it possible to adapt the software to any culture, locale or country.

Coders should be mostly aware of I18n. L10n is possible and easy if I18n is done correctly.

[edit] A few facts about the world's languages...

... or where your code might be used.

On planet Earth live 6.5 billion people, speaking at least 105 official languages (while in fact there are around 39,491 languages and dialects used by humans). A quick look at a world population density map, or a distribution of world population graph makes it very clear that the default language of software is not the default language of humans.

As of two years ago, only 35% of Internet users used English as their primary language. And the trend is away from English, as more and more people from non-English-speaking countries begin using the Internet. Take a look at human population numbers. 4.9% of humanity speaks native English, 7.9% overall speaks English. Not too many, huh?

  • Most languages in the world, including nine of the ten most commonly spoken, use non-ASCII characters in their alphabets.
  • Most languages have different rules when spoken and written.
  • Two of the most common languages in the world -- (Chinese and Hindi) -- don't use ASCII characters at all.
  • Words have different meaning in different countries. Nada means "nothing" in Spanish and "hope" in Croatian. In Latvian, Vista means "chicken".
  • Many languages have masculine and feminine genders, making it difficult to write dialog box text that speaks directly to the user.
  • Many cultures have very different conventions about how one speaks to others. In many languages it's forbidden to say "You should click here." The proper construction is something like: "The button should be clicked," or "It's requested to click on the button". English is one of the few languages where it is possible to use second person ("you") in official communication.
  • English is one of the few languages where you can mix nouns with prefixes or suffixes to obtain new meanings. Things like "star this favorite," "snippet," and "uploader" are very hard to translate. All the "-able" things -- like "resizable," "wearable," or "singable" -- are just a plain horror.
  • English is a language of short words. Most countries use much longer words; French, Polish, Russian and German are good examples. Those languages are not designed for UI text. Allowing localizers exactly five letter spaces for terms like "tag," "click," "swap," "(to) star," or "drop" results in disaster.
  • Many languages use declension which is an extraordinary invention created by ancient cultures to make localization as hard as possible. Try Czech declension or Slovak declension just to understand how hard it can be. Declension means that you cannot, for example, use the product name in most sentences, because it has to be declined. In Polish, for instance, you can't say "Open in Flock." You must decline it: "Otwórz we Flocku", "with Flock"=>"z Flockiem", "Part of Flock"=>"Część Flocka", "new Firefox"=>"nowy Firefox", but "Browse with Firefox"=>"Przeglądaj z Firefoksem" (rules are very complicated about modifications of the noun). Or you can go with "Open in program Flock," where you decline the word "program" and take "Flock" from the entity &brandFullName;.
  • Languages may be written from left to right, or from right to left, or from top to bottom. Depending on the language, each character could be a letter, sign, word, or even an entire phrase.
  • Most languages capitalize only the first letter in a sentence. Because the up and down shapes of words aid in reading comprehension, capitalizing an entire word makes it less readable.
  • Word order in sentences varies quite a bit.
  • Some languages don't use the elipsis ("...") (and in most cases, it shouldn't be used in English -- editor's note).
  • Many languages use a comma instead of a dot in numbers, and numbers may be listed at the end of a phrase or at the beginning.
  • Never try to hard code -, ", $, ! or ; inside the UI text.
  • Regions use varying formats for date and time. YYYY/MM/DD, DD/MM/YY, MM/DD/YY, DD.MM.YYYY, DD-MM-YYYY are all variations. Don't even try to hard code the date format, ever.
  • In many countries there are serious policy discussions about localization of some words: should those words be taken directly from English, or should new words be created? Competing translations of a single term may emerge, with disagreement between computer programs or even among localizers on one team. In Polish, for example, we still can't find consensus about translations of such terms as blog, blog post, feed, news reader, tab, and toolbar. In many other countries there are ongoing debates on those and other terms.
  • For example, "RSS feed" in polish is translated to "RSS channel" or "RSS source", in french it's "RSS stream". In polish, the word "tab" as they key on the keyboard is "tab", but browser "tab" is "card", in the past it was also translated as "panel", and in spoken language people use the translation of the word "bookmark".
  • Languages vary in use of singular and plural forms. In Japanese there is no difference between singular and plural, while Slavic languages have different forms for zero, one, two through four, and five through infinity. (The numbers from 10 to 20 are treated the same as five to infinity, but all larger number that are multiples of 2, 3, etc. follow the rule for the last numeral, so 35 or 145 uses a different form from 32 or 142.) Other languages may have different rules.

So now, armed with this knowledge, you may begin writing the UI text to help the localizers. If you think it's hard, then close your eyes for a moment and imagine the plight of the people who must localize your software. In Flock's case there are 16 localizations ready. Firefox has nearly 40. These localizers did their work until now without your help. Imagine how hard that was!

[edit] Entities naming scheme

For Flock we are using long entity names. This means that instead of &testBut;, we use &flock.module.element;. There are a few reasons for this:

  • It's easier to find long entity names. Any error tells you exactly which entity you should work on. It's a kind of work-around for Mozilla's meaningless error messages.
  • Localizers have a much greater chance to understand what's going on without looking for each entity in sources. It's not perfect, and sometimes it's better to add a comment, but in most cases it works.
  • We're limiting the risk of entity name collision. You can safely add a second *.dtd file without risking that its entity name will collide with the first one.
  • It is even easier to read the sources and pretend that you understand the entity without looking for its representation in *.dtd files.

Using long entity names creates a difference (but no incompatibility) between Flock and Mozilla, and makes both *.dtd files and sources longer.

When choosing an entity name, it's important to remember that someone must localize it later. There is a big difference between window titles, button labels, descriptions, and tooltips. Suggestions inside entity names are helpful.

You need to figure on your own when to use one *.dtd for many files, and when to split into multiply *.dtds. It's usually a matter of choosing how redundant it will be, and how much the *.dtd file will be used in this *.xul.

[edit] Internationalization of different kind of files

[edit] XUL

For XUL, we use *.dtd files. *.dtd file looks like this:

<!ENTITY flock.test.foo.button "Button nr. 1">
<!ENTITY flock.test.desc.cool  "It's so cool">

For XUL the only thing you must do is write any strings inside entities at the top of the file.

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<!DOCTYPE dialog [
<!ENTITY % fooMainDTD SYSTEM "chrome://flock/locale/foo/main.dtd" >
%fooMainDTD;

<!ENTITY flock.foo.main.test "Test" >
]>

<window
    id="findfile-window"
    title="FooHoo"
    orient="horizontal"
    xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

    <button label="&flock.foo.main.test;" />
</window>

This example shows both external *.dtd and local entities. When you're working on your code, put all strings in the form of local entities, so it is simpler to move them to an external *.dtd file later.

Hint: *.dtd is an XML model, so remember that you can always nest entities as you wish.

[edit] JavaScript

JavaScript is not so easy.

For JavaScript, Gecko uses Sun's l10n files model: *.properties files. (Unlike Sun, Mozilla allows non-escaped UTF-8 characters.) The properties file looks like this:

flock.foo.button.test     = Sentence number one
flock.foo.cool.stuff.desc = It's so cool

There are four paths:

Stringbundle This is the cleanest possible model when your .js file is directly tied to one or few *.xul files. Because of wrapper this method takes less space, and it will still work if we switch stringbundles to use *.dtd files in the future.

.xul file:

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<window
    id="findfile-window"
    title="FooHoo"
    orient="horizontal"
    xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

    <stringbundle id="fooBundle" src="chrome://flock/locale/foo/foo2.properties" />
</window>
  • .js file:
function foo () {
  var strBundle = document.getElementById('fooBundle');
  var label = strBundle.GetString("flock.foo.button.test");
}

Unwrapped interface This approach is better when you don't use l10n at all, beside of one function that's rarely launched (think: showError()). In this case, it's not worth loading *.properties files each time the user opens the *.xul file. Instead, move everything into the showError() method itself:

function showError () {
  var localeService = Components.classes["@mozilla.org/intl/nslocaleservice;1"].getService(Components.interfaces.nsILocaleService);
  var stringBundleService = Components.classes["@mozilla.org/intl/stringbundle;1"].getService(Components.interfaces.nsIStringBundleService);
  var stringBundle = stringBundleService.createBundle("chrome://flock/locale/blog/blog.properties", localeService.getApplicationLocale());

  var label = stringBundle.GetStringFromName("flock.blog.discIO.selectFile");
}

Helper method The third choice is to bind a helper method into your class or XPCOM service, such that the .properties file is loaded in the constructor and stored in a member variable. Then this helper method (usually called 'getEntity', or something similar) can be used to fetch individual strings, without reloading the .properties file each time. This approach is best when you have a large number of strings handled programatically across different xul files, or without an xul file at all (such as an nsIPromptService window).

function myClass() {
  var localeService = Components.classes["@mozilla.org/intl/nslocaleservice;1"].getService(Components.interfaces.nsILocaleService);
  var stringBundleService = Components.classes["@mozilla.org/intl/stringbundle;1"].getService(Components.interfaces.nsIStringBundleService);
  this.stringBundle = stringBundleService.createBundle("chrome://flock/locale/blog/blog.properties", localeService.getApplicationLocale());
}

myClass.prototype.getEntity = function myClass_getEntity(aName) {
  return this.stringBundle.GetStringFromName(aName);
}

Hard code the string This choice is hardest for internationalization, but simplest for the coder. You can hard code the string and add a comment so it can be internationalized once the dust settles:

function showError () {
  // FlockI18n: Internationalize me!
  var label = "foo";
}

When using this technique, please remember to group entities so the I18n team doesn't have to change the algorithm later just to be able to use one of the three methods above.

[edit] XBL

XBL is just a combination of XUL and JS, so you can use the XUL method (*.dtd) and any of JavaScript methods.

[edit] Making life easier

  • It is a good practice to change the entity name when you're changing its meaning. This makes it easier for localizers to discover and apply the change.
  • When you're creating an entity with a variable, add a comment describing what the variable is.
  • When you're removing code, also remove its variables. Retaining obsolete variables causes wasted effort from our volunteers.
  • Use a standard when you're writing localization notes.
Personal tools