Categories
main mozilla tech

The New Localization System for Firefox is in!

After nearly 3 years of work, 13 Firefox releases, 6 milestones and a lot of bits flipped, I’m happy to announce that the project of integrating the Fluent Localization System into Firefox is now completed!

It means that we consider Fluent to be well integrated into Gecko and ready to be used as the primary localization system for Firefox!

Below is a story of how that happened.

3 years of history

At Mozilla All-Hands in December 2016 my team at the time (L10n Drivers) presented a proposal for a new localization system for Firefox and Gecko – Fluent (code name at the time – “L20n“).

The proposal was sound, but at the time the organization was crystallizing vision for what later became known as Firefox Quantum and couldn’t afford pulling additional people in to make the required transition or risk the stability of Firefox during the push for Quantum.

Instead, we developed a plan to spend the Quantum release cycle bringing Fluent to 1.0, modernizing the Internationalization stack in Gecko, getting everything ready in place, and then, once the Quantum release completes, we’ll be ready to just land Fluent into Firefox!

Original schema of the proposed system integration into Gecko

We divided the work between two main engineers on the project – Staś Małolepszy took the lead of Fluent itself, while I became responsible for integrating it into Firefox.

My initial task was to refactor all of the locale management and higher-level internationalization integration (date/time formatting, number formatting, plural rules etc.) to unify around a common Unicode-backed model, all while avoiding any disruptions for the Quantum project, and by all means avoid any regressions.

I documented the first half of 2017 progress in a blog post “Multilingual Gecko in 2017” which became a series of reports on the progress of in our internationalization module, and ended up with a summary about the whole rearchitecture which ended up with a rewrite of 90% of code in intl::locale component.

Around May 2017, we had ICU enabled in all builds, all the required APIs including unified mozilla::intl::LocaleService, and the time has come to plan how we’re going to integrate Fluent into Gecko.

Planning

Measuring

Before we began, we wanted to understand what the success means, and how we’re going to measure the progress.

Stating that we aim at making Fluent a full replacement for the previous localization systems in Firefox (DTD and .properties) may be overwhelming. The path from landing the new API in Gecko, to having all of our UI migrated would likely take years and many engineers, and without a good way to measure our progress, we’d be unable to evaluate it.

Original draft of a per-component dashboard

Together with Axel, Staś and Francesco, we spent a couple days in Berlin going back and forth on what should we measure. After brainstorming through ideas such as fluent-per-component, fluent-per-XUL-widget and so on, we eventually settled on the simplest one – percentage of localization messages that use Fluent.

Original draft of a global percentage view

We knew we could answer more questions with more detailed breakdowns, but each additional metric required additional work to receive it and keep it up to date. With limited resources, we slowly gave up on aiming for detail, and focused on the big picture.

Getting the raw percentage of strings in Fluent to start with, and then adding more details, allowed us to get the measurements up quickly and have them available independently of further additions. Big picture first.

Staś took ownership over the measuring dashboard, wrote the code and the UI and soon after we had https://www.arewefluentyet.com running!

AreWeFluentYet.com as of January 12th 2020

Later, with the help from Eric Pang, we were able to improve the design and I added two more specific milestones: Main UI, and Startup Path.

The dashboard is immensely useful, both for monitoring the progress, and evangelizing the effort, and today if you visit any Mozilla office around the World, you’ll see it cycle through on the screens in common areas!

Target Component

To begin, we needed to get agreement with the Firefox Product Team on the intended change to their codebase, and select a target for the initial migration to validate the new technology.

We had a call with the Firefox Product Lead who advised that we start with migrating Preferences UI as a non-startup, self-contained, but sufficiently large piece of UI.

It felt like the right scale. Not starting with the startup path limited the risk of breaking peoples Nightly builds, and the UI itself is complex enough to test Fluent against large chunks of text, giving our team and the Firefox engineers time to verify that the API works as expected.

We knew the main target will be Preferences now, but we couldn’t yet just start migrating all of it. We needed smaller steps to validate the whole ecosystem is ready for Fluent, and we needed to plan separate steps to enable Fluent everywhere.

I split the whole project into 6 phases, each one gradually building on top of the previous ones.

Outline of the phases used by the stakeholders to track progress
Categories
main mozilla tech

Multilingual Gecko in 2017

The outline

In January 2017, we set the course to get a new localization framework named Fluent into Firefox.

Below is a story of the work performed on the Firefox engine – Gecko – over the last year to make Fluent in Firefox possible. This has been a collaborative effort involving a lot of people from different teams. It’s impossible to document all the work, so keep in mind that the following is just the story of the Gecko refactor, while many other critical pieces were being tackled outside of that range.

Also, the nature of the project does make the following blog post long, text heavy and light on pictures. I apologize for that and hope that the value of the content will offset this inconvenience and make it worth reading.

Categories
main mozilla tech

One year with the Firefox OS L10n framework

For several years now, the Localization team at Mozilla has been working on a modern localization framework based on the following set of principles and architectural choices that we consider fundamental for the next generation multilingual UI’s.

  • Principle 1: Localizers should be in control of translations. The localization framework should be grammar-agnostic, whether it’s about grammatical cases, genders, or tenses.  Localizers should be able to use the entire expressive power of their language to author translations which create the best experience for the users.
  • Principle 2: Language fallback should be robust and graceful. When a translation is missing or broken the user should be presented with a translation into the next best language, given their preferences. There might be more than one fallback language.
  • Principle 3: Translations should be isolated and asymmetric if needed. The source language of the application should not define structure of the translations (e.g. the lack of pluralization in English should not make it impossible for other languages to use plurals in a given message).
  • Principle 4: The framework should embrace the Web. Localization should react to changes of the runtime environment (e.g. resizing of the app’s window, change of orientation, incrementation of a number of unread messages) and should add as little overhead for developers as possible.

Exactly one year ago, on April 8th 2014, Stas landed the initial rewrite of l10n.js – the localization framework used in Firefox OS. This set us on the path to enable the vision of a modern localization framework driven by the design principles outlined above.

Since then, we have a dedicated, two person team, working full time on advancing this vision and learning how to improve upon it in the process.

The full year of work has resulted in many important features being developed for the platform, including:

  • Language packs: Small packages that decouple language resources from the application allowing us to extend language coverage dynamically when users request it, even after the device has been already released on the market.
  • Pseudo-locales: Programmatically built language resources that emulate different languages allowing developers to test their applications for any multilingual problems before localizers have time to provide translations
  • Asynchronous l10n: Major shift away from using synchronous API’s for retrieving localizations from JavaScript. This results in clean, condition race free code that is easier to write and maintain. It also sets us on the path to enable runtime language fallback.
  • Security: While not traditionally big thing in localization, having an open runtime ecosystem of localizations requires us to make sure that translations cannot accidentally or maliciously impact our code and break it.
  • Error reporting: We’ve made major advancements to help developers and localizers find potential errors early. We reject malformed strings, report missing strings and duplicates, recover from exceptions in our code.

A couple weeks ago we have finalized the work scheduled for Firefox OS 2.2 and begun development for the next major release. The clean and reliable API has given us a good base to start implementing the remaining components of the vision behind L20n in this cycle.

For the current cycle we have scheduled:

  • DOM Overlays: Ability for a localizer to use HTML syntax in their translations and also provide whole localized DOM Fragments to be merged with developer provided skeletons via a secure algorithm. This increases the system’s security and empowers localizers to provide better translations.
  • L20n format: One of the last remaining pieces of the puzzle is the new file format that is designed to store localization data like multivariant strings, entities with values and attributes and variant selectors. This will allow us to start introducing new features to the system that are impossible using the current data storing formats.
  • Lightweight l10n contexts: Together with the whole platform, we want to make a heavier use of the concept of multiple small localization contexts to replace the single-context-per-app approach. It will improve performance and isolation resulting in easier maintenance.
  • API 3.0: Our current API still contains remaining pieces from the old, synchronous API that we’d like to remove. Together with lightweight contexts and on the path to WebAPI we will want to make sure to organize our events, methods and objects to fit the design of other W3C APIs.

With ever-growing understanding of the environment and how the web stack matures, we are also getting close to start extracting the core of our framework to offer for standardization, and that’s an exciting opportunity to fulfill the vision of both Firefox OS and L20n and bring the modern localization framework to the whole web, making it more multilingual and global.

Stas & Zibi

Categories
main mozilla tech

Reducing MozL10n+Gaia technical debt in Firefox OS 2.1 cycle

Firefox OS is becoming a more mature platform, and one of the components that is maturing with it is the mozL10n library.

In 2.0 cycle, which is already feature complete, we introduced the codebase based on the L20n project.

The 2.1 cycle we’re currently in is a major API cleanup effort. We’re reviewing how Gaia apps use mozL10n, migrating them to the new API and minimizing the code complexity.

Simplifying the code responsible for localizability of Firefox OS is crucial for our ability to maintain and bring new features to the platform.

There are four major areas we’re working on:

  • mozL10n.translate – with introduction of Mutation Observer, we want to phase out manual DOM translation calls.
  • mozL10n.localize – this function is extremely hard to maintain and does enough “magic” to confuse devs.
  • mozL10n.get – manual l10n gets are the biggest cause of bugs and regressions in our code. They are synchronous, not retranslatable and badly misused
  • mock_l10n – many apps still use custom MockL10n class in tests, some even use real MozL10n code in tests. This makes testing harder to maintain and develop new l10n features.

We’re working on all four areas and would love to get your help.

No matter if you are a Gaia app owner, or if you’ve never wrote a patch for Gaia. If you know JavaScript, you can help us!

All of those bugs have instructions on how to start fixing, and I will be happy to mentor you.

We have time until October 13th. Let’s get Gaia ready for the next generation features we want to introduce soon! 🙂

Categories
main mozilla tech

Localization framework changes in Firefox OS 2.0 and plans for 2.1

On Monday we branched Firefox OS 2.0 which is the first branch to contain a new localization library that has been developed by my team.

What landed for 2.0

Library landing and reactions

The library itself landed exactly two months ago. In order to avoid any potential regressions, we’ve put a lot of work to ensure that it matches the behavior of the old code that it replaced. I believe we can now claim a success, because with two months of baking on master we didn’t get any serious regressions that would require us to change anything in our code.

The new library comes with a lot of unit tests and is stricter than the old code, so we had to fix a couple of small bugs where code has been passing an object instead of a string to our API and one related to one test failing on old machine with too little memory. That’s been simple to catch and fix.

We also got a few requests to improve the console log output and error output that the library produces in order to simplify developers work.

New features

Pseudo-locales

The major new thing that Stas completed in this cycle is the support for pseudo-locales. While this could be done with the old code, it was significantly easier with the new code thanks to some architectural decisions like separation of buildtime and runtime code.

Pseudo-locales allow developers to evaluate their UI’s localizability against an artificially generated english-like locale to catch any hardcoded strings. It also generates right-to-left locale for testing purposes. Before that, we’ve been relying on often outdated localizations that we kept with our source code. Now we can always test against fully generated pseudo localization.

mozL10n.once wrapper

Another new feature is the introduction of mozL10n.once wrapper. We identified that a lot of Gaia apps are waiting for localization to be ready before they initialize themselves. That makes sense since a lot of those apps want to work with UI and localized strings, but the challenge in asynchronous world is that you never know if your code has been fired after or before mozL10n is ready.

Because of that, simply setting an event listener and waiting for window.onlocalized event is not enough (what if it already has fired before your code was launched?). Developers were using mozL10n.ready wrapper, but the problem with that is that is has been designed to re-fire on reach retranslation which meant that your init code has been fired every time user changed language. That’s not an intended behavior, but admittedly a rare one. What’s worse is that we retained all the init code in memory.

Now, with mozL10n.once we can safely initialize code when l10n resources are available and free the memory right after that.

Uses of l10n API

On top of adding new features, we’ve been mostly busy investigating and improving how the default Firefox OS apps interact with localization API. That led to multiple design decisions including introduction of the described above mozL10n.once.

Once we got the new wrapper, we started analyzing bootstrapping code of each and every Gaia app and updating it to use the proper L10n API. Twenty two fixed bugs later, we’re done!

It’s incredible how much we were able to accomplish in just two short months. We feel much better right now about the bootstrap process and we have a clear picture on what we want to do next.

Use new navigator.languages API

Thank’s to Mounir’s work on navigator.languages API and implementation we were able to remove the only Mozilla specific API in mozL10n. That means that mozL10n should work in any modern browser now!

What we’re working on for 2.1

2.1 will be as ambitious for us as 2.0 was. The hashtag of the work is still #cleanup, but now it’s much more about modifying our API so that it’s more transparent to developers and requires less manual code in their apps.

entity.attributes become node attributes

First thing we were able to land is the transition away from assigning l10n entity attributes as node properties. We cleaned up the hacks that have been used and switched to entity attributes as node attributes.

DOM overlays

Next, we have one leftover from the previous change and that is an infamous innerHTML. We currently don’t have any clear way to inject localizable DOMFragment in Gaia. Fortunately, we have one that fits perfectly in L20n. It’s called DOM Overlays, and we’re working on getting them into mozL10n. That will allow us to further secure L10n API and remove any innerHTML calls.

Mutation Observers

Majority of localizability code in Gaia apps is related to localization of DOM nodes. With Mutation Observers we will be able to significantly reduce the amount of manual calls to mozL10n API and make majority of calls be just about settings data-l10n-id attribute with Mutation Observers doing the rest.

Not only will it reduce the use cases of mozL10n.translate and mozL10n.localize, but I expect to be able to cut by over 50% the number of manual mozL10n.get calls and manual operations that are currently used to set the result of that call into the node.

Mutation Observers will simplify Gaia code, reduce the amount of bugs related to language switching and get us closer to runtime l10n API that we want to offer for Gecko.

Bootstrap

There are still some interesting edge cases around how code boostrapping relies on particular pieces of environment. Does your code need DOM to be interactive? Does it need l10n resources to be loaded? Or maybe you need DOM to be localized? All those events happen asynchronously and we currently do not have a clean way to guard against any combination of those that your init code may require.

We’re working on a bootstrapping wrapper that will allow app developers to simply define which pieces of environment should their initialization be blocked by.

That will further secure the app boostrapping process and limit the risk of condition races.

mozDOMLocalized

One part of the bootstrap puzzle is how we fire event when certain bootstrap things happen. Right now we fire window.onlocalized event that means that mozL10n resources have been loaded, but doesn’t tell us anything about if document’s DOM has been localized (and is ready to be displayed).

With the work on the new event, we’ll be able to remove the global one, and settle on triggering on event on document and one on mozL10n object. Did I tell you that we’re still cleaning up? 🙂

Move mozL10n to document

We originally placed mozL10n object as a property on navigator object. Because our API is transitioning to be per-document, it becomes an inconsistency and an obstacle to keep mozL10n API on navigator object. It hardly fits the world with iframes, ShadowDOM and HTML Templates. We’re going to move it to document.mozL10n.

Remove inline l10n

One of the built time optimizations that we have in Gaia is called inline l10n. We store some portion of the l10n resources within each HTML file in order to localize the UI before it is displayed. It’s not very scalable, costs us memory and performance, but historically helped us prevent flashes of untranslated content. We hope to be able to remove this optimization in this cycle which will significantly simplify our internal code and give us some small memory and performance wins.

Is this L20n yet?

While we introduce L20n concepts into mozL10n, we’re still pretty far from being able to say that we support L20n API in Gaia. There’s a lot of work to do and it’s going to be a challenging work as we port L20n concepts to Gaia, and merge lessons we learned while working on Gaia into L20n spec and implementation. What we hope to end up with is a single codebase used in Gaia and offered to web developers.

It’s an exciting journey and I’m so happy to make Firefox OS’s localizability the most modern among all OSes!

Categories
main mozilla tech

L20n – what to add before 1.0

As I mentioned in my last blog post, we’re narrowing down the list of features that we’re willing to consider for inclusion into L20n 1.0 prior to its freeze and release.

Here’s the list:

Name Driver Target Milestone
difference between entities/macros from the resource and variables provided by the developer stas 1.0
difference between public and private attributes/entities gandalf 1.0
default values for hashes/arrays gandalf 1.0
globals namespace gandalf 1.0
import command gandalf 1.0
conditional blocks gandalf 1.0
value as ID (gettext mode) stas ?
string as ID (simple gettext mode) stas ?
key as string stas ?
relative referrals gandalf 1.0
dependency list gandalf 1.0
computer readable comments gandalf ?
multi-language resource files gandalf 1.0
Switch expression gandalf ?
attribute indexes stas 1.1
nested indexes gandalf 1.0
Expression errors gandalf 1.0
workflow gandalf 1.0
resource file syntax stas (support from: kaze, fantsai) 1.0
forbid referencing public entities stas ?
Macro attributes gandalf ?

All of those features represent some feedback item we got and we’re trying to evaluate it ASAP in order to finalize the parser/interpreter couple and work on the workflow toolchain for L20n 1.0 next.

If you want to discuss any of the items, join our localization-20 group and start a new thread for each feature.

If you want to add a new feature to be considered, start a wiki article in L20n/Features namespace.

 

Categories
main mozilla tech

L20n, feedback round

Last months have been extremely busy for L20n. I basically focused 100% of my time on the project, driving simultaneously multiple aspects of the project to completion.

L20n is a very complex project, not only technically, but also socially. Localization technologies have always been of minor importance for most of the software world so we never really develop technologies that could anyhow match the complexity of the human languages. The most common mindset, even among those who have to deal with localization, is that you can get “most of the stuff” done with simple key-value pair lists where English string is a key, and target localization string is a value.

It’s a bit like claiming that most of Firefox front end could be written in BASIC.

L20n is on the other side of the spectrum. It brings the localization technology to the new level, and in result breaks almost all paradigms of what people are used to do with l10n and breaks how it “usually works”.

In result, the major challenge when helping someone learn what L20n is, is to convince the person that she has to stop trying to match its components to the concepts the person knows from other l10n frameworks. It will just not do.

The reward is that once you get beyond the game of “how does L20n relate to Gettext / DTD / Properties?” people get into the “Oooh!” moment and what follows is a litany of ideas of what would be nice to have if we are about to reinvent the localization technologies. I love it 🙂

As many project leaders before me have observed, getting close to a target milestone always turns you from a visionary leader that sets the goals and drives them to completion into some sort of a butcher that says “no” to everything except the most crucial additions in the fear of never ending cycle of adding more and more without getting your project released.

So here we are. For the last month we’ve been working pretty close with several projects – Boot 2 Gecko, Jetpack, Firefox – and we got plenty of feedback, from minor additions to major suggestions. Now is the time to narrow down the list of changes we’re ready to incorporate for 1.0, close the list, work toward the release, and push everything else back to L20n:Next.

In the next blog post I’ll list the proposals and the status of the discussion on those.

Categories
main mozilla tech

L20n gets tangible

While Firefox 4 was the main focus for the last weeks and months, I’ve been also making progress with the next iteration of Mozilla localization technology – L20n.

Here are three things that constitute a milestone for me and should make it much easier to test and play with the features of it.

Toolbox

Toobox guides you through the examples of various localization scenarios and how L20n solves them. It blends incremental learning of features available to both developer and localizer. At the bottom it contains several more complex example that should rarely happen but constitute the latter part of Pike’s “easy things easy, complex things possible” mantra.

XPCShell tests

It’s a small (two at the moment) set of tests that run l20n code. It’s a great start point to play with how the library works and how the format works. You can adjust the compiled code, or the library code and see if it gives the expected result.

Live toolbox

What can be better than a toolbox for a geek? Yes. A live toolbox. A toolbox you can not only read, but one that you can actually touch, change, hack on and see the result live.

It’s a hack itself, so don’t be harsh pls, but it does the job. I even included a set of 7 examples (example1 to example7) that correspond to what you can find in the Toolbox. Feel free to modify the L20n code, see if it compiles properly, play with the compiled code, change the HTML or JS and see the results live!

Also, if you encounter a bug, you can save your code and send it my way so that I can investigate it. The compiler is just an initial approach, and a moving target right now as we still don’t have a finalized JS structure schema, but it works for most simple and medium complexity cases, so I’d say it’s ready for you to play with it!

Next steps

Now, that we have Firefox 4 released (yay!), and mozilla-central is open again, I hope to work on landing the initial set of L20n Gecko bindings which requires some updates to the patches themselves first. With that part, we’ll be able to start investigating migration away from current DTD/properties format into the wonderland of L20n.

 

Categories
main mozilla tech

Mozilla Summit 2010 – Localization 2.0 talk

Here come slides I used for Summit 2010 Localization 2.0 talk.

It was a very tough talk to give. Hard to grasp, hard to explain. I originally wanted to devote it exclusively to L20n, and make it as a form of tech talk, but eventually figured out it will not work and there’s much broader vision I need to explain. Thus a few hours before the talk I started rewriting it and end up with what you can find here.

Of course slides alone will tell you just a small part of the story, but it’s better than nothing. 😉

Thanks to all who participated in this! I know it eats a lot of brain cycles to process and it was already 4pm, but I hope you enjoyed it! 🙂

Categories
main mozilla tech

My vision of the future of Mozilla localization environment (part1)

After two parts of my vision of local communities, I’d like to make a sudden shift to write a bit about technical aspects of localization. The reason for that is trivial. The third, and last, part of the social story is the most complex one and requires a lot of thinking to put it right.

In the meantime, I work on several aspects of our l10n environment and I’d like to share with you some of the experiences and hopes around it.

Changes, changes

What I wrote in the social vision, part 1 about how the landscape of Mozilla changes and gets more complex, stands true from localization perspective and requires us to adapt in a similar fashion as it requires local communities.

There are three major shifts that I observe, that makes our approach from the past not sufficient.

  1. User Interfaces become more sophisticated than ever
  2. Product forms are becaming more diversified and new forms of mashups appear that blend web data, UI and content
  3. Different products have different “refresh cycles” in which different amount of content/UI is being replaced

Historically, we used DTD and properties for most of our products. The biggest issue with DTD/properties is that those two formats were never meant to be used for localization. We adapted them, exploitet and extended to match some of our needs, but their limitations are pretty obvious.

In respose to those changes, we spent significant amount of time analyzing and rethinking l10n formats to address the needs of Mozilla today and we came up with three distinct forms of data that requires localization and three technologies that we want to use.

L20n

Our major products like Firefox, Thunderbird, Seamonkey or Firefox Mobile, are becoming more sophisticated. We want to show as little UI as possible, each pixel is sacred. If we decide to take that screen piece from the user, we want to use it to maximum. Small buttons, toolbars should be denser – should present and offer more power, be intuitive and allow user to keep full control over the situation.

That exposes a major challenge to localization. Each message must be precise, clear and natural to the user to minimize his confusion. Strings are becoming more complex, with more data elements influencing them. It’s becoming less common to have plain sentences that are static. It’s becoming more common that a string will show in a tooltip, will have little screen (Mobile) and will depend on the state of other elements(number of open tabs, time, gender of the user).

DTD/properties are absolutely not ready to meet those requirements and the more hacks we implement the harder it’ll be to maintain the product and its localizations. Unfortunately other technologies that we considered, like gettext, XLIFF or QT’s TS file format are sharing most of the limitations and are being actively exploited themselves for years now (like gettext’s msgctxt).

Knowing that, we started thinking about how would localization format/technology look like if we can start it today. From the scratch. Knowing what we know. With experience that we have.

We knew that we would like to solve once and for all the problem with astonishing diversity of languages, linguistic rules, forms, variables. We knew we’d like to build a powerful tool set that would allow localizers to maintain their localizations easier, and localize with more context information (like, where the string will be used) than ever. We knew that we want to simplify the cooperation between developers and localizers. And we knew we would love to make it easy to use for everyone.

Axel Hecht came up with a concept of L20n. A format that shifts several paradigms of software localization by enabling algorithmic power outside of the source code. He’s motto is “Make easy things easy, and complex things possible” and that’s exactly what L20n does.

It doesn’t make sense to try to summarize L20n here, I’ll dig deeper in a separate blog post in this series, but what’s important for the sake of this one, is that L20n is supposed to be a new beginning, different than previous generations of localization formats, differently defining the contract between localizer and developer called “an entity”.

It targets software UI elements, should work in any environment (yes, Python, PHP, Perl too) and allow for building natural sentences with full power of each language without leaking this complexity to other locales or developers themselves. I know, sounds bold, but we’re talking about Pike’s idea, right?

Common Pool

While our major products require more complexity, we’re also getting more new products that appear in Mozilla, and very often they require little UI, because they are meant to be non-interruptive. Their localization entities are plain and simple, short and usually have single definition and translation. The land of extensions is the most prominent example of such approach, but more and more of our products have such needs.

Think of an “OK” and “Cancel” button. In 98% of cases, their translations are the same, no matter where they are used. In 98% of cases, their translations are the same among all products and platforms. On top of that there are three exceptions.

First, sometimes the platform uses different translation of the word. Like MacOS may have different translation of “Cancel” than Windows. It’s very easy, systematic difference shared among all products. It does not make any sense to expose this complexity to each localization case and require preparing each separately for this exception.

Second, sometimes an application is specific enough to use a very specific translation of a given word. Maybe it is a medical application? Low level development tool or for lawyers only? In that case, once again, the difference is easy to catch and there’s a very clear layer on which we should make the switch. Exposing it lower in a stack, for each entity use, does not make sense.

Third, it is possible that a very single use of an entity may require different translation for a given language. That’s an extremely rare case, but legitimate. Once again, it doesn’t make sense to leak this complexity onto others.

Common Pool is addressing exactly this type of localizations. Simple, repetitive entities that are shared among many products. In order to address the exceptions, we’re adding a system of overlays which allow a localizer to specify separate translation on one of the given three levels (possibly more).

L20n and Common Pool are complementing each other and we’d like to make sure that they can be used together depending on the potential complexity of the entity.

Rich Content Localization

The third type is very different from the two above. Mozilla today produces a lot of content that goes way beyond product UI and localization formats are terrible when dealing with such rich content –  sentences, paragraphs, pages of text mixed with some headers and footers that fill all of our websites.

This content is also diversified, SUMO or MDC articles may be translated into a significantly different layout and their source versions are often updated with minor changes that should not invalidate the whole content. On the other hand small event oriented websites like Five Years of Firefox or Browser Choice have different update patterns than project pages like Test Pilot or Drumbeat.

In that case, trying to build this social contract between developers and localizers by wrapping some piece of text into uniquely identifiable objects called entities and using some way to sign them and match translation to source like we do with product UI doesn’t make sense. Localizers need great flexibility, some changes should be populated to localizations automatically, only some should invalidate them.

For this last case, we need very different tools, that are specific for document/web content localization and if you ever tried Verbatim or direct source HTML localization you probably noticed how far it is from an optimal solution.

Is that all?

No. I don’t think so. But those are the three that we identified and we believe we have ideas on how to address them using modern technologies. If you see flaws in this logic, make sure to share your thoughts.

Why I’m writing about this?

Well, I’m lucky enough to be part of L10n-Drivers team in Mozilla, and I happen to be involved in different ways in experiments and projects that are going to address each of those three concepts. It’s exciting to be in a position that allows me to work on that, but I know that we, l10n-drivers, will not be able to make it on our own.

We will need help from the whole Mozilla project. We will need support from people who produce content, create interfaces and who of course from those who localize, from all of you.

This will be a long process, but it gives us a chance to bring localization to the next level and for the first time ever, make computer user interfaces look natural.

In each of the following blog posts I’ll be focusing on one of the above types of localizations and will present you projects that aim at this goal.