Tag Archives: l20n

Localization framework changes in Firefox OS 2.0 and plans for 2.1

On Monday we branched Firefox OS 2.0 which is the first branch to contain a new localization library that has been developed by my team.

What landed for 2.0

Library landing and reactions

The library itself landed exactly two months ago. In order to avoid any potential regressions, we’ve put a lot of work to ensure that it matches the behavior of the old code that it replaced. I believe we can now claim a success, because with two months of baking on master we didn’t get any serious regressions that would require us to change anything in our code.

The new library comes with a lot of unit tests and is stricter than the old code, so we had to fix a couple of small bugs where code has been passing an object instead of a string to our API and one related to one test failing on old machine with too little memory. That’s been simple to catch and fix.

We also got a few requests to improve the console log output and error output that the library produces in order to simplify developers work.

New features

Pseudo-locales

The major new thing that Stas completed in this cycle is the support for pseudo-locales. While this could be done with the old code, it was significantly easier with the new code thanks to some architectural decisions like separation of buildtime and runtime code.

Pseudo-locales allow developers to evaluate their UI’s localizability against an artificially generated english-like locale to catch any hardcoded strings. It also generates right-to-left locale for testing purposes. Before that, we’ve been relying on often outdated localizations that we kept with our source code. Now we can always test against fully generated pseudo localization.

mozL10n.once wrapper

Another new feature is the introduction of mozL10n.once wrapper. We identified that a lot of Gaia apps are waiting for localization to be ready before they initialize themselves. That makes sense since a lot of those apps want to work with UI and localized strings, but the challenge in asynchronous world is that you never know if your code has been fired after or before mozL10n is ready.

Because of that, simply setting an event listener and waiting for window.onlocalized event is not enough (what if it already has fired before your code was launched?). Developers were using mozL10n.ready wrapper, but the problem with that is that is has been designed to re-fire on reach retranslation which meant that your init code has been fired every time user changed language. That’s not an intended behavior, but admittedly a rare one. What’s worse is that we retained all the init code in memory.

Now, with mozL10n.once we can safely initialize code when l10n resources are available and free the memory right after that.

Uses of l10n API

On top of adding new features, we’ve been mostly busy investigating and improving how the default Firefox OS apps interact with localization API. That led to multiple design decisions including introduction of the described above mozL10n.once.

Once we got the new wrapper, we started analyzing bootstrapping code of each and every Gaia app and updating it to use the proper L10n API. Twenty two fixed bugs later, we’re done!

It’s incredible how much we were able to accomplish in just two short months. We feel much better right now about the bootstrap process and we have a clear picture on what we want to do next.

Use new navigator.languages API

Thank’s to Mounir’s work on navigator.languages API and implementation we were able to remove the only Mozilla specific API in mozL10n. That means that mozL10n should work in any modern browser now!

What we’re working on for 2.1

2.1 will be as ambitious for us as 2.0 was. The hashtag of the work is still #cleanup, but now it’s much more about modifying our API so that it’s more transparent to developers and requires less manual code in their apps.

entity.attributes become node attributes

First thing we were able to land is the transition away from assigning l10n entity attributes as node properties. We cleaned up the hacks that have been used and switched to entity attributes as node attributes.

DOM overlays

Next, we have one leftover from the previous change and that is an infamous innerHTML. We currently don’t have any clear way to inject localizable DOMFragment in Gaia. Fortunately, we have one that fits perfectly in L20n. It’s called DOM Overlays, and we’re working on getting them into mozL10n. That will allow us to further secure L10n API and remove any innerHTML calls.

Mutation Observers

Majority of localizability code in Gaia apps is related to localization of DOM nodes. With Mutation Observers we will be able to significantly reduce the amount of manual calls to mozL10n API and make majority of calls be just about settings data-l10n-id attribute with Mutation Observers doing the rest.

Not only will it reduce the use cases of mozL10n.translate and mozL10n.localize, but I expect to be able to cut by over 50% the number of manual mozL10n.get calls and manual operations that are currently used to set the result of that call into the node.

Mutation Observers will simplify Gaia code, reduce the amount of bugs related to language switching and get us closer to runtime l10n API that we want to offer for Gecko.

Bootstrap

There are still some interesting edge cases around how code boostrapping relies on particular pieces of environment. Does your code need DOM to be interactive? Does it need l10n resources to be loaded? Or maybe you need DOM to be localized? All those events happen asynchronously and we currently do not have a clean way to guard against any combination of those that your init code may require.

We’re working on a bootstrapping wrapper that will allow app developers to simply define which pieces of environment should their initialization be blocked by.

That will further secure the app boostrapping process and limit the risk of condition races.

mozDOMLocalized

One part of the bootstrap puzzle is how we fire event when certain bootstrap things happen. Right now we fire window.onlocalized event that means that mozL10n resources have been loaded, but doesn’t tell us anything about if document’s DOM has been localized (and is ready to be displayed).

With the work on the new event, we’ll be able to remove the global one, and settle on triggering on event on document and one on mozL10n object. Did I tell you that we’re still cleaning up? :)

Move mozL10n to document

We originally placed mozL10n object as a property on navigator object. Because our API is transitioning to be per-document, it becomes an inconsistency and an obstacle to keep mozL10n API on navigator object. It hardly fits the world with iframes, ShadowDOM and HTML Templates. We’re going to move it to document.mozL10n.

Remove inline l10n

One of the built time optimizations that we have in Gaia is called inline l10n. We store some portion of the l10n resources within each HTML file in order to localize the UI before it is displayed. It’s not very scalable, costs us memory and performance, but historically helped us prevent flashes of untranslated content. We hope to be able to remove this optimization in this cycle which will significantly simplify our internal code and give us some small memory and performance wins.

Is this L20n yet?

While we introduce L20n concepts into mozL10n, we’re still pretty far from being able to say that we support L20n API in Gaia. There’s a lot of work to do and it’s going to be a challenging work as we port L20n concepts to Gaia, and merge lessons we learned while working on Gaia into L20n spec and implementation. What we hope to end up with is a single codebase used in Gaia and offered to web developers.

It’s an exciting journey and I’m so happy to make Firefox OS’s localizability the most modern among all OSes!

L20n – what to add before 1.0

As I mentioned in my last blog post, we’re narrowing down the list of features that we’re willing to consider for inclusion into L20n 1.0 prior to its freeze and release.

Here’s the list:

Name Driver Target Milestone
difference between entities/macros from the resource and variables provided by the developer stas 1.0
difference between public and private attributes/entities gandalf 1.0
default values for hashes/arrays gandalf 1.0
globals namespace gandalf 1.0
import command gandalf 1.0
conditional blocks gandalf 1.0
value as ID (gettext mode) stas ?
string as ID (simple gettext mode) stas ?
key as string stas ?
relative referrals gandalf 1.0
dependency list gandalf 1.0
computer readable comments gandalf ?
multi-language resource files gandalf 1.0
Switch expression gandalf ?
attribute indexes stas 1.1
nested indexes gandalf 1.0
Expression errors gandalf 1.0
workflow gandalf 1.0
resource file syntax stas (support from: kaze, fantsai) 1.0
forbid referencing public entities stas ?
Macro attributes gandalf ?

All of those features represent some feedback item we got and we’re trying to evaluate it ASAP in order to finalize the parser/interpreter couple and work on the workflow toolchain for L20n 1.0 next.

If you want to discuss any of the items, join our localization-20 group and start a new thread for each feature.

If you want to add a new feature to be considered, start a wiki article in L20n/Features namespace.

 

L20n, feedback round

Last months have been extremely busy for L20n. I basically focused 100% of my time on the project, driving simultaneously multiple aspects of the project to completion.

L20n is a very complex project, not only technically, but also socially. Localization technologies have always been of minor importance for most of the software world so we never really develop technologies that could anyhow match the complexity of the human languages. The most common mindset, even among those who have to deal with localization, is that you can get “most of the stuff” done with simple key-value pair lists where English string is a key, and target localization string is a value.

It’s a bit like claiming that most of Firefox front end could be written in BASIC.

L20n is on the other side of the spectrum. It brings the localization technology to the new level, and in result breaks almost all paradigms of what people are used to do with l10n and breaks how it “usually works”.

In result, the major challenge when helping someone learn what L20n is, is to convince the person that she has to stop trying to match its components to the concepts the person knows from other l10n frameworks. It will just not do.

The reward is that once you get beyond the game of “how does L20n relate to Gettext / DTD / Properties?” people get into the “Oooh!” moment and what follows is a litany of ideas of what would be nice to have if we are about to reinvent the localization technologies. I love it :)

As many project leaders before me have observed, getting close to a target milestone always turns you from a visionary leader that sets the goals and drives them to completion into some sort of a butcher that says “no” to everything except the most crucial additions in the fear of never ending cycle of adding more and more without getting your project released.

So here we are. For the last month we’ve been working pretty close with several projects – Boot 2 Gecko, Jetpack, Firefox – and we got plenty of feedback, from minor additions to major suggestions. Now is the time to narrow down the list of changes we’re ready to incorporate for 1.0, close the list, work toward the release, and push everything else back to L20n:Next.

In the next blog post I’ll list the proposals and the status of the discussion on those.

L20n gets tangible

While Firefox 4 was the main focus for the last weeks and months, I’ve been also making progress with the next iteration of Mozilla localization technology – L20n.

Here are three things that constitute a milestone for me and should make it much easier to test and play with the features of it.

Toolbox

Toobox guides you through the examples of various localization scenarios and how L20n solves them. It blends incremental learning of features available to both developer and localizer. At the bottom it contains several more complex example that should rarely happen but constitute the latter part of Pike’s “easy things easy, complex things possible” mantra.

XPCShell tests

It’s a small (two at the moment) set of tests that run l20n code. It’s a great start point to play with how the library works and how the format works. You can adjust the compiled code, or the library code and see if it gives the expected result.

Live toolbox

What can be better than a toolbox for a geek? Yes. A live toolbox. A toolbox you can not only read, but one that you can actually touch, change, hack on and see the result live.

It’s a hack itself, so don’t be harsh pls, but it does the job. I even included a set of 7 examples (example1 to example7) that correspond to what you can find in the Toolbox. Feel free to modify the L20n code, see if it compiles properly, play with the compiled code, change the HTML or JS and see the results live!

Also, if you encounter a bug, you can save your code and send it my way so that I can investigate it. The compiler is just an initial approach, and a moving target right now as we still don’t have a finalized JS structure schema, but it works for most simple and medium complexity cases, so I’d say it’s ready for you to play with it!

Next steps

Now, that we have Firefox 4 released (yay!), and mozilla-central is open again, I hope to work on landing the initial set of L20n Gecko bindings which requires some updates to the patches themselves first. With that part, we’ll be able to start investigating migration away from current DTD/properties format into the wonderland of L20n.

 

Mozilla Summit 2010 – Localization 2.0 talk

Here come slides I used for Summit 2010 Localization 2.0 talk.

It was a very tough talk to give. Hard to grasp, hard to explain. I originally wanted to devote it exclusively to L20n, and make it as a form of tech talk, but eventually figured out it will not work and there’s much broader vision I need to explain. Thus a few hours before the talk I started rewriting it and end up with what you can find here.

Of course slides alone will tell you just a small part of the story, but it’s better than nothing. ;)

Thanks to all who participated in this! I know it eats a lot of brain cycles to process and it was already 4pm, but I hope you enjoyed it! :)

My vision of the future of Mozilla localization environment (part1)

After two parts of my vision of local communities, I’d like to make a sudden shift to write a bit about technical aspects of localization. The reason for that is trivial. The third, and last, part of the social story is the most complex one and requires a lot of thinking to put it right.

In the meantime, I work on several aspects of our l10n environment and I’d like to share with you some of the experiences and hopes around it.

Changes, changes

What I wrote in the social vision, part 1 about how the landscape of Mozilla changes and gets more complex, stands true from localization perspective and requires us to adapt in a similar fashion as it requires local communities.

There are three major shifts that I observe, that makes our approach from the past not sufficient.

  1. User Interfaces become more sophisticated than ever
  2. Product forms are becaming more diversified and new forms of mashups appear that blend web data, UI and content
  3. Different products have different “refresh cycles” in which different amount of content/UI is being replaced

Historically, we used DTD and properties for most of our products. The biggest issue with DTD/properties is that those two formats were never meant to be used for localization. We adapted them, exploitet and extended to match some of our needs, but their limitations are pretty obvious.

In respose to those changes, we spent significant amount of time analyzing and rethinking l10n formats to address the needs of Mozilla today and we came up with three distinct forms of data that requires localization and three technologies that we want to use.

L20n

Our major products like Firefox, Thunderbird, Seamonkey or Firefox Mobile, are becoming more sophisticated. We want to show as little UI as possible, each pixel is sacred. If we decide to take that screen piece from the user, we want to use it to maximum. Small buttons, toolbars should be denser – should present and offer more power, be intuitive and allow user to keep full control over the situation.

That exposes a major challenge to localization. Each message must be precise, clear and natural to the user to minimize his confusion. Strings are becoming more complex, with more data elements influencing them. It’s becoming less common to have plain sentences that are static. It’s becoming more common that a string will show in a tooltip, will have little screen (Mobile) and will depend on the state of other elements(number of open tabs, time, gender of the user).

DTD/properties are absolutely not ready to meet those requirements and the more hacks we implement the harder it’ll be to maintain the product and its localizations. Unfortunately other technologies that we considered, like gettext, XLIFF or QT’s TS file format are sharing most of the limitations and are being actively exploited themselves for years now (like gettext’s msgctxt).

Knowing that, we started thinking about how would localization format/technology look like if we can start it today. From the scratch. Knowing what we know. With experience that we have.

We knew that we would like to solve once and for all the problem with astonishing diversity of languages, linguistic rules, forms, variables. We knew we’d like to build a powerful tool set that would allow localizers to maintain their localizations easier, and localize with more context information (like, where the string will be used) than ever. We knew that we want to simplify the cooperation between developers and localizers. And we knew we would love to make it easy to use for everyone.

Axel Hecht came up with a concept of L20n. A format that shifts several paradigms of software localization by enabling algorithmic power outside of the source code. He’s motto is “Make easy things easy, and complex things possible” and that’s exactly what L20n does.

It doesn’t make sense to try to summarize L20n here, I’ll dig deeper in a separate blog post in this series, but what’s important for the sake of this one, is that L20n is supposed to be a new beginning, different than previous generations of localization formats, differently defining the contract between localizer and developer called “an entity”.

It targets software UI elements, should work in any environment (yes, Python, PHP, Perl too) and allow for building natural sentences with full power of each language without leaking this complexity to other locales or developers themselves. I know, sounds bold, but we’re talking about Pike’s idea, right?

Common Pool

While our major products require more complexity, we’re also getting more new products that appear in Mozilla, and very often they require little UI, because they are meant to be non-interruptive. Their localization entities are plain and simple, short and usually have single definition and translation. The land of extensions is the most prominent example of such approach, but more and more of our products have such needs.

Think of an “OK” and “Cancel” button. In 98% of cases, their translations are the same, no matter where they are used. In 98% of cases, their translations are the same among all products and platforms. On top of that there are three exceptions.

First, sometimes the platform uses different translation of the word. Like MacOS may have different translation of “Cancel” than Windows. It’s very easy, systematic difference shared among all products. It does not make any sense to expose this complexity to each localization case and require preparing each separately for this exception.

Second, sometimes an application is specific enough to use a very specific translation of a given word. Maybe it is a medical application? Low level development tool or for lawyers only? In that case, once again, the difference is easy to catch and there’s a very clear layer on which we should make the switch. Exposing it lower in a stack, for each entity use, does not make sense.

Third, it is possible that a very single use of an entity may require different translation for a given language. That’s an extremely rare case, but legitimate. Once again, it doesn’t make sense to leak this complexity onto others.

Common Pool is addressing exactly this type of localizations. Simple, repetitive entities that are shared among many products. In order to address the exceptions, we’re adding a system of overlays which allow a localizer to specify separate translation on one of the given three levels (possibly more).

L20n and Common Pool are complementing each other and we’d like to make sure that they can be used together depending on the potential complexity of the entity.

Rich Content Localization

The third type is very different from the two above. Mozilla today produces a lot of content that goes way beyond product UI and localization formats are terrible when dealing with such rich content -  sentences, paragraphs, pages of text mixed with some headers and footers that fill all of our websites.

This content is also diversified, SUMO or MDC articles may be translated into a significantly different layout and their source versions are often updated with minor changes that should not invalidate the whole content. On the other hand small event oriented websites like Five Years of Firefox or Browser Choice have different update patterns than project pages like Test Pilot or Drumbeat.

In that case, trying to build this social contract between developers and localizers by wrapping some piece of text into uniquely identifiable objects called entities and using some way to sign them and match translation to source like we do with product UI doesn’t make sense. Localizers need great flexibility, some changes should be populated to localizations automatically, only some should invalidate them.

For this last case, we need very different tools, that are specific for document/web content localization and if you ever tried Verbatim or direct source HTML localization you probably noticed how far it is from an optimal solution.

Is that all?

No. I don’t think so. But those are the three that we identified and we believe we have ideas on how to address them using modern technologies. If you see flaws in this logic, make sure to share your thoughts.

Why I’m writing about this?

Well, I’m lucky enough to be part of L10n-Drivers team in Mozilla, and I happen to be involved in different ways in experiments and projects that are going to address each of those three concepts. It’s exciting to be in a position that allows me to work on that, but I know that we, l10n-drivers, will not be able to make it on our own.

We will need help from the whole Mozilla project. We will need support from people who produce content, create interfaces and who of course from those who localize, from all of you.

This will be a long process, but it gives us a chance to bring localization to the next level and for the first time ever, make computer user interfaces look natural.

In each of the following blog posts I’ll be focusing on one of the above types of localizations and will present you projects that aim at this goal.

The new era of software localization is approaching

Those of you who follow the localization tales of Mozilla project know the mythical “L20n” concept introduced by Axel Hecht over two years ago.

Since then, we spent zillion hours thinking about this concept and maturing it. Every time I was working on any project – be it Verbatim, Firefox, Getpersonas, AMO, Pontoon or Jetpack L10n – I was telling myself how much better it could be, had we have L20n in place. Easier for localizers, easier for developers, easier to maintain. Did I tell you it’s faster as well?

Then, last summer, Jeremy Hiatt joined us for his summer internship and spent 3 months working with Axel and me on pushing some of the implementation concepts forward.

With the end of summer, we got again busy with upcoming Firefox 3.6 release and put L20n again in coma…

Until now.

Without getting into much detail, I can tell you that right now we’re preparing for the next, and hopefully, ultimate push toward L20n 1.0. Over the next few months you will have more than enough of blog posts and papers and demos and examples. We will be asking for feedback, presenting the syntax, experimenting with toolchain, building extensions, and slowly preparing to introduce L20n into Gecko platform and our websites.

We believe that L20n is the most crucial piece of Mozilla localization story, aligning perfectly into our core values. Mozilla is in the best position to push the localization experience to the new level, finally enabling software to speak in the natural language of the reader. Gettext is awesome, but aging technology, properties/DTD that we use in Gecko today, are limited, tens of proprietary formats (used by projects like Qt, Webkit, Apple, Nokia) just replicate the same concept. L20n shifts the paradigms of what localization is to the new level. It’s going to be big, and we hope to get a lot of people involved.

Next week I’ll start with the first demos.

p.s. those who like to read the code, may find first bits of fresh code in my hg repo. Yummy! Isn’t it? :)