Categories
main mozilla tech

Multilingual Gecko in 2017

The outline

In January 2017, we set the course to get a new localization framework named Fluent into Firefox.

Below is a story of the work performed on the Firefox engine – Gecko – over the last year to make Fluent in Firefox possible. This has been a collaborative effort involving a lot of people from different teams. It’s impossible to document all the work, so keep in mind that the following is just the story of the Gecko refactor, while many other critical pieces were being tackled outside of that range.

Also, the nature of the project does make the following blog post long, text heavy and light on pictures. I apologize for that and hope that the value of the content will offset this inconvenience and make it worth reading.

Why?

The change is necessary and long overdue – our aged localization model is brittle, is blocking us from adding new localizability features for developers and localizers, doesn’t work with the modern Intl APIs and doesn’t scale to other architectural changes we’re planning like migration away from XBL/XUL.

Fluent is a modern, web-ready localization system developed by Mozilla over the last 7 years. It was initially intended for Firefox OS, but the long term goal was always to use it in Firefox (first attempt from 7 years ago!).

Unfortunately, replacing the old system with Fluent is a monumental task similar in scope only to other major architectural changes like Electrolysis or Oxidation.

The reason for that is not the localization system change itself, but rather that localization in Gecko has more or less been set in stone since the dawn of the project.  Most of the logic and some fundamental paradigms in Gecko are deeply rooted in architectural choices from 1998-2000.  Since then, a lot of build system choices, front end code, and core runtime APIs were written with assumptions that hold true only for this system.

Getting rid of those assumptions requires major refactors of many of the Internationalization modules, build system pieces, language packs, test frameworks, and resource handling modules before we even touch the front end. All of this will have to be migrated to the new API.

On top of that, the majority of the Internationalization APIs in Gecko were designed at the time when Gecko could not carry its own internationalization system. Instead, it used host operating system methods to format dates, times, numbers etc. Those approaches were incomplete and the outcome differed from setup to setup, making the Gecko platform unpredictable and hard to internationalize.

Fluent has been designed to be aligned with the modern Internationalization API for Javascript developed as the ECMA402 standard by the TC39 Working Group. This is the API available to all web content and fortunately, by 2016, Gecko was already carrying this API powered by a modern cross-platform internationalization library designed by the Unicode Consortium – ICU.

That meant that we were using a modern standard to supply an internationalization API for the Web, but internally we relied on a different, much older, set of APIs for the Firefox UI.

Not only do we have to maintain two sets of internationalization APIs, but also we carry two sets of data for it in the product!

Since Fluent is aligned with the new model, and not much with the old one, as part of the shift toward Fluent we had to migrate more of our internal APIs to use ICU and its internationalization database called CLDR (think – Wikipedia for internationalization data), all while slowly deprecating the old Gecko Intl APIs and data sets.

To make things a bit harder, our implementation of the ECMA402 Intl API as of a year ago wasn’t very complete. Moving Firefox to use it required not just shifting the code base, but also adding remaining features like timezone support, case sensitive collations etc.

If all of that doesn’t sound ambitious enough, we also got a request from the Gecko owners not to use the main interface Gecko offered for resource selection – Chrome Registry – in the new approach.

It’s January 2017 and we not only have to remodel all of the locale selection, remodel the whole Intl layer and prepare for replacing the whole localization layer of Gecko, but we also need a new resource management API for l10n.

Fluent in Gecko - scheme
Fluent in Gecko – scheme

Warm up

More or less at the same time we got contacted by a team at Mozilla working on date/time pickers. They reached out asking us how to internationalize them properly.

Part of the project culture at the time when we worked on Firefox OS was to take everything we need, and push for it to become a web standard. This strategy – standardize everything you need – is much slower than just writing a custom API, but has the benefit of making the Web a better platform, rather than just supplying the APIs for your product.

Firefox OS is no more, but many of the APIs designed and proposed for standardization got through the standardization process and are now getting finalized. That includes many Internationalization APIs that we kickstarted back then.

Internationalization of the date/time picker required a flock of APIs and our team decided to propose building on, and extending, the standardized ECMA402 API set with the features required for the pickers.

Now with the date/time picker as a short term goal, and moving Firefox to Fluent in the long term, all while unifying our underlying internationalization infrastructure behind the scenes, the stage was set.

Timeline

The last piece of the puzzle that is important for the reader to know is that over the last year the main focus of the whole organization was Firefox Quantum, and it was necessary for our effort to ensure we do not affect Quantum’s stability, don’t introduce regressions, and generally speaking, operate under the radar of the release management and core engineering team.

Below is an incomplete timeline of changes that happened between January 2017 and today. It leads us through this major refactor and getting Gecko ready for Fluent, all while making sure we do not hinder the Quantum effort in the slightest.

Firefox 51 (January)

The first release of the year was rather modest, but I hope it’ll fit into the story well. Knowing that we want to get things via Intl JS APIs, but also realizing that the standardization process will take a long time, we introduced a new non-public API called mozIntl.

The role of mozIntl is to extend the JS Intl with pre-release APIs like Intl.getCalendarInfo, Intl.getLocaleInfo, etc., provide the functionality needed for Firefox, while at the same time being a test subject for the standard proposal itself.

This created a really cool two-way dynamic where we were able to identify a need, work within ECMA402 to draft the spec, implement an early proposal for the use in our UI while simultaneously working on advancing it as a Web Standard.

I can’t stress enough how pivotal this API became for advancing JS Intl API and shifting our platform to use JS Intl API for internationalization, and Firefox 51 was the first one to use it!

Notable changes [my work] [intl]:

Firefox 52 (March)

In Firefox 52 we set our target to embrace CLDR/ICU and started work to migrate our internal APIs to use ICU. Jonathan Kew made the first push, switching nsIScriptableDateFormat to ICU and then followed up moving another set of APIs.

At the same time André Bargull started taking on missing items from our implementation of the JS Intl API, tackling a major one first – IANA Timezone support.

The direction set in Firefox 52 was to move Gecko to use the same internationalization APIs internally and for the Web content, and to make our JS Intl API complete and robust – ready to handle Firefox UI.

Notable changes [my work][intl]:

Firefox 53 (April)

In Firefox 53 Jonathan updated our platform to Unicode 9.0, Gregory Moore moved nsIDateTimeFormat (one of the biggest Intl APIs in Gecko) to use ICU and I landed the first major new API needed for Fluent – mozIntl.PluralRules.

This set a precedent where, if we’re certain that an API will end up exposed to the Web, we write most of its code in SpiderMonkey (our JS engine), and only expose it via mozIntl until the API becomes part of the standard.

When the standard matures, we only switch the bit to expose the API, rather than having to move the code from Gecko to SpiderMonkey.

Notable changes [my work][intl]:

Firefox 54 (June)

In Firefox 54 we landed two major new APIs:

mozilla::intl::LocaleService is a new core API for managing languages and locales in Gecko. Its purpose is to become a central place to handle language selection and negotiation.

mozilla::intl::OSPreferences is a new core API for retrieving Intl related information from the operating system. That includes OS language selection, regional preferences etc.

Those two new APIs were intended to replace an aged nsILocaleService – which kind of did those two tasks together via many OS-specific APIs – and take away language negotiation that until this point had been performed primarily by ChromeRegistry.

They also introduced a new paradigm – instead of operating on single locale, like en-US, we started operating on locale fallback chains. All new APIs took lists, making it possible for us to identify not just the best match locale that the user requested, but also understand what is the fallback that the user wants, rather than falling back on en-US as a hardcoded locale.

This is a fairly recent development in the internationalization industry and you can find fallback locale lists in new Windows, macOS, Android and now in Firefox as well!

With this change, we removed the tablecloth from under the dishes – nothing observable changed, but the “decision” center was moved and we gained new, modern, central APIs to manage languages and communicate with the OS layer.

In 54 we also added a couple new mozIntl APIs:

  • mozIntl.getLocaleInfo became a central place to get information about a locale – what are the weekend days, what is the first day of the week, is this locale left-to-right or right-to-left and so on.
  • mozIntl.DateTimeFormat became the first example of a wrapper over an already existing Intl.DateTimeFormat that extends it and adds features necessary for Firefox UI, but not yet available in ECMA402 spec – primarily, we added the ability to adjust the formatted date and time to the regional preferences user set in the Operating System. It’s a good example of where mozilla::intl::OSPreferences, mozIntl, ICU/CLDR and JS Intl API create a layered model that incentivizes us to standardize as much as we can, without blocking us in until the standardization is complete.

By that release, we had all of our new core ready, we knew the direction, and were able to refactor major pieces of our low level intl infrastructure basically without any observable output.

Notable changes [my work][intl]:

Firefox 55 (August)

While several elements of the ecosystem were still limiting us, the primary focus of work now shifted to fixing edge cases and adding new features.

LocaleService gained a robust language negotiation API which made it possible to reason about non-perfect matches between requested and available language sets.

Before that point, if the user requested en-GB, we weren’t very good at matching it against anything else than a perfect match. So if we had en-ZA or en-AU, we might not know what to do and that made many of our locale selection systems very brittle.

Centralized, strong language negotiation allowed us to freely reason about asking the operating system for locales, matching them against available language resources, selecting the right fonts, or picking up languages for extensions.

55 brought many more new features to LocaleService, including a split between server and client, allowing our content processes to follow a single language selection decided in the parent process (or outside of Gecko in the case of Fennec – Firefox for Android!), and including a number of improvements in how OSPreferences interact with LocaleService.

Andre brought most of the remaining items for ECMA402 compatibility making SpiderMonkey Intl API 100% complete!

Notable changes [my work][intl]:

Firefox 56 (September)

As you can tell by now, there’s a clear direction in our work – migrating our internal APIs to use ICU, bridging the gap between JS Intl API and Gecko Intl APIs, and moving our UI to use ICU-backed APIs.

But until Firefox 56 there was one problem – due to size cost, Fennec’s team was pushing back on the idea to introduce ICU in our mobile browser.

Their reasoning was sound – adding 3MB to installer size is a non-trivial cost that has impact on the users and should not be added lightly.

That meant that Gecko had to maintain two ways of doing each API – one backed by ICU/CLDR, and the old one required by now just for our Android browser.

Fortunately, by Firefox 56 we had an idea how to move forward. We knew that once we turn on ICU everywhere, we’ll be able to remove all the old APIs and datasets and that will win us back some of the bundle size cost.

But the real deal was in the promise of the new localization API making it possible to load language resources at will. See, Fennec currently comes with a lot of language resources for a lot of locales. That’s because our 20 year old infrastructure isn’t very flexible and the easiest way to make sure that a locale is available is to package it into the .apk file.

If we could switch that to the new infrastructure and only load locales that the user selected, that would, coincidentally, save us around 3mb of the installers size!

Knowing that, by Firefox 56 we were able to agree on the plan and turn on ICU for Fennec. That meant not only that we were able to start removing the old APIs, and we could introduce all the new goodies like mozIntl to Fennec, but also, Fennec finally gets ECMA402 JS Intl API support!

Another big piece happened on the character encoding front. After months of work, Henri Sivonen landed a completely new, fast and shiny, character encoding library written in Rust!

This, bundled with a bump to Unicode 10 and ICU 59, enabled us to remove a lot of old APIs and reduce our technical debt significantly.

Last but not least, LocaleService gained a new API for retrieving Regional Preferences Locales allowing us to format a date to en-GB (or de-AT!) even if your Firefox UI is in en-US, and follow more closely the user choices from the Operating System.

Notable changes [my work][intl]:

Firefox 57 (November)

Firefox 57 was a very small release on the Intl front. We had most of our foundation work laid out by then and all the focus was on the quality of the Quantum release (I spent most of the cycle as a mercenary on the Quantum Flow team helping with UI/perf improvements).

But since the foundation was in place by then, we were able to use 57 to land all the new APIs – L10nRegistry, Fluent, FluentDOM and FluentWeb – in anticipation of being able to switch to using them in the following releases.

That means, that although we didn’t start using it yet, by 57 Fluent was in Gecko!

Notable changes [my work][intl]:

Firefox 58 (January 2018)

After the silent 57 release, 58 opened up with a lot of internationalization improvements accrued since 56.

The major one was the switch to new language packs.

Previously, Firefox had language packs based on the old extensions system, which relied heavily on ChromeRegistry. This, along with other problems, resulted in a sub-par user experience when compared to a fully localized build.

The new language packs are based on the Web Extensions ecosystem, are lighter, easier to maintain and are safer. They have a clean lifecycle, and of course support Fluent and L10nRegistry out of the box!

Speaking of removing technical debt, it was a good release for that. Having 57 cut off all of the old extensions, and being two releases after we enabled ICU in Fennec, it was the right time to remove a lot of old code.

Our hooks into OS via OSPreferences got improved on Android, Linux and Windows.

A bunch of our mozIntl APIs got completed as standard in ECMA402 and enabled for the web – Intl.PluralRules, hourCycle and NumberFormat.prototype.formatToParts.

We did a lot of intl build system refactors to make us package the right languages, with fallback, and also make it possible to build Firefox with many locales.

We gained an entry in about:support showing all the various language selections to help us debug cases where the localization doesn’t match expectations.

Finally, once all of the new build system bits got tested, the very first string localized using the Fluent landed in Firefox UI!

Huge milestone achieved!

Notable changes [my work][intl]:

Summary

In 2017 we successfully aligned our internationalization layer around Unicode standard, ICU and CLDR, removing a lot of old APIs and making Firefox UI use the same APIs (with a few extensions) as we expose to the Web Content.

We also advanced a lot of ECMA402 spec proposals that we identified in result of our Firefox OS and Desktop Firefox efforts making the Web a better platform for writing multilingual apps and the Firefox JS engine – SpiderMonkey – the most complete implementation of ECMA402 on the market.

Finally, we landed all the main components for the new localization framework in Gecko and got the first strings translated using Fluent!

Through all of that work, there were only a couple minor regressions that we were able to quickly fix without affecting any of the Quantum work. We vastly improved our test coverage in the intl/locale and intl/l10n modules, fixed tons of long standing bugs related to language switching and selection in the process, and got the platform ready for Fluent!

As a testament of all that happened this year, we just got a new module  ownership structure that reflects all the effort that we’ve put lately into making Gecko the best multilingual platform in the world and a great vehicle for driving the advancement of the Internationalization Web Standards!

It’s been the toughest year of work in my career so far. Handling so many variables, operating on a massive and aged codebase, writing code in three languages – JavaScript, Rust and C++ – aligning the goals and needs of many different stakeholders and teams, and pushing for the internal recognition and support for the refactor.

Taking advise from Mike Hoye“You can remove an adjective thankless from someones job by thanking them” – I’d like to thank the people who significantly contributed to this project either directly or indirectly, by supporting and mentoring me, brainstorming with me, patiently reviewing my patches and working on all the technologies required – Staś Małolepszy, Jonathan Kew, André Bargull, Dave Townsend, Jeff Walden, Makoto Kato, Axel Hecht, Richard Newman, Mike Conley, Nick Alexander, Daniel Ehrenberg, Kris Maglione, Andrew Swan, Matjaž HorvatGregory Szorc, Ted Mielczarek, Francesco Lodolo, Jeff Beatty, Jorg K, Rafael Xavier, Steven R. Loomis, Joe Hildebrand, Caridy Patiño and others. You turned the project from impossible to completed. Thank you.

With all the work to clean up the technical debt in 2017, 2018 is shaping up to be a year when we’ll be able to focus on using the modernized stack to work on adding new capabilities and fully switching Firefox to Fluent (starting with the Preferences UI).

I also hope to spend more time in Rust, get Firefox and Gecko to become better at serving users operating in multiple languages, and work with the Browser Architecture Group in getting the next generation stack at Mozilla be fully intl and l10n ready.

Stay tuned!