tech – stream of bytes

After two parts of my vision of local communities, I’d like to make a sudden shift to write a bit about technical aspects of localization. The reason for that is trivial. The third, and last, part of the social story is the most complex one and requires a lot of thinking to put it right.

In the meantime, I work on several aspects of our l10n environment and I’d like to share with you some of the experiences and hopes around it.

Changes, changes

What I wrote in the social vision, part 1 about how the landscape of Mozilla changes and gets more complex, stands true from localization perspective and requires us to adapt in a similar fashion as it requires local communities.

There are three major shifts that I observe, that makes our approach from the past not sufficient.

User Interfaces become more sophisticated than ever
Product forms are becaming more diversified and new forms of mashups appear that blend web data, UI and content
Different products have different “refresh cycles” in which different amount of content/UI is being replaced

Historically, we used DTD and properties for most of our products. The biggest issue with DTD/properties is that those two formats were never meant to be used for localization. We adapted them, exploitet and extended to match some of our needs, but their limitations are pretty obvious.

In respose to those changes, we spent significant amount of time analyzing and rethinking l10n formats to address the needs of Mozilla today and we came up with three distinct forms of data that requires localization and three technologies that we want to use.

L20n

Our major products like Firefox, Thunderbird, Seamonkey or Firefox Mobile, are becoming more sophisticated. We want to show as little UI as possible, each pixel is sacred. If we decide to take that screen piece from the user, we want to use it to maximum. Small buttons, toolbars should be denser – should present and offer more power, be intuitive and allow user to keep full control over the situation.

That exposes a major challenge to localization. Each message must be precise, clear and natural to the user to minimize his confusion. Strings are becoming more complex, with more data elements influencing them. It’s becoming less common to have plain sentences that are static. It’s becoming more common that a string will show in a tooltip, will have little screen (Mobile) and will depend on the state of other elements(number of open tabs, time, gender of the user).

DTD/properties are absolutely not ready to meet those requirements and the more hacks we implement the harder it’ll be to maintain the product and its localizations. Unfortunately other technologies that we considered, like gettext, XLIFF or QT’s TS file format are sharing most of the limitations and are being actively exploited themselves for years now (like gettext’s msgctxt).

Knowing that, we started thinking about how would localization format/technology look like if we can start it today. From the scratch. Knowing what we know. With experience that we have.

We knew that we would like to solve once and for all the problem with astonishing diversity of languages, linguistic rules, forms, variables. We knew we’d like to build a powerful tool set that would allow localizers to maintain their localizations easier, and localize with more context information (like, where the string will be used) than ever. We knew that we want to simplify the cooperation between developers and localizers. And we knew we would love to make it easy to use for everyone.

Axel Hecht came up with a concept of L20n. A format that shifts several paradigms of software localization by enabling algorithmic power outside of the source code. He’s motto is “Make easy things easy, and complex things possible” and that’s exactly what L20n does.

It doesn’t make sense to try to summarize L20n here, I’ll dig deeper in a separate blog post in this series, but what’s important for the sake of this one, is that L20n is supposed to be a new beginning, different than previous generations of localization formats, differently defining the contract between localizer and developer called “an entity”.

It targets software UI elements, should work in any environment (yes, Python, PHP, Perl too) and allow for building natural sentences with full power of each language without leaking this complexity to other locales or developers themselves. I know, sounds bold, but we’re talking about Pike’s idea, right?

Common Pool

While our major products require more complexity, we’re also getting more new products that appear in Mozilla, and very often they require little UI, because they are meant to be non-interruptive. Their localization entities are plain and simple, short and usually have single definition and translation. The land of extensions is the most prominent example of such approach, but more and more of our products have such needs.

Think of an “OK” and “Cancel” button. In 98% of cases, their translations are the same, no matter where they are used. In 98% of cases, their translations are the same among all products and platforms. On top of that there are three exceptions.

First, sometimes the platform uses different translation of the word. Like MacOS may have different translation of “Cancel” than Windows. It’s very easy, systematic difference shared among all products. It does not make any sense to expose this complexity to each localization case and require preparing each separately for this exception.

Second, sometimes an application is specific enough to use a very specific translation of a given word. Maybe it is a medical application? Low level development tool or for lawyers only? In that case, once again, the difference is easy to catch and there’s a very clear layer on which we should make the switch. Exposing it lower in a stack, for each entity use, does not make sense.

Third, it is possible that a very single use of an entity may require different translation for a given language. That’s an extremely rare case, but legitimate. Once again, it doesn’t make sense to leak this complexity onto others.

Common Pool is addressing exactly this type of localizations. Simple, repetitive entities that are shared among many products. In order to address the exceptions, we’re adding a system of overlays which allow a localizer to specify separate translation on one of the given three levels (possibly more).

L20n and Common Pool are complementing each other and we’d like to make sure that they can be used together depending on the potential complexity of the entity.

Rich Content Localization

The third type is very different from the two above. Mozilla today produces a lot of content that goes way beyond product UI and localization formats are terrible when dealing with such rich content – sentences, paragraphs, pages of text mixed with some headers and footers that fill all of our websites.

This content is also diversified, SUMO or MDC articles may be translated into a significantly different layout and their source versions are often updated with minor changes that should not invalidate the whole content. On the other hand small event oriented websites like Five Years of Firefox or Browser Choice have different update patterns than project pages like Test Pilot or Drumbeat.

In that case, trying to build this social contract between developers and localizers by wrapping some piece of text into uniquely identifiable objects called entities and using some way to sign them and match translation to source like we do with product UI doesn’t make sense. Localizers need great flexibility, some changes should be populated to localizations automatically, only some should invalidate them.

For this last case, we need very different tools, that are specific for document/web content localization and if you ever tried Verbatim or direct source HTML localization you probably noticed how far it is from an optimal solution.

Is that all?

No. I don’t think so. But those are the three that we identified and we believe we have ideas on how to address them using modern technologies. If you see flaws in this logic, make sure to share your thoughts.

Why I’m writing about this?

Well, I’m lucky enough to be part of L10n-Drivers team in Mozilla, and I happen to be involved in different ways in experiments and projects that are going to address each of those three concepts. It’s exciting to be in a position that allows me to work on that, but I know that we, l10n-drivers, will not be able to make it on our own.

We will need help from the whole Mozilla project. We will need support from people who produce content, create interfaces and who of course from those who localize, from all of you.

This will be a long process, but it gives us a chance to bring localization to the next level and for the first time ever, make computer user interfaces look natural.

In each of the following blog posts I’ll be focusing on one of the above types of localizations and will present you projects that aim at this goal.

As some of you may remember, over one and a half year ago I posted a list of software/hardware projects that I’m interested in. I named it “Project watcher” and some of my friends and readers followed my path. I really liked the idea, but on the other hand I felt I’m not updating the list and it may become obsolete with time.

Overall, the project seems to be my personal success. I really used it every month or two to see what’s going on there and I want to keep the project alive 🙂

Now, I’m going to prepare an update of the list, but first I’d like to summarize what happened in the project I’ve been following since 7th of March 2006.

First on the list are games:

TA Sprint – it came up I didn’t follow the progress of this project to carefully. Simply, had no time to play this game. I still feel there’s a lot of potential inside and according to my knowledge progress was made from release of 0.72b into 0.75b with 4 releases in the given interval. Projects is alive and kicking 🙂
Boson – I really miss a good RTS for Linux (Blizzard! Open Starcraft!), so I use Boson as one of the references for “new C&C“. In 2006 Boson received two updates – 0.12 and 0.13. Unfortunately since then not much has happened. There was some planning of the campaign story line, but the last edit on their Wiki was made on April 30th, and since then there’s not much going on. The last SVN commit was made 3 months ago. 0.13 is mostly a graphic upgrade over 0.11, the game is playable but it’s stil in it’s very alpha stage with very “generic” feeling of missions, gameplay etc.
Attal – This was my hope for “HOMM” like game. The website is totally down now, for the whole time it was dead and nobody updated it, but the development of the code has happened. The team (rather very small) did some coding this year, and they seems to be preparing 1.0RC release (+rewrite to Qt4 for 1.1?). The game is in non-playable state, at least for me, it requires huge update of graphics to catch up with the reality but who knows… Once 1.0 is out it may be very different.
Planeshift – amazing project. Open source MMORPG game. They’re very active, managed to create a healthy and alive community of developers, beta testers, players. It’s an a huge pleasure to watch them growing. When I was creating first PW, it was just past the 0.3.013 release. 0.3 was a long awaited update over 0.2, huge rewrite, very needed and awaited. 0.3.x line is much more about role playing than any other RPG game I’ve ever seen. At the time there was no fighting mode at all! The game has a lot of unique concepts, like their own races, unique economy system, interesting idea of Death Real which is a separate world where you “live” once you die and can stay there or try to get back to the real world, huge, multilevel idea of Game Masters, and many more. In the time frame between last PW and today, there were many minor updates from 0.3.013 to 0.3.020, but those updates are pretty lengthy – take a look at their website to list them. Short summary is about more skins, more monster variations, better cast spellings, update to stable Crystal Space 1.0 engine, many updates to crafting system, new areas, key/lock system and tens of hundreds updates to the graphic system. Overall the game is totally playable, the world is “alive” and there’s a great future for this game, as it’s one of the examples of huge and healthy open source community and system for players. The authors are not in hurry, have time and patient, and community is happy with current state, so in result I don’t expect any stupid rush, but steady growth which will make the game better and better all the time.
America’s Army – unfortunately, this is a case of a regression. After many years, the game devs decided to resign from Linux version, thus I’ve been following the game progress less carefully. The game is of course free by nature (free as in beer), so you can download the latest version being 2.8.2 and enjoy if you’re Windows user. According to WineHQ AppDB it won’t run on it 🙁 I’m waiting for 3.0 release which will base on Unreal Engine 3. I still enjoy playing the game but didn’t play much during last 1,5 year.
Glest – It’s another interesting project. RTS by nature, it’s a bit similar in structure and model of development to Nexuiz. It’s open source, but strongly driven by a solid core team and does not depend on the community itself. It gathers the community, but it’s definitely ot “driven by” a community. Since last PW it received major update to 2.0, but later there seems to be no active development in public taking place. Such projects are usually either driven by some fundings/sponsorship or as a project for studies. Not sure which happened here, but I hope it’ll go further. The current state is that the game is totally playable, it has nice graphics, but requires a lot of polishing to grow up from the Warcraft I kind of details.
Danger from the Deep – not much market noise created by the game, but it received multiple updates since first PW. It’s a submarine kind of game (Silent Hunter, Silent Service) By the time it was 1.0 stage, I remember chatting with it’s main dev about his plans and ideas and he was rather calm and confident about what he wants to make with it. I love such attitude in open source model 🙂 We have 0.3 version now, it’s pretty much playable and gave me a lot of joy, and there is a progress happening on the CVS. It seems that authors are deeply interested in a realism of the game, as they really try to reproduce the “feeling” of submarine with all the details (not like Silent Hunter, where you have candylike simplification of what a submarine work is). I believe that they need a bit of cleanup in sources, which usually happens in the middle between first alpha and first stable (~0.5) cause currently it’s all flat in one directory. The game seems to have great future ahead, although I think it would be easier if they will switch to some external graphics engine instead of developing their own (leverage).
Vega Strike – this project was nearly dead for last 2 years, but all of the sudden, we have 0.5 beta now! Also, we have a new website and it seems that the project is alive again, I’m just downloading to test it. From what I remember about 0.4.x line it had very nice graphics, but the world felt “empty” and it was hard for starters to find out what to do. I’m going to keep observing the progress.
Eternal Lands – magical project. The whole development is being made behind the scene, there seems to be no elements of a normal open source project (say, news, changelogs are on the forum), it has extremely active community, similarly to Planeshift I think, and huge world. It’s very stable, the graphics are very simple (reminds me Ultima Online), but it’s totally enough to enjoy the huge, full of quests world, many guilds, fan sites and, of course, players. It’s very mature as for an open source project. It has tutorial system, leveling system, fighting, etc. everything that needs to be for a successful game. I think that if the author could upgrade graphics to 2007 standards, it could storm the gaming world 🙂 Look at main dev’s blog for more news.
Nexuiz – As I mentioned before, it seems to be a project similar to Glest. No major community, small but strongly devoted group of devs (friends?) and an amazing result. Nexuiz is beautiful, and very carefully detailed game that is ready to use. New releases ( 1.5 ->2.3 since first PW) are mostly for new maps and performance updates. The team seems to be working on a new game, named Zymotic, but Nexuiz is still being developed. As always we’ll probably see it once it’s ready to use and will be able to only say “Wow!”. (I found an SVN repo for the game.It seems that it’ll use DarkPlaces engine, the same as Nexuiz)
Legends – I must admit, I didn’t follow the game progress at all. It seems to be developed actively but can’t say much about it. From what I remember it has nice graphics and platform, but that’s all I know 🙂
UFO A.I. – This game has interesting history – the game was initially developed in close source manner, by a small group of fans since 2003. After major slowdown in development in Q4 2005, the team decided to open the game and since then the project is pretty active, with release 2.1.1 in may 2007. I didn’t play recent releases, but from what I remember from time I did, the game is very nice and really has a “heart and soul” of UFO series. CVS repo is active (last checkins from week ago) and it seems project survived well.

In a summary I’d like to categorize the games via the development model. Please consider that all the games described here are free, and all but one are open source.

“traditional Open Source model pre-1.0” – strong role of a forum, wiki, bug tracker, low entry barrier. There’s a very thin line between users and a community. Actually most users of the product are part of the community that follows development progress, report bugs, take part in feature planning etc. Such model is usually pretty flexible, and projects are very active, with new code commits every second day or so. (UFO A.I.)
“traditional Open Source model post-1.0” – still a strong role of a forum,wiki,bug tracker, a bit higher entry barrier. The split line between users and a community is getting higher, but still the community takes a very active role in the direction of the project. There’s a thin line between community and developers. (Planeshift,Wesnoth, FreeCiv)

“silent project model” – the games that are passionately developer by a very small group of people (one or two), with low noise, low activism, very small to none community involvement. Those are the projects were the project leader is driven by the fun of game creating and while he definitely feels that he creates the project FOR users, he doesn’t need a community watching his hands and screaming his name to do his job. In such case, there may be 50 downloads per year and very small community noise, yet the game progresses with releases every year or two and exists for, say, 10 years.(examples: Boson, Attal, Danger from the Deep, Crystal Core)

“mod model” – in such model, we have a project based on some older game, usually proprietary, where fans of the game creates a community and development group of the project. It’s usually pretty much closed, the barrier is rather high, not because of the attitude but because of lack of time and interest in getting new people involved. In such model the development happens behind the scene, community knows where, while newcomers are just a potential users, there’s no effort in trying to get their energy used for the project. (examples: TA Spring, Legends)

“semi-closed model” – in such model, there’s a group of people that have some external motivator for their work (funding, university project etc.) and have no intention in raising the team. In such model the entry barrier is almost impossible to pass, there’s a strong line between contributors and users and there’s very small “community” that is made of users who stay users. The “community” in such model is just a forum users usually who may report bugs, propose ideas or talk to each others, and the dev team will respond from time to time. It’s very near to usual closed-source gaming model of community (think: community of America’s Army, The Witcher, or most other games). (examples: Nexuiz, Glest, Vega Strike)

Notice,that the first two -named (traditional Open Source) follows the Bazaar model, while the other three, may or may not use it being near to Cathedral one.

I know that some of the projects could hit a few models at the same time (Crystal Core being both, traditional OS before 1.0, and silent model, or UFO A.I. being traditional and mod model), but I tried to choose most important part of the model which seems to define other aspects of how the project is being developed.

Fair enough. In the next part I’ll present new projects from games part that I care about and will update this area.

Hope you like it 😉

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31