Categories
main mozilla tech

We have an important story to tell!

Hey @flod (and Giacomo!)! You touched interesting topics in your latest post, and when I started crafting my comment it got so lengthy, I decided to use my platform to deliver it. Blog-to-blog discussion style! 馃檪

I’ll try to respond, but please, bare in mind, that that’s my personal opinion, nothing official.

You started by pointing out a set of efforts that you either find of questionable value or not “leader” style. Things like Fx UI direction, multiprocessing, Jetpack or Personas.

In particular, you focused on two dimensions:

  1. Are those efforts unique? Innovative? Or do we chase others?
  2. Are those efforts valuable elements that fit into Mozilla Manifesto vision

You question both, and I believe you have all the rights on Earth to do so. We may disagree, but we should talk about this, and I find the fact that you express your concerns in public a good sign of the health of our ecosystem.

So, down to some points you raised. I humbly disagree with your notion of “cloning Chrome”. I believe it is a cognitive impairment that we so easily (we – I mean, most Mozillians I know) buy – this concept of “fresh Chrome”. Chrome is great! But it is not “innovative” in a sense many people talk about it. We just so easily take for granted a lot of inventions we brought to the world, and Chrome, yes, they just looked at Firefox and learned from us. That is just awesome, but that’s what they did!

Once again, Google is looking at an open source project and learns from them how to build a web browser. No, wait! Google, Microsoft and Apple are doing it. Now, how awesome it is? Think of all those things that Firefox brought to the browser landscape since its 1.0 version and notice how many of those innovations are now in IE8, Safari and Chrome.

They also have brilliant developers and *just* bringing all the values of Firefox would be a waste of time, so they, among other things, got a free ride of fixing things we struggled with. And now it is our time to fix those, and there’s nothing unhealthy in this. What would be, in my opinion, unhealthy, is to pretend we don’t see them, and defending our approach as “the right one” (remember Bill Gates first comments on Firefox 1.0?).

Ability to go multi-process is important. Majority of perceived performance improvements that Chrome has (and Fx 3.5/3.6 brought) come from two things: tricking user’s eyes – show him UI 200 ms before its usable – and putting expensive elements off the main thread. (I’m sure that performance team will be able to explain that better). The fact that you have to restart your browser when you install extension is a UX bug. No user expects it or wants it. It does not bring any value and the only reason we have it is a technical limitation.

For years we raised the bar of how the web browser should work. We set the standards in many areas. Opera set some, Safari set some, IE set some as well. Now Chrome set some standards and we just have to match them, possibly using extelligence of our brilliant dev team to push it further and innovate (Jetpack team is far from just fixing issues, they aim for bringing extensions to a new level, and they should be aiming for nothing less than that!). No reason to be worried, we make a great web browser better and it would be unwise to ask our users to trade those nice features for ability to use browser with Mozilla values. Why not give them the best of both?

Personas is an interesting project. I remember my initial feelings when I encountered Personas were much of “eee, nothing interesting”. I considered it to be a minor feature. I recognized that I’m not a target audience (neither you are, I think). But on the day of Fx 3.6 launch I got my lesson when I received amazing amount of feedback from my non-geek friends precisely about Personas and how this project resonated with them. It was amazing for me how emotional people got about “missing Real Madrid Persona, but you have Barcelona one!” or “the pink one is soooo cute” or “my browser is so much more personal now, when my you-name-it favorite actor/actress or symbol of my subculture is here”. Look at the amount of Personas created by people in such a short time! It is an amazing project and only now I see how it fits into Mozilla mission and vibe.

UI on the other hand is a much more complex thing, cause it is related to personal taste and fashion (and fashion itself is, from sociologist point of view a bizarre phenomena of human culture). But imo it all boils down to a simple aspect of cyclical changes. Windows 7 brought new UI, IE8 followed. Chrome followed IE8, Opera followed IE8 or Chrome or both, we’re following W7 or IE8 or Chrome or Opera – you call it. People expect browser to match the visual style of their operating system and Windows 7 is going to be the OS of choice for the vast majority of the world which, in result, will set the UI standard for the OS and apps for quite a some time. We can like it or hate it, but that’s going to happen, and Firefox on Windows should imo fit the OS style. What we will do beyond that is the major issue, and I believe our UI team is trying to come out with the value on top of that. Basing on past experience, I’m sure they’ll do a great job and we’ll see others learning from us. That’s how it works here. Would you prefer vendors to ignore each others accomplishments or deny them?

I disagree with you on your perspective of mobile world. I, for one, wait for Fennec on Android and I know a lot of people who do. I’m excited to think of how we can fit Firefox experience into Windows Mobile 7 and I’m sure it’ll be an exciting journey. Mozilla Messaging is going to generate projects related to forms of communication and I find this topic to be extremely important, so I have no worry about sustainability of it. Our embedding story is nothing to be proud of, but maybe it was a trade-off we had to do in order to achieve what we aimed for. I share your concerns here, and I see many of the platform team people discuss what we can do in order to make it better.

I see Mozilla pretty much self-aware of many of the issues you raised and diversified internally enough to have people raising concerns internally and open enough to have a ground to talk about them – your post is a part of it.

Bottom line

But ultimately I believe your concerns would be all valid if that would be all that is happening in the Mozilla project. If the whole community would work on either Personas, or marketing or UI. But is it? Do you really feel that those elements you describe represent, as you wrote “Mozilla project as a whole“?

I see Mozilla as a meta-project that’s involved in a huge number of projects that touch amazing variety of issues, and it is very hard to nail it down to one or two and call that “representative” for the community.

No matter what you think of Personas, I don’t think you can say that this effort matches what Mozilla is doing with Drumbeat, Bespin, Raindrop or Weave. No matter if you find Jetpack valuable, I hope you did not get lured by press foretelling the end of extensions as we know them. Can you name an example of a project that generated tons of thousands of dependencies and was irresponsibly killed by Mozilla? Have we ever done something like that? Then, do you really think we will do this?

We generate amazing amount of projects of very different kinds. Globally, our community is very diversified and in different points of their journey. Some communities need more marketing, UK, Korea, Sweden? Some, like Italy, Poland, Germany, may have enough internal marketing to consider Mozilla global marketing effort focused on promoting Firefox useless for them or even “too much”. We, Mozillians who live in those countries should act as a membrane which adjusts the signal, and gives feedback to our fellow Mozillians worldwide about what we need, and what we don’t. Poland has 52% of market share, and we need things like developers community or foundation-like efforts to use the potential are trust we generated over years as a platform to bring Mozilla values further, so we work with Mitchell, Mozilla Foundation and from time to time I try to get Paul Rouget’s attention 馃槈 At the same time, PR and marketing wise, we work with Polish PR agency, Barbara and others to balance the amount of press we generate to avoid wasting time to convince the convinced ones. That’s just adjusting. I believe that we should do that much more often in many countries which just are ready for different aspects of Mozilla project to stimulate and energize Mozillians.

Example? Here you are. You think we focus too much on marketing sites? Well, then you focus on other aspects! I believe that the concept of “we have to localize all websites to all languages” is not sustainable anymore. We will generate more websites/webapps, and our local communities will decide which ones to promote locally. We don’t have to have everything localize everywhere and that’s a great power you have to adjust the signal to your locale. Mozilla should make sure all websites/webapps/apps are localizable and let community decide which ones to localize. Focus on the ones that are most important for you!

We have so many projects to pick from! Of different kinds, using different techniques to address different aspects of the common value set expressed in Mozilla Manifesto. They’re also diverse in a way you think about them.

Some of them are truly unique and experimental, and massive – think of our JIT approach (it took a ride from MtV to SF airport for Taras to explain to my what is so different in our JIT approach but now I’m proud of what we’re aiming for), think of L20n, think of Ubiquity,聽 Bespin, Raindrop or Drumbeat.

Some of them, are application of Mozilla-way onto existing concepts. Weave is not innovative because it allows sharing data. But it brought privacy to the picture. SUMO is not the very first support platform ever, but the way we approach the concept of support is innovative and “Mozillian”. Our Metrics team is not the only metrics team in the world, but they do hell a lot of innovation on making their work public and open to contribution which is pretty unique. We may not be the first project ever to have marketing team, but we approach marketing and PR in a unique and innovative way.

Some of them are just a catch-up game and that’s also not bad. We have 350 million users, if someone brought a good idea to the world of web browsers and we can just make sure that 350 million Internet users may use Internet safer, easier and better then I find it pretty important thing to do and I definitely expect such actions from other vendors. (think: partial upgrades)

Ultimately, many of them are a mix of the ones above and as long as we are able to generate new projects that resonate with what people find important on the Internet, I think Mozilla makes an impact and has a bright future that we, including you and me, have to shape.

Categories
main po polsku tech

Blokowanie zalewu robot贸w MSNu

Mniej wi臋cej pod koniec grudnia, odezwa艂 si臋 do mnie dostawca hostingu dla aviary.pl – Dreamhost.

Napisali do mnie, 偶e niestety musz膮 mnie prosi膰 o zdj臋cie serwisu bugs.aviary.pl ze wzgl臋du na ogromne obci膮偶enie 艂膮cza jakie ten serwis generuje.

To oczywi艣cie do艣膰 mocny cios w postawowy mechanizm jakiego u偶ywa Aviary.pl do swojej pracy i na dodatek cios w punkt, na kt贸rym nie znam si臋 a偶 tak dobrze (nie znam kodu Bugzilli aby oceni膰 co wp艂ywa na jej wydajno艣膰). Pan z DH poinformowa艂, 偶e wygl膮da na to, 偶e nasza bugs.aviary.pl zu偶ywa ogromne zasoby procesora i rejestruje bardzo wysok膮 liczb臋 odwiedzin z robota MSN.

To natchn臋艂o mnie by za艂o偶y膰 system statystyczny i zacz膮膰 obserwowa膰. Z DH dogadali艣my si臋, 偶e zamykamy bugzill臋 do czasu gdy zrozumiemy przyczyn臋 tak du偶ego obci膮偶enia.

W okresie 艣wi膮t i nowego roku trudno by艂o znale藕膰 czas na prac臋 nad tym, ale gdy w ko艅cu si臋 zebra艂em, logi okaza艂y si臋 bezlitosne. Nasza Bugzilla dziennie obs艂uguje nasz zesp贸艂 (troch臋 ponad 20 os贸b) plus odwiedzaj膮cych, co powinno, na moje oko, dawa膰 ko艂o 60-80 os贸b dziennie, ko艂o 150 wizyt, ko艂o 600 stron.

Zamiast tego, w pa藕dzierniku rejestrowali艣my 艣rednio 6500 pobra艅 stron dziennie, w listopadzie 7000 a w grudniu dochodzi艂o do 8000. To znacz膮ca r贸偶nica zw艂aszcza, 偶e te 偶膮dania w znacznej cz臋艣ci dotyczy艂y z艂o偶onych kwerend wyszukiwawczych i za艂膮cznik贸w. Takie kwerendy generuj膮 najwi臋ksze obci膮偶enie serwera i s膮 najwolniejsze.

Nast臋pnym wnioskiem by艂o to, 偶e oko艂o 84% tego ruchu generowane jest przez hosty w domenie 65.55 oraz 66.249 za艣 najpopularniejsz膮 przegl膮dark膮 jest msnbot/2.0b kt贸ra pobra艂a w listopadzie 157000 stron, czyli oko艂o 13000 stron dziennie! Na dalszych miejscach by艂o Googlebot z 2500 i Yahoo Slurp! z 166 zapytaniami dziennie.

Pierwsz膮 reakcj膮 by艂o oczywi艣cie za艂o偶enie robots.txt, kt贸re powinno za艂atwi膰 spraw臋 oraz lektura google w poszukiwaniu podobnych przypadk贸w. Lekcja pierwsza m贸wi艂a, 偶e s膮 dobre i z艂e roboty. Dobre to takie, kt贸re sprawdzaj膮 robots.txt i jak ten m贸wi “nie” to nie przeszukuj膮 oraz z艂e, takie, kt贸re ignoruj膮 robots.txt.

Oczywi艣cie po chwili okaza艂o si臋, 偶e msnbot/2.0b, kt贸ry przychodzi do mnie z domen takich jak msnbot-65-55-104-75.search.msn.com czy msnbot-65-55-104-59.search.msn.com jest z艂ym robotem, kt贸ry ignoruje plik dla niego przygotowany (mimo, 偶e czyta go, oj czyta, tak mniej wi臋cej co 30 sekund przez 24h na dob臋!).

Ciekaw膮 reakcj膮 (obserwowan膮 tak偶e przez innych adminow) jest to, 偶e po w艂膮czeniu robots.txt, msnbot zaczyna oszukiwa膰. Mianowicie odpytuje robots.txt jako msnbot, znajduje informacj臋, 偶e go nie chcemy a nast臋pnie zaczyna indeksowa膰 strony podaj膮c si臋 za Internet Explorera 6.

Przyk艂ad takiego zachowania:

65.55.51.69 - - [03/Feb/2010:14:05:41 -0800] "GET /robots.txt HTTP/1.1" 200 319 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.110.210 - - [03/Feb/2010:14:06:12 -0800] "GET /attachment.cgi?id=453&action=diff&context=patch&collapsed=&headers=1&format=raw HTTP/1.1" 200 841 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;聽 SLCC1;聽 .NET CLR 1.1.4322;聽 .NET CLR 2.0.40607)" 
msnbot-65-55-232-33.search.msn.com - - [03/Feb/2010:14:06:30 -0800] "GET /attachment.cgi?id=132&action=edit HTTP/1.1" 200 477 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;聽 SV1;聽 .NET CLR 1.1.4322;聽 .NET CLR 2.0.50727;聽 .NET CLR 3.0.04506.648)" 
msnbot-65-55-232-33.search.msn.com - - [03/Feb/2010:14:06:44 -0800] "GET /attachment.cgi?bugid=189&action=viewall HTTP/1.1" 200 477 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;聽 SLCC1;聽 .NET CLR 1.1.4322;聽 .NET CLR 2.0.50727;聽 .NET CLR 3.0.30729;聽 .NET CLR 3.5.30729;聽 InfoPath.2)" 
65.55.110.210 - - [03/Feb/2010:14:06:47 -0800] "GET /attachment.cgi?id=784&action=diff&context=patch&collapsed=&headers=1&format=raw HTTP/1.1" 200 2160 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;聽 SLCC1;聽 .NET CLR 1.1.4325;聽 .NET CLR 2.0.50727;聽 .NET CLR 3.0.04506.648)" 
msnbot-65-55-232-33.search.msn.com - - [03/Feb/2010:14:06:50 -0800] "GET /attachment.cgi?bugid=189&action=viewall HTTP/1.1" 200 477 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;聽 SLCC1;聽 .NET CLR 1.1.4322;聽 .NET CLR 2.0.50727;聽 .NET CLR 3.0.30729;聽 .NET CLR 3.5.30729;聽 InfoPath.2)"
msnbot-65-55-104-75.search.msn.com - - [03/Feb/2010:14:07:16 -0800] "GET /attachment.cgi?bugid=1098&action=viewall HTTP/1.1" 200 477 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;聽 SLCC1;聽 .NET CLR 1.1.4322)" 

Uwa偶ny czytelnik zwr贸ci uwag臋, 偶e bot pos艂uguje si臋 r贸偶nymi UA stringami, r贸偶nymi adresami IP (tak na prawde to jest ich znacznie wi臋cej, rotuj膮 si臋 jednak raz na kilka godzin wi臋c trudno by艂o to 艂adnie uj膮膰 w wycinku logu) i identyfikuj膮 MSN albo przez host, albo przez UA, albo w og贸le! (wiersz drugi) W takiej sytuacji domy艣lenie si臋 koincydencji (oraz odwiedzin z domeny 65.55.*.* na polskim serwisie polskiego zespo艂u lokalizacyjnego) wymaga logicznego rozumowania i jest trudne do zapisania w spos贸b algorytmiczny.

Uwa偶am takie zachowanie za skandaliczne, cho膰 p贸藕niej okaza艂o si臋, 偶e nie oni jedni.

Google zachowuje si臋 w miare sensownie i utrzymuje sta艂e IP crawlera, co pozwala go wyci膮膰 po nim. Yahoo podobnie. Natomiast przedwczoraj zaatakowa艂 crawler WP, kt贸ry r贸wnie偶 zignorowa艂 robots.txt i przez 3 godziny indeksowa艂 ka偶d膮 kombinacj臋 kwerend wyszukania i za艂膮czniki jakie mamy w naszej Bugzilli.

Mimo to, 偶e takich problem贸w mam wi臋cej, najwa偶niejsze s膮 implikacje takich zachowa艅:

1) Jakie obci膮偶enie generuje dla serwis贸w dynamicznych taki robot. Prosz臋 zrozumie膰 skal臋! Ten robot atakuje falami, co godzin臋, przez 24 godziny na dob臋, przez wszystkie dni w miesi膮cu, za ka偶dym razem generuj膮c fal臋 zapyta艅 na poziomie 100 zapyta艅 w minut臋! I to nie s膮 zapytania o pliki statyczne, CSS, JS czy PNG. To wy艂膮cznie zapytania o strony! Jakie koszty to generuje, jakie obci膮偶enie… to absurd

2) Robot, kt贸ry wedle log贸w zacz膮艂 dzia艂ac mniej wi臋cej po uruchomieniu wyszukiwarki bing, ignoruje kontrakt spo艂eczny mi臋dzy autorami stron i wyszukiwarkami i indeksuje wszystko pomimo, 偶e pobiera te偶 robots.txt kt贸ry mu tego zabrania.

3) Jaki ma to wp艂yw na statystyki IE? W moim, bardzo starym mechanizmie statystyk (webalizer chyba – to dostarcza DH) ma ogromny. Aktualnie wedle niego bugs.aviary.pl odwiedza 80% u偶ytkownik贸w z IE6.

Na koniec dodam tylko, 偶e obecnie stosuj臋 niniejszy .htaccess do blokowania tego ile si臋 da.