Recently, a number of performance tests have been released that are based on the performance of the top 100 web sites such as SpriteMe savings, the IE8 100 top sites test results, or the JSMeter research. These are in direct contrast with tests such as ACID3 which attempt to test the future of the web rather than just what’s possible today.

These efforts are outstanding and highly useful, especially the JSMeter work and their valiant effort to redefine performance tests that are indicative of today’s web apps.

I completely agree with one of their stated goals:

We hope our results will convince the JavaScript community to develop and adopt benchmarks that are more representative of real web applications.

However, I disagree with their approach: they are testing the performance of today’s already optimized sites! There’s nothing in the middle of testing today’s sites and the more unrealistic “test every feature” ACID tests.

I believe more accurate tests for tomorrow would be useful, testing what’s pushing the limits of the web today, but are not currently top 100 sites. My main objection with comparing performance across the top 100 web sites is this: The top 100 web sites are already relatively highly performant, because they are optimized for what’s possible today. They have and continue to improve thanks in large part to the work of Steve Souders and others in the performance optimization community. Because it costs significant amounts of money in server operations fees and bandwidth, high traffic web sites generally dedicate considerable resources to highly optimizing their sites. High traffic web sites also face significant competition and are highly scrutinized for acceptable page load time. Budget and competition result in popular sites not deploying code that makes pages load slower than their desired performance threshold. Even more importantly, top 100 sites have the budget to make their app work in the future when things change. You can dedicate people on your team to squeezing out performance improvements in all aspects if you have the budget for it. Most web apps cannot afford to do this.

When we’re testing the performance of new browsers or analyzing page load performance, we should also really be looking at what the top 100 sites will look like in terms of features and expectations in five years! So how do we do that today? There’s no simple answer, but here are some ideas:

  • Test popular web apps, e.g. mint.com populated with large amounts of data
  • Test apps that don’t support IE6, e.g. Google Wave
  • Test all sections of popular sites, not just the home page, through an automated performance test harness
  • Test ridiculous configurations of popular applications, e.g. enable every feature in modular applications until they slow down
  • Test apps over long amounts of time in the browser, not just initial page load time
  • Test 50 apps, each in a different tab, all at once, and see how fast you can make a browser like Firefox or IE crash!
  • Test throttled networks that emulate the profile of mobile and satellite networks, slow hotel wi-fi networks that often limit the length and duration of connections, corporate proxies, tech conferences, and countries with overloaded pipes (e.g. YouTube in New Zealand)

Only when browsers are pushed to their limits do we see where they break down, and how sites break them. We also need tests and tools (such as instrumented usage of YSlow, PageSpeed, SpeedTracer, etc.) comparing the most complex apps and how they perform across the various browsers, as today’s complex app is potentially tomorrow’s median site.

To be clear, I’m not saying “don’t optimize for today”. I’m saying, stop comparing cutting edge sites to Google search results. Lumping these two together in a common test is like putting apples against oranges because they are both round fruit.