Analyze your code, errors, interface, and marketing effectiveness with dojox.analytics

By on March 13, 2008 4:14 pm

It’s not very often that I get to work on some software that has the potential to appeal to developers, testers, designers, and the marketing team all at once. And of course when I do get to work on something like that, it usually means there is a significant amount of pressure to get it done and done quickly. My work on dojox.analytics has been one of those rare instances when I’ve been able to work in peace on writing simple and useful code that can entertain a wide variety of use cases.

dojox.analytics is a small project, both in aspirations and in size. It has a simple goal of logging browser and application data to the server for review. This data can be used to monitor application performance, effectiveness, and quality, or it can be used for custom data collection to identify or monitor a business-specific use case.

The software is a tiny little logger that has a very loosely defined plugin system. It is a collection of objects that monitor some specific aspect of an application or its environment and then pushes the data it collects to the main logger, which in turn pushes this data to a server at a configurable rate. Currently there are plugins for the console, window info, Dojo Toolkit info, mouse position sampling, and mouse click events. Not too complicated, not too difficult, but it opens up a world of utility.

None of this, of course, is a new idea; we build on what exists and what we can see as other uses for a utility. In this case there are a number of different products that do something similar, Google Analytics not least among them. There are also other products such as Firebug for iPhone that essentially do the same thing, but for an entirely different purpose. dojox.analytics is meant only to provide client side code that can enable these other projects, and do so in a way that is simple and will not get in the way of the loading or performance of an application.

What is in dojox.analytics?

The _base package of dojox.analytics defines a singleton, which is designed to be loaded at the beginning of an application but really only needs to be loaded before any of the plugins, so that they have a logger to attach to. The core code has basically one useful method, addData().

dojox.analytics.addData("SuperImportantModule",
	"There was a very serious error here");

addData()” takes an arbitrary number of parameters, packages them as a single data point, and adds them to a send queue. The configurable poller periodically processes the queue, sending any newly received data off to the server. The interval is defined by the “sendInterval” parameter which is passed as a djConfig option. The post to the servers defined to be sent to the URL defined by “analyticsUrl” and using a method defined by “sendMethod”. The method is either “xhrPost” or “script”, with the default being script based io (JSONP style).

Given that you could potentially collect a lot of data quickly if you aren’t careful or are overzealous, the size of a request can grow outside of the bounds of a valid script based request by exceeding the URL length limit imposed by browsers, and most seriously restricted in IE. In order to avoid related problems, a “maxRequestSize” parameter can also be defined (defaults to 4000) which will not allow a request to be larger than the given size. It also caps IE’s size below this regardless of the “maxRequestSize” specified. Requests that are larger than the max size are automatically split up and delivered as multiple requests. No data will start to be sent, regardless of what has been collected, until after the page load event has fired and the application as been given the opportunity to startup. We want to make sure not to disturb the performance of the application at all costs. It’s better to lose the data than get in the way.

That’s pretty much all there is to the base—the real interesting work is in the plugins.

consoleMessages

The consoleMessages plugin connects to a definable set of events on the console object and passes any of their parameters to the logger to be passed to the server. By default these parameters are error, warn, info, and rlog. The plugin verifies the existence of the console object, creating one if necessary. It then attaches to one, or failing their existence, creates the methods as necessary. The method names simply get added to the server logs as part of the existing addData call. Internally, the plugin wraps addData using dojo.hitch. For example, dojo.hitch(dojox.analytics, “addData”, “consoleMessage”, methodName, arguments); Any methods that didn’t already exist on the console object are added to the console object, but will have the effect of logging data with that method only to the server. If the method already existed on the console object, then it will be logged to the console as normal, but will also be logged to the server. The “rlog” method is created by default as a way that an application developer could specifically log to the server without logging to the console, even when Firebug is enabled. As many of these functions can be added to provide the exact amount of granularity you might want.

dojo

The dojo plugin packages up the information that the Dojo Toolkit sniffs on load such as browser information, Dojo Toolkit version, etc.

window

The window plugin collects information from the window object and packages it up for collection.

idle

The idle plugin tracks whether a user has become idle and/or regained activity. The length of time before coming idle is controlled via the “idleTime” parameter in djConfig

mouseClick

The mouse click plugin tracks any time the mouse is clicked in the window. It records information such as mouse position and target information. This can be used to track clicks for items that are leaving a site/page. Every attempt is made to get this data sent off to the server before the page is lost, and is often successful, though not 100% (sometimes the data doesn’t get logged because the browser moves on to the next page before the final log gets sent or at least before it has been completely sent). Any suggestions for improvement here are welcome!

mouseOver

The mouseOver plugin simply samples the mouse every X seconds where X is defined by the “sampleDelay” parameter which defaults to 2500 ms. The data is similar to that of the onclick including the targets of the items the mouse is over and the mouse coordinates. There is also a “targetProps” parameter that allows you to define which properties of a target (target, originalTarget, explicitOriginalTarget) you want to track. Note that before messages are sent off to the server they are converted to JSON, and so care needs to be taken not to include targetProps that would create infinite recursion in dojo.toJson().

All in all a simple set of functions that collect data and log it to the server in an unobtrusive fashion. Of course this is all the easy part. The real work is involved in analyzing and making sense and use of the data. With other applications such as Google Analytics, the logging goes to the service provider, who then do this data analysis for you. Google undoubtedly has data mining experience as well as a large amount of processing power. At the same time, others like to log data to their own servers. Maybe they don’t like to share their data with third parties or perhaps they simply want to combine this data with data from other sources such as their log files.

This data can clearly be used for marketing analysis to see if changes to the platform need to be made, but it is the other analysis that it can open up that I’m interested in.

  • Testing and Quality Control: The system can be used to collect data from applications in beta or in testing or even to simply record serious errors for applications that are live in production.
  • UI analysis: Information such as common use paths, heatmaps, and other such User Interface data can be collected and analyzed.
  • Debugging: Simply logging all console information to the server can provide an easy “remote” debugging console for any browser, though it is particularly useful for debugging IE given all the scratches on my cornea from previous IE debugging sessions (though the recent IE8 has a Firebug like utility that sounds promising)

Now you might be thinking, yah there are possibilities, but do I want to go through the trouble? Is it hard? Do I have to learn the Dojo Toolkit? Can I use it with Foo Package?

The answer is that it is easy and works with anything. If it doesn’t it’s a bug in my mind. There are essentially two ways you might want to use the package. One is for Dojo Toolkit users. These lucky individuals with incredible foresight will only have to dojo.require() the package like anything else they are already doing or even use the Dojo Toolkit build system to build dojox.analytics into their own custom build layers. Unfortunately there are plenty of people out there who don’t or can’t use the Dojo Toolkit and they deserve the benefits just as much as I do. For them, we provide a custom Dojo Toolkit build with analytics automatically included. This is a single script tag that can be included in any arbitrary page. Configuration parameters are set as attributes on the <script> tag, just as they are with djConfig, and no other code or configuration is required. This can even be loaded cross-domain from the AOL CDN. Inclusion might look something like this:

I don’t think it can get much simpler than that.

What about size of this package, you ask? While I’m aiming to make a custom minimized build of the dojo _base that will get included as part of the above build, I haven’t completed that yet. Currently the build is 26KB compressed and gzipped, about 1KB larger than the standard dojo.js. Realistically I think this can be shrunk a little bit further by removing things from base that aren’t needed for this project. The Google Analytics ga.js file is currently 19KB, so don’t think this is doing all that bad for a first pass, not to mention this assumes that _all_ plugins are used, which is likely not to be the case in many instances, allowing for further optimization.

Thoughts? Suggestions? Other use cases?

Comments

  • I’d like to start using it, but dojotoolkit v 1.0.2 doesn’t have an alpha version of this code, the only way to start using it is downloading source and including it in dojox or you have a package with all code needed to use dojox.analytics?

  • Yes, it didn’t get done for 1.0.2, which i failed to mention in the article. It is in 1.1. However, you can download just that one dojox package (the analytics.js file and analytics/ folder) and it should work fine with 1.0.2.

  • Is the analytics.js file or _base.js file?
    I guess is _base.js file and rename it to analytics.js

  • Cristian: There are profiles in the dojox/analytics/profiles/ directory. One of these profiles will create a version of dojo.js that has analytics built in, and the other will generate analytics.js which is just the analytics package. You could rename the first (dojo.js + analytics.js combined) to dojoAnalytics.js or whatever you wanted and include it with a script tag. The other build (analytics.js) could be included either a) as a script tag after a standard dojo.js script tag or b) as a dojo.require(“dojox.analytics”); The second build is designed to replace the analytics.js file that is in the dojox directory.

  • This is great!

    You already touched on it, but this really lends itself to usability and tracking mouse movements, which is sort of the poor man’s eye tracker.

    And there’s nothing worse than launching an app and finding out that you lost customers during that critical first push, because they were gettings errors not picked up in QA. No excuse now, you can (should) log those errors and send them to the server.

  • Cristian:

    I now understand your problem. There was a file missing in the commit (analytics.js from the dojox root directory). I have updated svn now. This was a file of includes that basically requires the _base file.

  • Pingback: SitePen Blog » Introducing the Dojo Toolkit 1.1()

  • David

    Dustin-

    Is there any page available that handles the data logged?

  • Dipesh

    Excellent start! How difficult would it be to extend the base API so to work with custom formatters and schedulers? For example I’d like to update the logged entries on a server only upon dojo.onUnload event and not specific time interval. I would also like to update the server only when a specific queue size is reached. There may be other variations as well.

  • David,

    No currently this is only the client side of things. I’ve been (very) slowly working on some ideas for the other side. I have written a comet based tail -f equivalent that will push the events to a page as they come in so you can watch them, but the page itself doesn’t do much in terms of making sense of the data…it just regurgitates it.

    Dustin

  • Dipesh,

    Thanks.

    In regards to custom formatters, I think you are really referring to the structure of the individual data messages, correct? The base api doesn’t have any requirements as to how those messages are formatted as long as they are a JSON object. The formatting of the individual messages is done by the plugins and simply pushed into the queue.

    As far as schedulers, I don’t think it would be all that hard, but I’m also not sure what you have suggested will get you where you want to go easy either.

    Making it push only after certain queue size is reached would be fairly straight forward, though I’d think you’d still want to use the interval timer as this keeps things from interfering with ongoing actions. For example, you’d setup the interval timer to only push data to the server if the minimum queue size had been reached.

    As far as not pushing until onUnload, while this would be pretty simple to code, I would expect it to be fairly unreliable. We already make every effort to push any unsent data to the server onUnload, but there is no guarantee this will succeed. Functions in onUnload (at least i/o calls) are such that we make the request, but since the browser is aborting the page, it might not have time to finish. For example, one of my goals out of this was to be able to record someone clicking on an <a> tag and have it be reported on the server and allow the link to continue on as normal.

    Hope that helps. Feel free to drop me an email if you would like to discuss further or need additional help.

    Dustin

  • Dipesh

    Thanks. Agree with you on all the points. The only implication with using a timer is the possibility of loss of data when navigating away. Given onUnload is unreliable on IE, may not have any other option. Too bad but understandably a difficult problem to solve.

  • Dipesh,

    Actually you currently get the best of both worlds. It uses a timer throughout the ‘normal’ execution of the application. However, onUnload, it makes every attempt to send of anything that remains in queue despite the timer’s status. So really its already taking care of that (to the best of its ability).

    Dustin