HTTP Proxying to Solve Web Development Problems

By on October 8, 2008 12:30 am

An HTTP Proxy server relays requests between the HTTP client (e.g. your browser) and the server—whether it be out on the web, intranet or localhost. When it’s under your control, the proxy is a great place to inspect and debug client-server interactions over HTTP, log and report, tune and tamper with the requests the client makes, and the responses the server(s) produce. In this article I’ll show how to use Charles (one such proxy tool) to help solve a range of common web, and especially RIA, development problems.

The remainder of this article will feature screenshots and descriptions of how to use Charles—a shareware, cross-platform desktop application—but I’ll pause here to mention that it’s only one of many similar tools that fulfill the same or similar functions. You might know or care to look up: Fiddler, Squid, Firefox plugins like Firebug, Poster, RestTest as well as any number of solutions of varying quality in your programming language of choice. Onwards…

HTTP Monitoring

Let’s start simple. A question I’ve seen a few times on the #dojo IRC channel is “why do I get a ‘dojo not defined..’ error in the console?” Why is dojo not defined? Chances are it didn’t load, and chances are it didn’t load because the script path was wrong. You want to rule this out before you dig deeper, and you can check this lots of ways: Firebug’s net tab, your access logs. I like to use Charles to flag 404s:

Charles screenshot: dojo.js 404
Each request shows up as a row in the Sequence view, with the response status code and other particulars easily visible.

A variant on this issue is “why do I get responseXML is not defined” or “why is my callback being passed null?” This is just one of the side-effects you can encounter if your content is not being served with the expected mime-type. This too is made readily apparent:

Charles screenshot: blog/feed response is text/plain (reduced) Both the icon in the left column of the grid row, and the General tab show this response was delivered with Content-type: text/plain.

Charles screenshot: blog/feed response Content-Type is text/xml (reduced) When the correction is made on the server-side, the page is reloaded (you could also use the Repeat context menu item). Now we can see a text/xml response.
Charles screenshot: XML Response inspector (reduced) Charles offers a tree-style viewer for XML response bodies, as well as a formatted/syntax colored XML Text view, in addition to Text and Raw views for all text-type responses.


Note, because Charles can act as a system HTTP proxy, you can inspect requests from any browser, or in fact, any application that makes HTTP requests.

Charles screenshot: iTunes version check, showing a 302 (redirect) then the XML response
Here we can see the initial request to itunes.com/version, which returns a redirect, and the next request is answered with an xml document describing the most recent version of the software.

Accurate web mirroring

I’ve attempted and written many takes on this problem. The bit that always stumped me was how to (easily) capture only the pages or parts of an application of interest, and how to do that with complete confidence that what I mirrored is equivalent to what a user might see in their browser. I like Charles’ mirroring feature for this task. You pick a directory to build the mirror in, take your browser of choice and browse those pages, mouse-overing and button clicking as you go, to catch all those hidden image and scripted dependencies. Charles creates a directory structure using the request paths, and a file containing the response. It doesn’t re-write paths in the whizzy way that some dedicated mirroring tools do, but 95% of the time you can set up a new virtualhost rooted at that mirror directory, and view the result—images, CSS, JavaScript and all—as if it was the real thing.

Charles screenshot: Configuring site mirroring Configuring site mirroring—you can enable mirroring for as specific a URL as you choose. Each response body will be saved to the file system, creating a directory structure that reflects the host/post/path of each request.

Page Weight

Part 2 of the web mirroring task is measuring page weight.

Mac Finder screenshot showing results of a site mirror by Charles (reduced) The only way to accurately measure page weight is to browse the page and measure what was downloaded. It beats attempting to parse the page, or examining the browser cache; just take the mirror directory Charles created, and ask the OS its total file size. You can mirror with and without caching, mirror multi-page (or ajax-y) sessions, include 3rd party files and CDN content. Critically, the response as mirrored on disk is left gzip encoded – if that’s the way the server sent it.

Client-Server Inspection and Live Coding

We’ve all lived through the challenging situation where reproducing a particular problem requires stepping several clicks—or worse, forms—deep into an application. I use Charles’ request and response inspection, tampering and repeat requests features to speed up development and troubleshooting in this situation. When you get to the screen or state you’re interested in, you can examine each request and server response in detail: view image, XML, JSON, hex and text responses in the response viewer, or peruse the raw response. At any time, you can edit and replay a request – which allows for a workflow where you can iterate on the server-side code and re-request it from the client. You skip the tedium and noise surrounding the response output, but can remain confident that all session, user-agent and other awkward variables that constitute the client environment are accurate.

Ajax Messages

In a similar manner, examining JSON and other response from Ajax-style back-channel requests is convenient and easy. JSON can be viewed in its original (likely condensed) format, or a expandable tree format, as can XML.

Charles screenshot: JSON response inspection
The JSON tree view shortcuts the mental or actual formatting necessary to review a typical JSON server response—which typically removes newlines and indentation for reduced size on the wire.

Mapping and Rewriting

Many proxy applications offer a way to configure a mapping of urls from one point to another – so a request to sitepen.com/xyz would actually (transparently) send back the response from sitepen.somecache.com/xyz. More powerful still is the ability to create rules that rewrite requests and responses in a fairly arbitrary way to tune, re-jigger or completely mangle a given site/page/resource. In a development context I’ve found this useful for adjusting paths and/or requesting files on my own server in place of those held on the target server. Here’s one example that might resonate:

Charles screenshot: Remote Mapping editor, mapping dojo.js to dojo.js.uncompressed.js (reduced)
In this screenshot, I’m using the Remote Mapping editor to ensure requests for the dojo.js on sitepen.com actually get fulfilled with the response from /js/dojo/release/dojo/dojo/dojo.js.uncompressed.js. I could just as easily map to a path on any server.

Using Remote Mapping, you can enable sensible error logging output with meaningful line numbers, practical stepwise debugging and other niceties normally confounded by source code compression and obfuscation – with a simple rule. Actually you can go further: as the files now reside on your own server (and if they didn’t, you can quickly mirror them so they do), you can inject your own code to log, skip branches and generally rummage in code which you may not have easy access to modify. Mapping to localhost URLs works fine, but for this particular use case you might also take a look at the Map Local dialog, which allows URL to local file mappings.

Conclusion

I hope this has given you some ideas on how you could be using an HTTP proxy. I’ve concentrated on Charles because I think it is a great tool, and being Java-based isn’t going to leave half of the readership nonplussed. Fiddler also offers great options, in particular its scripting hooks and the extra response viewing options, to view as chunked/encoded or not. There are numerous features in Charles that I’ve not covered here, including convenient charting and summaries, up/download throttling, reverse-proxying and more. Charles is just one tool in SitePen’s arsenal for performance testing, profiling, and optimization services. I know I’m not the only one working in this way. So, please comment with your thoughts and remember to support Charles and shareware in general!

Comments

  • Correction: when using the “mirror” feature of Charles, the files on disk will be uncompressed, so they are not an accurate indication of *exactly* what was downloaded. They are also only the response bodies, so the headers arent accounted for there. A better measure of total page weight is to sum the response header/body columns.