Automating browsers provide many benefits including faster execution of repetitive tasks, ability to parallelise workloads and improved test coverage for your website. Google recently announced Puppeteer, a new tool to assist with Chrome browser automation.

Code examples are included so you can follow along. If you are familiar with browser automation already, feel free to jump to the section titled “Puppeteer: A practical example” which includes more advanced usage of Puppeteer.

Here we will cover:

  1. A quick introduction to browser automation: How it’s typically done & use cases.
  2. What is a headless browser: How it differs from non-headless.
  3. Chrome Headless: The command to run Chrome in a headless environment.
  4. Puppeteer: Controlling Chrome programmatically.
  5. Puppeteer: A practical example (includes code).
  6. Firefox Headless: An update about headless support.
  7. Conclusion: Recap

A quick introduction to browser automation, including use cases

Browser automation enables you to programmatically control a browser. For example, you could do some of the following:

  • Observe product pricing updates on an online store to discover the best time to purchase a particular product.
  • Log into your online banking account to download statements on a periodic basis.
  • Write functional tests or acceptance tests against a website you develop, in order to validate user functionality.
  • Complete a long and tedious HTML form which typically requires repetitive manual entry.

All modern browsers may be automated, including Chrome, Firefox, Edge & Safari. You can also automate mobile browsers. This post will briefly cover Chrome & Firefox.

For most use cases, to control a browser programmatically, you should use high level browser automation software. One popular choice is WebDriver, which we leverage for automating functional testing with Intern.

What is a headless browser?

You are most likely reading this article in a browser. Notice that the browser has supporting elements for your use, such as a menu bar, address bar, and toolbar. These items are part of the GUI (Graphical User Interface). A headless browser has no GUI and no visual components. It runs as a process, and will expose a mechanism to enable outside interaction from source code or other software programs.

Chrome Headless

To run a version of Chrome (nightly Canary builds) in a headless environment, you can use one of the following commands:

Mac OS X:

/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --headless --remote-debugging-port=9222 --disable-gpu

Linux:

google-chrome --headless --remote-debugging-port=9222 --disable-gpu

Windows:

chrome.exe --headless --remote-debugging-port=9222 --disable-gpu

The remote debugging flag enables you to use DevTools for some remote inspection of the headless browser tab.

In the past, you may have used PhantomJS to achieve your browser automation tasks, however Chrome headless runs faster and also consumes less memory.

To programmatically interact with this headless version of Chrome, you can send commands over the Chrome DevTools protocol. Using the DevTools protocol, you can do most things you would do in your usual browser DevTools. Here’s an example of communicating over the DevTools Protocol via WebSockets to retrieve the current page URL from the inspected page:

// Gist: https://gist.github.com/umaar/ebc170660f15aa894fa4880f4b76e77d

// You would use your own URL here

const devtools = new WebSocket('ws://localhost:9222/devtools/page/69990451-aaab-4ef8-87b1-ea77b8101b2a');

devtools.onmessage = ({data}) => {
	const {result: {result: {value}}} = JSON.parse(data);
	console.log('WebSocket Message Received: ', value)
};

devtools.send(JSON.stringify({
	id: 1,
	method: 'Runtime.evaluate',
	params: {
		expression: `'The current URL is: ' + location.href`
	}
}));

As you can see, considering what it does, the code above is low-level and verbose. Puppeteer provides a more concise API to automate browser operations.

Puppeteer

The automation code you write using the Puppeteer API actually makes calls over the DevTools Protocol API, which is exactly what’s covered in the previous section. Instead of having to craft WebSocket payloads, you can call APIs such as:

page.goto(‘<a href="https://example.com">https://example.com</a>‘)

.

As shown in the Puppeteer documentation, you can run Puppeteer from Node.js code like this:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});
  browser.close();
})();

The example above does the following:

  1. Launches Chrome.
  2. Opens a new tab.
  3. Navigates to example.com.
  4. Takes a screenshot of the current page.
  5. Closes the browser.

Puppeteer includes code examples and API documentation.

Puppeteer: A practical example

You could call the previous example (taking a screenshot of a page) the ‘Hello world’ of browser automation! It is useful to have a more practical example, or an example which uses a wider set of APIs, for demonstrative purposes.

The SitePen contact page includes a contact form. After some debugging with DevTools, notice the form fields are customised from a CMS (Content Management System), the fields themselves are created after a client-side JavaScript request.

Contact Page JSON

Firstly, consider the following hypothetical scenario: we have some browser automation tests which navigate to the SitePen contact page, fills in form fields and submits the form. Before we discuss a solution, let’s understand the potential problems.

In browser automation scenarios, what are the potential problems which can arise when allowing third party resources to download?

Tip: Notice that some of these problems are not unique to third party requests. Imagine your website fetches some comments for a blog post from your own server via Ajax, think about what problems listed below would also apply to that example.

Speed

Browser automation is notoriously slow. Often due to varying factors, like hard ‘sleeps’ rather than polling for changes, third-party scripts which are slow to download and slow to modify the page (which automation frameworks usually wait for), and many other aspects.

In this example: Network speed is a small issue.

Contact form point time

As you might notice, the form itself appeared at the 1.58 second mark, which is almost half a second after the first paint. You might think saving a few hundred milliseconds is not a big deal, but consider that many companies run large suites of browser automation tests frequently throughout the day, as the codebase changes. Seconds add up!

Reliability

Third party services can:

  • Go down for maintenance.
  • Unexpectedly return challenge pages (e.g. captchas) when they detect high traffic from the same IP address.
  • Charge you money for higher than usual API access.

These issues can add to the frustration of working with your browser automation setup.

Consistency:

Imagine your automation scripts target elements on the page via CSS selectors. If these same elements can appear or disappear based on what returns from a CMS, your automation scripts will likely fail. Dynamic data (rather than static data) can be challenging to work with in automation scripts as your code must accommodate multiple scenarios. This can also increase the complexity of your automation codebase.

Notice how speed, reliability & consistency are three potential problems which you can alleviate by mocking resources to third-party services. Generally, your goal is to manipulate a resource to cater to your specific use case.

Form Intercepted

Notice how the form field labels include the text ‘intercepted’.

You can find a complete code example which creates the screenshot above, on GitHub. The code example on GitHub does the following:

  • Starts a static web server to serve the mock file.
  • Enables request interception via Puppeteer.
  • Navigates to the SitePen contact page.
  • Observes all network requests.
  • Intercepts the network request which populates the contact form.
  • Forwards the request onto a static resource on the filesystem.

Firefox Headless

Firefox also offers a headless mode. Additionally, Mozilla offers JavaScript and Java code examples for how to utilise Firefox’s headless mode. At the time of this writing, Firefox headless is only supported on Windows, with plans to support other platforms in the near future. There is a guide for connecting WebDriver to a headless version of Firefox here:

Conclusion

We covered a number of topics:

  • Headless Browsers: Chrome and Firefox are able to run without a GUI. There are more options out there for other browsers (e.g. Safari & Edge)
  • DevTools Protocol: This offers an API for controlling Chrome over WebSockets. We saw how you can drive a remote browser from a client side web page using the WebSocket interface.
  • Chrome Headless: Chrome can run in a headless environment. Native support for this came out recently.
  • Puppeteer: This software offers a high level API to control the Chrome browser via the DevTools protocol.
  • Network Interception with Puppeteer: To demonstrate a more interesting use case, we saw how to modify network requests on the fly to provide a stable and speedier automation setup.
  • Firefox Headless: Firefox can also run in a headless environment. Native support for this came out recently.

Leveraging these techniques for browser automation, we can develop solutions to solve problems, whether it’s for quality assurance, productivity enhancements or data aggregation.