Your team realizes that testing is valuable. Your team creates tests for already completed features. The result was encouraging and your team even found a few bugs hiding in the application. Your team returns to feature development and attempts to add new tests. Features took more time to complete because of the need to create corresponding tests, but your team remains committed to testing. However, some tests keep breaking and some aspects seem too difficult to test. The test suite begins to get neglected and trust in the tests fades. What went wrong? (Be sure to also catch part one of this two-part automated testing series – Why Automated Testing Matters)

Just like poorly written code, poorly created tests become an impediment for progress. Poorly written tests become more of an impediment than missing tests because bad tests make it more difficult to make changes. Tests should facilitate faster development, so let’s look at some keys to writing effective and maintainable tests.

Key Terms

Test Suite: A collection of tests run together.
Unit Test: A test which runs a small “unit” of code in isolation, mocking out dependencies.
Functional test: Also known as an End-to-End test, for web applications these test the functionality of the user interface by simulating interaction by a user.
Mocking: Replacing a related section of code with an alternative to help isolate a test from other code or systems.
Continuous Integration (CI): Process for automatically building and testing a codebase when new code is added. Common tools for CI include Jenkins and Travis.
Flakey Test: A test which sometimes passes and sometimes fails without code changes
Test Runner: A software program to control running the tests of a codebase.
Code Coverage: A measure of how much of the codebase is run during testing

Tests Should Pass

It may seem like a given, but a test suite should pass. Failing tests often end up in a codebase when a developer makes a quick change without running the tests, or when a dependency gets introduced that requires environment configuration. But one developer’s breaking change impedes the rest of their team from creating their own tests. Eventually, the team stops checking the tests altogether because there are already too many issues.

To prevent breaking the test suite, tests should get automatically run in a Continuous Integration (CI) environment when new code gets added. If new changes cause tests to fail, the CI system should alert the team that a failure was introduced to the test suite so that the developer who broke the tests can resolve the issue quickly. This catches regressions quickly and allows developers to focus on delivering features.

Tests Should Always Pass

This isn’t redundant. If the tests pass, the tests should always pass. A “flakey” test causes confusion by giving different results even when the code hasn’t changed. Examples include tests that might fail at different times of the day (or in different timezones) and tests that suffer from race conditions. These tests are problematic because it takes time to determine if a recent change broke the application or if the test is bad. Flakey tests may fail weeks after they were added and are often difficult to reproduce consistently.

To reduce time debugging tests and retain confidence in a test suite, resolving flakey tests should become a priority. Taking note of unexplained test failures can help narrow down the issue. Sporadic test failures could point to an actual issue with the code, such as poor handling of daylight savings time. Unit tests can get stabilized by making tests independent of the time of the runtime environment. Functional tests can be harder to debug due to their more complex runtime environment such as mocking date utilities. Start functional tests with a consistent set of data and add checks to ensure data has fully loaded. Some situations, like interacting with an external system, can be difficult to reliably test in a functional test. In these cases a unit test where the environment can be more controlled and mocked is better than a flakey functional test. A stable testing suite makes test results clear and reduces time spent on debugging.

Tests Should Be Fast

Testing should bring value to software development teams, not waste developer time. Especially when writing tests, developers need quick feedback from the test runner. If it takes too long to run and diagnose tests, developers will avoid fully testing the application. Tests are usually slow because of a combination of complexity and test quantity. Running tests strategically will minimize the time costs of repeated testing while maximizing quality control.

Unit tests which are isolated can be run very quickly because of their reduced complexity. Use these tests to validate multiple combinations and error handling scenarios on small units of code. Test runner software often supports options to get results more quickly such as automatically re-running the tests when a file changes or selecting a subset of tests to run relevant. Then, the entire unit test suite should be run just before introducing the change to the team codebase to ensure there are no regressions. By keeping individual tests limited in scope and using test tooling well, unit tests allow building out a great test suite without losing time.

Unit tests may be fast, but functional tests demonstrate all the pieces of the application were deployed and work for the user. To do this, they take more time to set up and run and are more susceptible to returning the flakey results. Functional tests should focus on a few important workflows to validate that the application as a whole is running correctly. Focusing on key workflows allows covering a range of integrations without duplicating unit tests. If working on unrelated parts of the codebase, developers should be able to rely on a CI tool running the functional (and unit) tests automatically when changes get made so that all new work gets automatically validated. When used together, many fast unit tests and few key functional tests make an efficient balance of time and effectiveness.

Add New Tests When Adding New Code

Test writing should be part of your team’s development planning and process. Writing tests when implementing a feature helps avoid gaps in the test suite and helps teams to accurately estimate development efforts. Writing tests concurrently also pushes developers to have fully tested functionality before features get added to the codebase. Sometimes having additional test cases written by dedicated test engineers can work well, but typically only at a higher, functional test level. Having developers write their own tests ensures they understand the feature requirements and gives them the opportunity to make their code more efficient.

In our experience getting teams in the habit of adding code and tests together happens by making tests an expectation of the code review process. Engineers can encourage each other to add tests while they are reviewing the feature. Tests also help engineers understand the code they are reviewing, reducing back-and-forth questions from unclear code. Some teams take this to the extreme and use a pattern called Test (or Behavior)-Driven Development (TDD/BDD) to define the requirements in tests before adding new code. This is a more advanced practice but any team that agrees to require tests with new code will have fewer regressions and rarely needs to stop all feature development to improve the test suite.

Tests Should Be Independent of Implementation Details

Violating the separation between tests and implementation details occurs frequently because it only becomes a problem when the underlying code changes. Tests that depend on implementation details fail to focus on the purpose of the component getting tested. For example, users don’t care how sorting gets implemented, they care that the order is correct. Developers should be free to refactor internal code with limited impact on the test suite.

In many cases it may be impractical to avoid all implementation details. Querying for elements on a web page always involves some degree of implementation detail knowledge. However, there are approaches to creating more stable selectors. Instead of an XPath selector which will break if the element gets moved in the page, use a selector that isn’t dependent on parent elements like a unique CSS class. A better selector is one that won’t be changed for other reasons, like a [data-test] attribute selector. When testing classes, avoid testing internal class state. Focus on testing the parts of the class that are used by other parts of the codebase. This will reduce the amount of time needed to fix tests needlessly broken due to implementation details.

Tests Should Be Independent of Each Other

A test suite can start to feel overwhelming if one small change causes 50 tests to start failing. Additionally, it can be very challenging to diagnose errors if a test’s initial conditions get changed by other tests. Reduce the amount of time it takes to fix test failures by making each test case independent.

Most test runners have an option to do setup in some kind of “beforeEach” block which gets run before each test. This is a great place to set up common preconditions for your tests so that each test has the same starting condition. Typically, this includes things like creating class instances and initializing mocks. Make sure that each unit test uses its own instance to avoid one test affecting others. For functional tests, it can be more time-costly to repeat setup in each test. If each test isn’t completely independent, try to make groups of tests independent by starting the tests at a predictable point and adding initial data specifically for that group. Isolating tests reduces debugging time when test failures occur by preventing one test failure from cascading and breaking other tests.

Track Your Testing Code Coverage

Use tooling to track how well your tests cover the codebase scenarios. Many unit testing tools are able to output reports of what percentage of the code was run by the test suite. Code coverage analysis gives a quantitative value to testing efforts and allows teams to track their progress. Teams should set a goal of maintaining at least 90% code coverage and run coverage reports regularly.

Coverage reports are an important tool for highlighting where test coverage is missing. This number quantifies how much of the code was run by the testing suite, not actually tested. This means that the number of test cases doesn’t always correspond to test coverage percentage. Similarly, achieving 100% code coverage is not the goal since it often requires adding superfluous tests. Teams should use discretion to determine the specific test cases they need to cover. However, coverage reports are one valuable indication of how well a team is doing at thoroughly testing the codebase.

Conclusions

It is important to remember that working software is the goal of software development. Testing is one aspect to help achieve that. Teams should regularly consider how well their testing suite is helping the team deliver a quality product to users. When testing gets done effectively, the benefits should far outweigh the costs in the long term.

Teams should start by getting their testing suite passing quickly and consistently. This will unlock the test suite as a validation tool. From there, teams should begin adding tests that pass independent of implementation details and other test failures. Add code coverage reports to give the team a quantitative value to watch and a goal as they begin to ship more stable features to users.