Get Thunderbird Donate
featured post title image

Automated Testing: How We Catch Thunderbird Bugs Before You Do

Since the release of Thunderbird 115, a big focus has been on improving the state of our automated testing. Automated testing increases the software quality by minimizing the number of bugs accidentally introduced by changes to the code. For each change made to Thunderbird, our testing machines run a set of tests across Windows, macOS, and Linux to detect mistakes and unintended consequences. For a single change (or a group of changes that land at the same time), 60 to 80 hours of machine time is used running tests.

Our code is going to be under more pressure than ever before – with a bigger team making more changes, and monthly releases reducing the time code spends on testing channels before being released.

We want to find the bugs before our users do.

Why We’re Testing

We’re not writing tests merely to make ourselves feel better. Tests improve Thunderbird by:

We’re not trying to completely cover a feature or every edge case in tests. We are trying to create a testing framework around the feature so that when we find a bug, as well as fixing it, we can easily write a test preventing the bug from happening again without being noticed. For too much of the code, this has been impossible without a weeks-long detour into tests.

Breaking New Ground

In the past few months we’ve figured out how to make automated tests for things that were previously impossible:

These new abilities are being used to wrap better testing around account set-up features, ahead of the new Account Hub development, so that we can be sure nothing breaks without being noticed. They’re also helping test that collecting mail works when it should, or gives the error prompts we expect when it doesn’t.

Code coverage

We record every line of code that runs during our tests. Collecting all that data tells what code doesn’t run during our tests. If a block of code doesn’t run during any of our tests, nothing will tell us when it breaks until somebody uses the code and complains.

Our code coverage data can be viewed at You can also look at Firefox’s data at

Looking at the data, you might notice that our overall number is now lower than it was when we started measuring. This doesn’t mean that our testing got worse, it actually shows where we added a lot of code (that isn’t maintained by us) in the third_party directory. For a better reflection of the progress we’ve made, check out the individual directories, especially mail/base which contains the most important user interface code.

Mozmill no more

Towards the end of last year we finally retired an old test suite known as Mozmill. Those tests were partially migrated to a different test suite (Mochitest) about four years ago, and things were mostly working fine so it wasn’t a priority to finish. These tests now do things in a more conventional way instead of relying on a bunch of clever but weird tricks.

How much of the code is test code?

About 27%. This is a very rough estimate based on the files in our code repository (minus some third-party directories) and whether they are inside a directory with “test” in the name or not. That’s risen from about 19% in the last five years.

There is no particular goal in mind, but I can imagine a future where there is as much test code as non-test code. If we achieve that, Thunderbird will be in a very healthy place.

A stacked area chart showing the estimated lines of test code (in red) and non-test code (in blue) over time, from January 2019 to January 2024. The chart indicates both types of code increase over this period.

Looking ahead, we’ll be asking contributors to add tests to their patches more often. This obviously depends on the circumstance. But if you’re adding or fixing something, that is the best time to ensure it continues to work in the future. As always, feel free to reach out if you need help writing or running tests, either via Matrix or Topicbox mailing lists:

Geoff Lankow, Staff Engineer

8 responses

Anne-Marie Dubler wrote on

Ich wäre Ihnen dankbar, wenn Sie im Mailverkehr nicht Blauton bei einem Mail-Return verwenden würden.

Jason Evangelho wrote on

Zur Kenntnis genommen, und danke für den Kommentar.
(Please forgive any errors, this is a machine translation)

Wolfgang Wedlat wrote on

Bei Windows 11 funktioniert Thunderbird nicht mehr!!!°

Jason Evangelho wrote on

Wir würden uns über weitere Informationen freuen. Ich verwende derzeit Thunderbird 115 unter Windows 11. Möglicherweise haben Sie also einen Fehler entdeckt. Könnten Sie bitte unsere Community-Support-Seite besuchen und Ihr Problem im Detail erklären?

Scheibe wrote on

Der Link mit der Frage “übersetzen” erscheint so wie sich die Startseite aufgebaut hat.funktioniert nicht immer, mitunter werden die deutschen Texte “übersetzt” und es ergeben sich Wörter ohne Sinn ohne Zusammenhang zum original vorhandenen Text. Ich verwende einen Laptop mit Linux – Ubuntu und als Browser Mozilla, Thunderbird ist auf dem neuesten Stand.

Monica Ayhens-Madon wrote on

(DeepL) Unsere Übersetzungsschaltflächen verwenden Google Translate, was manchmal nicht gut funktioniert. Wir wollen bessere Übersetzungen anbieten und suchen nach Möglichkeiten, dies zu tun.

Simon Mills wrote on

An interesting summary.
As a retired practitioner of extreme automated testing, I always like to see it being used to free-up the testers to concentrate on the efficacy of the tests rather than the circular activity of repetitive, time consuming, manual execution. Nice one!

Monica Ayhens-Madon wrote on

I think we’re on the same wavelength! Using technology to minimize the tedious things, and leaving people’s brains more time and energy to be thoughtful and creative is the best way to go. 🙂

Comments are closed.