How much flakiness do you tolerate in end to end tests? (programming.dev)

submitted 2 years ago* (last edited 2 years ago) by kersplort@programming.dev to c/experienced_devs@programming.dev

22 comments fedilink hide all child comments

End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they're working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.

How do you handle the inherent flakiness of testing against a live app?

When do you run smokes? On every phoenix branch? Pre-prod? Prod only?

Who fixes the issues that the smokes find?

you are viewing a single comment's thread
view the rest of the comments

[-] minorninth@lemmy.world 9 points 2 years ago

I think the reality is that there are lots of different levels of tests, we just don't have names for all of them.

Even unit tests have levels. You have unit tests for a single function or method in isolation, then you have unit tests for a whole class that might set up quite a bit more mocks and test the class's contract with the rest of the system.

Then there are tests for a whole module, that might test multiple classes working together, while mocking out the rest of the system.

A step up from that might be unit tests that use fakes instead of mocks. You might have a fake in-memory database, for example. That enables you to test a class or module at a higher level and ensure it can solve more complex problems and leave the database in the state you expect it in the end.

A step up from that might be integration tests between modules, but all things you control.

Up from that might be integration tests or end-to-end tests that include third-party components like databases, libraries, etc. or tests that bring up a real GUI on the desktop - but where you still try to eliminate variables that are out of your control like sending requests to the external network, testing top-level window focus, etc.

Then at the opposite extreme you have end-to-end tests that really do interact with components you don't have 100% control over. That might mean calling a third-party API, so the test fails if the third-party has downtime. It might mean opening a GUI on the desktop and automating it with the mouse, which might fail if the desktop OS pops up a dialog over top of your app. Those last types of tests can still be very important and useful, but they're never going to be 100% reliable.

I think the solution is to have a smaller number of those tests with external dependencies, don't block the build on them, and look at statistics. Sound an alarm when a test fails multiple times in a row, but not for every failure.

Most of the other types of tests can be written in a way to drive flakiness down to almost zero. It's not easy, but it can be doable. It requires a heavy investment in test infrastructure.

this post was submitted on 30 Aug 2023

34 points (100.0% liked)

Experienced Devs

4710 readers

1 users here now

A community for discussion amongst professional software developers.

Posts should be relevant to those well into their careers.

For those looking to break into the industry, are hustling for their first job, or have just started their career and are looking for advice, check out:

Logo base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago

MODERATORS

snowe@programming.dev

jmk1ng@programming.dev

drewsiferr@programming.dev