Writing

My Testing Strategy: Unit, E2E, Load, and Chaos Testing Explained

A practical breakdown of the four test types I use on every production app, with tools and trade-offs for each.

4 May 2026

·

7 min read

·
TestingEngineering

I've worked on a few different production applications over the years, and testing is one of those areas where I've seen the approach vary a lot between teams. Some write only unit tests and find out about integration issues in production. Others invest heavily in E2E tests and end up with a CI pipeline that takes 40 minutes to pass, which eventually means people start skipping it. I've landed on a four-layer approach that I try to apply on every production app: unit, E2E, load, and chaos testing. Each layer asks a different question, and getting answers to all four gives a level of confidence that no single layer can provide on its own.

Unit testing

Unit testing is where most teams start, and for good reason. Unit tests are quick to write, fast to run, and are the best tool for capturing the intent of a component or function. AI tools have made them even faster to produce, since AI is particularly good at evaluating all the possible branches and edge cases of a function and generating coverage for them.

The key strength is granularity. When I'm testing a React component, unit tests let me verify every state the component can be in:

  • Empty states, what does the component render when there's no data yet?
  • Error states, what happens when an API call fails or data is malformed?
  • Boundary conditions, does pagination behave correctly at page 1 and at the last page?

These edge cases are difficult to reliably exercise with higher-level tests. A unit test can set up exactly the props and state needed to hit a specific branch in isolation, without needing a running server or a seeded database.

For tooling, I reach for Bun's built-in test runner for pure TypeScript, and React Testing Library when testing React components. The philosophy behind React Testing Library aligns well with how I think about unit tests: test the component from the user's perspective (what renders, what's accessible, what's clickable) rather than its internal implementation.

E2E testing

End-to-end tests give the highest confidence that the app is actually working. They drive a real browser through real user workflows and catch integration failures that unit tests can't see: a form that behaves correctly in isolation but breaks when the API shape has changed, or a page that renders fine on its own but shows stale data after a navigation.

The trade-off is time. E2E tests are slow to write and slow to run, and if you're not deliberate about scope they will drag down your CI pipeline. A test suite that takes 30 minutes to pass is one that people start skipping or working around, which defeats the purpose.

My rule is to limit E2E tests to key user workflows only:

  • Critical happy paths, the flows that directly deliver value (sign up, create, publish, pay)
  • High-risk integrations, anywhere two systems talk to each other and getting it wrong would be immediately visible
  • Regression coverage, specific bugs that made it to production once and shouldn't again

I use Playwright for E2E testing. It's reliable, the API is well-designed, and the trace viewer makes it much easier to understand what went wrong in CI than digging through screenshots alone.

Load testing

Load testing validates that the app scales to the user load you're expecting. The most useful form I've run is a ramping load test: start at a low number of virtual users, ramp up gradually, and observe where the app starts to degrade or fail. That failure point, and the failure modes it surfaces, are genuinely useful data.

Two things I try to get right with load testing:

  • Realistic simulation. It's tempting to test only the web-facing parts of the app, simulating users loading pages in a browser, but that's usually an incomplete picture. In most production systems there's data flowing into the system at the same time, whether that's API calls from other services, events from a message queue, or writes from background jobs. Simulating that concurrent load alongside user traffic gives a more accurate picture of how the system actually behaves under pressure.
  • Pod failure under load. In a high-availability configuration with multiple replicas, it's worth terminating a pod while the load test is running to see if the remaining pods can absorb the traffic from the one that went down. This surfaces whether your replica count and resource limits are actually sized correctly for a partial failure, not just for steady-state traffic.
  • Baseline load with E2E tests running concurrently. One approach I find useful is running the E2E test suite against the app while a baseline load is applied. This reuses tests you've already written to verify that real user workflows continue to function correctly under load, not just under the ideal conditions of a quiet test environment. If you already have E2E tests, it's a low-effort way to add meaningful coverage.
  • Using the results to inform monitoring and alerting thresholds. When I run a load test, I note at what concurrency certain metrics start moving: response time, error rate, CPU, memory. Setting thresholds from real load data means alerts fire when something is actually wrong, not too early from normal traffic spikes and not too late when the system is already struggling.

I use k6 for load testing. The scripting model (JavaScript) is easy to work with, and it integrates well with Grafana for visualising results in real time.

Chaos testing

Chaos testing takes a different approach entirely. Instead of testing performance under load, it deliberately injects failure into a running system and observes what happens. In a Kubernetes environment, that means things like:

  • Randomly terminating pods to see if the deployment recovers automatically
  • Injecting network latency or packet loss between services
  • Simulating upstream failures, Postgres going down, Redis becoming unavailable, an external API returning 500s

The value isn't in confirming that failures happen. It's in what you learn from watching them happen in a controlled way. I've found chaos testing is one of the best ways to answer three questions that are otherwise very difficult to answer without an actual incident:

  • Does the app self-heal? Many failure modes in Kubernetes should resolve automatically (a pod restarts, the health check fails and traffic is rerouted), but whether they actually do in your specific configuration, and how long it takes, is worth knowing before 3am.
  • Does monitoring catch it? Running chaos experiments while watching your monitoring dashboards is one of the most revealing exercises I've done. You'll quickly find failures that produce no alerting signal at all.
  • Are alerting thresholds tuned correctly? A Redis outage that lasts 30 seconds should probably page someone. A blip that resolves in 2 seconds probably shouldn't. Chaos testing lets you see the actual shape of the metrics during a failure, which makes it possible to tune thresholds based on real data rather than guesswork. Getting this right matters because the goal is to know about a problem before your users do, not after they've already started filing support tickets.

Chaos Mesh is a solid option for Kubernetes environments. The range of failure types it supports covers most of the scenarios worth testing.

Wrapping up

The thing I keep coming back to with this framework is that each layer covers a blind spot the others have. Unit tests won't tell you whether the services integrate correctly. E2E tests won't tell you what happens at 10x your expected load. Load tests won't tell you how the system behaves when a dependency disappears. Thinking about testing across all four layers has consistently helped me ship with more confidence and spend less time firefighting in production.