Test Driven Development

Subpage of Agentic Coding

Good codebases are easy to test, which leads to better quality feedback loops.

Large undirected generations are dangerous because they delay feedback. By the time the agent checks types, tests, or runtime behavior, it may have already spread bad assumptions across many files.

Testing Decisions

Testing in inherently a difficult problem. First, we need to answer the following 3 broad questions to get a sense of what the test will be about:

  1. How big is the unit?
  2. What to mock?
  3. What behaviours to test?

Test Pyramid

The Test Pyramid is a visual model introduced by Martin Fowler to describe the ideal distribution of tests in a software project. It emphasizes:

  • Many unit tests (fast, isolated, cheap to write and run).
  • Fewer integration tests (slower, test interactions between components).
  • Even fewer end-to-end tests (slowest, test the entire system).

Ideally, you want all of these but you want them to run at different points to ensure that we are not wasting too much time and compute on testing.

Test-Driven Development (TDD)

What is TDD?

TDD is a development approach where:

  1. Write a failing test for a new feature.
  2. Write the minimal code to pass the test.
  3. Refactor the code while keeping tests passing.

TDD Cycle (Red-Green-Refactor)

  1. Red: Write a test that fails (because the feature isn’t implemented).
  2. Green: Write the simplest code to make the test pass.
  3. Refactor: Improve the code without breaking tests.

Example: TDD in Python

  1. Write a failing test (test_calculator.py):
    python
    Copy
    def test_add():
        assert add(2,3)== 5

    Run:

    bash
    Copy
    pytest test_calculator.py -v

    Output:

    text
    Copy
    ERROR: add is not defined
  2. Write minimal code (calculator.py):
    python
    Copy
    def add(a,b):
        return a+ b

    Run tests again:

    text
    Copy
    test_calculator.py::test_add PASSED
  3. Refactor (if needed):
    python
    Copy
    def add(a,b):
        return sum([a, b])

Benefits of TDD

  • Better design: Forces modular, testable code.
  • Fewer bugs: Catches issues early.
  • Living documentation: Tests describe how code should work.
  • Confidence: Safe to refactor with a strong test suite.

Best Practices

  • Write tests first: Resist the urge to write implementation code first.
  • Keep tests small: One assertion per test.
  • Test behavior, not implementation: Focus on what the code does, not how it does it.
  • Refactor mercilessly: Improve code without fear.

Fix 1: Write Good Tests

This might be easy to say but is one of the hardest things in software engineering.

Here are all the types of testing that we should include in any codebase. This is because each of these can highlight different issues with the code, allowing you better oversight over the performance of the codebase.

StrategyScopeSpeedPurposeExample Tools
Unit TestingSingle function/classFastValidate logic in isolationJUnit, pytest, Jest
Integration TestingModule interactionsMediumTest component interactionsTestNG, pytest, Postman
End-to-End TestingEntire systemSlowValidate user journeysSelenium, Cypress, Playwright
Property-BasedInput/Output behaviorFastValidate invariantsHypothesis, QuickCheck
FuzzingRandom inputsSlowFind edge cases and crashesAFL, libFuzzer, Honggfuzz

Details of how to do this test are available in SWE notes.

Fix 2: Freeze The Tests

Once tests are written, they shouldn’t be changed unless the problem description changes. There needs to be a system to enforce this deterministically to avoid moving the goal posts once we start building.