Good codebases are easy to test, which leads to better quality feedback loops.

Large undirected generations are dangerous because they delay feedback. By the time the agent checks types, tests, or runtime behavior, it may have already spread bad assumptions across many files.

Testing Decisions

Testing in inherently a difficult problem. First, we need to answer the following 3 broad questions to get a sense of what the test will be about:

How big is the unit?
What to mock?
What behaviours to test?

Test Pyramid

The Test Pyramid is a visual model introduced by Martin Fowler to describe the ideal distribution of tests in a software project. It emphasizes:

Many unit tests (fast, isolated, cheap to write and run).
Fewer integration tests (slower, test interactions between components).
Even fewer end-to-end tests (slowest, test the entire system).

Ideally, you want all of these but you want them to run at different points to ensure that we are not wasting too much time and compute on testing.

Test-Driven Development (TDD)

What is TDD?

TDD is a development approach where:

Write a failing test for a new feature.
Write the minimal code to pass the test.
Refactor the code while keeping tests passing.

TDD Cycle (Red-Green-Refactor)

Red: Write a test that fails (because the feature isn’t implemented).
Green: Write the simplest code to make the test pass.
Refactor: Improve the code without breaking tests.

Example: TDD in Python

Write a failing test (test_calculator.py):

python
Copy
def test_add():
    assert add(2,3)== 5

Run:

bash
Copy
pytest test_calculator.py -v

Output:

text
Copy
ERROR: add is not defined

Write minimal code (calculator.py):

python
Copy
def add(a,b):
    return a+ b

Run tests again:

text
Copy
test_calculator.py::test_add PASSED

Refactor (if needed):

python
Copy
def add(a,b):
    return sum([a, b])

Benefits of TDD

Better design: Forces modular, testable code.
Fewer bugs: Catches issues early.
Living documentation: Tests describe how code should work.
Confidence: Safe to refactor with a strong test suite.

Best Practices

Write tests first: Resist the urge to write implementation code first.
Keep tests small: One assertion per test.
Test behavior, not implementation: Focus on what the code does, not how it does it.
Refactor mercilessly: Improve code without fear.

Fix 1: Write Good Tests

This might be easy to say but is one of the hardest things in software engineering.

Here are all the types of testing that we should include in any codebase. This is because each of these can highlight different issues with the code, allowing you better oversight over the performance of the codebase.

Strategy	Scope	Speed	Purpose	Example Tools
Unit Testing	Single function/class	Fast	Validate logic in isolation	JUnit, pytest, Jest
Integration Testing	Module interactions	Medium	Test component interactions	TestNG, pytest, Postman
End-to-End Testing	Entire system	Slow	Validate user journeys	Selenium, Cypress, Playwright
Property-Based	Input/Output behavior	Fast	Validate invariants	Hypothesis, QuickCheck
Fuzzing	Random inputs	Slow	Find edge cases and crashes	AFL, libFuzzer, Honggfuzz

Details of how to do this test are available in SWE notes.

Fix 2: Freeze The Tests

Once tests are written, they shouldn’t be changed unless the problem description changes. There needs to be a system to enforce this deterministically to avoid moving the goal posts once we start building.

Test Driven Development