Good codebases are easy to test, which leads to better quality feedback loops.
Large undirected generations are dangerous because they delay feedback. By the time the agent checks types, tests, or runtime behavior, it may have already spread bad assumptions across many files.
Testing Decisions
Testing in inherently a difficult problem. First, we need to answer the following 3 broad questions to get a sense of what the test will be about:
- How big is the unit?
- What to mock?
- What behaviours to test?
Test Pyramid
The Test Pyramid is a visual model introduced by Martin Fowler to describe the ideal distribution of tests in a software project. It emphasizes:
- Many unit tests (fast, isolated, cheap to write and run).
- Fewer integration tests (slower, test interactions between components).
- Even fewer end-to-end tests (slowest, test the entire system).
Ideally, you want all of these but you want them to run at different points to ensure that we are not wasting too much time and compute on testing.
Test-Driven Development (TDD)
What is TDD?
TDD is a development approach where:
- Write a failing test for a new feature.
- Write the minimal code to pass the test.
- Refactor the code while keeping tests passing.
TDD Cycle (Red-Green-Refactor)
- Red: Write a test that fails (because the feature isn’t implemented).
- Green: Write the simplest code to make the test pass.
- Refactor: Improve the code without breaking tests.
Example: TDD in Python
- Write a failing test (
test_calculator.py):python Copy def test_add(): assert add(2,3)== 5Run:
bash Copy pytest test_calculator.py -vOutput:
text Copy ERROR: add is not defined - Write minimal code (
calculator.py):python Copy def add(a,b): return a+ bRun tests again:
text Copy test_calculator.py::test_add PASSED - Refactor (if needed):
python Copy def add(a,b): return sum([a, b])
Benefits of TDD
- Better design: Forces modular, testable code.
- Fewer bugs: Catches issues early.
- Living documentation: Tests describe how code should work.
- Confidence: Safe to refactor with a strong test suite.
Best Practices
- Write tests first: Resist the urge to write implementation code first.
- Keep tests small: One assertion per test.
- Test behavior, not implementation: Focus on what the code does, not how it does it.
- Refactor mercilessly: Improve code without fear.
Fix 1: Write Good Tests
This might be easy to say but is one of the hardest things in software engineering.
Here are all the types of testing that we should include in any codebase. This is because each of these can highlight different issues with the code, allowing you better oversight over the performance of the codebase.
| Strategy | Scope | Speed | Purpose | Example Tools |
|---|---|---|---|---|
| Unit Testing | Single function/class | Fast | Validate logic in isolation | JUnit, pytest, Jest |
| Integration Testing | Module interactions | Medium | Test component interactions | TestNG, pytest, Postman |
| End-to-End Testing | Entire system | Slow | Validate user journeys | Selenium, Cypress, Playwright |
| Property-Based | Input/Output behavior | Fast | Validate invariants | Hypothesis, QuickCheck |
| Fuzzing | Random inputs | Slow | Find edge cases and crashes | AFL, libFuzzer, Honggfuzz |
Details of how to do this test are available in SWE notes.
Fix 2: Freeze The Tests
Once tests are written, they shouldn’t be changed unless the problem description changes. There needs to be a system to enforce this deterministically to avoid moving the goal posts once we start building.