Introduction

Overview

Teaching: 10 min
Exercises: 0 min

Questions

Why do we test software?

Objectives

Understand what is testing

Understand the goals of testing

What is testing?

Software testing involves the process of executing the software to discover any discrepancies between the expected behavior and the actual behavior of the software. Therefore the goal of software testing is to uncover defects, bugs, or errors in the software, making it more reliable, stable, and usable. Testing is the most widely used technique for the quality assurance of software. However, testing cannot prove that software is correct, unless you execute the software with all the possible inputs and verify that the produced outputs are correct according to the expected behavior. Obviously this is not practical. Thus the focus of testing is to execute the software with a set of “good” inputs and verify the produced outputs are correct, and thereby improving the confidence about the program. Testing typically involves the following activities.

Designing test inputs: It is not practical to test software with all possible inputs. Therefore we need to select a subset of inputs that are likely to detect bugs and would help to increase our confidence in the program. One approach is to randomly select inputs from the input space. While this can be an effective method, it has the tendency to miss corner cases where most mistakes are made. Therefore we will be discussing several approaches used for developing test inputs.
Test execution: tests need should be executed multiple times when code is changed some way, when implementing new features or when fixing bugs. Thus execution of tests should be automated as much as possible with the help of test frameworks (i.e. test tools) and CI/CD infrastructure.
Checking the correctness of test outputs: this is done by comparing the output of test cases with the expected output of the software and should be automated as well.

When to test?

The earlier a bug is detected, the less costly it is to fix. For example, a bug detected and fixed during coding is 50 times cheaper than detecting and fixing it after the software is deployed. Imagine the cost of publishing code with a bug! However, executing tests as soon as possible is also immediately beneficial. When you discover a bug after changing 3 lines of code, you know exactly where to look for a mistake. Theoretically, tests can be created and executed even before the functional code is written. This approach is called Test Driven Development (TDD) and we will discuss it a little later. Ultimately, the bottom line is creating and executing tests should not be an afterthought. It should be an activity that should go hand-in-hand with development.

Key Points

Testing improves confidence about your code. It doesn’t prove that code is correct.

Working with Legacy Code

Overview

Teaching: 20 min
Exercises: 15 min

Questions

What is legacy code?

Why should you test legacy code before changing it?

Objectives

Understand the sample legacy code

Set up an end-to-end test

Use TDD to begin refactoring

Meet the legacy code project

Imagine you are joining a new lab and have inherited a script from another grad student or published paper. It takes in a file with coordinates of rectangles and produces a matrix where the value at index i,j == 1 if rectangle i overlaps with rectangle j, and 0 otherwise. Here is the starting script

import sys


# read input file
infile = open(sys.argv[1], 'r')
outfile = open(sys.argv[2], 'w')

# build rectangle struct
dict = {}
for line in infile:
    name, *coords = line.split()
    dict[name] = [int(c) for c in coords]

for red_name, red_coords in dict.items():
    for blue_name, blue_coords in dict.items():
        # check if rects overlap
        result = '1'

        red_lo_x, red_lo_y, red_hi_x, red_hi_y = red_coords
        blue_lo_x, blue_lo_y, blue_hi_x, blue_hi_y = blue_coords

        if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
                (red_lo_y >= blue_hi_x) or (red_hi_y <= blue_lo_y):
            result = '0'

        outfile.write(result + '\t')
    outfile.write('\n')

infile.close()
outfile.close()

Critique the published version

The code above is certainly not perfect. Name 3 issues with the code and (to be nice) 1 good thing.
Consider if the problems will affect testing or can be addressed with testing.

Solution

Starting with good things:

Variable names are fairly clear and properly formatted (snake case)

Comments are used properly

Runs without errors

Dependencies are handled (no dependencies)

And some of the issues:

Unable to import as a module

No encoding specified on open

Redundant execution in main loop

Several assumptions made without checking or asserting correctness (hi vs. low, int coords)

Shadowing dict keywords

Trailing tab on the end of each output line

Not using context managers for file io

And likely many more!

Now your advisor wants you to take this code, make it run twice as fast and instead of just saying if there is overlap, output the percent the two areas overlap! You think you need to introduce a Rectangle object, but how can you make changes without adding to this pile of code?

A simple end-to-end test

The first step to making legacy code into tested code is to add a test.

Legacy code has many negative connotations. Here we are using it as defined by Micheal Feathers in Working Effectively with Legacy Code. “To me, legacy code is simply code without tests.” All the other bad features of legacy code stem from a lack of testing.

Often the best test to write first is an end-to-end or integration test. These kinds of tests exercise an entire codebase with some “representative” input and ensure the correct output is produced. You probably use a test like this, even if you don’t have it automated or formalized. End-to-end tests ensure the code is working in its entirety and giving the same output. They are often not ideal since with non-trivial systems they can take significant time or resources to run and verify. It is also difficult to test many corner cases without testing smaller parts of a codebase.

Like all tests, passing does not verify correctness just that the tests are not failing. Still, it is reassuring that some larger refactoring isn’t breaking published code! Let’s start with this simple input file (the columns are name x1 y1 x2 y2):

a	0	0	2	2
b	1	1	3	3
c	10	10	11	11

Clearly rectangles a and b overlap while c is far away. As output, we expect the following matrix:

1	1	0	
1	1	0	
0	0	1

Critique the sample input

What are some issues with the sample input? Is the result still useful?

Solution

The input doesn’t test rectangles where x != y or really exercise the code with more interesting cases (overlap at a point, a line, overlap exactly, etc).

The result can still be useful if it fails. Knowing something that was working has stopped working can notify you an error is present. However, having this test pass doesn’t make your code correct!

Given the input.txt and output.txt files, how can we test the function? You may be used to doing this manually, with the following commands:

# run the command
python overlap_v0.py input.txt new_output.txt
# view the result
cat new_output.txt
# compare the outputs
cmp output.txt new_output.txt

On every change to the file, you can press the up arrow or Ctrl+p to recall a command and rerun it. Bonus points if you are using multiple terminal sessions or a multiplexer. A better setup is to formalize this test with a script you can repeatedly run. Tools like entr or your text editor can automatically run the test when your files change. Here is a simple end-to-end test of the initial script:

#!/bin/bash

# run the command
python overlap_v0.py input.txt new_output.txt
# compare the outputs
cmp output.txt new_output.txt && echo 'passes'

When cmp is successful (no difference between files) passes will be echoed to the terminal. When it fails the byte and line number that differs will be printed. This script can be run with CI and should act as the final step to ensure your outputs are as expected. Think of tests like a funnel: you want to run quick tests frequently to catch most bugs early, on every file write. Slower tests can run prior to a git commit to catch the bugs that escaped the first round. Finally, the slowest, most thorough tests run via CI/CD on a PR and should catch a minority of bugs. We will run our end to end test after making major changes.

You can also add more test files to this script but we want to move to a better testing framework ASAP!

Key Points

Legacy code is code without tests. It’s extremely challenging to change code that doesn’t have tests.

At the very least you should have an end-to-end test when starting to change code.

Try to add tests before making changes, work in small steps.

Having tests pass doesn’t mean your code is correct.

Test framework

Overview

Teaching: 15 min
Exercises: 15 min

Questions

What is a test framework?

How to write test cases using a test frmework?

Objectives

Start running tests with pytest.

pytest: A test framework

While it is not practical to test a program with all possible inputs we should execute multiple tests that exercise different aspects of the program. Thus, to ease this process we use test frameworks. It is simply a software tool or library that provides a structured and organized environment for designing, writing, and executing tests. Here we will be using pytest, which is a widely used testing framework for Python.

Follow the instructions here to install pytest if you haven’t done so already. We will be using pytest in as our test framework.

Test already!

Now let’s make sure pytest is set up and ready to test.

In a larger package the structure should follow what you’ve already been taught, with source code and tests in different directories. Here we will stick in the same folder for simplicity.

Pytest looks for files that start with test_ and runs any functions in those files that start with test_. In contrast to unittest or other x-unit style frameworks, pytest has one test statement, assert which works exactly like in normal python code. Here are some pointless tests:

# test_nothing.py

def test_math():
    assert 2 + 2 == 4

def test_failure():
    assert 0  # zero evaluates to false

In the same directory as your test files, run pytest:

$ pytest
========================================== test session starts ==========================================
platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
rootdir: ~/projects/testing-lesson/files
plugins: anyio-3.6.2
collected 2 items

test_nothing.py .F                                                                                [100%]

=============================================== FAILURES ================================================
_____________________________________________ test_failure ______________________________________________

    def test_failure():
    >       assert 0  # zero evaluates to false
    E       assert 0

test_nothing.py:5: AssertionError
======================================== short test summary info ========================================
FAILED test_nothing.py::test_failure - assert 0
====================================== 1 failed, 1 passed in 0.06s ======================================

Note that two tests were found. Passing tests are marked with a green . while failures are F and exceptions are E.

Writing actual test cases

Let’s understand writing a test case using the following function that is supposed to compute x + 1 given x. Note the very obvious bug.

# sample_calc.py
def func(x):
    return x - 1

A typical test case consits of the following components:

A test input.
A call to the function under test with the test input.
A statement to compare the output with the expected output.

Following is a simple test case written to test func(x). Note that we have given a descriptive name to show what function is tested and and with what type of input. This is a good practice to increase the readability of tests. You shouldn’t have to explicitly call the test functions, so don’t worry if the names are longer than you would normally use. The test input here is 3 and the expected output is 4. The assert statement checks the equality of the two.

#test_sample_calc.py
import sample_calc as sc

def test_func_for_positive_int():
    input_value = 3                     # 1. test input
    func_output = sc.func(input_value)  # 2. call function
    assert func_output == 4             # 3. compare output with expected

When the test is executed it is going to fail due to the bug in func.

$ pytest test_sample_calc.py 
============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Volumes/Data/Work/research/INTERSECT2023/TestingLesson
plugins: anyio-3.5.0
collected 1 item                                                               

test_sample_calc.py F                                                    [100%]

=================================== FAILURES ===================================
_________________________________ test_answer __________________________________

    def test_answer():
>       assert sc.func(3) == 4
E       assert 2 == 4
E        +  where 2 = <function func at 0x7fe290e39160>(3)
E        +    where <function func at 0x7fe290e39160> = sc.func

test_sample_calc.py:4: AssertionError
=========================== short test summary info ============================
FAILED test_sample_calc.py::test_answer - assert 2 == 4
============================== 1 failed in 0.10s ===============================

Improve the code

Modify func and test_sample_calc.py:

Fix the bug in the code.

Add two tests test with a negative number and zero.
Solution
Clearly func should return x + 1 instead of x - 1
Here are some additional assert statements to test more values.
assert sc.func(0) == 1
assert sc.func(-1) == 0
assert sc.func(-3) == -2
assert sc.func(10) == 11
How you organize those calls is a matter of style. You could have one test_ function for each or group them into a single test_func_hard_inputs. Pytest code is just python code, so you could set up a loop or use the pytest.parameterize decorator to call the same code with different inputs.

We will cover more features of pytest as we need them. Now that we know how to use the pytest framework, we can use it with our legacy code base!

What is code (test) coverage

The term code coverage or coverage is used to refer to the code constructs such as statements and branches executed by your test cases.

Note: It is not recommended to create test cases to simply to cover the code. This can lead to creating useless test cases and a false sense of security. Rather, coverage should be used to learn about which parts of the code are not executed by a test case and use that information to augment test cases to check the respective functionality of the code. In open source projects, assuring high coverage can force contributors to test their new code.

We will be using Coverage.py to calculate the coverage of the test cases that we computed above. Follow the instructions here to install Coverage.py.

Now let’s use Coverage.py to check the coverage of the tests we created for func(x), To run your tests using pytest under coverage you need to use the coverage run command:

$ coverage run -m pytest test_sample_calc.py 
============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Volumes/Data/Work/research/INTERSECT2023/TestingLesson
plugins: anyio-3.5.0
collected 1 item                                                               

test_sample_calc.py F                                                    [100%]

=================================== FAILURES ===================================
_________________________________ test_answer __________________________________

    def test_answer():
>       assert sc.func(3) == 4
E       assert 2 == 4
E        +  where 2 = <function func at 0x7fab8ca2be50>(3)
E        +    where <function func at 0x7fab8ca2be50> = sc.func

test_sample_calc.py:4: AssertionError
=========================== short test summary info ============================
FAILED test_sample_calc.py::test_answer - assert 2 == 4
============================== 1 failed in 0.31s ===============================

To get a report of the coverage use the command coverage report:

$ coverage report
Name                  Stmts   Miss  Cover
-----------------------------------------
sample_calc.py            2      0   100%
test_sample_calc.py       3      0   100%
-----------------------------------------
TOTAL                     5      0   100%

Note the statement coverage of sample_calc.py which is 100% right now. We can also specifically check for the executed branches using the --branch flag with the coverage run command. You can use the --module flag with the coverage report command to see which statements are missed with the test cases if there is any and update the test cases accordingly.

Key Points

A test framework simplifies adding tests to your project.

Choose a framework that doesn’t get in your way and makes testing fun.

Coverage tools are useful to identify which parts of the code are executed with tests

Test Driven Development

Overview

Teaching: 15 min
Exercises: 15 min

Questions

Why should you write tests first?

What are the 3 phases of TDD?

Objectives

Start running tests with pytest.

Replace the end-to-end test with a pytest version.

Test Driven Development

Test Driven Development (or TDD) has a history in agile development but put simply you’re going to write tests before you write code (or nearly the same time). This is in contrast to the historical “waterfall” development cycle where tests were developed after code was written e.g. on a separate day or business quarter.

You don’t have to be a TDD purist as long as your code ends up having tests. Some say TDD makes code less buggy, though that may not be strictly true. Still, TDD provides at least three major benefits:

It greatly informs the design, making for more maintainable, legible and generally pretty code. Writing code is much faster since you’ve already made the difficult decisions about its interface.
It ensures that tests will exist in the first place (b/c you are writing them first). Remember, everyone dislikes writing tests, so you should hack yourself to get them done.
It makes writing tests A LOT more fun and enjoyable. That’s because writing tests when there’s no code yet is a huge puzzle that feels more like problem-solving (which is what programmers like) than clerical work or bookkeeping (which programmers generally despise).

Often times when you write tests with existing code, you anchor your expectations based on what the code does, instead of brain storming on what a user could do to break your system.

Red, green, refactor

The basic cycle of work with TDD is called red, green, refactor:

Start by writing a failing test. This is the red phase since your test runner will show a failure, usually with a red color. Only when you having a failing test can you add a feature.
Add only as much code to make the test pass (green). You want to work on the code base only until your tests are passing, then stop! This is often difficult if you are struck by inspiration but try to slow down and add a note to come back to. Maybe you identified the next test to write!
Look over your code and tests and see if anything should be refactored. Refactoring is the process of restructuring or improving the internal structure of existing source code without changing its external behavior. It can involve modifications to the code to make it more readable, maintainable, efficient, and adhering to best coding practices. At this point your test suite is passing and you can really get creative. If a change causes something to break you can easily undo it and get back to safety. Also, having this as a separate step allows you to focus on testing and writing code in the other phases. Remember, tests are code too and benefit from the same design considerations.

You will be repeating these red, green, refactor steps multiple times until you are done with developing code. Therefore we need to automate the execution of tests using testing frameworks. Here we will be using pytest, which is a widely used testing framework for Python. It provides a comprehensive and flexible set of tools and features for writing and executing tests.

Importing overlap.py, red-green-refactor

Getting back to the overlap script, let’s start with a failing test.

Red

You may be surprised how easy it is to fail:

# test_overlap.py

import overlap_v0 as overlap  # change version as needed

def test_import():
    pass

When you run pytest, you get an error due to the line opening sys.argv[1] when we aren’t providing an input file. Note how brittle the original design is, if we wanted to use part of the file somewhere else we aren’t able to since the file expects to be called as the main method. If you wanted to change command line parsers or use some part in a notebook, you would quickly run into problems.

Green

Maybe you are thinking about how to pull file IO out of this function, but to start we do the least amount possible to pass our test (which is really just importing):

# overlap.py

import sys

def main():
    # read input file
    infile = open(sys.argv[1], 'r')
    outfile = open(sys.argv[2], 'w')
    # ...

if __name__ == "__main__":
    main()

Now execute test_overlap.py with pytest:

$ pytest test_overlap.py
========================================== test session starts ===========================================
platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
rootdir: ~/projects/testing-lesson/files
plugins: anyio-3.6.2
collected 1 item

test_overlap.py .                                                                                  [100%]

=========================================== 1 passed in 0.01s ============================================

You can be extra thorough and see if end-to-end.sh still passes.

Refactor

At this point, our pytest function is just making sure we can import the code. But our end-to-end test makes sure the entire function is working as expected (albeit for a small, simple input). How about we move the file IO into the main guard?

Refactor

Change file IO to occur in the guard clause. Your new main method should look like def main(infile, outfile).
Solution
def main(infile, outfile):
   ...

if __name__ == "__main__":
   with open(sys.argv[1], encoding='utf-8') as infile, \
           open(sys.argv[2], 'w', encoding='utf-8') as outfile:
       main(infile, outfile)
That addresses the context manager and encoding of files. Continue to run your tests to see your changes are ok!

After refactoring save the code as overlap_v1.py.

End-to-end testing with pytest

We have a lot to address, but it would be nice to integrate our end to end test with pytest so we can just run one command. The problem is our main method deals with files. Reading and writing to a file system can be several orders of magnitude slower than working in main memory. We want our tests to be fast so we can run them all the time. The nice thing about python’s duck typing is the infile and outfile variables don’t have to be open files on disk, they can be in memory files. In python, this is accomplished with a StringIO object from the io library.

A StringIO object acts just like an open file. You can read and iterate from it and you can write to it just like an opened text file. When you want to read the contents, use the function getvalue to get a string representation.

Red

Let’s write a test that will pass in two StringIO objects, one with the file to read and one for the output.

# test_overlap.py
import overlap_v1 as overlap
from io import StringIO

def test_end_to_end():
    ## ARRANGE
    # initialize a StringIO with a string to read from
    infile = StringIO(
        'a	0	0	2	2\n'
        'b	1	1	3	3\n'
        'c	10	10	11	11'
    )
    # this holds our output
    outfile = StringIO()

    ## ACT
    # call the function
    overlap.main(infile, outfile)

    ## ASSERT
    output = outfile.getvalue().split('\n')
    assert output[0].split() == '1 1 0'.split()
    assert output[1].split() == '1 1 0'.split()
    assert output[2].split() == '0 0 1'.split()

Since this is the first non-trivial test, let’s spend a moment going over it. Like before we import our module and make our test function start with test_. The actual name of the function is for your benefit only so don’t be worried if it is “too long”. You won’t have to type it so be descriptive! Next we have the three steps in all tests: Arrange-Act-Assert (aka Given-When-Then)

Arrange: Set up the state of the program in a particular way to test the feature you want. Consider edge cases, mocking databases or files, building helper objects, etc.
Act: Call the code you want to test
Assert: Confirm the observable outputs are what you expect (i.e. compare the output with the expected output). Notice that the outputs here read like the actual output file.

Again, pytest uses plain assert statements to test results.

But we have a problem, our test passes! This is partially because we are replacing an existing test, but the problem is how can you be sure you are testing what you think? Maybe you are importing a different version of code, maybe pytest isn’t finding your test, etc. Often a test that passes which should fail is more worrisome than a test which fails that should be passing.

Notice the splits in the assert section, they normalize the output so any whitespace will be accepted. Let’s replace those with explicit white space (tabs) and see if we can go red:

    assert output[0] == '1\t1\t0'
    assert output[1] == '1\t1\t0'
    assert output[2] == '0\t0\t1'

Since split('\n') strips off the newline at the end of each line. Now we fail (yay!) and it’s due to something we have wanted to change anyways. The pytest output says

E       AssertionError: assert '1\t1\t0\t' == '1\t1\t0'

that is, we have an extra tab at the end of our lines.

Green

Fix the code

Change the overlap file to make the above test pass. Hint, change when and where you write a tab to the file.

Solution

There are a few options here. I’m opting for building a list and using join for the heavy lifting.

   for red_name, red_coords in dict.items():
       output_line = []
       for blue_name, blue_coords in dict.items():
           # check if rects overlap
           result = '1'

           red_lo_x, red_lo_y, red_hi_x, red_hi_y = red_coords
           blue_lo_x, blue_lo_y, blue_hi_x, blue_hi_y = blue_coords

           if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
                   (red_lo_y >= blue_hi_x) or (red_hi_y <= blue_lo_y):
               result = '0'

           output_line.append(result)
       outfile.write('\t'.join(output_line) + '\n')

Once you get the test passing you will notice that the end-to-end.sh will no longer pass! One of the reasons to test code is to find (and prevent) bugs. When you work on legacy code, sometimes you will find bugs in published code and the questions can get more problematic. Is the change something that will alter published results? Was the previous version wrong or just not what you prefer? You may decide to keep a bug if it was a valid answer! In this case we will keep the change and not have a tab followed by a newline.

Refactor

Let’s focus this round of refactoring on our test code and introduce some nice pytest features. First, it seems like we will want to use our simple input file in other tests. Instead of copying and pasting it around, let’s extract it as a fixture. Pytest fixtures can do a lot, but for now we just need a StringIO object built for some tests:

# test_overlap.py
import overlap_v1 as overlap
from io import StringIO
import pytest

@pytest.fixture()
def simple_input():
    return StringIO(
        'a	0	0	2	2\n'
        'b	1	1	3	3\n'
        'c	10	10	11	11'
    )

def test_end_to_end(simple_input):
    # initialize a StringIO with a string to read from
    infile = simple_input
    ...

We start by importing pytest so we can use the @pytest.fixture() decorator. This decorates a function and makes its return value available to any function that uses it as an input argument. Next we can replace our infile definition by assigning it to simple_input in test_end_to_end.

For this code, the end-to-end test runs quickly, but what if it took several minutes? Ideally we could run our fast tests all the time and occasionally run everything, including slow tests. The simplest way to achieve this with pytest is to mark the function as slow and invoke pytest with pytest -m "not slow":

# test_overlap.py
@pytest.mark.slow()
def test_end_to_end(simple_input):
    ...

pytest -m "not slow" test_overlap.py
========================================== test session starts ===========================================
platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
rootdir: ~/projects/testing-lesson/files
plugins: anyio-3.6.2
collected 1 item / 1 deselected / 0 selected

Note that the one test was deselected and not run. You can (and should) formalize this mark as described in the warning that will pop up!

Key Points

TDD cycles between the phases red, green, and refactor

TDD cycles should be fast, run tests on every write

Writing tests first ensures you have tests and that they are working

Making code testable forces better style

It is much faster to work with code under test

Refactoring for Testing

Overview

Teaching: 20 min
Exercises: 40 min

Questions

Why are long methods hard to test?

Objectives

Learn some key refactoring methods to change code safely.

Modify the overlap script to support a Rectangle object.

Monolithic Main Methods

One of the smelliest code smells that new developers make is having a single main method which spans several hundred (thousand?) lines. Even if the code is perfect, long methods are an anti-pattern because they require a reader of the code to have exceptionally good working memory to understand the method. It also becomes impossible to reuse part of the code elsewhere.

Consider the following two sections of code:

if user is not None and user in database and database[user]['type'] == 'admin':
    # do admin work

#### vs ####
def user_is_admin(user, database):
    return user is not None and user in database and database[user]['type'] == 'admin'

if user_is_admin(user, database):
    # do admin work

The second version puts less stress on the reader because when they see the function called user_is_admin they can treat it like a black box until they are interested in the internals. But we are here to talk about testing!

Consider writing a test for the first version, where all the logic of what makes a user an admin is in the if statement. To even get to this line of code you will have to set up your system and possibly mock or replace other function calls. As a separate function, your test becomes a unit test, exercising a small piece of code. Here is a possible pytest:

def test_user_is_admin():
    user = None
    database = {}
    assert user_is_admin(user, database) is False, "None user is not an admin"

    user = "Jim"
    database = {}
    assert user_is_admin(user, database) is False, "User not in database"

    database = {"Jim": {"type": "regular"}}
    assert user_is_admin(user, database) is False, "User is not an admin"

    database = {"Jim": {"type": "admin"}}
    assert user_is_admin(user, database) is True, "User is an admin"

A surprising benefit of TDD and testing is that having to write code with tests makes your code better! It’s just too hard to write tests for bad code; if you have to write a test then it’s easier to just write better code.

Safe Refactoring and TDD

Reading in Rectangles

Now back to our overlap code. You have inherited a monolithic main method, how do you go about breaking it up? Some IDEs have functions for extracting a method or attribute but we are following TDD and need to write a failing test first. Let’s start with the first portion of our existing main method, reading in the dict:

Red

# overlap.py

def main(infile, outfile):
    # build rectangle struct
    dict = {}
    for line in infile:
        name, *coords = line.split()
        dict[name] = [int(c) for c in coords]
...

We have a design decision to make. Should we extract the entire for loop or just the inside? The current code works with an open file, but any iterable could do (say a list of strings). Your decision should be informed by what is useful and what is easy to test. Having a function to convert a string to a rectangle could be handy, but for now we will take the entire for loop.

# test_overlap.py
def test_read_rectangles_simple(simple_input):
    rectangles = overlap.read_rectangles(simple_input)
    assert rectangles == {
        'a': [0, 0, 2, 2],
        'b': [1, 1, 3, 3],
        'c': [10, 10, 11, 11],
    }

Which fails because we don’t have a method named read_rectangles. Note that we have already decided on a name for the function, its arguments and return values without touching our source code.

Green

Now we can pull out the function and replace that for loop with a function call.

Fix the code

Change the overlap file to make the above test pass.
Solution
def main(infile, outfile):
   dict = read_rectangles(infile)
   ...


def read_rectangles(infile):
   result = {}
   for line in infile:
       name, *coords = line.split()
       result[name] = [int(c) for c in coords]
   return result
When you start refactoring like this go slow and in parts. Make the function first and when the test_read_rectangles passes replace the code in the main method. Your end to end test should still pass and tell you if something is wrong. Note we can remove the comment because the function name is self documenting.

Refactor

With read rectangles pulled out, let’s rename our dict to rectangles. Since the argument to our function can be any iterable of strings, we can also rename it to rectangles and add a type annotation to show what we expect as input. Now is also the time to add a doc string. Make changes step-wise and run your tests frequently.

from collections.abc import Iterable

def main(infile, outfile):
    rectangles = read_rectangles(infile)
    ...


def read_rectangles(rectangles: Iterable[str]) -> dict[str, list[float]]:
    result = {}
    for rectangle in rectangles:
        name, *coords = rectangle.split()
        result[name] = [int(c) for c in coords]
    return result

Note our return value is a list of floats, ints are implied by that and we may want to support floats later (hint, we do!).

Test like an enemy

With a small, testable method in place, we can start adding features and get creative. Often you will take the mindset of an adversary of a codebase. This is generally easier with someone else’s code because you can think “I bet they didn’t think of this” instead of “why didn’t I think of this”, but being self deprecating gets easier with practice.

Think of all the ways the overlap code could break, all the things that aren’t tested, and how you would expect them to be handled. Remember users aren’t perfect so expect non-optimal inputs. It’s better to fail early (and descriptively) than several lines or modules away.

Red-Green-Refactor the `read_rectangles`

Pick one of the following and work through a red-green-refactor cycle to address it. Work on more as time permits. Remember to keep each phase short and run pytest often. Stop adding code once a test starts passing.

How to handle empty input?
Non-numeric X and Y coordinates?
X and Y coordinates that are floats?
X and Y coordinates not equal? (how are we not testing this!)
X and Y coordinates are not in increasing order?
Incorrect number of coordinates supplied?
Any others you can think of?

Often there isn’t one right answer. When the user gives an empty file, should that be an empty result or throw an error? The answer may not matter, but formalizing it as a test forces you to answer and document such questions. Look up pytest.raises for how to test if an exception is thrown.

Solution

Here is one set of tests and passing code. You may have chosen different behavior for some changes.

# test_overlap.py
def test_read_rectangles_empty_input_empty_output():
   rectangles = overlap.read_rectangles([])
   assert rectangles == {}

def test_read_rectangles_non_numeric_coord_raise_error():
   with pytest.raises(ValueError) as error:
       overlap.read_rectangles(['a a a a a'])
   assert "Non numeric value provided as input for 'a a a a a'" in str(error)

def test_read_rectangles_accepts_floats():
   lines = ['a 0.3 0.3 1.0 1.0']
   rectangles = overlap.read_rectangles(lines)
   coords = rectangles['a']
   # Note that 0.3 != 0.1 + 0.2 due to floating point error, use pytest.approx
   assert coords[0] == pytest.approx(0.1 + 0.2)
   assert coords[1] == 0.3
   assert coords[2] == 1.0
   assert coords[3] == 1.0

def test_read_rectangles_not_equal_coords():
   lines = ['a 1 2 3 4']
   rectangles = overlap.read_rectangles(lines)
   assert rectangles == {'a': [1, 2, 3, 4]}

def test_read_rectangles_fix_order_of_coords():
   lines = ['a 3 4 1 2']
   rectangles = overlap.read_rectangles(lines)
   assert rectangles == {'a': [1, 2, 3, 4]}

def test_read_rectangles_incorrect_number_of_coords_raise_error():
   with pytest.raises(ValueError) as error:
       overlap.read_rectangles(['a 1'])
   assert "Incorrect number of coordinates for 'a 1'" in str(error)

   with pytest.raises(ValueError) as error:
       overlap.read_rectangles(['a'])
   assert "Incorrect number of coordinates for 'a'" in str(error)

# overlap.py
def read_rectangles(rectangles: Iterable[str]) -> dict[str, list[float]]:
   result = {}
   for rectangle in rectangles:
       name, *coords = rectangle.split()
       try:
           value = [float(c) for c in coords]
       except ValueError:
           raise ValueError(f"Non numeric value provided as input for '{rectangle}'")

       if len(value) != 4:
           raise ValueError(f"Incorrect number of coordinates for '{rectangle}'")

       # make sure x1 <= x2, value = [x1, y1, x2, y2]
       value[0], value[2] = min(value[0], value[2]), max(value[0], value[2])
       value[1], value[3] = min(value[1], value[3]), max(value[1], value[3])
       result[name] = value
   return result

Notice how the test code is much longer than the source code, this is typical for well-tested code.

A good question is, do we need to update our end-to-end test to cover all these pathological cases? The answer is no (for most). We already know our read_rectangles will handle the incorrect number of coordinates by throwing an informative error so it would be redundant to see if the main method has the same behavior. However, it would be useful to document what happens if an empty input file is supplied. Maybe you want read_rectangles to return an empty dict but the main method should warn the user or fail.

The decision not to test something in multiple places is more important than being efficient with our time writing tests. Say we want to change the message printed for one of the improper input cases. If the same function is tested multiple places we need to update multiple tests that are effectively covering the same feature. Remember that tests are code and should be kept DRY. Tests that are hard to change don’t get run and code eventually degrades back to a legacy version.

Testing testing for overlap

We have arrived to the largest needed refactoring, the overlap code. This is the internals of the nested for loop.

...
for blue_name, blue_coords in rectangles.items():
    # check if rects overlap
    result = '1'

    red_lo_x, red_lo_y, red_hi_x, red_hi_y = red_coords
    blue_lo_x, blue_lo_y, blue_hi_x, blue_hi_y = blue_coords

    if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
            (red_lo_y >= blue_hi_x) or (red_hi_y <= blue_lo_y):
        result = '0'
...

Like before, we want to write a failing test first and extract this method. When that is in place we can start getting creative to test more interesting cases. Finally, to support our original goal (reporting the percent overlap) we will change our return type to another rectangle.

Red-Green-Refactor `rects_overlap`

Extract the inner for loop code. The signature should be:

def rects_overlap(red, blue) -> bool: ...

Solution

For testing, I’m using the simple input, when parsed from read_rectangles. Notice that if we had used read_rectangles to do the parsing we would add a dependency between that function and this one. If we change (break) read_rectangles this would break too even though nothing is wrong with rects_overlap. However, uncoupling them completely doesn’t capture how they interact in code. Consider the drawbacks before deciding what to use.

# test_overlap.py
def test_rects_overlap():
   rectangles = {
       'a': [0, 0, 2, 2],
       'b': [1, 1, 3, 3],
       'c': [10, 10, 11, 11],
   }

   assert overlap.rects_overlap(rectangles['a'], rectangles['a']) is True
   assert overlap.rects_overlap(rectangles['a'], rectangles['b']) is True
   assert overlap.rects_overlap(rectangles['b'], rectangles['a']) is True
   assert overlap.rects_overlap(rectangles['a'], rectangles['c']) is False

# overlap.py
def main(infile, outfile):
   rectangles = read_rectangles(infile)

   for red_name, red_coords in rectangles.items():
       output_line = []
       for blue_name, blue_coords in rectangles.items():
           result = '1' if rects_overlap(red_coords, blue_coords) else '0'

           output_line.append(result)
       outfile.write('\t'.join(output_line) + '\n')

...
def rects_overlap(red, blue) -> bool:
   red_lo_x, red_lo_y, red_hi_x, red_hi_y = red
   blue_lo_x, blue_lo_y, blue_hi_x, blue_hi_y = blue

   if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
           (red_lo_y >= blue_hi_x) or (red_hi_y <= blue_lo_y):
       return False

   return True

Now we can really put the function through it’s paces. Here are some rectangles to guide your testing:

     ┌───┐     ┌──┬──┐   ┌──────┐    ┌─┐    
     │  ┌┼─┐   │  │  │   │ ┌──┐ │    └─┼─┐  
     └──┼┘ │   └──┴──┘   └─┼──┼─┘      └─┘  
        └──┘               └──┘             

For each, consider swapping the red and blue labels and rotating 90 degrees. Rotating a coordinate 90 degrees clockwise is simply swapping x and y while negating the new y value. If you thought the rotation function would be useful elsewhere, you could add it to your script, but for now we will keep it in our pytest code. First, write some failing unit tests of our new helper function:

# test_overlap.py
def test_rotate_rectangle():
    rectangle = [1, 2, 3, 3]

    rectangle = rotate_rectangle(rectangle)
    assert rectangle == [2, -3, 3, -1]

    rectangle = rotate_rectangle(rectangle)
    assert rectangle == [-3, -3, -1, -2]

    rectangle = rotate_rectangle(rectangle)
    assert rectangle == [-3, 1, -2, 3]

    rectangle = rotate_rectangle(rectangle)
    assert rectangle == [1, 2, 3, 3]

Writing out the different permutations was challenging (for me, at least), probably related to having unnamed attributes for our rectangle as a list of numbers… Then you need to produce the correct rectangle where x1 <= x2.

# test_overlap.py
def rotate_rectangle(rectangle):
    x1, y1, x2, y2 = rectangle
    x1, y1 = y1, -x1
    x2, y2 = y2, -x2

    # make sure x1 <= x2, value = [x1, y1, x2, y2]
    x1, x2 = min(x1, x2), max(x1, x2)
    y1, y2 = min(y1, y2), max(y1, y2)

    return [x1, y1, x2, y2]

Notice this is still in our test file. We don’t want it in our project and it’s just a helper for our tests, we could use it elsewhere in later tests. Since the function is non-trivial we also have it tested. For helper functions that mutate or wrap inputs, you could have them non-tested but beware the simple helper function may evolve into something more complex!

Test the first overlap orientation

Start with the first type of overlap (over corners). Set one rectangle to (-1, -1, 1, 1) and place the second rectangle. Transform the second rectangle by rotating it around the origin 3 times and test the rects_overlap function for each overlap and red/blue assignment. Hint: pytest code is still python!
Solution

With our rotate_rectangle function in hand, the test function needs to test rects_overlap twice (swapping arguments each time) then rotate the second rectangle. Note the choice of centering the first rectangle on the origin makes it rotationally invariant. Here is a test function, notice that the assert statements have a clear message as it would otherwise be difficult to tell when a test failed.
def test_rects_overlap_permutations():
   rectangle_1 = [-1, -1, 1, 1]
   rectangle_2 = [0, 0, 2, 3]
   result = True

   for i in range(4):
       assert overlap.rects_overlap(rectangle_1, rectangle_2) == result, (
           f"Failed rects_overlap({rectangle_1}, {rectangle_2}) "
           f"on rotation {i}")

       assert overlap.rects_overlap(rectangle_2, rectangle_1) == result, (
           f"Failed rects_overlap({rectangle_2}, {rectangle_1}) "
           f"on rotation {i}")

       rectangle_2 = rotate_rectangle(rectangle_2)
I declare the result to simplify parameterize introduced next.

We have test_rects_overlap_permutations written for one rectangle. You may be tempted to copy and paste the function multiple times and declare a different rectangle_2 and result for each orientation we came up with above. To avoid code duplication, you could make our test a helper function, but pytest has a better builtin option, parameterize.

`pytest.mark.parametrize`

Parametrizing a test function allows you to easily repeat its execution with different input values. Using parameters cuts down on duplication and can vastly increase the input space you can explore by nesting parameters.

Let’s work on testing the following function:

def square_sometimes(value, condition):
    if condition:
        raise ValueError()
    return value ** 2

Fully testing this function requires testing a variety of values and ensuring when condition is True the function instead raises an error. Here are a few tests showing how you could write multiple test functions:

def test_square_sometimes_number_false():
    assert square_sometimes(4, False) == 16

def test_square_sometimes_number_true():
    with pytest.raises(ValueError):
        square_sometimes(4, True)

def test_square_sometimes_list_false():
    assert square_sometimes([1, 1], False) == [1, 1, 1, 1]

def test_square_sometimes_list_true():
    with pytest.raises(ValueError):
        square_sometimes([1, 1], True)
...

Hopefully that code duplication is making your skin crawl! Maybe instead you write a helper function to check the value of condition:

def square_sometimes_test_helper(value, condition, output):
    if condition:
        with pytest.raises(ValueError):
            square_sometimes(value, condition)
    else:
        assert square_sometimes(value, condition) == output

Then you can replace some of your tests with the helper calls:

def test_square_sometimes_number():
    square_sometimes_test_helper(4, False, 16)
    square_sometimes_test_helper(4, True, 16)

def test_square_sometimes_list():
    square_sometimes_test_helper([1, 1], False, [1, 1, 1, 1])
    square_sometimes_test_helper([1, 1], True, [1, 1, 1, 1])
...

You could imagine wrapping the helper calls in a loop and you’d basically have a parametrized test. Using pytest.mark.parametrize you’d instead have:

@pytest.mark.parametrize(
    'value,output',
    [
        (4, 16),
        ([1,1], [1,1,1,1]),
        ('asdf', 'asdfasdf'),
    ]
)
@pytest.mark.paramterize('condition', [True, False])
def test_square_sometimes(value, condition, output):
    if condition:
        with pytest.raises(ValueError):
            square_sometimes(value, condition)
    else:
        assert square_sometimes(value, condition) == output

pytest will generate the cross product of all nested parameters, here producing 6 tests. Notice how there is no code duplication and if you come up with another test input, you just have to add it to the ‘value,output’ list; testing each condition will automatically happen. Compared with coding the loops yourself, pytest generates better error reporting on what set of parameters failed. Like most of pytest, there is a lot more we aren’t covering like mixing parameters and fixtures.

Here is the parameterized version of test_rects_overlap_permutations. I added the unicode rectangle images because I already made them above and it seemed like a shame to not reuse. Normally I wouldn’t spend time making them and instead use a text descriptor, e.g. “Corners overlap”.

rectangle_strs = ['''
┌───┐  
│  ┌┼─┐
└──┼┘ │
   └──┘''','''
┌──┬──┐
│  │  │
└──┴──┘''','''
┌──────┐
│ ┌──┐ │
└─┼──┼─┘
  └──┘''','''
┌─┐  
└─┼─┐
  └─┘''','''
┌─┐  
└─┘ ┌─┐
    └─┘''','''
┌──────┐
│  ┌─┐ │
│  └─┘ │
└──────┘''',
                  ]
@pytest.mark.parametrize(
    "rectangle_2,rectangle_str,result",
    [
        ([0, 0, 2, 3], rectangle_strs[0], True),
        ([1, -1, 2, 1], rectangle_strs[1], False),
        ([0, -2, 0.5, 0], rectangle_strs[2], True),
        ([1, 1, 2, 2], rectangle_strs[3], False),
        ([2, 2, 3, 3], rectangle_strs[4], False),
        ([0, 0, 0.5, 0.5], rectangle_strs[5], True),
    ])
def test_rects_overlap_permutations(rectangle_2, rectangle_str, result):
    rectangle_1 = [-1, -1, 1, 1]

    for i in range(4):
        assert overlap.rects_overlap(rectangle_1, rectangle_2) == result, (
            f"Failed rects_overlap({rectangle_1}, {rectangle_2}) "
            f"on rotation {i}. {rectangle_str}")

        assert overlap.rects_overlap(rectangle_2, rectangle_1) == result, (
            f"Failed rects_overlap({rectangle_2}, {rectangle_1}) "
            f"on rotation {i}. {rectangle_str}")

        rectangle_2 = rotate_rectangle(rectangle_2)

Now failed tests will also show the image along with information to recreate any failures. Running the above should fail… Time to get back to green.

Note that with parameterize, each set of parameters will be run, e.g. failing on the second rectangle will not affect the running on the third rectangle.

Fix the code

Work on the source code for rects_overlap until all the code passes. Hint: try pytest --pdb test_overlap.py which launches pdb at the site of the first assertion error.
Solution

Only one set of rectangles is failing and it’s on the first rotation. That should indicate our tests are working properly and perhaps there is a more subtle bug hiding in our code. Hopefully it didn’t take too long to see there is a copy paste error in the if statement.
   if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
           (red_lo_y >= blue_hi_x) or (red_hi_y <= blue_lo_y):
                              # ^ should be y!
Notice that all our other tests didn’t exercise this condition. If we weren’t rotating each case we would have also missed this. At least the fix is easy.

In contrast to the last implementation change we made (removing the trailing tab character) this bug fix is more serious. You have revealed an error in code that was previously published! Were this a real example, you should open an issue with the original code and consider rerunning any dependent analyses to see if results fundamentally change. It may be by chance that branch of code never actually ran and it didn’t matter if it were wrong. Maybe only 1% of cases were miscalled. That would require just a new version to patch the issue. Larger changes in the results may necessitate a revision or retraction of the original publication.

Key Points

Testing long methods is difficult since you can’t pinpoint a few lines of logic.

Testable code is also good code!

Changing code without tests can be dangerous. Work slowly and carefully making only the simplest changes first.

Write tests with an adversarial viewpoint.

Keep tests DRY, fixtures and parameter can help.

Testing Big Changes

Overview

Teaching: 20 min
Exercises: 40 min

Questions

What does it mean if you have to change a lot of tests while adding features?

What are the advantages of testing an interface?

Objectives

Add a Rectangle class

Learn how to use TDD when making large changes to code

Making big changes

With this fairly simple example, we have fully tested our codebase. You can rest easy knowing the code is working as well as you’ve tested it. We have made a few changes to what the legacy code used to do and even found a bug. Still, our code functions basically the same as it did (perhaps more correct). How does testing help when we need to add features and really change code?

The answer depends on how well you tested the interface vs the implementation. If your tests are full of duplication, you will have to change a lot of the test code. Now you would want to make a commit of this version of the code and get ready to shake things up. Even if it seems like you are spending a lot of time messing with the tests, at the end once everything passes you should have the same confidence in your code.

Testing the interface vs the implementation

Consider a database object that performs a query and then can return a result as separate function calls. This is not the best design but it is a useful example

class MyDB():
    self._result: list
...
    def query(self, something):
        ...
        self._result = result
    def get_result(self):
        return list(self._result)

When you call query the result is saved in the _result attribute of the instance. When you get the result, a copy is returned as a list.

Consider the following two tests for query:

def test_query_1():
    my_db = MyDB()
    my_db.query('something')
    assert my_db._result == ['a result']

def test_query_2():
    my_db = MyDB()
    my_db.query('something')
    assert my_db.get_result() == ['a result']

Currently, they do the same thing. One could argue the first option is better because it only tests the query function while the second is also exercising get_result. By convention, the underscore in front of the attribute in python means it is private and you shouldn’t really access it. Since “private” is kind of meaningless in python, a better description is that it is an implementation detail which could change. Even if it weren’t private, the first test is testing internal implementation (calling query sets the result attribute) while the second is testing the interface (call query first and get_result will give you the result as a list).

Think about what happens if the _result attribute is changed from a list to a set, or dictionary. This is valid since no one should really use _result outside of the class, but in terms of testing, test_query_1 would break while test_query_2 still passes, even if the code is correct. This is annoying because you have to change a lot of test code but it is also dangerous. Tests that fail for correct code encourage you to run tests less often and can cause code and tests to diverge. Where possible, test to the interface.

Returning a new rectangle

Back to our rectangles, let’s change our rects_overlap to return a new rectangle when the arguments overlap, and None when they do not.

Red

We will start with the basic test_rects_overlap

# test_overlap.py

def test_rects_overlap():
    rectangles = {
        'a': [0, 0, 2, 2],
        'b': [1, 1, 3, 3],
        'c': [10, 10, 11, 11],
    }

    assert overlap.rects_overlap(rectangles['a'], rectangles['a']) == [0, 0, 2, 2]
    assert overlap.rects_overlap(rectangles['a'], rectangles['b']) == [1, 1, 2, 2]
    assert overlap.rects_overlap(rectangles['b'], rectangles['a']) == [1, 1, 2, 2]
    assert overlap.rects_overlap(rectangles['a'], rectangles['c']) is None

We are failing since the first call returns True instead of our list.

Green

Simple enough, instead of False we should return None and instead of True we need to make a new rectangle with the smaller of the coordinates.

# overlap.py
from typing import Optional

def rects_overlap(red, blue) -> Optional[list[float]]:
    red_lo_x, red_lo_y, red_hi_x, red_hi_y = red
    blue_lo_x, blue_lo_y, blue_hi_x, blue_hi_y = blue

    if (red_lo_x >= blue_hi_x) or (red_hi_x <= blue_lo_x) or \
            (red_lo_y >= blue_hi_y) or (red_hi_y <= blue_lo_y):
        return None

    x1 = max(red_lo_x, blue_lo_x)
    x2 = min(red_hi_x, blue_hi_x)
    y1 = max(red_lo_y, blue_lo_y)
    y2 = min(red_hi_y, blue_hi_y)
    return [x1, y1, x2, y2]

We are passing the new test_rects_overlap but failing another 6 tests! That’s because many tests are still testing the old behavior. If you have kept tests DRY and not tested the implementation, this should be quick to fix, otherwise take the opportunity to refactor your tests!

For us, the test_rects_overlap_permutation will accept a rectangle as the result. On each loop iteration, we need to rotate the result rectangle. Since None is a valid rectangle now, rotate_rectangle needs to handle it appropriately. Here is the finished test method:

@pytest.mark.parametrize(
    "rectangle_2,rectangle_str,result",
    [
    ([0, 0, 2, 3], rectangle_strs[0], [0, 0, 1, 1]),
    ([1, -1, 2, 1], rectangle_strs[1], None),
    ([0, -2, 0.5, 0], rectangle_strs[2], [0, -1, 0.5, 0]),
    ([1, 1, 2, 2], rectangle_strs[3], None),
    ([2, 2, 3, 3], rectangle_strs[4], None),
    ([0, 0, 0.5, 0.5], rectangle_strs[5], [0, 0, 0.5, 0.5]),
    ])
def test_rects_overlap_permutations(rectangle_2, rectangle_str, result):
    rectangle_1 = [-1, -1, 1, 1]

    for i in range(4):
        assert overlap.rects_overlap(rectangle_1, rectangle_2) == result, (
                f"Failed rects_overlap({rectangle_1}, {rectangle_2}) "
                f"on rotation {i}. {rectangle_str}")

        assert overlap.rects_overlap(rectangle_2, rectangle_1) == result, (
                f"Failed rects_overlap({rectangle_2}, {rectangle_1}) "
                f"on rotation {i}. {rectangle_str}")

        rectangle_2 = rotate_rectangle(rectangle_2)
        result = rotate_rectangle(result)

def rotate_rectangle(rectangle):
    if rectangle is None:
        return None

    x1, y1, x2, y2 = rectangle
    x1, y1 = y1, -x1
    x2, y2 = y2, -x2

    # make sure x1 <= x2, value = [x1, y1, x2, y2]
    x1, x2 = min(x1, x2), max(x1, x2)
    y1, y2 = min(y1, y2), max(y1, y2)

    return [x1, y1, x2, y2]

Refactor

Since our tests are fairly small and not too coupled to the implementation we don’t have to change anything. If you had written the permutation function as 48 separate tests, it would make sense to refactor that while updating the return values!

End-to-end test?

You may be surprised our end to end test is still passing. This is a feature of python more than any design of our system. The line

result = '1' if rects_overlap(red_coords, blue_coords) else '0'

works properly because None evaluates as False while a non-empty list evaluates as True in a boolean context.

Take a minute to reflect on this change. Creating a new rectangle is non-trivial, it would be easy to mix up a min or max or introduce a copy/paste error. Since we have tests, we can be confident we made this change correctly. With our end-to-end test we further know that our main method is behaving exactly the same as well.

A `Rectangle` Object

Any decent OOP developer would be pulling their hair over using a list of 4 numbers as a rectangle when you could have a full object hierarchy! We have another design decision before we really dig into the changes, how much should we support the old format of a rectangle as a list of numbers? Coming from strict, static typing you may be inclined to say “not at all” but python gains power from its duck typing.

Consider adding support for comparing a rectangle to a list of numbers. If you support lists fully we can maintain the same tests. However, a user may not expect a list to equal a rectangle and could cause problems with other code. Since we want to focus on writing tests instead of advanced python we will try to replace everything with rectangles. In real applications you may want to transiently add support for comparisons to lists so the tests pass to confirm any other logic is sound. After the tests are converted to rectangles you would alter your __eq__ method to only compare with other rectangles.

Without getting carried away, we want our rectangles to support

Construction from a list of numbers or with named arguments
Comparison with other rectangles, primarily for testing
Calculation of area
An overlap function with a signature like def overlap(self, other: Rectangle) -> Rectangle.

Afterwards, our main loop code will change from

result = '1' if rects_overlap(red_coords, blue_coords) else '0'
# to
overlap_area = red_rectangle.overlap(blue_rectangle).area()

We will also need to change code in our read_rectangles and rects_overlap functions as well as main.

Order of Operations

Should we start with converting our existing tests or making new tests for the rectangle object?

Solution

It’s better to work from bottom up, e.g. test creating a rectangle before touching read_rectangles.

Imaging a red, green, refactor cycle where you start with updating read_rectangles to return a (yet unimplemented) Rectangle object. First change your tests to expect a Rectangle to get them failing. The problem is to get them passing you need to add code to create a rectangle and test for equality. It’s possible but then you don’t have unit tests for those functions. Furthermore, that breaks with the idea that TDD cycles should be short; you have to add a lot of code to get things passing!

TDD: Creating Rectangles

Starting with a new test, we will simply check that a new rectangle can be created from named arguments.

# test_overlap.py
def test_create_rectangle_named_parameters():
    assert overlap.Rectangle(x1=1.1, x2=2, y1=4, y2=3)

Fails because there is no Rectangle object yet. To get green, we are going to use a python dataclass:

# overlap.py
from dataclasses import dataclass

@dataclass
class Rectangle:
    x1: float
    y1: float
    x2: float
    y2: float

The dataclass produces a default __init__ and __eq__.

How about rectangles from lists?

# test_overlap.py
def test_create_rectangle_from_list():
    assert overlap.Rectangle.from_list([1.1, 4, 2, 3])

Here we are using the order of parameters matching the existing implementation. The code uses a bit of advanced python but should be clear:

# overlap.py
class Rectangle:
    # ...
    @classmethod
    def from_list(cls, coordinates: list[float]):
        x1, y1, x2, y2 = coordinates
        return cls(x1=x1, y1=y1, x2=x2, y2=y2)

And let’s add a test for the incorrect number of values in the list:

# test_overlap.py
def test_create_rectangle_from_list_wrong_number_of_args():
    with pytest.raises(ValueError) as error:
        overlap.Rectangle.from_list([1.1, 4, 2])
    assert "Incorrect number of coordinates " in str(error)
    with pytest.raises(ValueError) as error:
        overlap.Rectangle.from_list([1.1, 4, 2, 2, 2])
    assert "Incorrect number of coordinates " in str(error)

And update the code.

# overlap.py
class Rectangle:
    # ...
    @classmethod
    def from_list(cls, coordinates: list[float]):
        if len(coordinates) != 4:
            raise ValueError(f"Incorrect number of coordinates for '{coordinates}'")
        x1, y1, x2, y2 = coordinates
        return cls(x1=x1, y1=y1, x2=x2, y2=y2)

Notice that we now have some code duplication since we use the same check in read_rectangles. But we can’t remove it until we start using the Rectangle there.

TDD: Testing equality

Instead of checking our rectangles are just not None, let’s see if they are what we expect:

# test_overlap.py
def test_create_rectangle_named_parameters():
    assert overlap.Rectangle(1.1, 4, 2, 3) == overlap.Rectangle(1.1, 3, 2, 4)
    assert overlap.Rectangle(x1=1.1, x2=2, y1=4, y2=3) == overlap.Rectangle(1.1, 3, 2, 4)


def test_create_rectangle_from_list():
    assert overlap.Rectangle.from_list([1.1, 4, 2, 3]) == overlap.Rectangle(1.1, 3, 2, 4)

Here we use unnamed parameters during initialization to check for equality. The order of attributes in the dataclass determines the order of assignment. Since that’s fairly brittle, consider adding kw_only for production code.

This fails not because of an issue with __eq__ but the __init__, we aren’t ensuring our expected ordering of x1 <= x2. You could overwrite __init__ but instead we will use a __post_init__ to valid the attributes.

# overlap.py
@dataclass
class Rectangle:
    x1: float
    y1: float
    x2: float
    y2: float

    def __post_init__(self):
        self.x1, self.x2 = min(self.x1, self.x2), max(self.x1, self.x2)
        self.y1, self.y2 = min(self.y1, self.y2), max(self.y1, self.y2)

Note this also fixes the issue with from_list as it calls __init__ as well. If we later decide to add an attribute like color the __init__ method would update and our __post_init__ would function as intended.

TDD: Calculate Area

For area we can come up with a few simple tests:

# test_overlap.py
def test_rectangle_area():
    assert overlap.Rectangle(0, 0, 1, 1).area() == 1
    assert overlap.Rectangle(0, 0, 1, 2).area() == 2
    assert overlap.Rectangle(0, 1, 2, 2).area() == 2
    assert overlap.Rectangle(0, 0, 0, 0).area() == 0
    assert overlap.Rectangle(0, 0, 0.3, 0.3).area() == 0.09
    assert overlap.Rectangle(0.1, 0, 0.4, 0.3).area() == 0.09

The code shouldn’t be too surprising:

# overlap.py
@dataclass
class Rectangle:
    ...

    def area(self):
        return (self.x2-self.x1) * (self.y2-self.y1)

But you may be surprised that the last assert is failing, even though the second to last is passing. Again we are struck by the limits of floating point precision.

Fix the tests

Fix the issue of floating point comparison. Hint, look up pytest.approx.
Solution
   assert overlap.Rectangle(0.1, 0, 0.4, 0.3).area() == pytest.approx(0.09)
To be thorough, you should approximate the line above as well. While it doesn’t fail here, a different machine may have different precision.

TDD: Overlap

Now we have the challenge of replacing the rects_overlap function with a Rectangle method with the following signature def overlap(self, other: Rectangle) -> Rectangle.

Red, green, refactor

Perform an iteration of TDD to add the overlap method to Rectangle. Hint, start with only the test_rects_overlap test for now.

Solution

Red

# test_overlap.py
def test_rectangle_overlap():
    rectangles = {
        'a': overlap.Rectangle(0, 0, 2, 2),
        'b': overlap.Rectangle(1, 1, 3, 3),
        'c': overlap.Rectangle(10, 10, 11, 11),
    }

    assert rectangles['a'].overlap(rectangles['a']) == overlap.Rectangle(0, 0, 2, 2)
    assert rectangles['a'].overlap(rectangles['b']) == overlap.Rectangle(1, 1, 2, 2)
    assert rectangles['b'].overlap(rectangles['a']) == overlap.Rectangle(1, 1, 2, 2)
    assert rectangles['a'].overlap(rectangles['c']) is None

Green

You can copy most of the current code to the new method with changes to variable names.

# overlap.py
class Rectangle:
    ...

    def overlap(self, other):
        if (self.x1 >= other.x2) or \
                (self.x2 <= other.x1) or \
                (self.y1 >= other.y2) or \
                (self.y2 <= other.y1):
            return None

        return Rectangle(
            x1=max(self.x1, other.x1),
            y1=max(self.y1, other.y1),
            x2=min(self.x2, other.x2),
            y2=min(self.y2, other.y2),
        )

This was after refactoring to get rid of extra variables. Since we can use named arguments for new rectangles, we don’t have to set x1 separately for clarity.

Refactor

While it’s tempting to go forth and replace all rects_overlap calls now, we need to work on some other tests first.

Wrapping up the overlap tests, we need to work on rotate_rectangle and the overlap permutations function. I think it makes sense to move our rotate function to Rectangle now. While we don’t use it in our main method, the class is becoming general enough to be used outside our current script.

# test_overlap.py
def test_rotate_rectangle():
    rectangle = overlap.Rectangle(1, 2, 3, 3)

    rectangle = rectangle.rotate()
    assert rectangle == overlap.Rectangle(2, -3, 3, -1)

    rectangle = rectangle.rotate()
    assert rectangle == overlap.Rectangle(-3, -3, -1, -2)

    rectangle = rectangle.rotate()
    assert rectangle == overlap.Rectangle(-3, 1, -2, 3)

    rectangle = rectangle.rotate()
    assert rectangle == overlap.Rectangle(1, 2, 3, 3)

The special case when Rectangle is None is not handled anymore since you can’t call the method rotate on a None object.

# overlap.py
class Rectangle:
    ...

    def rotate(self):
        return Rectangle(
            x1=self.y1,
            y1=-self.x1,
            x2=self.y2,
            y2=-self.x2,
        )

That is much cleaner than the last version. Now to handle our None rectangles, we will keep our rotate_rectangle helper function in the test code to wrap our method dispatch and modify our test:

# test_overlap.py
def rotate_rectangle(rectangle):
    if rectangle is None:
        return None
    return rectangle.rotate()

...
@pytest.mark.parametrize(
    "rectangle_2,rectangle_str,result",
    [
        (overlap.Rectangle(0, 0, 2, 3), rectangle_strs[0], overlap.Rectangle(0, 0, 1, 1)),
        (overlap.Rectangle(1, -1, 2, 1), rectangle_strs[1], None),
        (overlap.Rectangle(0, -2, 0.5, 0), rectangle_strs[2], overlap.Rectangle(0, -1, 0.5, 0)),
        (overlap.Rectangle(1, 1, 2, 2), rectangle_strs[3], None),
        (overlap.Rectangle(2, 2, 3, 3), rectangle_strs[4], None),
        (overlap.Rectangle(0, 0, 0.5, 0.5), rectangle_strs[5], overlap.Rectangle(0, 0, 0.5, 0.5)),
    ])
def test_rectangles_overlap_permutations(rectangle_2, rectangle_str, result):
    rectangle_1 = overlap.Rectangle(-1, -1, 1, 1)

    for i in range(4):
        assert rectangle_1.overlap(rectangle_2) == result, (
            f"Failed {rectangle_1}.overlap({rectangle_2}) "
            f"on rotation {i}. {rectangle_str}")

        assert rectangle_2.overlap(rectangle_1) == result, (
            f"Failed {rectangle_2}.overlap({rectangle_1}) "
            f"on rotation {i}. {rectangle_str}")

        rectangle_2 = rotate_rectangle(rectangle_2)
        result = rotate_rectangle(result)

And we can almost get rid of rects_overlap entirely. It’s not in our tests, but we do use it in our main script. The last function to change is read_rectangles.

TDD: Read rectangles

Red, green, refactor

Perform an iteration of TDD to make the read_rectangles function return Rectangles.
Solution

Red

It’s best to start with the simple method then clean up failures on refactor.
# test_overlap.py
def test_read_rectangles_simple(simple_input):
    rectangles = overlap.read_rectangles(simple_input)
    assert rectangles == {
        'a': overlap.Rectangle(0, 0, 2, 2),
        'b': overlap.Rectangle(1, 1, 3, 3),
        'c': overlap.Rectangle(10, 10, 11, 11),
    }
Green(ish)

The original code dealing with coordinate ordering and number of coordinates can be removed as that’s now handled by Rectangle.
# overlap.py
def read_rectangles(rectangles: Iterable[str]) -> dict[str, Rectangle]:
    result = {}
    for rectangle in rectangles:
        name, *coords = rectangle.split()
        try:
            value = [float(c) for c in coords]
        except ValueError:
            raise ValueError(f"Non numeric value provided as input for '{rectangle}'")

        result[name] = Rectangle.from_list(value)

    return result
This is not a real green since other tests are failing, but at least test_read_rectangles_simple is passing.

Refactor

You have to touch a lot of test code to get everything passing. Also you need to modify the main method. Afterwards you can finally remove rects_overlap.

TDD: Area of overlap

Recall we set out to do all this in order to output the area of the overlap of a rectangle, instead of just 1 when rectangles overlap. That change will only affect our end to end test. It may be prudent to extract the line

result = '1' if red_rect.overlap(blue_rect) else '0'

into a separate function to exercise it more fully, but for now we will leave it.

Red, green, refactor

Perform an iteration of TDD to make the main function output the area of overlap.

Solution

Red

Our simple rectangles aren’t the best for testing, but we are at least confident our area method is well tested elsewhere.

# test_overlap.py
def test_end_to_end(simple_input):
    # initialize a StringIO with a string to read from
    infile = simple_input
    # this holds our output
    outfile = StringIO()
    # call the function
    overlap.main(infile, outfile)
    output = outfile.getvalue().split('\n')
    assert output[0] == '4.0\t1.0\t0'
    assert output[1] == '1.0\t4.0\t0'
    assert output[2] == '0\t0\t1.0'

Green

# overlap.py
def main(infile, outfile):
    rectangles = read_rectangles(infile)

    for red_name, red_rect in rectangles.items():
        output_line = []
        for blue_name, blue_rect in rectangles.items():
            overlap = red_rect.overlap(blue_rect)
            result = str(overlap.area()) if overlap else '0'

            output_line.append(result)
        outfile.write('\t'.join(output_line) + '\n')

Refactor

You may consider changing the formatting of the floats, but otherwise we are done!

Big Conclusions

Take a breath, we are out of the deep end of rectangles for now. At this point, you may think TDD has slowed you down; sometimes a simple change of code required touching a lot of test code just to have the same performance and functionality. If that’s your impression, consider the following:

Make new mistakes. Nothing is worse than finding the same bug twice. It is hard to appreciate preventing regressions on simple problems like we used here, but the overlap and rotate functions are close. Once you’ve put in the time to write a test and are confident in the results, it is easier to make changes touching that code because the previous tests should still pass. Without tests you may forget a negative sign or make a typo when refactoring and not notice until production. Don’t spend your time making the same mistakes, go forth and make new mistakes!
Go far, not fast. There is a proverb “If you want to go fast, go alone. If you want to go far, go together.” The translation to testing breaks down because frequently having tests makes your work go much faster. You are investing in code upfront with tests to limit debugging time later.
Work with a safety net. Moving to a problem of larger complexity, working on legacy code can feel unsettling. The entire system is too large and opaque to be sure everything works so you are left with an uneasy feeling that maybe a change you made introduced a bug. With tests, you can sleep soundly knowing all your tests passed. There may be bugs, but they are new bugs.

As you move into open source projects or work with multiple collaborators, fully tested codebases make sure the quality of the project won’t degrade as long as tests are informative, fast, and fun to write.

Key Points

Changing a lot of test code for minor features can indicate your tests are not DRY and heavily coupled.

Testing a module’s interface focuses tests on what a user would typically observe. You don’t have to change as many tests when internal change.

Testing

Introduction

Overview

What is testing?

When to test?

Key Points

Working with Legacy Code

Overview

Meet the legacy code project

Critique the published version

Solution

A simple end-to-end test

Critique the sample input

Solution

Key Points

Test framework

Overview

pytest: A test framework

Test already!

Writing actual test cases

Improve the code

Solution

What is code (test) coverage

Key Points

Test Driven Development

Overview

Test Driven Development

Red, green, refactor

Importing overlap.py, red-green-refactor

Red

Green

Refactor

Refactor

Solution

End-to-end testing with pytest

Red

Green

Fix the code

Solution

Refactor

Key Points

Refactoring for Testing

Overview

Monolithic Main Methods

Safe Refactoring and TDD

Reading in Rectangles

Red

Green

Fix the code

Solution

Refactor

Test like an enemy

Red-Green-Refactor the read_rectangles

Solution

Testing testing for overlap

Red-Green-Refactor rects_overlap

Solution

Test the first overlap orientation

Solution

pytest.mark.parametrize

Fix the code

Solution

Key Points

Testing Big Changes

Overview

Making big changes

Testing the interface vs the implementation

Returning a new rectangle

Red

Green

Refactor

End-to-end test?

A Rectangle Object

Order of Operations

Solution

TDD: Creating Rectangles

TDD: Testing equality

TDD: Calculate Area

Fix the tests

Solution

Red-Green-Refactor the `read_rectangles`

Red-Green-Refactor `rects_overlap`

`pytest.mark.parametrize`

A `Rectangle` Object