Doctestability

Some languages let you use inline documentation to write example code that can be used as unit tests. In Python, these are called doctests. They look something like this:

def adder(a, b):
    """Adds two numbers for example purposes.

    >>> adder(1, 2)
    3

    >>> adder(5, 2)
    7

    """
    return a + b

I’m becoming a big fan of this feature. Because I’ve noticed that the ability to effectively doctest something is usually an indicator of good design.

What is an “effective doctest”? I mean a doctest that:

  • Is easy to understand
  • Is focused: doesn’t require a lot of setup
  • Is safe: no side effects
  • Communicates: It’s documentation first and a test second

These are also things you can say about code that is well designed: it’s easy to understand, focused, safe, and communicates intent.

A black-box, purely functional object meets all of these criteria. You pass some data in, you get some data out. Passing the same data in always gives you the same data out. This is the perfect candidate for a doctest, so let your desire to doctest force you to write more functions like this.

But what about situations where you must have side effects?

Recently I needed an object to route background tasks. For example, when background task A was finished, it should fire off task B and C in parallel, and when B was finished, it should fire off D. Upon task completion, the task router should be triggered again with a message saying the task was completed so we can fire off the next task(s).

We were going to do this in python using celery. An implementation could have looked like this:

from myproj.celery import app, tasks

@app.task
def router(data, task, message):
    """Route background tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.
    Start the chain by calling:

        router('data', 'task_a', 'start')

    """
    if task == 'task_a':
        if message == 'start':
            tasks.task_a.delay(data) | router('task_a', 'complete')
        if message == 'complete':
            tasks.task_b.delay(data) | router('task_b', 'complete')
            tasks.task_c.delay(data) | router('task_c', 'complete')
    elif task == 'task_b':
        if message == 'complete':
            tasks.task_d.delay(data) | router('task_d', 'complete')
    else:
        # all done
        return data

Let’s look past the nested conditionals I used to keep the example compact and see what else is wrong with this function: My business logic – what tasks get triggered when – is tightly coupled to a third-party implementation: celery.

@app.task, .delay(), and chaining calls with a pipe are all celery-specific. This doesn’t seem too bad now, but this logic is likely to grow more complex, make the coupling even tighter, cluttering the logic, and making it even harder to test. And what happens when we outgrow our celery implementation and want to move to something like Amazon Simple Workflow Service?

Instead, since I approached this code with a desire to doctest, it ended up looking more like this:

class Router:
    """Route tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.

    Init with a task runner: a callable that accepts the name of a
    task, some data, and a callback (which will be this router's
    route method). The runner should call the callback with a
    'complete' message and result data for the completed task.

    Example Usage:

    >>> def fake_runner(task, data, callback):
    ...     print('Running %s with %s' % (task, repr(data)))
    ...     callback('%s results' % task, task, 'complete')
    ...
    >>> router = Router(fake_runner)
    >>> router.route('task_a', 'data', 'start')
    Running task_a with 'data'
    Running task_b with 'task_a results'
    Running task_c with 'task_a results'
    Running task_d with 'task_b results'

    """
    def __init__(self, runner):
        self.runner = runner

    def route(self, task, data, message):
        if task == 'task_a':
            if message == 'start':
                self.runner('task_a', 'data', callback=self.route)
            if message == 'complete':
                self.runner('task_b', 'data', callback=self.route)
                self.runner('task_c', 'data', callback=self.route)
        elif task == 'task_b':
            if message == 'complete':
                self.runner('task_d', 'data', callback=self.route)
        else:
            # all done
            return data  

To make it doctestable, I introduced a seam between my business logic and celery: a task runner (I’ll leave the celery runner implementation to your imagination). And that seam was simple enough that I could include a fake implementation right in the doctest without hurting its readability. In fact, it improves the communication by documenting how to implement the seam’s interface.

So the documentation is better, but is the code better?

My celery usage (the mechanics of running background tasks) and my business logic (what tasks run when) are now decoupled. Since they need to change for different reasons, my code now follows the Single Responsibility Principle. That’s a good sign that this is a better design. I can expand the logic without celery details increasing the complexity, and I can move to a new third-party task runner by writing a new implementation of the runner interface without touching my business logic at all.

Notice my router no longer depends on celery. In fact, I no longer need to import anything. Instead, it depends on an interface (the runner). So it’s also following the Dependency Inversion Principle. As a side effect, I can now unit test this by injecting a mock runner and making assertions on its calls. These are also good signs that it’s a better design.

But! You may be asking, aren’t these the same benefits you get from normal unit testing?

Yes, but there is one big additional constraint with doctests that you don’t have in unit tests: You don’t want to use a mocking library. It would reduce the effectiveness of the doctest by cluttering it with mock stuff, which reduces its focus and ability to communicate. If I had a mocking library available, I may have decided to just patch celery and tasks. Instead, I was forced to construct a seam with an interface that was simple enough to fake right in the documentation for my object.

I love the ability to mock. But it’s a design tool, and reducing your need for mocks is usually an indicator of good design. So get into the habit of writing doctests as you write your code. You will be happy with where it leads you.

tdubs: better test doubles for python

A couple things have been bothering me about python’s unittest.mock:

Problem 1: Stubs aren’t Mocks

Here’s a function (that is stupid and dumb because this is an example):

def get_next_page(repo, current_page):
    return repo.get_page(current_page + 1)

If I want to test this with unittest.mock, it would look like this:

def test_it_gets_next_page_from_repo(self):
    repo = Mock()
    next_page = get_next_page(repo, current_page=1)
    self.assertEqual(next_page, repo.get_page.return_value)
    repo.get_page.assert_called_with(2)

What bothers me is that I’m forced to use a mock when what I really want is a stub. What’s the difference? A stub is a test double that provides canned responses to calls. A mock is a test double that can verify what calls are made.

Look at the implementation of get_next_page. To test this, all I really need is a canned response to repo.get_page(2). But with unittest.mock, I can only give a canned response for any call to repo.get_page. That’s why I need the last line of my test to verify that I called the method with a 2. It’s that last line that bothers me.

If I’m writing tests that explicitly assert that specific calls were made, I prefer those to be verifying commands, not queries. For example, imagine I have some code that looks like this:

# ...
article.publish()
# ...

with tests like this:

def test_it_publishes_the_article(self):
    article.publish.assert_called_once_with()

Now the assertion in my test feels right. I’m telling the article to publish, so my test verifies that I sent the publish message to the article. My tests are verifying that I sent a command, I triggered some behavior that’s implemented elsewhere. Feels good. But wait…

Problem 2: Public API conflicts

Here’s the other problem. Imagine I had a typo in my test:

def test_it_publishes_the_article(self):
    article.publish.assertt_called_once_with()

Notice the extra “t” in “assert”? I hope so, because this test will pass even if article.publish is never called. Because every method called on a unittest.mock.Mock instance returns another Mock instance.

The problem here is that python’s mocks have their own public api, but they are supposed to be stand-ins for other objects that themselves have a public api. This causes conflicts. Have you ever tried to mock an object that has a name attribute? Then you’ve felt this pain (passing name as a Mock kwarg doesn’t stub a name attribute like you think it would, instead if names the mock).

Doesn’t autospec fix this problem?

autospec is an annoying bandage over this problem. It doesn’t fit into my normal TDD flow where I use the tests to tease out a collaborator’s public API before actually writing it.

Solution: tdubs

I decided to write my own test double library to fix these problems, and I am very happy with the result. I called it tdubs. See the README for installation and usage instructions. In this post I’m only going to explain the parts that solve the problems I described above.

In tdubs, stubs and mocks are explicit. If you want to give canned responses to queries, use a Stub. If you want to verify commands, use a Mock. (you want to do both? rethink your design [though it’s technically possible with a Mock])

A Stub can provide responses that are specific to the arguments passed in. This lets you create true stubs. In the example above, using tdubs I could have stubbed my repo like this:

repo = Stub('repo')
calling(repo.get_page).passing(2).returns(next_page)

and I would not need to verify my call to repo.get_page, because I would only get my expected next page object if I pass 2 to the method.

With tdubs, there’s no chance of false positives due to typos or API conflicts, because tdubs doubles have no public attributes. For example, you don’t verify commands directly on a tdubs Mock, you use a verification object:

verify(my_mock).called_with(123)

After hammering out the initial implementation to solve these specific problems, I ended up really liking the way my tests read and the type of TDD flow that tdubs enabled. I’ve been using it for my own projects since then and I think it’s ready to be used by others. So if you’re interested, visit the the readme and try it out. I’d love some feedback.

Python’s patch decorator is a code smell

I’m a big fan of using mocks as a testing/design tool. But if I find myself reaching for patch instead of Mock in python, I usually stop and rethink my design.

I consider the use of patch in tests to be a code smell. It means the test code is not using my internal API. It’s reaching in to the private implementation details of my object.

For example, I recently needed a helper function for creating users on a third-party service with a set of default values. I could have written it like this:

from services import UserService

from settings import SERVICE_CONF


def create_user_with_defaults(**attributes):
  defaults = { "name": "test" }
  defaults.update(attributes)

  service = UserService(**SERVICE_CONF)
  return service.create_user(**defaults)

This would get the job done. And because this is python, I can test it without hitting real services using @patch:

@patch("users.helpers.UserService")
def test_creates_user_with_defaults_on_user_service(self, MockUserService):
  user_service = MockUserService.return_value
  
  # execution:
  user = create_user_with_defaults()
  
  # verification:
  user_service.create_user.assert_called_once_with(name="test")
  self.assertEqual(user, user_service.create_user.return_value)

But look at the verification step: there is nothing in the execution step about user_service, yet that’s what I’m asserting against. My tests have knowledge about private implementation details of the thing they’re testing. That’s bad news.

I prefer my tests to be normal consumers of my internal APIs. This forces me to keep my APIs easy to use and flexible. @patch lets me get around issues like tight coupling by hijacking my hard-coded dependencies.

Here is how I actually implemented the helper function:

def create_user_with_defaults(service, **attributes):
  defaults = { "name": "test" }
  defaults.update(attributes)
  return service.create_user(**defaults)

I didn’t even need to import anything! This is how I would test it:

def test_creates_user_with_defaults_on_user_service(self):
  user_service = Mock()
  
  # execution:
  user = create_user_with_defaults(user_service)
  
  # verification:
  user_service.create_user.assert_called_once_with(name="test")
  self.assertEqual(user, user_service.create_user.return_value)

Now compare the verification to the execution. Instead of patching the internal workings of the module, I’m explicitly passing in a mock object. I can do this because the function no longer depends on the concrete implementation of the user service, it depends on an abstraction*: some object that must be passed in that conforms to a certain interface. So it makes sense that my test verifies the interaction with that interface.

This means my test is now a normal consumer of my function, and my desire to avoid patch led me to a design that is more flexible. This became clear as soon as I wanted to create some test users in the repl. I happily created an instance of the UserService that uses the settings for our sandbox, and passed that in to my function.

*See The Dependency Inversion Principle (the D from SOLID).