Doctestability

Some languages let you use inline documentation to write example code that can be used as unit tests. In Python, these are called doctests. They look something like this:

def adder(a, b):
    """Adds two numbers for example purposes.

    >>> adder(1, 2)
    3

    >>> adder(5, 2)
    7

    """
    return a + b

I’m becoming a big fan of this feature. Because I’ve noticed that the ability to effectively doctest something is usually an indicator of good design.

What is an “effective doctest”? I mean a doctest that:

  • Is easy to understand
  • Is focused: doesn’t require a lot of setup
  • Is safe: no side effects
  • Communicates: It’s documentation first and a test second

These are also things you can say about code that is well designed: it’s easy to understand, focused, safe, and communicates intent.

A black-box, purely functional object meets all of these criteria. You pass some data in, you get some data out. Passing the same data in always gives you the same data out. This is the perfect candidate for a doctest, so let your desire to doctest force you to write more functions like this.

But what about situations where you must have side effects?

Recently I needed an object to route background tasks. For example, when background task A was finished, it should fire off task B and C in parallel, and when B was finished, it should fire off D. Upon task completion, the task router should be triggered again with a message saying the task was completed so we can fire off the next task(s).

We were going to do this in python using celery. An implementation could have looked like this:

from myproj.celery import app, tasks

@app.task
def router(data, task, message):
    """Route background tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.
    Start the chain by calling:

        router('data', 'task_a', 'start')

    """
    if task == 'task_a':
        if message == 'start':
            tasks.task_a.delay(data) | router('task_a', 'complete')
        if message == 'complete':
            tasks.task_b.delay(data) | router('task_b', 'complete')
            tasks.task_c.delay(data) | router('task_c', 'complete')
    elif task == 'task_b':
        if message == 'complete':
            tasks.task_d.delay(data) | router('task_d', 'complete')
    else:
        # all done
        return data

Let’s look past the nested conditionals I used to keep the example compact and see what else is wrong with this function: My business logic – what tasks get triggered when – is tightly coupled to a third-party implementation: celery.

@app.task, .delay(), and chaining calls with a pipe are all celery-specific. This doesn’t seem too bad now, but this logic is likely to grow more complex, make the coupling even tighter, cluttering the logic, and making it even harder to test. And what happens when we outgrow our celery implementation and want to move to something like Amazon Simple Workflow Service?

Instead, since I approached this code with a desire to doctest, it ended up looking more like this:

class Router:
    """Route tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.

    Init with a task runner: a callable that accepts the name of a
    task, some data, and a callback (which will be this router's
    route method). The runner should call the callback with a
    'complete' message and result data for the completed task.

    Example Usage:

    >>> def fake_runner(task, data, callback):
    ...     print('Running %s with %s' % (task, repr(data)))
    ...     callback('%s results' % task, task, 'complete')
    ...
    >>> router = Router(fake_runner)
    >>> router.route('task_a', 'data', 'start')
    Running task_a with 'data'
    Running task_b with 'task_a results'
    Running task_c with 'task_a results'
    Running task_d with 'task_b results'

    """
    def __init__(self, runner):
        self.runner = runner

    def route(self, task, data, message):
        if task == 'task_a':
            if message == 'start':
                self.runner('task_a', 'data', callback=self.route)
            if message == 'complete':
                self.runner('task_b', 'data', callback=self.route)
                self.runner('task_c', 'data', callback=self.route)
        elif task == 'task_b':
            if message == 'complete':
                self.runner('task_d', 'data', callback=self.route)
        else:
            # all done
            return data  

To make it doctestable, I introduced a seam between my business logic and celery: a task runner (I’ll leave the celery runner implementation to your imagination). And that seam was simple enough that I could include a fake implementation right in the doctest without hurting its readability. In fact, it improves the communication by documenting how to implement the seam’s interface.

So the documentation is better, but is the code better?

My celery usage (the mechanics of running background tasks) and my business logic (what tasks run when) are now decoupled. Since they need to change for different reasons, my code now follows the Single Responsibility Principle. That’s a good sign that this is a better design. I can expand the logic without celery details increasing the complexity, and I can move to a new third-party task runner by writing a new implementation of the runner interface without touching my business logic at all.

Notice my router no longer depends on celery. In fact, I no longer need to import anything. Instead, it depends on an interface (the runner). So it’s also following the Dependency Inversion Principle. As a side effect, I can now unit test this by injecting a mock runner and making assertions on its calls. These are also good signs that it’s a better design.

But! You may be asking, aren’t these the same benefits you get from normal unit testing?

Yes, but there is one big additional constraint with doctests that you don’t have in unit tests: You don’t want to use a mocking library. It would reduce the effectiveness of the doctest by cluttering it with mock stuff, which reduces its focus and ability to communicate. If I had a mocking library available, I may have decided to just patch celery and tasks. Instead, I was forced to construct a seam with an interface that was simple enough to fake right in the documentation for my object.

I love the ability to mock. But it’s a design tool, and reducing your need for mocks is usually an indicator of good design. So get into the habit of writing doctests as you write your code. You will be happy with where it leads you.

Leave a Reply (markdown is supported)