TDD’s Third Step

In TDD, there are three steps to making a code change:

  1. Write a test.
  2. Make it pass.
  3. Refactor.

I sometimes find myself skipping the third step. Not because I don’t like clean code, but because I combined it with making the test pass. To help explain why I don’t like doing that, and how I avoid it, here’s a more elaborate explanation of the steps as I try to practice them:

  1. Write a test first. Watch it fail.
  2. Fix only that failure the simplest way you can. Repeat until the test passes.
  3. Clean up the mess made in step 2.

The “Make it pass” step is not “write the fully-designed behavior I’m trying to drive with this test”. I want to get rid of that red failure message quickly so I can work in the green where it’s safe. I don’t care (too much) about DRY or SOLID or any of that yet. I’ll worry about that in step three when I have a passing test to act as a safety net. Right now I’m in the danger zone. I need to get out by any means necessary.

Is it failing because the get_price function I’m testing doesn’t exist yet? I write the function definition and leave it empty. Is it failing because the return value doesn’t equal 9.99? I return a literal 9.99 from the function. I’m green! I can move to step three and clean up my mess now that my safety net is in place.

Kent Beck calls this step “remove duplication”. I have a magic value of 9.99 in both my test code and my implementation. With TDD, I want my tests to become more specific as my implementation becomes more general, so I’ll remove the duplication by changing my implementation from the hard-coded solution to a general abstract solution. Maybe I can get 9.99 by returning a variable, an attribute on an object, or maybe the refactoring needed is bigger than that, or maybe I can’t see what refactoring is needed yet.

In the case of a refactoring that I can’t see, I might decide to live with the duplication for now, and write another test to drive it forward, asserting some other input’s price is 5.99. I’ll repeat this until a refactoring becomes clear.

If I can see the refactoring, but it will require changing more than a couple lines, I don’t try to do it all at once. Instead, I work backwards, starting with a call to a final implementation that I wish I had. Maybe returning product.price. This puts me back into the red because that doesn’t exist yet. Now I drop back into a TDD loop (the process is recursive): I have a failing test, so fix it by any means necessary. Maybe I just move that magic value to the price attribute. Now clean that up. Repeat until I’m satisfied with the implementation.

In all cases, I’m never taking too large of a step. Ideally, I’m one or two undos away from a working state. J. B. Rainsberger calls theses micro steps. I can’t always do this, but it’s the goal I strive for because it’s when I’m most comfortable. I’m terrible at keeping more than one thing in my head at a time. I hate when an interruption derails an hour of work because there was too much up there and it all fell out. When every change I’m making feels boring, I’m happy. That doesn’t mean I never step back and consider the big picture – you’re not doing your overall design any favors by only focusing on the micro scale – it means I don’t have to do that while I’m also trying to get code working. Remembering that refactoring is its own separate step is the best way I’ve found to enable that.

The refactoring step isn’t important to me simply for the sake of clean code, but because its existence means I don’t have to do too much at once. I don’t have to worry about clean code while I’m trying to get something working, and I don’t have to worry about getting something working while I’m trying to clean the code.

To DRY or not to DRY

Suppose you’re writing some code for a zoo, and one of your business rules requires you to write something like this:

if animal == 'dog' or animal == 'wolf':
    canine = True

You do some stuff with animal and then later end up writing this:

if another_animal == 'dog' or another_animal == 'wolf':
    canine = True

There’s some duplication here. You learned the “Don’t Repeat Yourself” (DRY) principle so you refactor that code to use a is_canine function. But what triggered this instinct to DRY things up? Well, you noticed some duplication. What was duplicated? Well, the code I guess?

Now suppose you’re working on the zoo’s web app and you have a request handler for creating animal records. It accepts a request, validates the input, creates an Animal instance, saves it, and responds with a 201 (if created) or 400 (if invalid).

Later you add another request handler for creating tickets. It accepts a request, validates the input, creates a Ticket instance, saves it, and responds with a 201 (if created) or 400 (if invalid). The code for this looks almost identical to the animal request handler. This triggers your DRY instinct again, so you make a create_resource function that accepts request data and the resource class

Now it’s so easy to add new handlers that create other types of resources! Until a new requirement comes in: you must check the schedule before creating a ticket so you don’t overbook the zoo. You decide to accept a pre_create callback in create_resource to handle this.

The next requirement is that you must create a FeedingSchedule for each Animal that is created. No problem, a post_create callback can handle that.

But then that has to communicate with the kitchen’s API, which is slow, requiring the animal request to run asynchronously. Maybe an async=True flag? Oh, and tickets shouldn’t actually return a 400 status if the input is bad, they should create an in-progress ticket, and…

Your create_resource function now has a pile of parameters, flags, and callbacks. It’s hard to use and even harder to change. You look back to figure out how this happened: You DRYed up some duplicated code, but then the requirements kept changing and your beautiful and simple abstraction got gross.

How can you avoid this? The key is to know when to DRY your code. It’s not about duplicated code, it’s about duplicated knowledge. For example, when you wrote is_canine, you created one source of truth for the knowledge of what makes something a canine, which seems reasonable. But when you created create_resource, you unintentionally created one source of truth for the knowledge of how every resource gets created, and not all resources are created the same way.

You will have much greater success with the DRY principle if you stop thinking in terms of reuse, and start thinking in terms of change. Abstractions couple logic, and you should only couple things that change for the same reason. The logic for creating an animal and creating a ticket changed for different reasons, and those changes were painful because their logic was coupled.

Remember, when deciding if you should apply DRY, it’s not about saving future keystrokes, it’s about keeping one source of truth. It’s not about reuse, it’s about change (like so much in programming).

It’s not about avoiding duplicated code, it’s about avoiding duplicated knowledge.

Mocks and Dependency Injection

This is the third and final part of a series about mocking and TDD. In part 2, I created a github boundary object. To use it, I relied on dependency injection that looked like this: User.get('username', github)

This isn’t great. Every time I need a user I would have to take two steps: create or import the github boundary, and then get the user. That’s not only annoying, but could result in Github() calls being sprinkled throughout my code, making it very hard to change the way I initialize it (I can easily picture needing to pass in some config options down the road).

One way to fix this would be to make the github parameter optional. It would look something like this:

# in User model:
def get(cls, username, github=None):
    github = github or Github()
    user_data = github.get_user(username)
    return cls(**user_data)

That would let me instantiate users wherever I want using just the username, and still let me inject a mocked github boundary in my tests. But it would wreak of code that only exists to make something easier to test. That’s a smell (a signal that a design may be bad), that is very common when testing with mocks. Your tests are more effective when they exercise code the same way it’s used in production, without special hooks or hacks to poke around inside.

Making the parameter optional also risks accidental integration. Have you ever found that your unit test suite fails when the network was down, even though you thought you were being so careful? Allowing implicit communication with boundaries can lead to pain like that.

Whenever I reach the point in my test where I want to inject a mocked boundary, I always leave it as a required parameter when implementing the production code. I do this because boundaries are volatile and out of my control, so I want all interaction with them to be explicit.

Since I want the github parameter to be required, but I don’t want to pass it in every time I need a user, I decide that I need a new thing that knows about my boundary and can pass it in for me, and when I need a user from github, I can call that thing. One way to do that is with a full-blown inversion of control (IoC) container. It’s an object that knows how to build your objects with all their dependencies/wiring. Those have their own set of downsides and I try to avoid getting to the point where I need something that heavy.

Instead, I’ll add a new class method right on the User model that looks like this: User.from_github('username'). Then I can get users with a single call, and my interaction with github will still be explicit. There’s no risk of accidental integration: I’ll either be passing in a boundary, or calling a method that mentions it.

How do I implement this? First, I realize that I’m in the “refactor” phase of my red/green/refactor TDD cycle. Since I’m refactoring, I want to change the structure of my code, without changing its behavior. So my goal is to clean this up without breaking (and ideally without changing) any of my tests that exercise that behavior. My view currently looks like this:

# somewhere in my view:
github = Github()
user = User.get(['username'], github)

It’s instantiating the github boundary and then using it to get a user matching a username. This is pretty much the exact behavior I want in my from_github method, so my plan is to do an extract method refactoring.

I start by literally copying the lines to a class method on my User model:

# in my User model:
def from_github(cls):
    github = Github()
    user = User.get(['username'], github)

I can’t rely on a global request object, so I change it to a parameter:

# in my User model:
def from_github(cls, username):
    github = Github()
    user = User.get(username, github)

and update my view:

# somewhere in my view:
user = User.from_github(['username'])

Still green! Successful refactoring.

While in my user model I notice something:

# in User:
def get(cls, username, github):
    user_data = github.get_user(username)
    return cls(**user_data)

Now that I have a method that explicitly mentions github, it feels weird that this method – which is not specific to github – has a parameter called github. I change it to be generic:

# in User:
def get(cls, username, repo):
    user_data = repo.get_user(username)
    return cls(**user_data)

Much better. And now the door is open to getting users from other APIs. For example, you can probably imagine what a from_twitter method might look like.

Notice that I never explicitly wrote a test for User.from_github. There are a few reasons: In my refactoring step, I rarely write new tests, since I don’t want to change behavior. And this method is actually another boundary, which I don’t unit test, and is already covered by the system test that hits my view.

In the end, I now have several ways to create users:

  • Instantiate with data from anywhere: User(**attributes)
  • Instantiate with data from a boundary that conforms to my expected interface: User.get(username, boundary)
  • Instantiate with data from a specific, named boundary: User.from_github(username)

I’m happy with this design. These are all clear, well defined factory methods (methods for creating objects) each with a specific purpose. And it turns out this is the pattern I usually end up with when dealing with models and boundaries. More explicitly, the pattern looks like this:

  • Ignore the boundary at first and write an init method that accepts pure data.
  • Drill down until I need to test that data is coming from a boundary, design the boundary using a mock, and inject it via a required parameter on a new method (see part 2).
  • Clean things up by adding a new method that can handle the boundary wiring for me.

I’ve found that using mocks, dependency injection, and factory methods in this way has made my code easier to maintain. The methods are small, all interaction is clear, and refactoring is safe and fun.

Mocks and External Dependencies

This is Part 2 in a series about mocking. In part 1, I said it’s best to use mocks as a design tool, and not as a convenience tool for tests that touch external dependencies. But what does it look like when you do have an external dependency, like a third-party library? Do you wrap it in your own code and mock the wrapper? I don’t think so. That puts the emphasis on the dependency, and I want dependencies to be details. Instead, I think in terms of boundaries. I let my tests help me decide where those boundaries are, and then by mocking them, figure out what they should look like. Then I may implement that boundary using an external dependency, which I do not mock in the tests. I’ll show what I mean with an example.

My imaginary example app will tell you if a particular github user is famous or not. My business logic determines famousness based on the number of followers. If they have 100 or more, they are famous.

Now pretend that I’ve drilled down to the point where I need a User model with a is_famous() method.

I don’t start by looking for (or writing) a github API library. I don’t like to interact with an API in my code until I’m absolutely forced to. I do take a look at the API to get an idea of where I’m headed before starting, but when I do, I’m careful not to let that influence my design in a way that would couple it tightly to the external API.

So I have some idea of what the github data looks like, but since I don’t want to hit the API yet, I start by assuming I can init my user with data from anywhere. This lets me test my logic without worrying about anything external. My tests look something like this:

def test_user_is_famous_if_more_than_100_followers(self):
    user = User(followers=101)

def test_user_is_famous_if_100_followers(self):
    user = User(followers=100)

def test_user_is_not_famous_if_less_than_100_followers(self):
    user = User(followers=99)

It’s easy enough to make these pass, and since I’m not concerned with the github api yet, the tests are very easy to read and understand. No mocking noise!

But in the real world, I won’t be hard-coding the data passed to User‘s constructor. So I want a new method that can initialize a user with data from the service where that data lives. I take a moment to think about what that might look like:

# in my imagination (or maybe a scratch buffer...)
def get(cls, username):
    # get user_data from github...
    return cls(**user_data)

I’m happy with that. So in a real file, I start with a test:

def test_gets_user_data_from_github(self):
    user = User.get('blaix')
    self.assertEqual(, 420)
    self.assertEqual(user.followers, 69)

How do I make this pass? Time to start looking for a github client library? Not yet. I can defer that decision a bit longer. For now, I only want to do the simplest thing that makes the test pass, so I cheat:

# in User:
def get(cls, username):
    return cls(id=420, followers=69)

I haven’t shown it, but I have a view that is initializing a user, and a system test that exercises that view. So I update my view to call this new method, and check that my system test still passes.

I’m all green. But not every github user has an id and follower count this cool and nice. I need my code to handle the general case. To make my code more general, I need to make my tests more specific. So I need my tests to explicitly verify that I’m getting the data from github. How do I do that?

First, I recognize that I’ve finally reached a boundary: my app code needs data from the outside world – in this case, the github API. At boundaries like this, I want an explicit object (a function, method, instance, or class) with a single purpose: handle that external communication.

I keep my boundaries in explicit objects to protect me from things that are volatile. The github API, or even the library I’d use to access the API, could change for reasons completely independent from my business logic. When I keep my interaction with it isolated, I can respond to those changes safer and faster, since the interaction won’t be scattered around and mixed with my app code. As a side-effect, it also provides a nice injection point to stick a test double that will help me move forward here, as well as protect my unit tests from unreliable and non-deterministic network calls.

Now back to that test: how do I assert that those numbers came from github? Since I’ve decided to use a boundary object, I can verify that my new method is using the boundary object to get the data. How do I verify that something I’m testing is interacting with another object correctly? This is exactly the right job for a mock.

This is a pattern I’ve been recognizing in my code lately. At my boundaries, I want the ability to inject a boundary object, and I want my tests to verify the interactions by injecting a test double to stand in for that boundary.

Since I’m using a test double as a stand-in for my boundary object, I get to design it from the point of view of the caller without worrying about the details of the implementation. So I decide that the cleanest way to get a user from my boundary object is to call a get_user method. Here’s my updated test:

def test_gets_user_data_from_github(self):
    github = Stub('github')
        'id': 420,
        'followers': 69,

        'id': 421,
        'followers': 70,
    user = User.get('blaix', github)
    self.assertEqual(, 420)
    self.assertEqual(user.followers, 69)

    user = User.get('notblaix', github)
    self.assertEqual(, 421)
    self.assertEqual(user.followers, 70)

Note: I’m using tdubs which does not have explicit Mock objects, but the way I’m using Stub + calling here provides the same functionality: verify that I’m calling the collaborator correctly (since I’d only get the expected values when calling the method with those parameters).

Notice I had to add a new parameter to inject the github object. That’s  yucky, but I don’t want to split my thinking yet. So I make a note to refactor this when I’m green again. First I’ll make this test pass by writing :

# in User:
def get(cls, username, github):
    user_data = github.get_user(username)
    return cls(**user_data)

That makes the unit test pass, but now my system test is failing because I’m not passing the github parameter. So I update my view:

# somewhere in my view:
github = Github() # I know this doesn't exist yet, it's fine.
user = User.get(['username'], github)

It’s still failing, but for a different reason. Progress! Now it’s failing because Github doesn’t exist. So I create that class but leave it empty. I like to wait for my test failures to move me forward. Now it’s failing because get_user doesn’t exist, so I create that too, leaving it empty as well. Finally I get a failure that isn’t about basic scaffolding: I’m returning None and my code expects a dictionary. That’s going to require adding real logic, and I don’t want to do that without an explicit test for that logic, so for now, I silence this failure by returning a faked dictionary.

Time to force my really for real github logic. As usual, I want to start with a test. Does that mean a unit test? Well, imagine what I’d need to do to unit test Github.get_user (meaning: test it without interacting with the outside world). I’d end up mocking third-party or even standard libraries. I don’t control those interfaces, so I wouldn’t get the full benefit of mocks, but I’d still get all the costs. So to optimize my rewards, I decide to fully integration test this boundary method. I expect to hit the real github API, and assert against my real user id. I’m only asserting against my id and not my follower count because the latter is likely to change, and I’m confident enough in my system tests that my bases are covered there.

def test_get_user_from_github(self):
    user = Github().get_user('blaix')
    self.assertEqual(user['id'], 664)

This is another pattern in my code: I always integration test my boundary objects. By doing this, I get two benefits: the tests are simple and provide high confidence, and since I don’t want to write a lot of tests like this, there is pressure to keep minimal logic in my boundary objects, which makes them easier to understand and maintain – something that’s very important for code that bumps up against things that are unreliable and could change without your control.

Time to make it pass by filling my empty method with real guts:

import requests

class Github(object):
    def get_user(self, username):
        url ='{}'.format(username)
        return requests.get(url).json()

I decided I didn’t need a full github client library. The simplest way to make my tests pass was to use the requests package (a ubiquitous package in python land for making HTTP requests).

But even though I’m using a third-party package, at no point did I need to patch an import. I’m not wrapping a third-party package to have something to mock in my tests, I used a mock to design an interface and then implemented that interface with a third-party package. I’m now free to swap out that package if I need to as the requirements grow more complex, and as long as I keep returning github user data from Github.get_user, I won’t have to change any of my other production code, or any of my tests. Imagine that: a complete refactoring of the internals of a class, with a test suite that acts only as a safety net and not handcuffs. Tests (with mocks!) that make refactoring third-party integrations easier, not harder. It’s possible when you follow these guidelines:

  • Work from the outside in. I started with a system test (not shown in the article), and that provided the safety net to start and keep the ball rolling. Then I worked my way in, one layer at a time, designing the code I wanted to have at the next layer down as I wrote my tests.
  • Defer decisions on third-party integrations as long as possible. It would have been tempting to start by using a third-party github library right in my view, but instead, I worked in layers, drilling down until I absolutely needed a single object with the sole purpose of communicating with github.
  • Prefer injectable boundary objects. When I reached the point where I wanted a test to assert that certain data came from github, I did that by injecting a test double, and this made it very easy to design the API of an explicit object to communicate with github.
  • Only integration test boundary objects. When I reached my boundary object, it was something that needed to communicate with the outside world. I could have tested it in isolation by mocking a third-party dependency, but that would leave me tightly coupled to an API I don’t have control over. So I fully integration test it, which puts pressure on me to keep my boundary object thin and free of logic, which is a good design for an object that interacts with volatile things like third-party dependencies and external HTTP APIs.

But wait! Remember this?

user = User.get('blaix', github)

This is gross. Passing an instance of my boundary object every time I need a user is going to be annoying. I punted on that earlier, but now that I’ve implemented everything and my tests are green, I’m free to refactor. This will require some discussion about mocks and dependency injection, and will be the subject of part 3 in this series.

Mocks as a Design Tool

Many people see mocks as a necessary evil to isolate their test code from third party dependencies and the outside world (the database, network, filesystem, etc). But in the paper “Mock Roles, Not Objects“, some of the first people to describe mocks describe them as a tool used in TDD to discover good interactions between your objects (i.e. design good types). They are much more powerful, and their costs are more reasonable, when they are used as a design tool,  and not just a convenience tool for isolating your tests.

Note: “Mock” is a loaded word often used to describe any type of test double, but this article will be speaking about mocks in the strict sense. If you don’t know what that means, first read The Little Mocker by Uncle Bob. It’s the best explanation I’ve seen of the different types of test doubles. Further note: all of this will also apply to some implementations of spies.

To understand how mock objects can be used as a design tool, it helps to to think about object-oriented programming as being all about messaging. In OOP, we don’t just have procedures that we can call, we have objects that we can ask questions or give commands to. Those questions and commands are messages that we send to the object. When you write, try thinking of it as telling the my_model object to save itself.

So if OOP is about messages, what are mock objects used for? Verifying messages! You should use a mock when you are testing something that interacts with another object, and you want to verify that you have told that other object to do something – i.e. assert that you sent it a particular message. And when you are writing your test first, you literally get to make up what that message looks like.

This is how mocks are used as a design tool in TDD. You work outside-in: start at a high level, and delegate details to lower levels. Mock those lower levels because right now you only care about telling them what to do. You’ll worry about how they do it later, when you’re ready to test that level.

In other words, you design your messages from the perspective of the message sender, the perspective that cares most about what you want that object to do, and least about how that object does it. This leads to messages that are simple and communicate well. And that leads to an object API that is simple and communicates well.

When done right, it feels like cheating. Your high-level tests almost feel like they aren’t testing anything. That’s good. These high level tests aren’t about verifying algorithms or reducing bugs. They are about designing your messages. It’s part of a TDD process to design code that is easy to understand and maintain. This high level code is easy to test because it’s easy to understand. It also helps lead to low-level code that is easy to test and understand because you’ve shaken out all the object collaboration in the higher levels, leaving simple procedures that can be tested without mocks.

But you only get these design benefits if you own the API of the object you’re mocking. You may have heard that you should not mock what you don’t own. Some libraries even strictly enforce this rule. But what does that mean? Why is it important?

When you “mock something you don’t own”, like a third-party dependency or something in stdlib, you can’t let your tests help you decide what the messages should be, because those choices have already been made. So if you only use mocks in this way, you are only getting what should be a side-effect of mocking, with none of the design benefits. And that leads to pain, because mocks have high costs. They give you plenty of rope to hang yourself with: increased coupling between test and implementation, potential for “false positives”, and increased setup costs. Many people don’t like mocks for these reasons, and if you aren’t using them primarily to design messages, I agree, they aren’t worth it.

So how do you mitigate those costs? What exactly should you do when you have an external dependency? What does this all look like in practice? I’m still writing about those topics and more, and planning to release it as a series about mocking and TDD. Part 2 covers mocks and external dependencies. If you’d like to be emailed when it is complete, subscribe to my newsletter. In the meantime, try using mocks to design the interactions between your objects. Used in this way, they can become a powerful part of your TDD tool belt.

What am I missing about Progressive Enhancement?

I remember when I first read about progressive enhancement. It was back when you couldn’t depend on everyone having javascript. So the case for building a functional site without javascript first, and then enhancing the experience for those with javascript made sense.

But now javascript is ubiquitous. Even screenreaders have it. That argument holds much less weight. Is there still a good case for progressive enhancement?

I suspect there is, because “we don’t need progressive enhancement because everyone has javascript” feels a lot like “we don’t need TDD because there are better ways to catch bugs”, an argument that I disagree with the very foundation of.

I remember when I first read about TDD. I was exposed to it as a way to prevent bugs. The things I were building were small, and I hadn’t experienced many maintenance headaches yet. So the case for writing some tests to make sure your “code was working” made sense.

But then it became cumbersome, and I still had bugs. The time it took to write and maintain the tests didn’t seem worth the effort. Is there still a good case for TDD?

For me, yes, there is. I soon learned to use TDD as a design tool first, and a safety net second. This made the effort worth it again. Is there a similar benefit to Progressive Enhancement? Something that provides value besides “it works for people without javascript” that makes it still worth the effort?

I think there’s something there that I’m missing. So I’m asking you. If you have thoughts, please comment.

A decoupling conversation

A while ago I was asked this question when discussing our project’s architecture. I want to share my answer publicly, because it’s a subject that I encounter often.

I assume the idea is to decouple as much as possible, but understanding that as soon as you pick a piece of technology, you are coupled to it.

With any decoupling there are trade-offs. With any trade-off you have to decide if the benefit is worth the cost. In some cases it may be worth it to isolate the coupling. For example, we choose a third party library MegaPayments2 to handle payments. We are essentially coupled to this technology choice. But it wouldn’t cost us much to isolate that dependency. Instead of sprinkling calls to MegaPayments2 all over our project, we can create a wrapper class, Payments, which delegates to MegaPayments2 internally. Now Payments is sprinkled throughout our system but the cost of change is now much lower. Maybe MegaPayments2 becomes obsolete and we want to change to UltraPayments3. We only have to do that in one place.

You may hear decoupling discussed as something that’s always good no matter what. But consider the trade-offs. When deciding if you should decouple, ask yourself these questions:

  • How likely is it that this will change?
  • How easy would it be to introduce a boundary to keep this isolated?
  • Do we get any other benefits by introducing a boundary that we own? (e.g. a nicer API?)

With the above questions in mind, you can then ask yourself:

Should we decouple from python? No.

Should we decouple from our specific version of python? It depends…

Should we decouple from this third-party library? Probably yes.

Decoupling is a tool, like any other. Use it wisely.


Some languages let you use inline documentation to write example code that can be used as unit tests. In Python, these are called doctests. They look something like this:

def adder(a, b):
    """Adds two numbers for example purposes.

    >>> adder(1, 2)

    >>> adder(5, 2)

    return a + b

I’m becoming a big fan of this feature. Because I’ve noticed that the ability to effectively doctest something is usually an indicator of good design.

What is an “effective doctest”? I mean a doctest that:

  • Is easy to understand
  • Is focused: doesn’t require a lot of setup
  • Is safe: no side effects
  • Communicates: It’s documentation first and a test second

These are also things you can say about code that is well designed: it’s easy to understand, focused, safe, and communicates intent.

A black-box, purely functional object meets all of these criteria. You pass some data in, you get some data out. Passing the same data in always gives you the same data out. This is the perfect candidate for a doctest, so let your desire to doctest force you to write more functions like this.

But what about situations where you must have side effects?

Recently I needed an object to route background tasks. For example, when background task A was finished, it should fire off task B and C in parallel, and when B was finished, it should fire off D. Upon task completion, the task router should be triggered again with a message saying the task was completed so we can fire off the next task(s).

We were going to do this in python using celery. An implementation could have looked like this:

from myproj.celery import app, tasks

def router(data, task, message):
    """Route background tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.
    Start the chain by calling:

        router('data', 'task_a', 'start')

    if task == 'task_a':
        if message == 'start':
            tasks.task_a.delay(data) | router('task_a', 'complete')
        if message == 'complete':
            tasks.task_b.delay(data) | router('task_b', 'complete')
            tasks.task_c.delay(data) | router('task_c', 'complete')
    elif task == 'task_b':
        if message == 'complete':
            tasks.task_d.delay(data) | router('task_d', 'complete')
        # all done
        return data

Let’s look past the nested conditionals I used to keep the example compact and see what else is wrong with this function: My business logic – what tasks get triggered when – is tightly coupled to a third-party implementation: celery.

@app.task, .delay(), and chaining calls with a pipe are all celery-specific. This doesn’t seem too bad now, but this logic is likely to grow more complex, make the coupling even tighter, cluttering the logic, and making it even harder to test. And what happens when we outgrow our celery implementation and want to move to something like Amazon Simple Workflow Service?

Instead, since I approached this code with a desire to doctest, it ended up looking more like this:

class Router:
    """Route tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.

    Init with a task runner: a callable that accepts the name of a
    task, some data, and a callback (which will be this router's
    route method). The runner should call the callback with a
    'complete' message and result data for the completed task.

    Example Usage:

    >>> def fake_runner(task, data, callback):
    ...     print('Running %s with %s' % (task, repr(data)))
    ...     callback('%s results' % task, task, 'complete')
    >>> router = Router(fake_runner)
    >>> router.route('task_a', 'data', 'start')
    Running task_a with 'data'
    Running task_b with 'task_a results'
    Running task_c with 'task_a results'
    Running task_d with 'task_b results'

    def __init__(self, runner):
        self.runner = runner

    def route(self, task, data, message):
        if task == 'task_a':
            if message == 'start':
                self.runner('task_a', 'data', callback=self.route)
            if message == 'complete':
                self.runner('task_b', 'data', callback=self.route)
                self.runner('task_c', 'data', callback=self.route)
        elif task == 'task_b':
            if message == 'complete':
                self.runner('task_d', 'data', callback=self.route)
            # all done
            return data  

To make it doctestable, I introduced a seam between my business logic and celery: a task runner (I’ll leave the celery runner implementation to your imagination). And that seam was simple enough that I could include a fake implementation right in the doctest without hurting its readability. In fact, it improves the communication by documenting how to implement the seam’s interface.

So the documentation is better, but is the code better?

My celery usage (the mechanics of running background tasks) and my business logic (what tasks run when) are now decoupled. Since they need to change for different reasons, my code now follows the Single Responsibility Principle. That’s a good sign that this is a better design. I can expand the logic without celery details increasing the complexity, and I can move to a new third-party task runner by writing a new implementation of the runner interface without touching my business logic at all.

Notice my router no longer depends on celery. In fact, I no longer need to import anything. Instead, it depends on an interface (the runner). So it’s also following the Dependency Inversion Principle. As a side effect, I can now unit test this by injecting a mock runner and making assertions on its calls. These are also good signs that it’s a better design.

But! You may be asking, aren’t these the same benefits you get from normal unit testing?

Yes, but there is one big additional constraint with doctests that you don’t have in unit tests: You don’t want to use a mocking library. It would reduce the effectiveness of the doctest by cluttering it with mock stuff, which reduces its focus and ability to communicate. If I had a mocking library available, I may have decided to just patch celery and tasks. Instead, I was forced to construct a seam with an interface that was simple enough to fake right in the documentation for my object.

I love the ability to mock. But it’s a design tool, and reducing your need for mocks is usually an indicator of good design. So get into the habit of writing doctests as you write your code. You will be happy with where it leads you.

My first impression of Elm

My first experience with elm was installing a package, and that was enough to completely blow me away. This is how it went down:

$ elm package install elm-lang/html
To install elm-lang/html I would like to add the following
dependency to elm-package.json:

    "elm-lang/html": "1.0.0 <= v < 2.0.0"

May I add that to elm-package.json for you? (y/n) y

Some new packages are needed. Here is the upgrade plan.

    elm-lang/core 4.0.1
    elm-lang/html 1.0.0
    elm-lang/virtual-dom 1.0.2

Do you approve of this plan? (y/n) y
Downloading elm-lang/core
Downloading elm-lang/html
Downloading elm-lang/virtual-dom
Packages configured successfully!

An installer that asks to write to the dependencies list, using semantic versioning, letting me know exactly what it’s going to do, and asking my approval beforehand. I’ve never experienced a package manager like this. Even the language used was so easy to understand and friendly. I haven’t written a single line of code but I already like this language.

Start decoupled

Suppose you’re writing some code that needs to list a user’s subscriptions. Your first instinct is to add a user.get_subscriptions() method.

But wait. Why is this user‘s responsibility? What if you start with something like this instead: Subscription.list(user_id)

In the original example, you jumped straight to coupling User and Subscription. Wherever you want to list subscriptions, you must have or create a full user object. In the second example, you only need to know the user’s id.

As you continue writing code, if you find yourself with many calls to list subscriptions for a user, and they are always in the context of having a full user object, now it’s time to couple them. But since you already have code to get a list of subscriptions for a user id, it’s a simple refactoring to add a user.get_subscriptions() method that calls Subscription.list( internally, which is probably a lot cleaner than whatever the method would have contained if it was created at the start.

Coupling has drawbacks and benefits. Be mindful of defaulting to coupling. Maybe it would be better to start decoupled and wait until your code makes it obvious that the trade off will be worth it.