A decoupling conversation

A while ago I was asked this question when discussing our project’s architecture. I want to share my answer publicly, because it’s a subject that I encounter often.

I assume the idea is to decouple as much as possible, but understanding that as soon as you pick a piece of technology, you are coupled to it.

With any decoupling there are trade-offs. With any trade-off you have to decide if the benefit is worth the cost. In some cases it may be worth it to isolate the coupling. For example, we choose a third party library MegaPayments2 to handle payments. We are essentially coupled to this technology choice. But it wouldn’t cost us much to isolate that dependency. Instead of sprinkling calls to MegaPayments2 all over our project, we can create a wrapper class, Payments, which delegates to MegaPayments2 internally. Now Payments is sprinkled throughout our system but the cost of change is now much lower. Maybe MegaPayments2 becomes obsolete and we want to change to UltraPayments3. We only have to do that in one place.

You may hear decoupling discussed as something that’s always good no matter what. But consider the trade-offs. When deciding if you should decouple, ask yourself these questions:

  • How likely is it that this will change?
  • How easy would it be to introduce a boundary to keep this isolated?
  • Do we get any other benefits by introducing a boundary that we own? (e.g. a nicer API?)

With the above questions in mind, you can then ask yourself:

Should we decouple from python? No.

Should we decouple from our specific version of python? It depends…

Should we decouple from this third-party library? Probably yes.

Decoupling is a tool, like any other. Use it wisely.

Doctestability

Some languages let you use inline documentation to write example code that can be used as unit tests. In Python, these are called doctests. They look something like this:

def adder(a, b):
    """Adds two numbers for example purposes.

    >>> adder(1, 2)
    3

    >>> adder(5, 2)
    7

    """
    return a + b

I’m becoming a big fan of this feature. Because I’ve noticed that the ability to effectively doctest something is usually an indicator of good design.

What is an “effective doctest”? I mean a doctest that:

  • Is easy to understand
  • Is focused: doesn’t require a lot of setup
  • Is safe: no side effects
  • Communicates: It’s documentation first and a test second

These are also things you can say about code that is well designed: it’s easy to understand, focused, safe, and communicates intent.

A black-box, purely functional object meets all of these criteria. You pass some data in, you get some data out. Passing the same data in always gives you the same data out. This is the perfect candidate for a doctest, so let your desire to doctest force you to write more functions like this.

But what about situations where you must have side effects?

Recently I needed an object to route background tasks. For example, when background task A was finished, it should fire off task B and C in parallel, and when B was finished, it should fire off D. Upon task completion, the task router should be triggered again with a message saying the task was completed so we can fire off the next task(s).

We were going to do this in python using celery. An implementation could have looked like this:

from myproj.celery import app, tasks

@app.task
def router(data, task, message):
    """Route background tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.
    Start the chain by calling:

        router('data', 'task_a', 'start')

    """
    if task == 'task_a':
        if message == 'start':
            tasks.task_a.delay(data) | router('task_a', 'complete')
        if message == 'complete':
            tasks.task_b.delay(data) | router('task_b', 'complete')
            tasks.task_c.delay(data) | router('task_c', 'complete')
    elif task == 'task_b':
        if message == 'complete':
            tasks.task_d.delay(data) | router('task_d', 'complete')
    else:
        # all done
        return data

Let’s look past the nested conditionals I used to keep the example compact and see what else is wrong with this function: My business logic – what tasks get triggered when – is tightly coupled to a third-party implementation: celery.

@app.task, .delay(), and chaining calls with a pipe are all celery-specific. This doesn’t seem too bad now, but this logic is likely to grow more complex, make the coupling even tighter, cluttering the logic, and making it even harder to test. And what happens when we outgrow our celery implementation and want to move to something like Amazon Simple Workflow Service?

Instead, since I approached this code with a desire to doctest, it ended up looking more like this:

class Router:
    """Route tasks.

    When task A is complete, run B and C.
    When task B is complete, run D.

    Init with a task runner: a callable that accepts the name of a
    task, some data, and a callback (which will be this router's
    route method). The runner should call the callback with a
    'complete' message and result data for the completed task.

    Example Usage:

    >>> def fake_runner(task, data, callback):
    ...     print('Running %s with %s' % (task, repr(data)))
    ...     callback('%s results' % task, task, 'complete')
    ...
    >>> router = Router(fake_runner)
    >>> router.route('task_a', 'data', 'start')
    Running task_a with 'data'
    Running task_b with 'task_a results'
    Running task_c with 'task_a results'
    Running task_d with 'task_b results'

    """
    def __init__(self, runner):
        self.runner = runner

    def route(self, task, data, message):
        if task == 'task_a':
            if message == 'start':
                self.runner('task_a', 'data', callback=self.route)
            if message == 'complete':
                self.runner('task_b', 'data', callback=self.route)
                self.runner('task_c', 'data', callback=self.route)
        elif task == 'task_b':
            if message == 'complete':
                self.runner('task_d', 'data', callback=self.route)
        else:
            # all done
            return data  

To make it doctestable, I introduced a seam between my business logic and celery: a task runner (I’ll leave the celery runner implementation to your imagination). And that seam was simple enough that I could include a fake implementation right in the doctest without hurting its readability. In fact, it improves the communication by documenting how to implement the seam’s interface.

So the documentation is better, but is the code better?

My celery usage (the mechanics of running background tasks) and my business logic (what tasks run when) are now decoupled. Since they need to change for different reasons, my code now follows the Single Responsibility Principle. That’s a good sign that this is a better design. I can expand the logic without celery details increasing the complexity, and I can move to a new third-party task runner by writing a new implementation of the runner interface without touching my business logic at all.

Notice my router no longer depends on celery. In fact, I no longer need to import anything. Instead, it depends on an interface (the runner). So it’s also following the Dependency Inversion Principle. As a side effect, I can now unit test this by injecting a mock runner and making assertions on its calls. These are also good signs that it’s a better design.

But! You may be asking, aren’t these the same benefits you get from normal unit testing?

Yes, but there is one big additional constraint with doctests that you don’t have in unit tests: You don’t want to use a mocking library. It would reduce the effectiveness of the doctest by cluttering it with mock stuff, which reduces its focus and ability to communicate. If I had a mocking library available, I may have decided to just patch celery and tasks. Instead, I was forced to construct a seam with an interface that was simple enough to fake right in the documentation for my object.

I love the ability to mock. But it’s a design tool, and reducing your need for mocks is usually an indicator of good design. So get into the habit of writing doctests as you write your code. You will be happy with where it leads you.

My first impression of Elm

My first experience with elm was installing a package, and that was enough to completely blow me away. This is how it went down:

$ elm package install elm-lang/html
To install elm-lang/html I would like to add the following
dependency to elm-package.json:

    "elm-lang/html": "1.0.0 <= v < 2.0.0"

May I add that to elm-package.json for you? (y/n) y

Some new packages are needed. Here is the upgrade plan.

  Install:
    elm-lang/core 4.0.1
    elm-lang/html 1.0.0
    elm-lang/virtual-dom 1.0.2

Do you approve of this plan? (y/n) y
Downloading elm-lang/core
Downloading elm-lang/html
Downloading elm-lang/virtual-dom
Packages configured successfully!

An installer that asks to write to the dependencies list, using semantic versioning, letting me know exactly what it’s going to do, and asking my approval beforehand. I’ve never experienced a package manager like this. Even the language used was so easy to understand and friendly. I haven’t written a single line of code but I already like this language.

Start decoupled

Suppose you’re writing some code that needs to list a user’s subscriptions. Your first instinct is to add a user.get_subscriptions() method.

But wait. Why is this user‘s responsibility? What if you start with something like this instead: Subscription.list(user_id)

In the original example, you jumped straight to coupling User and Subscription. Wherever you want to list subscriptions, you must have or create a full user object. In the second example, you only need to know the user’s id.

As you continue writing code, if you find yourself with many calls to list subscriptions for a user, and they are always in the context of having a full user object, now it’s time to couple them. But since you already have code to get a list of subscriptions for a user id, it’s a simple refactoring to add a user.get_subscriptions() method that calls Subscription.list(self.id) internally, which is probably a lot cleaner than whatever the method would have contained if it was created at the start.

Coupling has drawbacks and benefits. Be mindful of defaulting to coupling. Maybe it would be better to start decoupled and wait until your code makes it obvious that the trade off will be worth it.

Does TDD slow you down?

When you first start out with TDD, development will be much harder and much slower. It will practically grind to a halt. This is because you are learning. I’m not as interested in this part of the discussion. Any time you are learning something new, you will go slower. The more interesting question is, is it worth learning? Does it still slow you down once you become competent?

The truth is, you may never be as fast with TDD as you were without it. That’s a sign that you were going too fast. You weren’t finishing your work. You were writing code to get that specific feature working, and then moving on. You didn’t have to worry if the code you wrote was tightly coupled or had a poor interface, because you only had to call it once and that work is done. You definitely didn’t do much refactoring, because there was no safety net in place to alleviate the fear.

That pace is super fast and very addicting. But it is not sustainable. You can get things built quickly, but eventually maintenance becomes a nightmare, and your progress grinds to a halt. That’s because the same qualities that make code hard to test make it hard to change. Building code is easy, maintaining (i.e. changing) code is the hard part. TDD forces you to start feeling that pain early, so the cost gets spread out over the life of the codebase, instead of pushed back and back until you’re forced to deal with it (technical debt!).

So it’s about trade-offs. If you are working on a quick prototype, don’t write tests. It will slow you down! It’s ok to admit that. But it’s fine, because tests for a prototype won’t provide value. But if you are building something to last, write tests. It may slow down your initial velocity, but it will even out over the long-term life of the project.

Have you been trying to do TDD and it still feels like it’s slowing you down too much? Does it seem like your tests are doing more harm than good? Are you still waiting to see all these supposed “benefits” of TDD? I’m working on some materials to help you level up your TDD skills so you can start loving your tests instead of hating them. And I started a newsletter so we can have a conversation about the pain you’re feeling, and I can let you know as soon my TDD materials become available.

Start loving your tests:

How to test your tests

One of the benefits of writing your tests first is that you will always see the test fail. This is important, because you can’t trust a test you haven’t seen fail. Think about a line of code like this: assert a = 3

Of course, you meant to write a == 3, but you may not realize that if it’s in a test that you wrote to verify already-working code. It would pass, and you’d assume it passed because the code it’s testing really did set a to 3. But if you wrote and ran the test first (or commented out the working code to see a test failure), you’d notice that the test was passing when it shouldn’t, and fix the bug.

Watching a test fail is one way to test your tests.

But don’t just see a red/failing test and run off to make it pass. Pay attention to the failure message. You could have a different bug in your test that’s causing the wrong failure. Maybe due to a syntax or logic error. So if you don’t get the failure you expect, that’s another sign that your test may have a bug.

Now when you see a failure you expect, only write just enough code to fix that specific failure. Is your “makes a equal to 3″ test failing because the module is missing? Don’t implement the entire module, just create it. Then watch it fail because the function is missing from the module. Now don’t implement the entire function, just declare it. And so on. Keep fixing only the immediate failure until you hit green. Does the implementation feel complete? If not, you need another test.

It may feel silly at first, but if you train yourself to always take these micro steps, you can be sure that every line of your code is actually being tested. If you take large steps, the chances increase that untested – or even superfluous – code sneaks into your system. Fixing only the current failure tests your tests for completeness.

So while you’re in the “red” step of “red, green, refactor”, remember to keep an eye out that you’re red for the right reason, and don’t try to jump straight to green, just fix whatever is making you red right now. Eventually you’ll get there, and you’ll feel super confident in your code.

Do I have to write the test first?

To many, writing the test first is a requirement of TDD. It’s how I prefer to do it, but I don’t believe it’s a requirement, especially when starting out.

But that doesn’t mean I’m suggesting you go ahead and code away willy nilly and then write all the tests when you’re done. You still need a tight feedback loop. So how do you get that if you aren’t writing the tests first?

Using small steps: write one slice of code. Does it work? Good, now comment it out! Then write a test that will only pass with the code you just wrote.

Now run your test and watch it fail. This is an important step. If you haven’t seen a test fail, you can’t trust that you’re actually testing what you think you’re testing.

Now uncomment your code. Does the test pass? Good. Now you can refactor. Does the test still pass? Good! Now commit and repeat. I sometimes call this comment-driven development. I’m sure I’m the first person to think of it. I’m very clever.

If you stick to this style, eventually you will start to anticipate how to design the code you’re writing so you can easily test it. Then you may decide it’s easier to just go ahead and write the test first. Welcome to the club.

TDD, Micro Steps, and Backing Up

TDD is a way to think through your requirements incrementally by writing a test for one small piece, writing just enough code to get that test to pass, refactoring, and then moving on to the next small piece.

As you’re growing your code in this way, you should be zoomed way in. Taking tiny, micro steps. The time from failing to passing test should be measured in seconds, not minutes. It’s a feedback loop and it should be tight. You don’t want to waste time poking around in the dark.

This puts excellent design pressure on your system. The only way to keep that tight feedback loop moving is by writing code that is loosely coupled with a single responsibility. Otherwise it will be too hard to test.

This is why TDD is more about design than it is about verifying working code. But that doesn’t mean you can ignore design completely and let your tests lead you blindly somewhere without thinking. You will need to zoom back out every once in a while. You will need to put your knowledge of good design principles into practice and think critically about your code beyond what’s only easy to test.

I’ve been in many positions where my tests painted me into a corner. Or my feedback loop starts slowing down as complexity starts spiking, or the easiest way to test something would result in an obvious code smell. When that happens, I back up.

The ability to back up is another reason to keep the feedback loop tight. You should always be able to easily jump back any number of steps to working code and try a new path.

Yes, you still have to choose your path when you TDD. It’s called test-driven development, but you’re still the driver, not the tests. They are a tool you use to drive out some desired behavior, and there’s usually going to be multiple ways to write tests to get there. Use your design sense to make the best choice. If you don’t like where you ended up, back up. And keep your feedback loop short so backing up is no big deal.

tdubs: better test doubles for python

A couple things have been bothering me about python’s unittest.mock:

Problem 1: Stubs aren’t Mocks

Here’s a function (that is stupid and dumb because this is an example):

def get_next_page(repo, current_page):
    return repo.get_page(current_page + 1)

If I want to test this with unittest.mock, it would look like this:

def test_it_gets_next_page_from_repo(self):
    repo = Mock()
    next_page = get_next_page(repo, current_page=1)
    self.assertEqual(next_page, repo.get_page.return_value)
    repo.get_page.assert_called_with(2)

What bothers me is that I’m forced to use a mock when what I really want is a stub. What’s the difference? A stub is a test double that provides canned responses to calls. A mock is a test double that can verify what calls are made.

Look at the implementation of get_next_page. To test this, all I really need is a canned response to repo.get_page(2). But with unittest.mock, I can only give a canned response for any call to repo.get_page. That’s why I need the last line of my test to verify that I called the method with a 2. It’s that last line that bothers me.

If I’m writing tests that explicitly assert that specific calls were made, I prefer those to be verifying commands, not queries. For example, imagine I have some code that looks like this:

# ...
article.publish()
# ...

with tests like this:

def test_it_publishes_the_article(self):
    article.publish.assert_called_once_with()

Now the assertion in my test feels right. I’m telling the article to publish, so my test verifies that I sent the publish message to the article. My tests are verifying that I sent a command, I triggered some behavior that’s implemented elsewhere. Feels good. But wait…

Problem 2: Public API conflicts

Here’s the other problem. Imagine I had a typo in my test:

def test_it_publishes_the_article(self):
    article.publish.assertt_called_once_with()

Notice the extra “t” in “assert”? I hope so, because this test will pass even if article.publish is never called. Because every method called on a unittest.mock.Mock instance returns another Mock instance.

The problem here is that python’s mocks have their own public api, but they are supposed to be stand-ins for other objects that themselves have a public api. This causes conflicts. Have you ever tried to mock an object that has a name attribute? Then you’ve felt this pain (passing name as a Mock kwarg doesn’t stub a name attribute like you think it would, instead if names the mock).

Doesn’t autospec fix this problem?

autospec is an annoying bandage over this problem. It doesn’t fit into my normal TDD flow where I use the tests to tease out a collaborator’s public API before actually writing it.

Solution: tdubs

I decided to write my own test double library to fix these problems, and I am very happy with the result. I called it tdubs. See the README for installation and usage instructions. In this post I’m only going to explain the parts that solve the problems I described above.

In tdubs, stubs and mocks are explicit. If you want to give canned responses to queries, use a Stub. If you want to verify commands, use a Mock. (you want to do both? rethink your design [though it’s technically possible with a Mock])

A Stub can provide responses that are specific to the arguments passed in. This lets you create true stubs. In the example above, using tdubs I could have stubbed my repo like this:

repo = Stub('repo')
calling(repo.get_page).passing(2).returns(next_page)

and I would not need to verify my call to repo.get_page, because I would only get my expected next page object if I pass 2 to the method.

With tdubs, there’s no chance of false positives due to typos or API conflicts, because tdubs doubles have no public attributes. For example, you don’t verify commands directly on a tdubs Mock, you use a verification object:

verify(my_mock).called_with(123)

After hammering out the initial implementation to solve these specific problems, I ended up really liking the way my tests read and the type of TDD flow that tdubs enabled. I’ve been using it for my own projects since then and I think it’s ready to be used by others. So if you’re interested, visit the the readme and try it out. I’d love some feedback.

How to Practice

In my previous two blog posts (here and here) I mentioned practicing. The posts cover the reasons why we should practice, but they don’t mention how to practice. Here are some resources for that:

Google code katas. These are sample programming problems that you can solve over and over in any language. They aren’t about solving hard problems, they are about practicing and forming good habits.

Similarly, google koans for your language of choice. These may feel like beginner tutorials, but they are about practicing until you have the nitty gritty details of your language committed to muscle memory.

exercism.io offers guided programming problems with pre-written tests for many languages.

The Pragmatic Programmer recommends learning a new language every year. This is great practice even if you don’t get to use those languages in your day job. Try to pick languages in a different paradigm than what you’re used to. This will often help you see better solutions to problems in any language. You can use the previous resources to learn these new languages.

For meditation, google for a mindfulness mobile app so you can have it anywhere, and as a little reminder every time you look at your phone. I use The Mindfulness App on iOS. I like it because it offers both guided meditations in different lengths, as well as meditation timers I can set to any length of time I want. Great for starting out with 1 or 2 minute meditations.

There are many ways to practice, these are just a few suggestions. However you do it, remember the goal is to train so that when the pressure is on, you default to good code instead of bad.