Mocks and Dependency Injection

This is the third and final part of a series about mocking and TDD. In part 2, I created a github boundary object. To use it, I relied on dependency injection that looked like this: User.get('username', github)

This isn’t great. Every time I need a user I would have to take two steps: create or import the github boundary, and then get the user. That’s not only annoying, but could result in Github() calls being sprinkled throughout my code, making it very hard to change the way I initialize it (I can easily picture needing to pass in some config options down the road).

One way to fix this would be to make the github parameter optional. It would look something like this:

# in User model:
@classmethod
def get(cls, username, github=None):
    github = github or Github()
    user_data = github.get_user(username)
    return cls(**user_data)

That would let me instantiate users wherever I want using just the username, and still let me inject a mocked github boundary in my tests. But it would wreak of code that only exists to make something easier to test. That’s a smell (a signal that a design may be bad), that is very common when testing with mocks. Your tests are more effective when they exercise code the same way it’s used in production, without special hooks or hacks to poke around inside.

Making the parameter optional also risks accidental integration. Have you ever found that your unit test suite fails when the network was down, even though you thought you were being so careful? Allowing implicit communication with boundaries can lead to pain like that.

Whenever I reach the point in my test where I want to inject a mocked boundary, I always leave it as a required parameter when implementing the production code. I do this because boundaries are volatile and out of my control, so I want all interaction with them to be explicit.

Since I want the github parameter to be required, but I don’t want to pass it in every time I need a user, I decide that I need a new thing that knows about my boundary and can pass it in for me, and when I need a user from github, I can call that thing. One way to do that is with a full-blown inversion of control (IoC) container. It’s an object that knows how to build your objects with all their dependencies/wiring. Those have their own set of downsides and I try to avoid getting to the point where I need something that heavy.

Instead, I’ll add a new class method right on the User model that looks like this: User.from_github('username'). Then I can get users with a single call, and my interaction with github will still be explicit. There’s no risk of accidental integration: I’ll either be passing in a boundary, or calling a method that mentions it.

How do I implement this? First, I realize that I’m in the “refactor” phase of my red/green/refactor TDD cycle. Since I’m refactoring, I want to change the structure of my code, without changing its behavior. So my goal is to clean this up without breaking (and ideally without changing) any of my tests that exercise that behavior. My view currently looks like this:

# somewhere in my view:
github = Github()
user = User.get(request.data['username'], github)

It’s instantiating the github boundary and then using it to get a user matching a username. This is pretty much the exact behavior I want in my from_github method, so my plan is to do an extract method refactoring.

I start by literally copying the lines to a class method on my User model:

# in my User model:
@classmethod
def from_github(cls):
    github = Github()
    user = User.get(request.data['username'], github)

I can’t rely on a global request object, so I change it to a parameter:

# in my User model:
@classmethod
def from_github(cls, username):
    github = Github()
    user = User.get(username, github)

and update my view:

# somewhere in my view:
user = User.from_github(request.data['username'])

Still green! Successful refactoring.

While in my user model I notice something:

# in User:
@classmethod
def get(cls, username, github):
    user_data = github.get_user(username)
    return cls(**user_data)

Now that I have a method that explicitly mentions github, it feels weird that this method – which is not specific to github – has a parameter called github. I change it to be generic:

# in User:
@classmethod
def get(cls, username, repo):
    user_data = repo.get_user(username)
    return cls(**user_data)

Much better. And now the door is open to getting users from other APIs. For example, you can probably imagine what a from_twitter method might look like.

Notice that I never explicitly wrote a test for User.from_github. There are a few reasons: In my refactoring step, I rarely write new tests, since I don’t want to change behavior. And this method is actually another boundary, which I don’t unit test, and is already covered by the system test that hits my view.

In the end, I now have several ways to create users:

  • Instantiate with data from anywhere: User(**attributes)
  • Instantiate with data from a boundary that conforms to my expected interface: User.get(username, boundary)
  • Instantiate with data from a specific, named boundary: User.from_github(username)

I’m happy with this design. These are all clear, well defined factory methods (methods for creating objects) each with a specific purpose. And it turns out this is the pattern I usually end up with when dealing with models and boundaries. More explicitly, the pattern looks like this:

  • Ignore the boundary at first and write an init method that accepts pure data.
  • Drill down until I need to test that data is coming from a boundary, design the boundary using a mock, and inject it via a required parameter on a new method (see part 2).
  • Clean things up by adding a new method that can handle the boundary wiring for me.

I’ve found that using mocks, dependency injection, and factory methods in this way has made my code easier to maintain. The methods are small, all interaction is clear, and refactoring is safe and fun.

Leave a Reply (markdown is supported)