Mocks and Dependency Injection

This is the third and final part of a series about mocking and TDD. In part 2, I created a github boundary object. To use it, I relied on dependency injection that looked like this: User.get('username', github)

This isn’t great. Every time I need a user I would have to take two steps: create or import the github boundary, and then get the user. That’s not only annoying, but could result in Github() calls being sprinkled throughout my code, making it very hard to change the way I initialize it (I can easily picture needing to pass in some config options down the road).

One way to fix this would be to make the github parameter optional. It would look something like this:

# in User model:
@classmethod
def get(cls, username, github=None):
    github = github or Github()
    user_data = github.get_user(username)
    return cls(**user_data)

That would let me instantiate users wherever I want using just the username, and still let me inject a mocked github boundary in my tests. But it would wreak of code that only exists to make something easier to test. That’s a smell (a signal that a design may be bad), that is very common when testing with mocks. Your tests are more effective when they exercise code the same way it’s used in production, without special hooks or hacks to poke around inside.

Making the parameter optional also risks accidental integration. Have you ever found that your unit test suite fails when the network was down, even though you thought you were being so careful? Allowing implicit communication with boundaries can lead to pain like that.

Whenever I reach the point in my test where I want to inject a mocked boundary, I always leave it as a required parameter when implementing the production code. I do this because boundaries are volatile and out of my control, so I want all interaction with them to be explicit.

Since I want the github parameter to be required, but I don’t want to pass it in every time I need a user, I decide that I need a new thing that knows about my boundary and can pass it in for me, and when I need a user from github, I can call that thing. One way to do that is with a full-blown inversion of control (IoC) container. It’s an object that knows how to build your objects with all their dependencies/wiring. Those have their own set of downsides and I try to avoid getting to the point where I need something that heavy.

Instead, I’ll add a new class method right on the User model that looks like this: User.from_github('username'). Then I can get users with a single call, and my interaction with github will still be explicit. There’s no risk of accidental integration: I’ll either be passing in a boundary, or calling a method that mentions it.

How do I implement this? First, I realize that I’m in the “refactor” phase of my red/green/refactor TDD cycle. Since I’m refactoring, I want to change the structure of my code, without changing its behavior. So my goal is to clean this up without breaking (and ideally without changing) any of my tests that exercise that behavior. My view currently looks like this:

# somewhere in my view:
github = Github()
user = User.get(request.data['username'], github)

It’s instantiating the github boundary and then using it to get a user matching a username. This is pretty much the exact behavior I want in my from_github method, so my plan is to do an extract method refactoring.

I start by literally copying the lines to a class method on my User model:

# in my User model:
@classmethod
def from_github(cls):
    github = Github()
    user = User.get(request.data['username'], github)

I can’t rely on a global request object, so I change it to a parameter:

# in my User model:
@classmethod
def from_github(cls, username):
    github = Github()
    user = User.get(username, github)

and update my view:

# somewhere in my view:
user = User.from_github(request.data['username'])

Still green! Successful refactoring.

While in my user model I notice something:

# in User:
@classmethod
def get(cls, username, github):
    user_data = github.get_user(username)
    return cls(**user_data)

Now that I have a method that explicitly mentions github, it feels weird that this method – which is not specific to github – has a parameter called github. I change it to be generic:

# in User:
@classmethod
def get(cls, username, repo):
    user_data = repo.get_user(username)
    return cls(**user_data)

Much better. And now the door is open to getting users from other APIs. For example, you can probably imagine what a from_twitter method might look like.

Notice that I never explicitly wrote a test for User.from_github. There are a few reasons: In my refactoring step, I rarely write new tests, since I don’t want to change behavior. And this method is actually another boundary, which I don’t unit test, and is already covered by the system test that hits my view.

In the end, I now have several ways to create users:

  • Instantiate with data from anywhere: User(**attributes)
  • Instantiate with data from a boundary that conforms to my expected interface: User.get(username, boundary)
  • Instantiate with data from a specific, named boundary: User.from_github(username)

I’m happy with this design. These are all clear, well defined factory methods (methods for creating objects) each with a specific purpose. And it turns out this is the pattern I usually end up with when dealing with models and boundaries. More explicitly, the pattern looks like this:

  • Ignore the boundary at first and write an init method that accepts pure data.
  • Drill down until I need to test that data is coming from a boundary, design the boundary using a mock, and inject it via a required parameter on a new method (see part 2).
  • Clean things up by adding a new method that can handle the boundary wiring for me.

I’ve found that using mocks, dependency injection, and factory methods in this way has made my code easier to maintain. The methods are small, all interaction is clear, and refactoring is safe and fun.

Python’s patch decorator is a code smell

I’m a big fan of using mocks as a testing/design tool. But if I find myself reaching for patch instead of Mock in python, I usually stop and rethink my design.

I consider the use of patch in tests to be a code smell. It means the test code is not using my internal API. It’s reaching in to the private implementation details of my object.

For example, I recently needed a helper function for creating users on a third-party service with a set of default values. I could have written it like this:

from services import UserService

from settings import SERVICE_CONF


def create_user_with_defaults(**attributes):
  defaults = { "name": "test" }
  defaults.update(attributes)

  service = UserService(**SERVICE_CONF)
  return service.create_user(**defaults)

This would get the job done. And because this is python, I can test it without hitting real services using @patch:

@patch("users.helpers.UserService")
def test_creates_user_with_defaults_on_user_service(self, MockUserService):
  user_service = MockUserService.return_value
  
  # execution:
  user = create_user_with_defaults()
  
  # verification:
  user_service.create_user.assert_called_once_with(name="test")
  self.assertEqual(user, user_service.create_user.return_value)

But look at the verification step: there is nothing in the execution step about user_service, yet that’s what I’m asserting against. My tests have knowledge about private implementation details of the thing they’re testing. That’s bad news.

I prefer my tests to be normal consumers of my internal APIs. This forces me to keep my APIs easy to use and flexible. @patch lets me get around issues like tight coupling by hijacking my hard-coded dependencies.

Here is how I actually implemented the helper function:

def create_user_with_defaults(service, **attributes):
  defaults = { "name": "test" }
  defaults.update(attributes)
  return service.create_user(**defaults)

I didn’t even need to import anything! This is how I would test it:

def test_creates_user_with_defaults_on_user_service(self):
  user_service = Mock()
  
  # execution:
  user = create_user_with_defaults(user_service)
  
  # verification:
  user_service.create_user.assert_called_once_with(name="test")
  self.assertEqual(user, user_service.create_user.return_value)

Now compare the verification to the execution. Instead of patching the internal workings of the module, I’m explicitly passing in a mock object. I can do this because the function no longer depends on the concrete implementation of the user service, it depends on an abstraction*: some object that must be passed in that conforms to a certain interface. So it makes sense that my test verifies the interaction with that interface.

This means my test is now a normal consumer of my function, and my desire to avoid patch led me to a design that is more flexible. This became clear as soon as I wanted to create some test users in the repl. I happily created an instance of the UserService that uses the settings for our sandbox, and passed that in to my function.

*See The Dependency Inversion Principle (the D from SOLID).

Do you need DI in dynamic languages?

I’m often told dependency injection isn’t needed in dynamic languages. DHH wrote an article a couple years ago that sums the argument up nicely: Dependency injection is not a virtue. Read that and come back here for my 3-years-later response.

He uses this as an example:

def publish!
  self.update published_at: Time.now
end

He says DI folks would shiver at hard-coding a call to Time.now, and I’m assuming he thinks they would inject a clock or something to more easily test it. He argues (accurately) that it’s silly to make that more complex just for testing, and you should do something like this:

Time.stub(:now) { Time.new(2012, 12, 24) }
article.publish!
assert_equal 24, article.published_at.day

I do shiver at hard-coding Time in the publish! method, but I also shiver at stubbing global, built-in classes like that…

I prefer to think about Dependency Inversion first, and then Dependency Injection if that’s a way to accomplish it.

So when I see this:

def publish!
  self.update published_at: Time.now
end

I do shiver, because hard-coding like that is well established to be a code smell. Now, a smell does not mean it’s wrong in every case, but it means I should examine this. I’m not even thinking about tests right now, I’m thinking about inverting this dependency. Here’s how I’d do it:

def publish!(time)
  self.update published_at: time
end

I’ve inverted the dependency. No dependency injection framework needed. And no stubbing globals or bringing in complex gems (like Timecop) to test it:

article.publish! Time.new(2012, 12, 24)
assert_equal 24, article.published_at.day

The fact that it’s easier to test wasn’t the goal, it was a side effect of a better design. Now the system is ready to allow users to define publish dates in the future for example.

So just because a dynamic language lets us hijack objects to make testing easier, doesn’t mean we should completely ignore good practices. I’d argue that this is better code, regardless of what language I happened to write it in.

And if you miss the old API? Just give it a default:

def publish!(time = Time.now)
  self.update published_at: time
end