Mocks and External Dependencies

This is Part 2 in a series about mocking. In part 1, I said it’s best to use mocks as a design tool, and not as a convenience tool for tests that touch external dependencies. But what does it look like when you do have an external dependency, like a third-party library? Do you wrap it in your own code and mock the wrapper? I don’t think so. That puts the emphasis on the dependency, and I want dependencies to be details. Instead, I think in terms of boundaries. I let my tests help me decide where those boundaries are, and then by mocking them, figure out what they should look like. Then I may implement that boundary using an external dependency, which I do not mock in the tests. I’ll show what I mean with an example.

My imaginary example app will tell you if a particular github user is famous or not. My business logic determines famousness based on the number of followers. If they have 100 or more, they are famous.

Now pretend that I’ve drilled down to the point where I need a User model with a is_famous() method.

I don’t start by looking for (or writing) a github API library. I don’t like to interact with an API in my code until I’m absolutely forced to. I do take a look at the API to get an idea of where I’m headed before starting, but when I do, I’m careful not to let that influence my design in a way that would couple it tightly to the external API.

So I have some idea of what the github data looks like, but since I don’t want to hit the API yet, I start by assuming I can init my user with data from anywhere. This lets me test my logic without worrying about anything external. My tests look something like this:

def test_user_is_famous_if_more_than_100_followers(self):
    user = User(followers=101)
    self.assertTrue(user.is_famous())

def test_user_is_famous_if_100_followers(self):
    user = User(followers=100)
    self.assertTrue(user.is_famous())

def test_user_is_not_famous_if_less_than_100_followers(self):
    user = User(followers=99)
    self.assertFalse(user.is_famous())

It’s easy enough to make these pass, and since I’m not concerned with the github api yet, the tests are very easy to read and understand. No mocking noise!

But in the real world, I won’t be hard-coding the data passed to User‘s constructor. So I want a new method that can initialize a user with data from the service where that data lives. I take a moment to think about what that might look like:

# in my imagination (or maybe a scratch buffer...)
@classmethod
def get(cls, username):
    # get user_data from github...
    return cls(**user_data)

I’m happy with that. So in a real file, I start with a test:

def test_gets_user_data_from_github(self):
    user = User.get('blaix')
    self.assertEqual(user.id, 420)
    self.assertEqual(user.followers, 69)

How do I make this pass? Time to start looking for a github client library? Not yet. I can defer that decision a bit longer. For now, I only want to do the simplest thing that makes the test pass, so I cheat:

# in User:
@classmethod
def get(cls, username):
    return cls(id=420, followers=69)

I haven’t shown it, but I have a view that is initializing a user, and a system test that exercises that view. So I update my view to call this new method, and check that my system test still passes.

I’m all green. But not every github user has an id and follower count this cool and nice. I need my code to handle the general case. To make my code more general, I need to make my tests more specific. So I need my tests to explicitly verify that I’m getting the data from github. How do I do that?

First, I recognize that I’ve finally reached a boundary: my app code needs data from the outside world – in this case, the github API. At boundaries like this, I want an explicit object (a function, method, instance, or class) with a single purpose: handle that external communication.

I keep my boundaries in explicit objects to protect me from things that are volatile. The github API, or even the library I’d use to access the API, could change for reasons completely independent from my business logic. When I keep my interaction with it isolated, I can respond to those changes safer and faster, since the interaction won’t be scattered around and mixed with my app code. As a side-effect, it also provides a nice injection point to stick a test double that will help me move forward here, as well as protect my unit tests from unreliable and non-deterministic network calls.

Now back to that test: how do I assert that those numbers came from github? Since I’ve decided to use a boundary object, I can verify that my new method is using the boundary object to get the data. How do I verify that something I’m testing is interacting with another object correctly? This is exactly the right job for a mock.

This is a pattern I’ve been recognizing in my code lately. At my boundaries, I want the ability to inject a boundary object, and I want my tests to verify the interactions by injecting a test double to stand in for that boundary.

Since I’m using a test double as a stand-in for my boundary object, I get to design it from the point of view of the caller without worrying about the details of the implementation. So I decide that the cleanest way to get a user from my boundary object is to call a get_user method. Here’s my updated test:

def test_gets_user_data_from_github(self):
    github = Stub('github')
    
    calling(github.get_user).passing('blaix').returns({
        'id': 420,
        'followers': 69,
    })

    calling(github.get_user).passing('notblaix').returns({
        'id': 421,
        'followers': 70,
    })
    
    user = User.get('blaix', github)
    self.assertEqual(user.id, 420)
    self.assertEqual(user.followers, 69)

    user = User.get('notblaix', github)
    self.assertEqual(user.id, 421)
    self.assertEqual(user.followers, 70)

Note: I’m using tdubs which does not have explicit Mock objects, but the way I’m using Stub + calling here provides the same functionality: verify that I’m calling the collaborator correctly (since I’d only get the expected values when calling the method with those parameters).

Notice I had to add a new parameter to inject the github object. That’s  yucky, but I don’t want to split my thinking yet. So I make a note to refactor this when I’m green again. First I’ll make this test pass by writing :

# in User:
@classmethod
def get(cls, username, github):
    user_data = github.get_user(username)
    return cls(**user_data)

That makes the unit test pass, but now my system test is failing because I’m not passing the github parameter. So I update my view:

# somewhere in my view:
github = Github() # I know this doesn't exist yet, it's fine.
user = User.get(request.data['username'], github)

It’s still failing, but for a different reason. Progress! Now it’s failing because Github doesn’t exist. So I create that class but leave it empty. I like to wait for my test failures to move me forward. Now it’s failing because get_user doesn’t exist, so I create that too, leaving it empty as well. Finally I get a failure that isn’t about basic scaffolding: I’m returning None and my code expects a dictionary. That’s going to require adding real logic, and I don’t want to do that without an explicit test for that logic, so for now, I silence this failure by returning a faked dictionary.

Time to force my really for real github logic. As usual, I want to start with a test. Does that mean a unit test? Well, imagine what I’d need to do to unit test Github.get_user (meaning: test it without interacting with the outside world). I’d end up mocking third-party or even standard libraries. I don’t control those interfaces, so I wouldn’t get the full benefit of mocks, but I’d still get all the costs. So to optimize my rewards, I decide to fully integration test this boundary method. I expect to hit the real github API, and assert against my real user id. I’m only asserting against my id and not my follower count because the latter is likely to change, and I’m confident enough in my system tests that my bases are covered there.

def test_get_user_from_github(self):
    user = Github().get_user('blaix')
    self.assertEqual(user['id'], 664)

This is another pattern in my code: I always integration test my boundary objects. By doing this, I get two benefits: the tests are simple and provide high confidence, and since I don’t want to write a lot of tests like this, there is pressure to keep minimal logic in my boundary objects, which makes them easier to understand and maintain – something that’s very important for code that bumps up against things that are unreliable and could change without your control.

Time to make it pass by filling my empty method with real guts:

import requests

class Github(object):
    def get_user(self, username):
        url ='https://api.github.com/users/{}'.format(username)
        return requests.get(url).json()

I decided I didn’t need a full github client library. The simplest way to make my tests pass was to use the requests package (a ubiquitous package in python land for making HTTP requests).

But even though I’m using a third-party package, at no point did I need to patch an import. I’m not wrapping a third-party package to have something to mock in my tests, I used a mock to design an interface and then implemented that interface with a third-party package. I’m now free to swap out that package if I need to as the requirements grow more complex, and as long as I keep returning github user data from Github.get_user, I won’t have to change any of my other production code, or any of my tests. Imagine that: a complete refactoring of the internals of a class, with a test suite that acts only as a safety net and not handcuffs. Tests (with mocks!) that make refactoring third-party integrations easier, not harder. It’s possible when you follow these guidelines:

  • Work from the outside in. I started with a system test (not shown in the article), and that provided the safety net to start and keep the ball rolling. Then I worked my way in, one layer at a time, designing the code I wanted to have at the next layer down as I wrote my tests.
  • Defer decisions on third-party integrations as long as possible. It would have been tempting to start by using a third-party github library right in my view, but instead, I worked in layers, drilling down until I absolutely needed a single object with the sole purpose of communicating with github.
  • Prefer injectable boundary objects. When I reached the point where I wanted a test to assert that certain data came from github, I did that by injecting a test double, and this made it very easy to design the API of an explicit object to communicate with github.
  • Only integration test boundary objects. When I reached my boundary object, it was something that needed to communicate with the outside world. I could have tested it in isolation by mocking a third-party dependency, but that would leave me tightly coupled to an API I don’t have control over. So I fully integration test it, which puts pressure on me to keep my boundary object thin and free of logic, which is a good design for an object that interacts with volatile things like third-party dependencies and external HTTP APIs.

But wait! Remember this?

user = User.get('blaix', github)

This is gross. Passing an instance of my boundary object every time I need a user is going to be annoying. I punted on that earlier, but now that I’ve implemented everything and my tests are green, I’m free to refactor. This will require some discussion about mocks and dependency injection, and will be the subject of part 3 in this series.

2 thoughts on “Mocks and External Dependencies

Leave a Reply (markdown is supported)