Solve tough problems with spikes

Sometimes I’m approaching a problem where I lack some understanding that would let me start with nice little unit tests. So instead, I start with a high-level functional test. Then I start getting the code to work by any means necessary. I do this without writing any more tests. I just cowboy-code my way to a solution that works. Extreme Programming calls this a Spike Solution.

When my functional test is green, I have much more understanding. I’ve been googling, looking up and using new libraries, and usually have a better idea of what a clean solution might look like. This is when I throw my code away.

Well, most of it. I keep the functional test. And if I have any particularly tricky code I might want for reference, I keep it separate from the project code, but available to refer to if I need it. Then I jump down into my unit tests and try to make my functional test green again using a proper test-driven approach.

It can be very hard to discard working code, but when I think back to every time I’ve lost some writing work unintentionally – an essay, blog post, homework assignment – the second draft is always better. I think the same is true for writing code.

Why is “Legacy” a negative word?

Programming might be the only place where “legacy” is a negative word. It seems backwards that we’re an industry where things tend to get worse instead of better as they get older and more people work on them. How does that happen?

Imagine you’re given a task to make a change to a piece of code that works, but no one wants to touch. It’s a gnarly mess of quick fixes and rushed, untested features that’s impossible to understand.

But you’re relieved because the change you need to make is super tiny. You can dive in, add a new if block to the 500-line method and get outta there.

“Who made that mess? I wish my managers would give me time to rewrite this thing” you think as you wash your hands of that “legacy code”.

But how did the code get like that? No one sets out to write a giant, confusing method that no one wants to work in. It’s a group effort. It takes lots of little “quick fixes” and “tiny changes” like the one you just made.

It’s not up to your managers to “let” you pay down that “technical debt”. A rewrite won’t fix that problem. To really fix it, we need to change our attitude about code like this.

We need to realize that the problem isn’t some other programmer who wrote this big thing. It’s us. We all worked together to make this mess because no one thinks their one small change is causing the problem.

When we finish working on a piece of the system, it shouldn’t just do what we want, it should also be a little bit easier for the next person to come in and make more changes. Our job isn’t just to build new things, it’s to enable change. To do that, we should leave each piece of code we touch a little better than we found it.

Uncle Bob calls this The Boy Scout Rule. Martin Fowler calls it Opportunistic Refactoring.

Imagine working on a team where this is the norm. Imagine working on a code base that gets better as it ages. Imagine if “legacy” was a good thing.

That doesn’t mean there will never be a mess. But it means the trend will be towards cleaner code. Quick hacks and technical debt will no longer be excuses to continue cutting corners. They will be strategic decisions made when the trade-off is worth it. And we don’t need to get management approval to pay down that debt, because it won’t be a huge undertaking. It will be how we normally work: pay it down a little bit at a time as we continue working on the codebase.

Let’s start leaving behind a good legacy.

Python’s patch decorator is a code smell

I’m a big fan of using mocks as a testing/design tool. But if I find myself reaching for patch instead of Mock in python, I usually stop and rethink my design.

I consider the use of patch in tests to be a code smell. It means the test code is not using my internal API. It’s reaching in to the private implementation details of my object.

For example, I recently needed a helper function for creating users on a third-party service with a set of default values. I could have written it like this:

from services import UserService

from settings import SERVICE_CONF

def create_user_with_defaults(**attributes):
  defaults = { "name": "test" }

  service = UserService(**SERVICE_CONF)
  return service.create_user(**defaults)

This would get the job done. And because this is python, I can test it without hitting real services using @patch:

def test_creates_user_with_defaults_on_user_service(self, MockUserService):
  user_service = MockUserService.return_value
  # execution:
  user = create_user_with_defaults()
  # verification:
  self.assertEqual(user, user_service.create_user.return_value)

But look at the verification step: there is nothing in the execution step about user_service, yet that’s what I’m asserting against. My tests have knowledge about private implementation details of the thing they’re testing. That’s bad news.

I prefer my tests to be normal consumers of my internal APIs. This forces me to keep my APIs easy to use and flexible. @patch lets me get around issues like tight coupling by hijacking my hard-coded dependencies.

Here is how I actually implemented the helper function:

def create_user_with_defaults(service, **attributes):
  defaults = { "name": "test" }
  return service.create_user(**defaults)

I didn’t even need to import anything! This is how I would test it:

def test_creates_user_with_defaults_on_user_service(self):
  user_service = Mock()
  # execution:
  user = create_user_with_defaults(user_service)
  # verification:
  self.assertEqual(user, user_service.create_user.return_value)

Now compare the verification to the execution. Instead of patching the internal workings of the module, I’m explicitly passing in a mock object. I can do this because the function no longer depends on the concrete implementation of the user service, it depends on an abstraction*: some object that must be passed in that conforms to a certain interface. So it makes sense that my test verifies the interaction with that interface.

This means my test is now a normal consumer of my function, and my desire to avoid patch led me to a design that is more flexible. This became clear as soon as I wanted to create some test users in the repl. I happily created an instance of the UserService that uses the settings for our sandbox, and passed that in to my function.

*See The Dependency Inversion Principle (the D from SOLID).

Do you need DI in dynamic languages?

I’m often told dependency injection isn’t needed in dynamic languages. DHH wrote an article a couple years ago that sums the argument up nicely: Dependency injection is not a virtue. Read that and come back here for my 3-years-later response.

He uses this as an example:

def publish!
  self.update published_at:

He says DI folks would shiver at hard-coding a call to, and I’m assuming he thinks they would inject a clock or something to more easily test it. He argues (accurately) that it’s silly to make that more complex just for testing, and you should do something like this:

Time.stub(:now) {, 12, 24) }
assert_equal 24,

I do shiver at hard-coding Time in the publish! method, but I also shiver at stubbing global, built-in classes like that…

I prefer to think about Dependency Inversion first, and then Dependency Injection if that’s a way to accomplish it.

So when I see this:

def publish!
  self.update published_at:

I do shiver, because hard-coding like that is well established to be a code smell. Now, a smell does not mean it’s wrong in every case, but it means I should examine this. I’m not even thinking about tests right now, I’m thinking about inverting this dependency. Here’s how I’d do it:

def publish!(time)
  self.update published_at: time

I’ve inverted the dependency. No dependency injection framework needed. And no stubbing globals or bringing in complex gems (like Timecop) to test it:

article.publish!, 12, 24)
assert_equal 24,

The fact that it’s easier to test wasn’t the goal, it was a side effect of a better design. Now the system is ready to allow users to define publish dates in the future for example.

So just because a dynamic language lets us hijack objects to make testing easier, doesn’t mean we should completely ignore good practices. I’d argue that this is better code, regardless of what language I happened to write it in.

And if you miss the old API? Just give it a default:

def publish!(time =
  self.update published_at: time

The Only Three Reasons My Unit Test Should Fail

When I practice TDD, my goal is to design a system with a test suite that encourages good design and enables confident refactoring. One of the ways I do that is by keeping in mind when a test should fail. Ideally, I only want to see test failures for one of these three reasons:

  • The unit’s not built yet
  • The unit’s public interface changes
  • The unit’s behavior changes

I consider these rules. If they are all true, I must not see a test failure. If any of them are not true, I must see a test failure.

Reason 1: The unit’s not built yet

Of course a test for a thing that doesn’t exist will fail. But this is a failure that I must see for each test at least once. It is how I know the test is actually testing what I think it’s testing. It’s surprisingly easy to write a test for something after it’s built, but write it in a way that it can never actually fail (and is therefore useless). This is one of the reasons why I prefer to write my tests first.

Reason 2: Public Interface Changes

If I am testing an object’s method, and the way I init the object or call the method changes, my test should fail. This seems obvious, but notice that I only want my tests to fail if my public interface changes. I don’t want my tests to depend on private implementation details. This means my tests are a consumer of my internal API. In other words, my tests interact with my code the same way my production code interacts with it. This is what puts good design pressure on my system and is the driving force behind the idea that “code that’s easy to test is easy to write/read/change”. It also keeps my code easier to refactor, which I’ll get to later.

Reason 3: Behavior Changes

If the behavior of my unit changes, my test should fail. This also seems obvious, but again, I want to dig in to what this means for my system.

For example, consider this unit:

class Greeter(object):
  def greet(self, name):
    return "Hello %s" % name

My test might look something like this:

def test_greet_says_hello_to_name(self):
  greeter = Greeter()
  self.assertEqual(greeter.greet("Justin"), "Hello Justin")

If greet changes to write to a display instead of returning a string, this test will fail. That’s good because this is a change in behavior. That change will require a change to any consumer of the unit, and my test is a consumer of that unit just like the production code is.

Confident Refactoring

I use these rules to keep good design pressure on my system and keep things easy to refactor. For example, let’s say we do want to write to a display in that Greeter class. One implementation might look like this:

class Greeter(object):
  def greet(self, name):
    sys.stdout.write("Hello %s" % name)

I often see code like this tested with a mock in this way:

def test_greet_says_hello_to_name(self, stdout):
  greeter = Greeter()
  stdout.write.assert_called_once_with("Hello Justin")

But this breaks one of my rules: this test is coupled to a private implementation detail of my unit. There’s nothing in my public interface that conveys my usage of sys.stdout yet my test knows to mock exactly that module. If that implementation changes, my test will fail, even though my public interface and the behavior has not changed.

This makes my code hard to refactor. When I’m refactoring, I only want to look at the implementation. I want to be free to change private details without fear of breaking tests (and therefore breaking production code) as long as I’m not changing the behavior or the public interface. That’s not true here. If I change my usage of sys.stdout to something else, I will get a surprising test failure.

What does this mean? I can’t think of a good way to test this without mocking, but I don’t want to use a mock that breaks a rule, so in this case, I think my test is telling me my design needs improvement. I would choose to fix this by promoting the display object to the public interface. That lets me inject a mock in my test without breaking a rule:

class Greeter(object):
  def __init__(self, display):
    self.display = display

  def greet(self, name):
    self.display.write("Hello %s" % name)

Now my test can still mock the display, but it does it by being a normal consumer of the object, just the same as any production code:

def test_greet_says_hello_to_name(self, stdout):
  display = Mock()
  greeter = Greeter(display)
  display.write.assert_called_once_with("Hello Justin")

Now it is clear that if I change how I print to a display, I will need to update any consumer of the greeter object (including the tests), because the display object is part of the public interface. The rules have led me to a better design that uses the Dependency Inversion Principle.

Only letting my tests use the public interface of my units has given the unit more flexibility outside of my tests, and made it clear to me what is and is not a safe refactoring.