Blaix Blog

To DRY or not to DRY

2017-10-16

Very often a misunderstanding or over-eagerness to DRY up your code, when combined with inevitable changing requirements, leads to a complex mess that grinds progress to a halt. How did we get here?

Suppose you’re writing some code for a zoo, and at some point you write something like this:

if animal == 'dog' or animal == 'wolf':
    canine = True

You do some stuff with animal and then later end up writing this:

if another_animal == 'dog' or another_animal == 'wolf':
    canine = True

There’s some duplication here. You learned the “Don’t Repeat Yourself” (DRY) principle so you refactor that code to use a is_canine function. But what triggered this instinct to DRY things up? Well, you noticed some duplication. What was duplicated? Well, the code I guess?

Now suppose you’re working on the zoo’s web app and you have a request handler for creating animal records. It accepts a request, validates the input, creates an Animal instance, saves it, and responds with a 201 (if created) or 400 (if invalid).

Later you add another request handler for creating tickets. It accepts a request, validates the input, creates a Ticket instance, saves it, and responds with a 201 (if created) or 400 (if invalid). The code for this looks almost identical to the animal request handler. This triggers your DRY instinct again, so you make a create_resource function that accepts request data and the resource class

Now it’s so easy to add new handlers that create other types of resources! Until a new requirement comes in: you must check the schedule before creating a ticket so you don’t overbook the zoo. You decide to accept a pre_create callback in create_resource to handle this.

The next requirement is that you must create a FeedingSchedule for each Animal that is created. No problem, a post_create callback can handle that.

But then that has to communicate with the kitchen’s API, which is slow, requiring the animal request to run asynchronously. Maybe an async=True flag? Oh, and tickets shouldn’t actually return a 400 status if the input is bad, they should create an in-progress ticket, and…

Your create_resource function now has a pile of parameters, flags, and callbacks. It’s hard to use and even harder to change. You look back to figure out how this happened: You DRYed up some duplicated code, but then the requirements kept changing and your beautiful and simple abstraction got gross.

How can you avoid this?

The key is to know when to DRY your code. It’s not about duplicated code, it’s about duplicated knowledge. For example, when you wrote is_canine, you created one source of truth for the knowledge of what makes something a canine, which seems reasonable. But when you created create_resource, you unintentionally created one source of truth for the knowledge of how every resource gets created, and not all resources are created the same way.

You will have much greater success with the DRY principle if you stop thinking in terms of reuse, and start thinking in terms of change. Abstractions couple logic, and you should only couple things that change for the same reason. The logic for creating an animal and creating a ticket changed for different reasons, and those changes were painful because their logic was coupled.

Remember, when deciding if you should apply DRY, it’s not about saving future keystrokes, it’s about keeping one source of truth. It’s not about reuse, it’s about change (like so much in programming).

It’s not about avoiding duplicated code, it’s about avoiding duplicated knowledge.