Diff-Friendly Programming

In continuous deployment environments, changes should be small and diff-friendly. Find out what it means and how to do it.

Nick Galbreath · March 24, 2018

The day-to-day for most programmers isn’t writing code. It’s editing code. Specifically, editing code you didn’t write. And these edits are not typically pages and pages of fresh code. There are small changes here and there. And in the future, someone else will be editing these edits as well. Programming is a team sport.

These changes are often only evaluated in “diff” form (think GitHub pull request) and not the full context of the file. The quality of the diff is an indicator if someone needs to do a full review of the entire source file. Thus, making your changes “diff friendly” matters.

Here are some tips to help make the project and the changes be more diff-friendly. These tips are more designed for continuous deployment environments, but will apply to some extent to software products and open source projects.

Automatically Enforce a Coding Style

Besides all the other benefits of having a code style, another is that consistently formatted code makes it easier for people to write high quality changes, leads to easier reviews and audits, and makes merges easier. Having a written style guide is fine, but have tools to reformat code to the house style is even more important. Everyone has been to a code review, where 90% of time is spent on whitespace and formats. Enforced code style eliminates that.

But what If a file has some horribly formatted legacy code? Format it, commit, push. Then make your changes (and make sure it’s formatted again). Your actual change will be disconnected from the re-formatting change. Don’t worry about losing change history. There are tools that can ignore this type of commit, or consider ripping out the single function that needs change into its own file, or reformatting only the function in question.

Any style is fine as long as it is consistent and automated. With one exception. For PHP, Javascript, Java, C and C++, never use naked if-statements.

Keep static data structures sorted

Coding style also applies to static data structures in code, and certainly any data files. Having these sorted not only makes it easy to read but also is an indicator to any future editor how to make a change. But it also makes merges easier and more predictable.

This rule also includes import or include statements. Think of them as a data structure of the file. Having them sorted makes merges predictable and prevent crazy diffs, but also lets the reader and editor know what to expect. Sorting need not be 100% alphabetical. It’s common to break them up into groups: system includes, external library includes, and then any local includes. The important thing is to make the order of includes predictable.

Put benign changes in a separate commit

Additional logging, debug, comments, name changes, or reformatting should go in a separate commit. These should be very safe, trivial to audit, and easy to read. Then work on making changes. This isolates the important code, focusing attention on stuff that matters.

Add dead code instead of editing critical live code

For non-trivial changes, consider just cutting and pasting the old function into a new function, and then editing the copy. Then commit this new function with nothing calling it. This dead code is completely safe to push to production. The diff here will be really clear. Just a addition of a single block of code. This can be reviewed or committed without anyone needing to use branches.

You can then swap names of the function in separate commit. Other options include using feature flags, a/b testing, ramps or other features to control when the new function will be invoked. In the end, the diff will be one or two lines.

The original function is still available for reference, which is often a lot easier to read than some interwoven diff. After confidence in the new function is established, you can eventually delete the old version.

There is some art on when to do this or not. The more critical or confusing the function, the more likely it is wise to use a copy.

The bigger the impact, the smaller the diff

In continuous deployment environments, you want the last commit and push to the be smallest -- it’s the one that actually turns on all the edits or changes you made. Before that you hopefully pushed out all sorts of minor edits, reformats, dead code, and feature flagged code -- all of which should do nothing in production.

The last commit, be it a config change, a change of condition (if false to if true …), the change of a feature flag, ideally is one line long. If something goes wrong, and your new code doesn’t behave, anyone should be able to undo. They won’t even need to be a git master, since the should be able to review the changes that went to production, see the one-liner, and undo the change by manual typing it out if need be. In emergency you don’t want to be googling how to cherry pick a commit or attempt a complicated unmerge.

Small diffs for open source projects

For open source projects on something like GitHub, the equivalent of “push to production” is having your PR accepted and merged. The same rules above still apply but you have to look at it a bit differently. The odds of your pull-request getting accepted decrease the longer the diff is. Rarely does the repo owner have time to evaluate a complicated change. The trick is to make the smallest, cleanest diff that still advances your goal. This may mean chopping your change into smaller independent chunks and making separate pull-requests or tickets.

Conclusion

As a long time advocate for continuous deployment, I’ve often struggled at how to explain how the process of coding changes in this environment. While “small changes, more often” describes what happens, it’s of little use to developer on how to actually to do it. Hopefully “diff-friendly” will provide some guidance.