January 29, 2020

Refactoring as Programming Philosophy

The subtitle on the cover of Martin Fowler's book Refactoring, Second Edition 2019, originally published in 1999, implies that Refactoring is about "improving the design of existing code". Further inside, in the preface, he says refactoring is "the process of changing a software system in a way that does not alter the external behavior of the code yet improves it's internal structure". A couple paragraphs further on he provides the motivation, "with refactoring we can take a bad, even chaotic, design and rework it into well structured code". He argues that as we build a system, we simultaneously learn how to improve its design. We can alter the code to the shape of the improved design using a catalog of reusable refactoring patterns. Each refactoring pattern is a code change pattern which is "almost simplistic". The point being that these are safe steps. For example, we move the body of a case statement into a function, and call the function in the case statement. Or we move code that is part of a method into its own method, or we move code snippets up or down the stack, addjusting isolation of responsibility.

From my perspective there is an extra issue however. My definition of "refactoring" is any alteration of existing code. If a subsystem of any type (be it a complete server or just a library) is working correctly, and it doesn't require alterations, then refactoring is not an issue, it does not require alteration. Making things pretty or even less chaotic for no reason is not the goal of refactoring. But if code has to change, you are refactoring as well as adding code.

Fowler mentions "not altering external behavior"... then why would we alter the internal mechanism? The answer is so that we can subsequently alter behavior. The better design means more flexibility, prepared for future changes which are blocked by the current design. A better design means better prepared for refactoring, better prepared to cooperate with future alterations required not by the code base, but by the users of the code base. So it's best to remember that it is about altering external behavior, ultimately, especially when arguing for application of the principle.

It's quite possible the reason it's hard to argue with the business side of software development for the time to refactor is because we have not made clear the external behavior benefits a software system will display when continually refactored in key areas. Software made with refactored growth gets more powerful as it ages, insteal of becoming lumbersome and decaying.

Change is continuous in software. Any exceptions are not of concern, all the systems we work on are undergoing change. Software developers are by defintion dealing with the systems that are being changed. A reason to "refactor" is that if you do not consciously refactor, the code will naturally deteriorate, from the law of entropy as applied to information, and bit rot will result. Change is unavoidable, but navigating change with refactoring plans allows us to direct that change so it's not just wandering. If you are touching code, you are changing it, and you are affecting systems that rely on it, and you should keep the project refactoring plan in mind. Thus you should always chose to
incrementally improve that code at an opportune moment, which is always the moment you find yourself changing it.

The lean startup concepts of first, building an mvp, and then, working sprints with scrum, doing kanban, or some other agile methodology, is essentially equivalent to thinking of software development as constant refactoring. The waterfall method was based on the attempt to consider all factors up front, prior to implementation, which is impossible. Thus initial design creates the initial factors, which are subsequently refactored.

Even so called "greenfield" work includes refactoring, first there will be some proof of concept code, and a pre-mvp core, and subseqently those will be refactored to meed the needs of the mvp. Fowler's definition including not changing external behavior still applies, it's just important to realize that pure refactoring is followed by making use of the new abilities of the system, and so practically both things will often be thought of as the same issue. In this case, it's better to think of the change as a refactoring that adds behavior, than and issue that requires changes to support. Refactoring means the design is better, but "chage to support" something new is where hacks come from. Refactoring requires thinking about not only the current change, but the next changes after that.

  • refactoring as a development philosophy
    • Good design should make it safe to refactor.
    • The purpose of testing is to make it safer to refactor.
    • Don't refactor accidentally, do it intentionally.
    • Over time, even short periods, there are compound time savings.
    • Over time, even medium periods, there are exponential configurability
      and system ability.

Healthy Code

Personally, I don't like the term "code smell", it seems mean hearted to code that must be of value somehow or else it would be thrown out. Also, "smell" implies that you and sense something bad but don't know exactly where it's coming from, or that it's rotten. Really, if you see a bad idiom at play, or a dangerous one, or a misused one, like a case statement a thousand lines long, it's a bad idiom. It's analagous to network or power cables that are tangled and countless. It's not a smell, it's a mess. Instead, I like to say that my favorite type of software is software that works, and this is a problem and an opportunity.

When you have a bunch of professionals complaining about how terrible and chaotic a system is, it can only be because some business is using it for some vital business process, so it's indespensible. It has some mission critical purpose that in itself proves there is something worthwhile in there. Software engineers should respect that as an emperical fact. That aslo means that with refactoring it can be conserved and eventually embeded, its flaws removed, and coexist as a good citizen within a well designed software system.

Target Designs

  • refactoring is not a design
    • you need a target design
    • you use "backward compatibility" techniques to create your new design
      such that at any point in time it coexists with existing code
    • coexistence ensures legacy features are not refactored away as
      happens with a "rewrite"
    • often, refactoring patterns are reversible b/c sometimes it's the
      specific target design that defines which direction is "better"

To make progress via refactoring you have to have an idea of where you want to go. Examples of refactoring in Fowler's book often show both directions of a refactoring step, his simplistic steps are often bi-directional code transformation primatives.

Below are less than simplistic steps from my own experience.

Refactoring No Brainers

To talk about refactoring that is specific to the target design of your code base would require a whole reference project, like provided in Fowler's book. For the purposes of this blog I will merely mention some from my own experience that are patterns which immediately call for refactoring. I would like to note that I respect disagreement on any of these just as I disagree with multiple of Fowler's. They are mostly not machine executable transformations however, because they require the goals and motivation of the engineer. To collect some code in a function together, and move it to another function is simple enough to be carefully done by a human, but a machine lacks the semantic power to choose which code needs to be collected, what to call the new function, or where to put it.

You have cut an paste code throughout your code

You need to make the copied code into a shared function.

  1. Review the cut and paste code fragment where-ever you know it's used. You can use variable names to grep for examples. Compare the fragments in detail across a variety of examples to get an idea of how conformant they are to each other, to the most general idea in the code. Often pasted code will have been modified or intersperced with other code for reasons that might be subtle. Some versions will have attempted to be general, but proved not general enough and someone made a special version for themselves. This version may be more general, or more specific with the scars of attempted generality. It's always better to make the copied code more general, then used for the new special purpose, because then it might be backported to the previous page.
  2. To make your general case function, choose the most general case you find and write the function by modifying it. If it's an extreme mess, and none are general enough, create your own generalization and write it from scratch with the (generalized) characteristics of the complete set of similar code.
  3. With your starter code, in a new function, compare to other copies, generalizing to accomodate the second version.
  4. Add arguments as needed to pass variable that had been available in scope of the other function. It's tempting to combine arguments when you find the copy has a similar variable. Don't, fix that later. If the starter function had goodDate and the other has niceDate and they seem the same, wait, they are likely not handled exactly the same, and you don't know the dependencies on that difference. Here is where we are thinking, "don't change behavior". Combining the two variables in the same function, each with their own handling will not affect behavior. Normalizing the use might.
  5. Repeat with other fragments, and when they are different, support the difference with an additional argument.
  6. As you go replace each incorporated copy with the new generalized function. Confirm the software works and is passing tests, but also do targeted manual tests with small scripts to exercise the generalized function and ensure it's answers are the same as the original copy.
  7. Look at the generalized argument list. It might be a mess, with overlapping arguments (e.g. "state" and "stateCapitalized" and "stateCode" and "stateId" and so on) and it's not clear which are interchangeable or merely overlap.
  8. Find the generalizations for this data. For example, if you have a canonical "stateId", then the generalized function can take that, and look up the rest. If niceDate and goodDate are mostly similar, they can be replaced with displayDate serving the purpose of both.
  9. If you want callers of the original copies to not have to change, and use their unique argument lists or names, you can create thin thunk functions that continue to accept those arguments and call the new generalized function. In fact, this is something you can just set up an the begining if you want to make it easy to swap functions around while you work, ensuring it's easy to move backward if the generalization fails.

Using named arguments for functions is very useful in this regard since ordering doesn't matter, allowing parts of the code to ignore changes to the arguments. This means when generalizing by adding new arguments that are needed for generalization won't interfere with the function signature or other code within the function.

You have a 1000 line function

  1. Collect the lines of code within the function. All the lines that refer to the same variable should be bunched together. If the language supports it, hopefully, put the variable declaration at the top of these lines.
  2. Move that code to a function, decide what it's responsibility is and name it appropriately.
  3. Organize the new functions into subsystems of isolated responsibility.
  4. As the function is collected this way, you will start to have functions that call the first round of code, and these go into a library of a higher layer relative to the first round of functions.
  5. The perfect stopping point is when the function reads like an engineer explaining what the functions job is to another engineer.

You have a 1000 line nested switch statement

  1. Create an abstraction for a command
  2. Create an abstraction for a command executor
  3. Create and abstraction for a command context
  4. The end result is all cases encapuslated in command objects that accept command contexts
  5. The end result is the switch repaces by a command executor call

The situation a giant case statement is taking advantage of is generall the global context. That should be placed in the context. The function will now get their inputs from the context. The commands will have names/ids that are stored in a command executor configuration. The switch statement is replaced by a command executor call, new commands can easily be added, and the commands can be executed by alternate executors, which is often useful.

Good design patterns and Refactoring

  • in general good design is that which is easy to refactor
    • OOD encapsulates so that solutions can be replaced
    • OOD encapsulates so that multiple solutions can be retained and swapped in/out
    • refactoring is enabled by code comprehensibility
    • breaking things into units of responsibility makes things easy to refactor
    • testing makes it easy to refactor, if unintended consequences are detected (regression testing)
  • modern methodology, Agile, Lean, Kanban, Scrum... are partly about refactoring or assume it
    • making an MVP is getting a starting point to subsequently refactor
    • Sprints are interpretting all work as time boxed refactoring
    • Kanban is delta-ticket tracking refactor patches
    • All agile sees development as iterative in nature, where you have a "continuous" development methodology, trying to be as granular.
  • If you start with something running and end with something running, you refactored.

To really explain refactoring though I want to harken back to before it was a word, when we had alternate words for refactoring. For one, we didn't admit to refactoring or have a word for it. We 'worked' on code and sometimes if it became unworkable, we "rewrote" it, or "started over".

My introduction to the term, but not the concept, of "refactoring" was from the book Refactoring: Improving the Design of Existing Code, by Martin Fowler in 1999. I might have heard the term earlier, I think I had, but the book cemented it all together. I had loved object oriented design, not as a purist but because it allowed me to change code in and out. It was inherently like a plug in system, not a run-time one, but one for developers, to have classes and object instances working together to achieved desired behavior. The part of object oriented design I liked happened to make refactoring easier.

What I and others were calling iterative development, that now we call agile, or closely related, is all about assuming you are refactoring. You achieve a MVP, and after that, you run sprints refactoring. Even new features are interpreted as a refactoring b/c the system needs refactoring to accomodate the new feature, if it really is new.

Refactoring steps are doable chunks of change, it works at the start, if considered non-optimal or even buggy. It also works when your done, even if considered far from finished. Code that is easy to change is easy to test. Code that's easy to change is easy to understand. Granted, these are tautologies in that... if you don't make it easy to understand, it'll be hard to change, if you don't make it easy to test, you'll make it hard to change. The point is these things go together. Learning how to make code easy to change leads to how we make code easy to understand and easy to change, and the points all reinforce each other.

What's interesting about Martin Fowlers book is that it has many examples of code problems you often see, and how to refactor them. I find that in the decades since reading this book my entire  programming approach has been formed into refactoring techniques. In that time, even green field work has to work with old systems or have some sort of backward compatibility. Refactoring is key.

I am always refactoring. Even green-field is refactoring. I start with scripts to prove concepts, I refactor that into a script calling reusable useful libraries, then that into classes using those libraries and then into a command line tool that performs services, first, especially, that are useful in development, and then into servers and client software.

What's neat about refactoring is getting used to turning the kind of thing you run into that might haunt and afflict you, into what you want it to be. You can't do it right away, but just the knowledge reveals how to isolate your new code from the horrors of the code debt surrounding it. It's not a magic wand but it is possible to refactor code as you work so that your job working with it slowly gets better.

Published with Ghost
based on a the open source Editorial Theme