Coder What Codes At Midnight

The Maturity Sequence

Craig B Allen Jr — Sat, 29 Oct 2022 17:43:18 GMT

From my point of view the meta-philosophy of agile is iterative development and relies on the definition of capability and maturity provided by the Carnegie Mellon Capability Maturity Model. Scrum and Kanban provide a workshop full of tools and we choose certain ceremonies, processes and guidelines as our tools for building a workflow that fits the problem domain and specific problem.

This model has five levels of maturity and capability. Successful software can be made at any level, but it's more likely with a more mature software process. What I personally add to this is the concept that the level a software project should have is related to it's actual maturity. In other words, one is not seeking maximum maturity day on, that's developed iteratively. Less mature levels are appropriate for less mature code. We apply our full set of practices incrementally, over time.

The levels are as follows:

Initial - Otherwise known as heroic, where problems are daringly resolved, hopefully, through long hour antics and fire fighting incidents. A mature system should not be at level one, but a research or demonstration project may be.
Repeatable - The programmer's practices, such as installing software or giving a demonstration, are repeatable. This implies some notes and instructions, especially assuming repeatability by a team. However, some things are still only in the brains of the engineers working on the system. Some practices are ad hoc, the way a particular team naturally interacts, and will not translate forward easily as the team grows or transitions.
Defined - The repeatable processes are fully defined. In this day and age this definition is used to automate repeatable processes as through CI/CD, but also processes like sprints, right-sizing them, team size, creation, and breadth.
Capable - At this level there are metrics for the repeatability of all processes, from technical things like system deployment in the cloud, point capacity per sprint, and any other sprint metrics. This allows the team to accurately say how long a particular project will take and accurately schedule it in calendar time.
Efficient - All systems are continually refined according to metrics. These metrics will also be continually refined in terms of end results, like the popularity of the system with users, or costs. The metrics themselves thus become a focus of refinement as they now can drive the rest of the development process, but cannot be relied on forever. Over time the limitations of some metrics become clear, or the terrain changes requiring different metrics.

When starting a project, level 1 and 2 can come more or less simultaneously by documenting things, always... in other words by keeping notes and publishing them. Comments in code can be the first notes, then that can be moved to READMEs, then also to wiki's and other document repos. When installing a system, always make notes, even the most crappy notes, so that next time you can read them. Seeing what they lack creates a better next draft. After a few times the instructions for the repeated activity will be complete. If the process is inherently complex, making the notes and documentation complex, this indicates that you should change the system to make it easier to document.

This initial level is appropriate to proof of concept, prototypes, initial development for libraries and tools, done by a single person. Level two projects emerge as projects are shared with other developers, the notes published, and the system can be reliably deployed and developed.

Level 3 should exist no later than the beginning of a team development effort, especially since we have tools such as kanban issue systems, sprint workflow tools, and the like, and what such tools provide is specifically level 3 definition. They are places to execute the defined actions, guiding all improvements and problem solving effort.

The fact is, a functional application, such as an internal tool or small site, can operate sustainably at level 3, but will falter when a major problem arises, which it eventually will. But some organizations will prefer to wait until that time, and then do a spike, and rebuild their system, and that effort can succeed at level 3 b/c of the freedom to just write new code and abandon old.

Serious software and enterprise software (even smaller scale) must operate at Level 4. Without the measuring, when the "major problem" arises mission critical software failure is likely, and at scale this cannot be solved with a spike in a reasonable time period, and the system is too complex to rewrite greenfield, and there would be no way to measure the results of the new effort in comparison to the last. Level 4 must be in place long before its needed.

If at Level 4, you should absolutely begin to act at Level 5. The metrics are there for you to have a basis against which to improve, or measure degradation as the software increases in capability. Ultimately there is little reason to have metrics unless you use them, and if you don't improve the metrics themselves, your gaze is fixed and isn't taking into account the changing environment which leads to the major problems we are trying to avoid.

Thanks for reading.

Note: This article is subject to further editing

Novem Computing Ontology

Craig B Allen Jr — Sat, 30 Oct 2021 21:28:00 GMT

I consider myself very adaptable to many varied practices used for development and workflow management. My primary concern in a local development culture is that it's internally consistent, coherent, and realistic. There are countless such systems that are valid in themselves but not compatible with each other. To mix two incompatible approaches is what it means to not be coherent, when the two systems don't cohere with each other under a compatible set of approaches and assumptions. This means the value of a practice from one well vetted workflow cannot be relied upon when used out of the context of that workflow.

In practice, each system of practices in a workflow will obviously have its strengths and weaknesses. Only systems with well known catastrophic weaknesses, such as "waterfall", are thought of as invalid. Centralized designs catastrophically fail at scaling. However, the flaws of waterfall and centralization can be mitigated, so removing some of the practices and creating new practices that address those weaknesses is never off the table. Some special projects and some particular cases may lead us to seek some of their practices for a particular project.

Let me point out specifically what the reader may have already noticed, I am mixing engineering management issues with engineering architecture issues. This is intentional, because the two are intimately related. The engineering management practices used to develop software leaves its mark inside the software, and leads to many of the characteristics, laid down in sedimentary fashion in the source code. Nevertheless, as I continue, I will be focusing more and more on the analytic engineering abstractions used to understand particular software systems, leaving you to relate them back to workflow design.

I analyze all software systems using a set of abstractions into which I decompose them. Another truism, we all have no choice but to do this, but what is the set of abstractions? This truism applies to any existing systems and subsystems, be they libraries, processes, classes, or executables, and to any system I will build. I do something similar with workflow methodologies. For example, I decompose a team's particular use of Jira, or other work tracking software, into my own project tracking methods. I map it into my system, but I don't change it. My own system is just a set of elements to which I can break down the other system, like analyzing water and finding it's hydrogen and oxygen, related by covalent bonds.

I recognize, and the reader should consider, that the system being decomposed to elements will have richness that is not obviously represented directly in the decomposition. The behavior of water is not obviousl from the behaviors of Hydrogen and Oxygen on their own. Their relationship transforms their behavior into a very different observable behavior when connected in a system. The decomposition is just an additional way of understanding it, translating it, into more fundamental laws of computation. They allow doing local work in the system.

My fundamental system is un-opinionated, like organic chemistry, allowing all possible system behavior, but it is very reductive. A particular system always is opinionated, it is specifically itself, and makes decisions about which ingredients to include and what relationships to coax them into. A cake and a jam can both be understood in terms of organic chemistry and seen as particular implementations of it. If an organic chemist encounters the most amazing food they've ever had, having never encountered anything like it, they can be amazed how such a thing is possible. When they take that food to their lab and decompose it, they will nevertheless discover the basic elements of organic chemistry or it would not be edible. They will discover the enzymes, emulsions, and particular chemicals. This increases their understanding of the particular system that surprised them with new systematic behavior.

You can describe baking cake in terms of chemistry and emulsions; the "opinionated" part is what makes the result a cake instead of an industrial solvent. Decomposing a problem domain into my own model helps explains the problem domain for me, clarifying any inherent complexities. In my model there are general purpose software artifacts that can be applied at this elemental level which at the higher level may seem opaque or unsolvable due to assumptions upon which the system relies.

In workflow, I maintain my model for what's going on in the software in parallel with any workflow system like Jira. Jira and the like are communication software for having a conversation about workflow. I'm certain good managers and engineers maintain their own models of project status throughout a project. If they didn't, they wouldn't be able to improve the workflow they implemented in Jira, or define it in the first place.

The word ontology is a noun which names the group of all things referenced by a system. For example, the ontology of mathematics is numbers, the number line, the arithmetic operations, and so on. The ontology of physics is math plus the measurable quantities, space, mass, and time. The ontology of space is a coordinate system, of which there are many complete examples, cartesian, polar, logarithmic, which all share dimensionality.

For the programmer or logician of any sort, abstractions are like tools in a workshop. They are used to break down ideas into parts, or combine them into systems. There are an infinite number of ways to equip a workshop, but there are patterns. The spectrum of tools you'll likely need or find helpful for a type of task bears relationships to all possible tools. A wood shop for making furniture will bear many similarities to other furniture workshops. It will even bear noticeable relationships to a workshop for making doll house furniture. Woodworking shops bear some similarities to a metal machine shop. Japanese gardening tools and European gardening tools cover the same spectrum of tasks, some are interchangeable with each other, but some are special.

The abstractions below are the primary tools of my workshop. Of course I have a lot of more specialized tools, including fantastic software created by thousands of fellow programmers. But all these other tools are decomposable into these principles and can be built from them. I think of all particular or specialized tools (i.e. opinionated software) in terms of how it can be decomposed into the following elements.

Unit of Computation and Flow of Control

The unit of computation is the function.

By "function" I actually mean the general case, a routine. The coroutine is the most general case of a routine. The ordinary function in most classical programming languages is a special case called a subroutine. A subroutine has one entry point taking input data, and one exit point returning output data. Of course, internally the code can branch, and return from different points with different data, but a subroutine can return data exactly once, and accept data exactly once.

In the general case of the coroutine, a coroutine can return a series of values via yielding. After each yield, the caller can re-enter the coroutine at the point of the yield where it will continue running. New data can be sent in during this re-entry, and new output data can be returned until the routine decides it is done, or the caller stops re-entering. Communication between the caller and callee is ongoing, the caller is co-operating with the caller, thus, it's a co-routine. The relationship between caller and coroutine is that of highly efficient single-threaded "cooperative" multitasking.

Coroutines are about cooperative multitasking in a single thread in which the caller is a controller (aka "a control loop") and the callee is a worker. The controller is not necessarilly a loop, technically, but it will generally use a loop to call the co-routine, since it is called repeatedly before finishing.

In the case where the called routine does not handle data being sent in when re-entering the routine, the routine is called a generator and can be used as an iterator, such as in a for..in loop. The generator yeilds a series of values, but only accepts exactly one input. Note that async/await in JavaScript and asyncio in Python are implemented with generators. The syntax in asyncio is syntatic sugar to yeild control back to an event loop that can schedule the function call for the future, and await that resolution in the calling scope.

Having said that, the subroutine, or traditional function, is good enough to think of as the base abstraction conceptually, since coroutines have special coupling with their controllers, another subject. My model relies on the interchangeability of messages and functions, which is most easily imagined with subroutines/functions because of the looser coupling between caller and callee since message passing (input message/output message) happens exactly once in and once out. That is, the regular function signature, taken as a message schema or vice versa, is universally transmitable, as a network message for example. The signature of controler/coroutine interaction always adds some particular protocol of interaction, not only in the data schema passed as with a subroutine, but also acceptable vs unacceptable ordering issues and the type of control messages which may be a part of what is necessarilly a communication protocol.

To execute the function, we hand it the flow of control. That of course gives the function control, and the flow of control flows downward in the function, and when sub-functions are called, deeper in the stack. The currently running function is the control, to the function it calls, and its caller is its controller. Having said this, in practice some routines specialize in controlling their called functions, and are called controllers in the design. Other routines specialize in performing the system logic (i.e. data reduction and business logic) , and these are called workers and services.

The flow within the function could be entirely sequential, it could branch, it can loop and so on, skipping around the source code. Nevertheless this is all powered by a downward flow through the source code. The body of the function is executed from top to bottom, from caller into to callee, deeper down through the callee, and eventually back to the caller to move downward again. Embedded in the flow of control in the body of some of the functions are the computations. The computations are the explicit data transformations that take place. Functions always transform data. To return nothing is to transform your input into nothing.

Lest you think this is a philosophy purely from functional programming, no, it's merely noting that other abstractions, like "classes", can also be thought of as made of functions. I am not a functional programming purist, merely a functional realist. Object oriented class declarations declare a set of functions. It provides syntactic sugar for easily bundling functions together in a tiny library, and sharing state. The shared state can be freely and multiply instantiated, allowing multiple alternates states handled simultaneously, and provides a way to pass this library as a single reference, supporting things like injection of complex dependencies. That's super useful, and is a set of functions coupled together. Often, in a regular old standard function library the library keeps state, the pattern does not require OO syntax, it's just a lot easier and better looking. A class is just a useful syntactic sugar from the compiler to make working with this miniature library, the object instance, easier to manipulate and easier to write.

Decomposing things into functions.

A function always owns exactly one flow of control, which to the function is like its flow of consciousness. The flow is always at one particular point of the source code of the function at any given moment in time. That is analagous to what the function is thinking at a particular moment. It will share that flow of control with other functions, no doubt, the other functions it calls. Even the lowest level functions still are made of function calls, system calls, or inside the kernel, device driver calls. Even though at some point a function turns into voltages and a circuit, from the programmers point of view it's functions all the way down.

A process is always started by calling a single function. A thread is created by calling a function in a way that gives it a new flow of control of its own. As the initial callee, it's now "controller" for everything that it calls, and everything called by the functions it calls, everything beneath it in the stack. In principle, all these subfunctions are "under it's control". However, it has temporarily given up control, giving it to the callee. Though it will eventually return to it's caller, for now, it decides when to return or just loop forever, and it's caller is dormant. It can do things to never return to the controller, such as call process exit, or throw an exception which may skip the caller on the way up the stack. Note: in most if not all languages and systems the caller always has ways to intercept such attempts to never return control, such as catching the exceptions or similarly intercepting a process exit event.

All programmers will know what I'm saying, but probably have not narrated it this way to themselves. I hope you understand my perspective and can see how it fits with or against your own way of decomposing programming. For example, you may not want to consider the caller technically "dormant" b/c maybe that's a poor synonym for "paused". Perhaps you see functions as conceptually properly objects, so everything is objects to you is a lower level decomposition. I'm eager to hear such feedback. A large part of my dedication to my own decomposition techniques is the pragmatic value it's generated for me in my work, allowing me to understand anything throughout the stack of tools in distributed systems.

When a message is sent to another server, the sender is calling the receiving routine that will process that message. There is always a routine that accepts that message after all. Generally it will be a routine that knows how to dispatch the message further. Obviously there may be many functions handling many messages. Somewhere after delivery is a function that has essentially been called by the caller in the other process. The caller and callee still share understanding of the input data schema, but what they don't share, is a common flow of control. The sender does not have to be paused/dormant/blocked.

Loose vs Tight Coupling

Functions have callers. The relationship between the caller and the function is a coupling. The coupling can be loose or tight. A loose coupling is easier to deal with. There is give and take in it. In some cases a tight coupling is needed, where two systems each have orthogonal purpose, but still cooperate closely to implement the features of a complete system. Tight coupling can be more computationally efficient, use-of-memory efficient, that is, optimized. It's fair to say when possible we prefer loose couplings, for they can more easily be made to degrade gracefully.

There are two factors contributing to the tightness of the coupling, shared data schema, and shared flow of control.

Functions calling each other in a single thread are tightly coupled in flow of control. The advantage of this coupling is they know they are not fighting each other to access data, and so no mutual exclusion systems are needed. Things run efficiently with zero race condition collision between the functions. Also, it's unavoidable that many functions are run with the same control flow and from the same scope, as functions are made of function calls.

A function sending a message to a function in another thread is more loosely coupled to it. Because of the intermediary messaging system, little is known about the other function's flow of control. It may be in another thread or even another process. It could even be executed later in the same thread such as with async functions. It may be broken up into a number of messages that fan out and are handled by multiple functions across multiple processes, coupling the caller indirectly to multiple callees simultaneously.

The caller and callee, wherever is runs, are still directly coupled by the shared data schema of the data passed between them. Both must understand the data schema with some degree of compatibility. If they both have to have the exact same understanding about the whole body of data passed, that would be more tightly coupled. If each only had to understand small parts of the data passed, the functions are more loosely coupled. An example of the partial schema would be a caller that passes a whole object, such as a user object, to a function that only understands one part of the schema, such as where in the shema the user's name located. Thus the callee doesn't care about the rest of the schema, just so long as the user's name is in the expected location.

A good example of the tightest possible data schema would be two functions communicating by a unique binary protocol. In cases like this a list of long integers stored as bytes may not be compatible when the same code is run on different processors with different byte ordering.

In general, the more one function has to know about its caller's or callee's data schema the more tightly they are coupled by data schema. For example, if a calling function must call specific other functions to prepare the data to pass to function G, that's a tighter coupling than if it can just call G without preparation, given the data it started with. If function A called function B and B calls C with the same input A gave it. This couples A to C in that case, but ironically, A and C might be only very loosely coupled to B! If B's job is passing data from A to C, then all it has to know is enough to pass the data, which even in the binary data example, means a pointer and length.

You can empirically test how tightly coupled functions are, in a real system, by trying to change one of the functions, and seeing how many other functions have to change with it. Therefore, a general concept of coupling is to ask, "if I change function G, is it likely I will have to change functions that call G?" and "if I change function G, is it likely I will have to change functions that G calls?". The more function that have to be changed along with G, the more tightly coupled the functions are.

Another way of looking at the same issues is to ask "how easy is it to refactor this set of functions?" To the first few orders of accuracy, the looser the coupling, the easier it is to refactor the whole set. However, taken to extreme, the looser coupling will not actually make refactoring easier. This is because the loosest coupling evolves toward highly configurable systems, which are useful in reducing direct coupling. This can become overgeneralization which actually introduces configuration complexity which might in fact make refactoring difficult due to too much indirection. This simply reminds us that in spite of "preferring" loose coupling, in fact, what is desired is optimal coupling. The preference for "loose coupling" comes historically from the era of mostly-only tight coupling, which was inevitable before general data encodings were available.

The ideal design suggested by this interpretation is one in which functions perform well isolated responsibilities, and thus don't have change because another function changes. They are ideally plugged into the system in such a way that changes that don't change their responsibility don't affect them. If adding a new feature, only callers that want to use the new feature have to change. If changing how a feature is accomplished (or fixing a bug), then callers don't have to change how they call the function.

Optimized systems often use tight coupling to solve complex problems and present loose couplings to the rest of the system (e.g. to other processes in a distributed system). A point I'm trying to make clear is that there are potential reasons to do any conceivable type of possible coupling, via control flow and data schema planning, given a problem domain where it's advantageous. But at the same time, there is a general case for the vast majority of software in which an array of mostly loose couplings is used between functions regardless of if they are subroutines, are in the same thread, in another thread, or in another process, on another machine.

Every software system is a set of choices made about functions and how they are coupled by data and flow of control sharing. For me all the answers to questions about the swarm of needed functions and their optimal coupling come from the problem domain itself. Certain patterns are called for, and the problem domain is what calls for them.

Note: This article is subject to further editing

Javascript Coroutines

Craig B Allen Jr — Tue, 10 Mar 2020 00:47:00 GMT

"Subroutines are special cases of ... coroutines." -- Donald Knuth.

tldr; version with the example is below

Recently I was adding a minimalist "recipe system" to an existing code base with the intention of refactoring it into a full system over time. A "recipe system" is a data reduction system driven by a "recipe", which is a series of steps. It's a concept, as far as I know, first envisaged at UKIRT in Perl by Paul Hirst et al. It was further developed at Gemini Observatory, adapted to a pyramid of python coroutines which filled the role of the recipe steps. I'll go into that pattern more in some other blog.

In this case I just needed to execute a list of coroutines, in JavaScript. Think of the steps on the list as executing business logic or data reduction, as coroutines they are functions that can yield values, like returning a value, but also be re-entered where they left off.

These steps are atomic units of interest to the problem domain. They can be written single mindedly since they are mostly decoupled from their control system. The control system in contrast is pure software artifact, problem domain neutral, while all problem domain behaviors are within the steps. To execute the list I needed to write a parent coroutine.

This routine provides the foundation of a control loop, since coroutines are presented to the callers as functions returning iterators. The iterator will iterate once each time coroutine it's calling calls yield. A coroutine can thus be written like an ordinary function but still cooperate with a control loop.

The calling function will always be a custom control loop. The body of the loop has control over continued execution. It can trap all errors. It can inspect output data and augment input data. It is able to perform services in order to offload them from the routines themselves, such as database modifications, to further purify the steps as atomic units concerned only with a specific step of relevance purely from the problem domain. It will obtain the recipe's initial input and dispatch the output as appropriate.

I was adding this directly into the existing code base because I've used generators and coroutines for a long time in python. I have also used them in javascript. I had a good idea of how I wanted it to work. However, the added issue of async functions and passing around promises got me confused and went in circles for a bit. I suffered cyclic off by one paralysis.

I did what I always prefer to do in such a situation; I stepped back and made a small stand-alone demonstration for myself, outside the real world code base I was working on. The demonstration is minimal, as they should be, with just the parts needed to demonstrate the structure of the design. It's a small software tool, meant to help the developer visualize concepts in isolation, so they can be better visualized when deployed in concert together.

In this case that is just the required two level nesting of coroutines. It's often true that code written for the demonstration is subsequently helpful in pieces, if not in the whole. The clarity of isolation a demonstration project provides can also prived the base interfaces which will be clear in practice. Also, such demonstrations can evolve over time as the design concept is enhanced and forks in the road are taken. In the long run they become the ideal implementation of the design originally demonstrated, and demonstrate if that is or is not actually a good thing.

These type of projects demonstrate useful patterns to team members and are to be encouraged. Time should be put aside for them. In this case I put my own time aside, rather than my clients.

async/await

I've found the simplest way to understand async and await is that:

async functions return Promises. So the return retVal; statements in a function marked async will really be a function wrapped in a Promise, which when resolved will become retVal. A promise is an object promising to eventually emit a return value.
await is a keyword that can be used inside of async functions to wait on promises. Thus await someAsyncFooFunction(arg) will essentially pause at the given point in the function until the promise resolves. It also means you can not await an async function and pass the return value around as a promise, resolving it whenever you like.

The whole scheme is to allow us to write functions that look like functions, and let the caller control when functions block or when they can run in parallel. Since the entire concept is single-threaded, mutual exclusion is not needed, but real parallelism happens due to I/O blocking. The low level functions don't have to cooperate with the schemes of their control systems.

Notably this means try/catch blocks can be used, an essential thing. I would say that the use of await and async obviates the desireability of most Promise handling. However, there are certain cases where it's very convienient, for example to start execution of a number of functions and awaiting Promise.all(..) of them to avoid sequentializing I/O.

As I said, in addition to coroutines, the system also uses the async/await syntax. It's worth noting that async functions are implemented under the hood as generators, and are a great example of what generators can do, and how linguistic sugar and like the async/await keywords can expose how expressive the concept is. At the time the coroutine itself was required to be synchronous by node. This was slightly unfortunate. (Now in 2021 async generators are supported in node and all browsers).

Given that, the stack will be like this:

An async control loop function calls...
a synchronous coroutine that calls...
an async "primitive" function.

Brief on Terminology

the list of functions is a recipe
the functions in the list are primitives
the control loop is the caller iterating over the coroutine
the input or output data is the document; think of it as a complex data type, a JSON-serializable pure object.
for a bit more on coroutines, see Wikipedia contrasting them with subroutines here

The Goal

I wanted the ability call an async function (the run action) which would loop over a coroutine (the steps action) that itself iterates over a sequence of functions (the primitive functions). The coroutine is used to weave the infrastructure into the execution of the pipeline while also isolating the primitives from the details of the execution context. The primitives just look like basic functions, input and return value with statements in between. The yield statement just allows returning values to the caller early, and seems to continue right away after from the point of view of this primitive function performing the "step".

This allows the primitives to be lightly constrained functions. They only worry about the data transformation they perform. This is good isolation of concern in general but is also especially useful in data pipelines since scientists and data science programmers, who will be programming the primitives and authoring recipes in principle, don't have to know the underlying execution system. They save time not knowing it, and that also enables evolving the control system and having more than one control system for different execution environments.

A data scientist writing an advanced data processing step might very likely not be skilled in the very different kind of infrastructural programming that is heavy with software artifacts from a whole different domain of abstractions. Their expertise is the discrete mathematics and physics involved in the step itself. In the business domain it's similar, the person that can implement a specified step in a business process is not particularly likely to find the abstraction of the control system easy to comprehend, even if it's kept minimal and with few lines of code. In fact, the few lines of code can make it very fragile, which a business step can often be conceived of an written by even a junior programmers, since it will have been specified in advance, and just be a simple function (albeit with yields).

Even when the same people write both the primitives and the control system, it's better to separate expertise with the control system from expertise in data transformation logic. They're simply two different domains that do not want to be married, but rather business partners.

I wrote two versions, one in which the primitives were regular functions, and another in which they are async functions. I had the impression maybe the promises were impacting my confusion so I excluded them on the first pass of my demonstration. Turns out that wasn't my problem. As it turns out, the only change required was that the control loop awaits the primitive function. As far as what the problem was, it was just getting the ideas straight.

In this blog I'll just go through the version with async primitives. The source code for the demonstration is at the bottom.

Operation of the Control Loop

The coroutine needs to pass an input document to a primitive which will return a Promise. The Promise, when resolved, will provide the output document. In the demonstration I just pass the output received from one step into the next step as input (by passing it as the argument to next(..)), but in real systems the control loop will also perform other actions. These can include a wide variety of services, such as validating the document, saving it, normalizing it, annotating it with workflow information, providing debug output to logging systems, interpreting the document as an instruction, performing some service before the next step of the recipe, and a very wide and unrestricted array of possible and appropriate services.

This makes a recipe system built this way very good at integrating external systems into the data reduction process.

Description of Coroutines and Generators for a Coder

Generators

A generator is a type of function that yields return values instead of just returning them. The difference is that a return statement exits the function whereas a yield statement merely pauses a function. Execution can re-enter the function from the paused point, the function keeps running from there. What?!

The function scope is saved, preserving all local variables, and the controller (whoever called the function) takes over after the yield, until re-entering the routine. Obviously there has to be some syntax for re-entering just as there is a syntax for yielding, and it's possible to imagine a few.

Instead of calling the generator once, like a regular function ("subroutine"), you have to somehow call it multiple times, once for every time it yields. The caller also has to know when the coroutine is done yielding in order to stop reentering it. Fortunately, languages have long had an idiom for this, iteration, and convenient syntax for iterating, such as the for loop.

Classically one could easily iterate over an array, but using an iterator objects abstracts this. Any series of values can be yielded, linked lists, tree traversals, or a sequence of function calls. Indeed, the latter is the most general case, and generators are functions that, using in-language syntax, are packaged as an iterator compatible with all other types of iterators from the point of view of the runtime engine.

To the caller a generator looks like a function returning an iterator object. Iterator objects can always be repeatedly called until done using a for loop (for .. in in python and for .. of in javascript). The function return value behaves just as an iterable does, such as a built in iterator such as an array or list iterator, as well as custom iterator object such as database query iterator.

Therefore you simply iterate over the generator in a loop. The generator call goes where an array variable (etc.) would go (e.g. for i of arrayVar becomes for i of arrayGen()). This is the first really compelling thing about generators, at least it's why I started using them. You can replace a giant array variable with a small and efficient generator that "generates" the values that would be in the array. For example, at one time the common idiom in python (for i in range(1000)) would create an array of integers from 1 to 1000, the return value of the range(..) function. That's a lot of memory for no a good reason. It's tolerable for small ranges but becomes totally unworkable for millions. It's annoying at all sizes, although the syntax would otherwise be eloquent, what they call pythonic in the python world. The use of generators allowed this desirable syntax without that cost. The generator replacement (originally grange(..), but eventually this replaced the range(..) function itself) can create the values one at a time as they are needed. A range of 1000000 is not less efficient than a range of 10.

Another particularly useful application is the use of generators in database queries where there may be a large number of rows or documents returned. A generator allows you to hide all paging and cachine logic in the iterator itself, while the calling loop is able to concieve of the query as returned one at a time. This makes the code that's processing documents (or 'rows') blissfully ignorant about where the "rows" are coming from, or even if any batching and caching is going on. Coroutines are really amazing for isolating special purpose code, business routines, from control systems which are general purpose.

Outside the generator (to the caller) it looks like any other continuous iteration. Each iteration of the loop runs the generator function from the last yield to the next one.

for (yieldedVal of generatorFunction(arguments)) {
  // do something or break to abandon generator
}

Coroutines

In the case of a coroutine, in addition to yielding values outward, upward in the stack, to the control loop, it has a way to receive values. In both javascript and python this is accomplished by the yield statement itself acting as an rvalue which returns a value, an lvalue, specifically the variable being set within the coroutine.

Unlike with a generator there isn't a built in language syntax for this, and we hav to use the iterator as an object and call it's next function, a part of the given languages iterator interface. Thus:

next(injectedValue) is called by the loop which is driving the coroutine with a control loop.
an injectedVal = yield outputDoc statement inside the coroutine receives the value. The coroutine variable, injectedVal, ends up being set to the control loop's argument to next(..), injectedValue.

The variable outputDoc is what the previous call to next(..) returned. This is a source of confusion. The routine executes yield someVal and pauses. When re-entered it resumed first by accepting the injected value, the "return value" of the yield, and applying it as appropriate, e.g. setting a variable.

As mentioned above, the "for loop" syntax has no way to send in these new values. Instead, use of the coroutine has to be done in two steps, instantiating vs iterating. We are doing manually what the "for loop" does under the hood, calling the iterator's next method explicitly.

instantiate: you call the coroutine as a function to instantiate it, the function returns the iterator. The coroutine is automatically in the yield state at the top of the function.
iterate: you repeatedly call the instantiated object's .next(..) member, usually in a loop. The end of the function is communicated the way any iterator is, with some sort of end of iteration return value or exception.

Each call to .next(nextVal) returns whatever value the coroutine yields. The yield inside the coroutine "accepts" whatever the argument is to the subsequent .next(..) call. This little exchange is one of the things that originally make coroutines confusing from my point of view. Even after I inevitably think I've got them understood, I can find myself getting off by one. The yield statement pauses the code, and the return value is really whatever the control loops sends as an argument to (next). It should be thought of as a message.

lvalue and rvalue

To put this simply for purposes of this blog, which is to say grossly simplified:

an lvalue is something that can go on the left side of an equals statement
an rvalue is something that can go on the right side of an equals statement
e.g. lVariable = rStatement

Why it's odd

In the coroutine you have:

const injectedVal = yield yieldedValue;

The injectedVal can be anything the controlling loop sends in as an argument to next(...). Strictly speaking it bears no particular relationship to the yieldedValue. This is a weird thing to me. What I'm used to with a statement in any language is that the lvalue bears a direct relationship to the variables, constants and arguments in the rvalue. It would be odd if parseDate(dateStr) returned a value of no relation to the dateStr passed in. But yieldedValue is just a value the coroutine is sending to its caller. And injectedVal is just a value the caller is sending to the coroutine. Coroutines cooperate in cooperative multitasking, which is why they are "co"-routines. It's specifically a cooperative multitasking model.

Normally y = x+2 means that y now has a relationship to x. It's a transformation of x by 2. Deep in the mind, I think I usually consider the "return value" as made from the "argument", and I suppose most programmers do as well. I mention it now because this was one of the things that has confused me in this case while adding async to the equation.

When this is the case, the function is a "pure function", but even impure function generally emit a return value related somehow to the arguments sent in.

With coroutines it's absolutely a message exchange going on, between the controlling function and the coroutine. The coroutine is cooperating with cooperative multitasking by saying only when is a good time for it to yield, at a minimum. Beyond that it is sending and accepting "messages" with yield statements. It passes the control flow back to the caller, and the caller passed control flow back to it.

The yeildedValue was sent to the calling routine. The calling routine sent the coroutine the injectedVal when it wanted it to run again, thus controlling it. Design-wise, the yield statement is a handoff point between two separate routines that are cooperatively multitasking, thus "co"-routines. The control routine can stop early but the coroutine is authority over when it's done.

Step One, The Coroutine Object Instance

To instantiate the coroutine object:

const cor = coroutine(inDoc);

At this point the body of coroutine has not run at all. Slightly confusing is the fact that the arguments to the coroutine have, however, been added to the coroutine's scope (so arguments that are function calls, for instance, will have been called). That's all that's happened. It's as if calling the function has executed only the creation of the stack and the opening brace, pausing before the first statement, with the function arguments declared but not yet used. The coroutine will only start to actually execute when next(..) is called for the first time.

Extra `next()`, extra `yield`?

Most instructions on making a coroutine in both python and javascript suggest that you call next() immediately after instantiating the coroutine. This acts as a sort of initiation that "starts up" the coroutine. This use of next is outside the actual use of the coroutine as an iterator. This means an extra yield in the coroutine, acting as something like an initialization ack (acknowledgement). To me, since a generator 'generates' a 'virtual' array in my mind, it also feels like having a phantom member at element -1 in the abstract array the coroutine generates. I don't prefer it.

It's worth understanding the issue, for example Harold Cooper's blog post on coroutines is still quite interesting and useful although it suggests using a wrapper function that does the above. This wrapper both instantiates the coroutine and makes an initial next call with no arguments. Absolutely no offense is intended to Harold Cooper, as I say, the suggestion is standard and his blog is nicely informative, so I once again suggest reading it. I reference it as proof that I didn't not make the issue up just to solve it :)

Here's Harold's coroutine wrapper. Notice of course it's a function that is called 'coroutine' which "starts" the argument "f". The function "f" is the actual coroutine.

function coroutine(f) {
    var o = f(); // instantiate the coroutine
    o.next(); // execute until the first yield
    return function(x) {
        o.next(x);
    }
}

Rather than this I want to call the coroutine with every next() being called exactly the same way inside the iterative part of the execution loop, as done with the generator. I don't want to use the iteration function "next" in a special way. To do so imposes a constraint on the generator that it "begins" with an initial yield statement, and anything before the yield statement has special status as to what must or must not go there.

I want the coroutine entirely compatible with a generator, because the only difference is a generator needs only initial argument, not a series of them. If a control loop doesn't care about the values being emitted, it can use it just like a generator since a generator is just a coroutine that's not accepting other non-initial arguments.

I also want to call the primitive only with data it needs to do it's job, not a communication ack.

In my version each next(..) call will always send in the current document and each yield statement will return the document in its possibly intermediary states. In general one will also have a way to pass out requests for the control system to perform a service before the primitive is resumed. This is how you offload tasks to the control loops making them loosely coupled, so the control system control what databases or other services are used, and how, while business logic in routines specifies only that this data ought to be saved, or emailed, etc, at a high level. For the demonstration it is a "document" which is passed in and out, giving the control loop and the caller the chance to manipulate the document. An example use could be the control loop performing validations, logging, or sending messages to clients regarding the status of processing.

Most explanations of coroutines in Python are similar to the above. I think the apparent need for the extra next() call is like the natural occurrence of the one-off error, coming from the structure of the tool mixed with our prior habits. It's why there is no year 0 AD but there is year zero in ISO 8601:2004. There's always various solutions available.

The cost of this co-operation is that while it simplifies writing the routines, the control system needs to handle the baton handoff of messages clearly and as expected. Changes to the control loop can cause changes in behavior, obviously. For a likely example, if you have a loop inside the coroutine, you end up with a very different idiom if your yield is at the end of the loop rather than the beginning, because you, the coroutine, might be expecting the control system to do something for you, and it changes when that is.

It can always be sorted out, but t can be confusing. At the start, before the body, it's one removed from if it's at the bottom of the body. What it means to "finish" changes by one. This subtly changes the relationship between the calling code and the coroutine. Thus the structure of the control loop should be known and fixed for any coroutines written to be executed by it. Done correctly it will be clear when the control loops performs services, validation or logging.

Novem Coroutine Approach

The idiom demonstrated here allows the caller to be a control loop with no initiation of the loop prior to the iteration. The coroutine function in my demonstration is not a wrapper but the actual coroutine:

function *coroutine(inputDoc)
{
    let doc = inputDoc;
    for (let i = 0; i < 6; i++)
    {
        doc = yield primitive(doc);
    }
}

Note: the '*' before the function name is how you declare a generator/coroutine in JavaScript.

Simulation: Above I'm simulating the "sequence" of functions from a list that I mentioned as the recipe with a loop repeatedly calling the same function. This is a simplification that doesn't affect the demonstration b/c it's grammatically equivalent to the loop over a list. Leaving it out saves me some standard plumbing the deployed system will need since there is no need for more than one routine. The target app where I started this will have a list in this pace, and also a wrapping object to help the control system handle the routine. Using this idiom, all the caller ever communicates to the coroutine is input documents and all it receives back is output documents.

In this pattern iteration immediately follows after construction with no handeshake, no "extra" next() call is used to get the coroutine "started".

Why use a coroutine for this?

If you give it some thought, you will see in this simple example, already, the type of flexibility the control loop achieves in this arrangement. The primitive transformations in the leaf coroutine do not care if they are called using a for loop. They don't care if the control loop dynamically modifies context. They don't care if any other primitive will or has been called. They don't care if they finish, even though it's up to them to decide when to finish in principle. The primitives don't even care if they are being called on the same machine as other primitives. In principle, they might even be resumed on an entirely different machine, though that's rarely economical so far. In practice, a primitive lives within a single thread, for the lifetime of the recipe, sharing memory with the rest of the thread.

On the control side, the infrastructure don't care what primitives do with data. They don't care if it's business information being "processed" or scientific data being "reduced" or anything else about domain logic. All it cares about is that the data going in and out is recognizable enough to identify, handle, and store it by itself. In this demonstration, that they're javascript objects. In the target application that they are readilly JSON-serializable javascript objects. If there are services, the control loop cares about the request-for-service messages, and shares knowledge of its schema with the coroutines.

The control system sees the primitive as "just an iterator", but the primitive itself sees itself not as an iterator at all, but as a "data transformation". I love this idea because I think it's the perfect isolation between domain logic (businessness logic, scientific logic, etc.) and control infrastructure.

The control loop doesn't have to care what's done to the data, but it can care. Routines don't have to be in a list, they can be in a tree, selected by datatype of command structures. It is available as a location for hooks that can, for example, validate data on it's way through the system, trigger messages to emit on a message bus, and provide system command and control features. A lot of actions can be performed on the control side that help make the system robust against badly behaved primitives, where at the least they are quickly isolated as the source of a failure in the data pipeline.

The control loop

For me, calling the coroutine from the top control loop is also cleaner and easier to understand without an initializing next() call. Here is the control loop:

    const inDoc = { type: "testdoc" };

    const cor = coroutine(inDoc);   // Instantiate coroutine
    let done = false;
    let doc = inDoc;

    while (!done)
    {
        console.log(" input doc", doc);
        const iterVal = cor.next(doc); // call next(..)

        done = iterVal.done;
        if (!done)
        {
            doc = await iterVal.value; // resolve Promise
            console.log("output doc", done, doc);
        }
        else
        {
            console.log("done");
        }
    }

The coroutine is called with the input data to process, which is its purpose semantically, so that's good. The next(..) function in javascript returns an object with done and value keys. The done member becomes true when iteration is over and the value property is the value yielded by the coroutine. In this case that value is a promise to deliver the output document. The control loop awaits this document before proceeding because its output is needed as input for the next step.

Note that if the control loop has a way to know what routines can be run in parallel, it could start multiple routines and await using Promise.all(..).

A big part of why these two tightly interleaved systems won't interfere with each other is that the inter-operation is cooperative multitasking, they share a thread. Though the primitive is "controlled" by the control loop, the primitive also 'controls' the control loop insofar as it decides when it's willing to pause and let the control flow pass back to the control loop. The control loop has the greater control only because it does not have to resume the primitive, but the primitive has no choice but to release control back to the control loop, eventually, even if there is an error.

Here's the whole demonstration script:

"use strict";

async function primitive(doc)
{
    if (typeof (doc.increment) === "number")
    {
        doc.increment++;
    }
    else
    {
        doc.increment = 1; // undefined means 0 so 0+1
    }
    return doc;
}


function *coroutine(inputDoc)
{
    let doc = inputDoc;

    for (let i = 0; i < 6; i++)
    {
        doc = yield primitive(doc);
    }
}

(async() =>
{
    const inDoc = { type: "testdoc" };

    const cor = coroutine(inDoc);
    let done = false;
    let doc = inDoc;

    while (!done)
    {
        console.log(" input doc", doc);
        const iterVal = cor.next(doc);
        done = iterVal.done;
        if (!done)
        {
            doc = await iterVal.value;
            console.log("output doc", done, doc);
        }
        else
        {
            console.log("done");
        }
    }
})();

Yes, that's where I like the braces. In fact, eslint polices that.

"brace-style": ["warn", "allman", {"allowSingleLine": true}].

I find that it's easier to match braces visually that way.

Source code likes being anthropomorphized.

PS: If you think about it, the "extra" next is in fact required but the coroutine implementation itself already handles that initial yield, stopping right at the opening brace, the real start. No value is yielded, just like the examples above. I'm just leveraging this implicit yield.

MCWCAM Blog Being Reconstructed

Craig B Allen Jr — Mon, 03 Feb 2020 06:11:22 GMT

With a new version of ghost and a new way of running it, this blog is being reconstructed. The only old content being migrated is the blog on coroutines, for which the format didn't survive, so I'm going to edit and improve it in general. To go along with a rewrite of novem.technology site to React, and to go along with plans for the future currently in the works.

I hope you enjoy the new blog as much as we plan to ourselves.

Pardon the dust as we go through our files. You can see an edited version of the coroutine blogs as we experiment with the new blog in plain view, and edit the content so it better serves it's intended purpose.

Refactoring as Programming Philosophy

Craig B Allen Jr — Wed, 29 Jan 2020 06:37:28 GMT

The subtitle on the cover of Martin Fowler's book Refactoring, Second Edition 2019, originally published in 1999, implies that Refactoring is about "improving the design of existing code". Further inside, in the preface, he says refactoring is "the process of changing a software system in a way that does not alter the external behavior of the code yet improves it's internal structure". A couple paragraphs further on he provides the motivation, "with refactoring we can take a bad, even chaotic, design and rework it into well structured code". He argues that as we build a system, we simultaneously learn how to improve its design. We can alter the code to the shape of the improved design using a catalog of reusable refactoring patterns. Each refactoring pattern is a code change pattern which is "almost simplistic". The point being that these are safe steps. For example, we move the body of a case statement into a function, and call the function in the case statement. Or we move code that is part of a method into its own method, or we move code snippets up or down the stack, addjusting isolation of responsibility.

From my perspective there is an extra issue however. My definition of "refactoring" is any alteration of existing code. If a subsystem of any type (be it a complete server or just a library) is working correctly, and it doesn't require alterations, then refactoring is not an issue, it does not require alteration. Making things pretty or even less chaotic for no reason is not the goal of refactoring. But if code has to change, you are refactoring as well as adding code.

Fowler mentions "not altering external behavior"... then why would we alter the internal mechanism? The answer is so that we can subsequently alter behavior. The better design means more flexibility, prepared for future changes which are blocked by the current design. A better design means better prepared for refactoring, better prepared to cooperate with future alterations required not by the code base, but by the users of the code base. So it's best to remember that it is about altering external behavior, ultimately, especially when arguing for application of the principle.

It's quite possible the reason it's hard to argue with the business side of software development for the time to refactor is because we have not made clear the external behavior benefits a software system will display when continually refactored in key areas. Software made with refactored growth gets more powerful as it ages, insteal of becoming lumbersome and decaying.

Change is continuous in software. Any exceptions are not of concern, all the systems we work on are undergoing change. Software developers are by defintion dealing with the systems that are being changed. A reason to "refactor" is that if you do not consciously refactor, the code will naturally deteriorate, from the law of entropy as applied to information, and bit rot will result. Change is unavoidable, but navigating change with refactoring plans allows us to direct that change so it's not just wandering. If you are touching code, you are changing it, and you are affecting systems that rely on it, and you should keep the project refactoring plan in mind. Thus you should always chose to
incrementally improve that code at an opportune moment, which is always the moment you find yourself changing it.

The lean startup concepts of first, building an mvp, and then, working sprints with scrum, doing kanban, or some other agile methodology, is essentially equivalent to thinking of software development as constant refactoring. The waterfall method was based on the attempt to consider all factors up front, prior to implementation, which is impossible. Thus initial design creates the initial factors, which are subsequently refactored.

Even so called "greenfield" work includes refactoring, first there will be some proof of concept code, and a pre-mvp core, and subseqently those will be refactored to meed the needs of the mvp. Fowler's definition including not changing external behavior still applies, it's just important to realize that pure refactoring is followed by making use of the new abilities of the system, and so practically both things will often be thought of as the same issue. In this case, it's better to think of the change as a refactoring that adds behavior, than and issue that requires changes to support. Refactoring means the design is better, but "chage to support" something new is where hacks come from. Refactoring requires thinking about not only the current change, but the next changes after that.

refactoring as a development philosophy
- Good design should make it safe to refactor.
- The purpose of testing is to make it safer to refactor.
- Don't refactor accidentally, do it intentionally.
- Over time, even short periods, there are compound time savings.
- Over time, even medium periods, there are exponential configurability
  and system ability.

Healthy Code

Personally, I don't like the term "code smell", it seems mean hearted to code that must be of value somehow or else it would be thrown out. Also, "smell" implies that you and sense something bad but don't know exactly where it's coming from, or that it's rotten. Really, if you see a bad idiom at play, or a dangerous one, or a misused one, like a case statement a thousand lines long, it's a bad idiom. It's analagous to network or power cables that are tangled and countless. It's not a smell, it's a mess. Instead, I like to say that my favorite type of software is software that works, and this is a problem and an opportunity.

When you have a bunch of professionals complaining about how terrible and chaotic a system is, it can only be because some business is using it for some vital business process, so it's indespensible. It has some mission critical purpose that in itself proves there is something worthwhile in there. Software engineers should respect that as an emperical fact. That aslo means that with refactoring it can be conserved and eventually embeded, its flaws removed, and coexist as a good citizen within a well designed software system.

Target Designs

refactoring is not a design
- you need a target design
- you use "backward compatibility" techniques to create your new design
  such that at any point in time it coexists with existing code
- coexistence ensures legacy features are not refactored away as
  happens with a "rewrite"
- often, refactoring patterns are reversible b/c sometimes it's the
  specific target design that defines which direction is "better"

To make progress via refactoring you have to have an idea of where you want to go. Examples of refactoring in Fowler's book often show both directions of a refactoring step, his simplistic steps are often bi-directional code transformation primatives.

Below are less than simplistic steps from my own experience.

Refactoring No Brainers

To talk about refactoring that is specific to the target design of your code base would require a whole reference project, like provided in Fowler's book. For the purposes of this blog I will merely mention some from my own experience that are patterns which immediately call for refactoring. I would like to note that I respect disagreement on any of these just as I disagree with multiple of Fowler's. They are mostly not machine executable transformations however, because they require the goals and motivation of the engineer. To collect some code in a function together, and move it to another function is simple enough to be carefully done by a human, but a machine lacks the semantic power to choose which code needs to be collected, what to call the new function, or where to put it.

You have cut an paste code throughout your code

You need to make the copied code into a shared function.

Review the cut and paste code fragment where-ever you know it's used. You can use variable names to grep for examples. Compare the fragments in detail across a variety of examples to get an idea of how conformant they are to each other, to the most general idea in the code. Often pasted code will have been modified or intersperced with other code for reasons that might be subtle. Some versions will have attempted to be general, but proved not general enough and someone made a special version for themselves. This version may be more general, or more specific with the scars of attempted generality. It's always better to make the copied code more general, then used for the new special purpose, because then it might be backported to the previous page.
To make your general case function, choose the most general case you find and write the function by modifying it. If it's an extreme mess, and none are general enough, create your own generalization and write it from scratch with the (generalized) characteristics of the complete set of similar code.
With your starter code, in a new function, compare to other copies, generalizing to accomodate the second version.
Add arguments as needed to pass variable that had been available in scope of the other function. It's tempting to combine arguments when you find the copy has a similar variable. Don't, fix that later. If the starter function had goodDate and the other has niceDate and they seem the same, wait, they are likely not handled exactly the same, and you don't know the dependencies on that difference. Here is where we are thinking, "don't change behavior". Combining the two variables in the same function, each with their own handling will not affect behavior. Normalizing the use might.
Repeat with other fragments, and when they are different, support the difference with an additional argument.
As you go replace each incorporated copy with the new generalized function. Confirm the software works and is passing tests, but also do targeted manual tests with small scripts to exercise the generalized function and ensure it's answers are the same as the original copy.
Look at the generalized argument list. It might be a mess, with overlapping arguments (e.g. "state" and "stateCapitalized" and "stateCode" and "stateId" and so on) and it's not clear which are interchangeable or merely overlap.
Find the generalizations for this data. For example, if you have a canonical "stateId", then the generalized function can take that, and look up the rest. If niceDate and goodDate are mostly similar, they can be replaced with displayDate serving the purpose of both.
If you want callers of the original copies to not have to change, and use their unique argument lists or names, you can create thin thunk functions that continue to accept those arguments and call the new generalized function. In fact, this is something you can just set up an the begining if you want to make it easy to swap functions around while you work, ensuring it's easy to move backward if the generalization fails.

Using named arguments for functions is very useful in this regard since ordering doesn't matter, allowing parts of the code to ignore changes to the arguments. This means when generalizing by adding new arguments that are needed for generalization won't interfere with the function signature or other code within the function.

You have a 1000 line function

Collect the lines of code within the function. All the lines that refer to the same variable should be bunched together. If the language supports it, hopefully, put the variable declaration at the top of these lines.
Move that code to a function, decide what it's responsibility is and name it appropriately.
Organize the new functions into subsystems of isolated responsibility.
As the function is collected this way, you will start to have functions that call the first round of code, and these go into a library of a higher layer relative to the first round of functions.
The perfect stopping point is when the function reads like an engineer explaining what the functions job is to another engineer.

You have a 1000 line nested switch statement

Create an abstraction for a command
Create an abstraction for a command executor
Create and abstraction for a command context
The end result is all cases encapuslated in command objects that accept command contexts
The end result is the switch repaces by a command executor call

The situation a giant case statement is taking advantage of is generall the global context. That should be placed in the context. The function will now get their inputs from the context. The commands will have names/ids that are stored in a command executor configuration. The switch statement is replaced by a command executor call, new commands can easily be added, and the commands can be executed by alternate executors, which is often useful.

Good design patterns and Refactoring

in general good design is that which is easy to refactor
- OOD encapsulates so that solutions can be replaced
- OOD encapsulates so that multiple solutions can be retained and swapped in/out
- refactoring is enabled by code comprehensibility
- breaking things into units of responsibility makes things easy to refactor
- testing makes it easy to refactor, if unintended consequences are detected (regression testing)
modern methodology, Agile, Lean, Kanban, Scrum... are partly about refactoring or assume it
- making an MVP is getting a starting point to subsequently refactor
- Sprints are interpretting all work as time boxed refactoring
- Kanban is delta-ticket tracking refactor patches
- All agile sees development as iterative in nature, where you have a "continuous" development methodology, trying to be as granular.
If you start with something running and end with something running, you refactored.

To really explain refactoring though I want to harken back to before it was a word, when we had alternate words for refactoring. For one, we didn't admit to refactoring or have a word for it. We 'worked' on code and sometimes if it became unworkable, we "rewrote" it, or "started over".

My introduction to the term, but not the concept, of "refactoring" was from the book Refactoring: Improving the Design of Existing Code, by Martin Fowler in 1999. I might have heard the term earlier, I think I had, but the book cemented it all together. I had loved object oriented design, not as a purist but because it allowed me to change code in and out. It was inherently like a plug in system, not a run-time one, but one for developers, to have classes and object instances working together to achieved desired behavior. The part of object oriented design I liked happened to make refactoring easier.

What I and others were calling iterative development, that now we call agile, or closely related, is all about assuming you are refactoring. You achieve a MVP, and after that, you run sprints refactoring. Even new features are interpreted as a refactoring b/c the system needs refactoring to accomodate the new feature, if it really is new.

Refactoring steps are doable chunks of change, it works at the start, if considered non-optimal or even buggy. It also works when your done, even if considered far from finished. Code that is easy to change is easy to test. Code that's easy to change is easy to understand. Granted, these are tautologies in that... if you don't make it easy to understand, it'll be hard to change, if you don't make it easy to test, you'll make it hard to change. The point is these things go together. Learning how to make code easy to change leads to how we make code easy to understand and easy to change, and the points all reinforce each other.

What's interesting about Martin Fowlers book is that it has many examples of code problems you often see, and how to refactor them. I find that in the decades since reading this book my entire programming approach has been formed into refactoring techniques. In that time, even green field work has to work with old systems or have some sort of backward compatibility. Refactoring is key.

I am always refactoring. Even green-field is refactoring. I start with scripts to prove concepts, I refactor that into a script calling reusable useful libraries, then that into classes using those libraries and then into a command line tool that performs services, first, especially, that are useful in development, and then into servers and client software.

What's neat about refactoring is getting used to turning the kind of thing you run into that might haunt and afflict you, into what you want it to be. You can't do it right away, but just the knowledge reveals how to isolate your new code from the horrors of the code debt surrounding it. It's not a magic wand but it is possible to refactor code as you work so that your job working with it slowly gets better.