Novem Computing Ontology

I consider myself very adaptable to many varied practices used for development and workflow management. My primary concern in a local development culture is that it's internally consistent, coherent, and realistic. There are countless such systems that are valid in themselves but not compatible with each other. To mix two incompatible approaches is what it means to not be coherent, when the two systems don't cohere with each other under a compatible set of approaches and assumptions. This means the value of a practice from one well vetted workflow cannot be relied upon when used out of the context of that workflow.

In practice, each system of practices in a workflow will obviously have its strengths and weaknesses. Only systems with well known catastrophic weaknesses, such as "waterfall", are thought of as invalid. Centralized designs catastrophically fail at scaling. However, the flaws of waterfall and centralization can be mitigated, so removing some of the practices and creating new practices that address those weaknesses is never off the table. Some special projects and some particular cases may lead us to seek some of their practices for a particular project.

Let me point out specifically what the reader may have already noticed, I am mixing engineering management issues with engineering architecture issues. This is intentional, because the two are intimately related. The engineering management practices used to develop software leaves its mark inside the software, and leads to many of the characteristics, laid down in sedimentary fashion in the source code. Nevertheless, as I continue, I will be focusing more and more on the analytic engineering abstractions used to understand particular software systems, leaving you to relate them back to workflow design.

I analyze all software systems using a set of abstractions into which I decompose them. Another truism, we all have no choice but to do this, but what is the set of abstractions? This truism applies to any existing systems and subsystems, be they libraries, processes, classes, or executables, and to any system I will build. I do something similar with workflow methodologies. For example, I decompose a team's particular use of Jira, or other work tracking software, into my own project tracking methods. I map it into my system, but I don't change it. My own system is just a set of elements to which I can break down the other system, like analyzing water and finding it's hydrogen and oxygen, related by covalent bonds.

I recognize, and the reader should consider, that the system being decomposed to elements will have richness that is not obviously represented directly in the decomposition. The behavior of water is not obviousl from the behaviors of Hydrogen and Oxygen on their own. Their relationship transforms their behavior into a very different observable behavior when connected in a system. The decomposition is just an additional way of understanding it, translating it, into more fundamental laws of computation. They allow doing local work in the system.

My fundamental system is un-opinionated, like organic chemistry, allowing all possible system behavior, but it is very reductive. A particular system always is opinionated, it is specifically itself, and makes decisions about which ingredients to include and what relationships to coax them into. A cake and a jam can both be understood in terms of organic chemistry and seen as particular implementations of it. If an organic chemist encounters the most amazing food they've ever had, having never encountered anything like it, they can be amazed how such a thing is possible. When they take that food to their lab and decompose it, they will nevertheless discover the basic elements of organic chemistry or it would not be edible. They will discover the enzymes, emulsions, and particular chemicals. This increases their understanding of the particular system that surprised them with new systematic behavior.

You can describe baking cake in terms of chemistry and emulsions; the "opinionated" part is what makes the result a cake instead of an industrial solvent. Decomposing a problem domain into my own model helps explains the problem domain for me, clarifying any inherent complexities. In my model there are general purpose software artifacts that can be applied at this elemental level which at the higher level may seem opaque or unsolvable due to assumptions upon which the system relies.

In workflow, I maintain my model for what's going on in the software in parallel with any workflow system like Jira. Jira and the like are communication software for having a conversation about workflow. I'm certain good managers and engineers maintain their own models of project status throughout a project. If they didn't, they wouldn't be able to improve the workflow they implemented in Jira, or define it in the first place.

The word ontology is a noun which names the group of all things referenced by a system. For example, the ontology of mathematics is numbers, the number line, the arithmetic operations, and so on. The ontology of physics is math plus the measurable quantities, space, mass, and time. The ontology of space is a coordinate system, of which there are many complete examples, cartesian, polar, logarithmic, which all share dimensionality.

For the programmer or logician of any sort, abstractions are like tools in a workshop. They are used to break down ideas into parts, or combine them into systems. There are an infinite number of ways to equip a workshop, but there are patterns. The spectrum of tools you'll likely need or find helpful for a type of task bears relationships to all possible tools. A wood shop for making furniture will bear many similarities to other furniture workshops. It will even bear noticeable relationships to a workshop for making doll house furniture. Woodworking shops bear some similarities to a metal machine shop. Japanese gardening tools and European gardening tools cover the same spectrum of tasks, some are interchangeable with each other, but some are special.

The abstractions below are the primary tools of my workshop. Of course I have a lot of more specialized tools, including fantastic software created by thousands of fellow programmers. But all these other tools are decomposable into these principles and can be built from them. I think of all particular or specialized tools (i.e. opinionated software) in terms of how it can be decomposed into the following elements.

Unit of Computation and Flow of Control

The unit of computation is the function.

By "function" I actually mean the general case, a routine. The coroutine is the most general case of a routine. The ordinary function in most classical programming languages is a special case called a subroutine. A subroutine has one entry point taking input data, and one exit point returning output data. Of course, internally the code can branch, and return from different points with different data, but a subroutine can return data exactly once, and accept data exactly once.

In the general case of the coroutine, a coroutine can return a series of values via yielding. After each yield, the caller can re-enter the coroutine at the point of the yield where it will continue running. New data can be sent in during this re-entry, and new output data can be returned until the routine decides it is done, or the caller stops re-entering. Communication between the caller and callee is ongoing, the caller is co-operating with the caller, thus, it's a co-routine. The relationship between caller and coroutine is that of highly efficient single-threaded "cooperative" multitasking.

Coroutines are about cooperative multitasking in a single thread in which the caller is a controller (aka "a control loop") and the callee is a worker. The controller is not necessarilly a loop, technically, but it will generally use a loop to call the co-routine, since it is called repeatedly before finishing.

In the case where the called routine does not handle data being sent in when re-entering the routine, the routine is called a generator and can be used as an iterator, such as in a for..in loop. The generator yeilds a series of values, but only accepts exactly one input. Note that async/await in JavaScript and asyncio in Python are implemented with generators. The syntax in asyncio is syntatic sugar to yeild control back to an event loop that can schedule the function call for the future, and await that resolution in the calling scope.

Having said that, the subroutine, or traditional function, is good enough to think of as the base abstraction conceptually, since coroutines have special coupling with their controllers, another subject. My model relies on the interchangeability of messages and functions, which is most easily imagined with subroutines/functions because of the looser coupling between caller and callee since message passing (input message/output message) happens exactly once in and once out. That is, the regular function signature, taken as a message schema or vice versa, is universally transmitable, as a network message for example. The signature of controler/coroutine interaction always adds some particular protocol of interaction, not only in the data schema passed as with a subroutine, but also acceptable vs unacceptable ordering issues and the type of control messages which may be a part of what is necessarilly a communication protocol.

To execute the function, we hand it the flow of control. That of course gives the function control, and the flow of control flows downward in the function, and when sub-functions are called, deeper in the stack. The currently running function is the control, to the function it calls, and its caller is its controller. Having said this, in practice some routines specialize in controlling their called functions, and are called controllers in the design. Other routines specialize in performing the system logic (i.e. data reduction and business logic) , and these are called workers and services.

The flow within the function could be entirely sequential, it could branch, it can loop and so on, skipping around the source code. Nevertheless this is all powered by a downward flow through the source code. The body of the function is executed from top to bottom, from caller into to callee, deeper down through the callee, and eventually back to the caller to move downward again. Embedded in the flow of control in the body of some of the functions are the computations. The computations are the explicit data transformations that take place. Functions always transform data. To return nothing is to transform your input into nothing.

Lest you think this is a philosophy purely from functional programming, no, it's merely noting that other abstractions, like "classes", can also be thought of as made of functions. I am not a functional programming purist, merely a functional realist. Object oriented class declarations declare a set of functions. It provides syntactic sugar for easily bundling functions together in a tiny library, and sharing state. The shared state can be freely and multiply instantiated, allowing multiple alternates states handled simultaneously, and provides a way to pass this library as a single reference, supporting things like injection of complex dependencies. That's super useful, and is a set of functions coupled together. Often, in a regular old standard function library the library keeps state, the pattern does not require OO syntax, it's just a lot easier and better looking. A class is just a useful syntactic sugar from the compiler to make working with this miniature library, the object instance, easier to manipulate and easier to write.

Decomposing things into functions.

A function always owns exactly one flow of control, which to the function is like its flow of consciousness. The flow is always at one particular point of the source code of the function at any given moment in time. That is analagous to what the function is thinking at a particular moment. It will share that flow of control with other functions, no doubt, the other functions it calls. Even the lowest level functions still are made of function calls, system calls, or inside the kernel, device driver calls. Even though at some point a function turns into voltages and a circuit, from the programmers point of view it's functions all the way down.

A process is always started by calling a single function. A thread is created by calling a function in a way that gives it a new flow of control of its own. As the initial callee, it's now "controller" for everything that it calls, and everything called by the functions it calls, everything beneath it in the stack. In principle, all these subfunctions are "under it's control". However, it has temporarily given up control, giving it to the callee. Though it will eventually return to it's caller, for now, it decides when to return or just loop forever, and it's caller is dormant. It can do things to never return to the controller, such as call process exit, or throw an exception which may skip the caller on the way up the stack. Note: in most if not all languages and systems the caller always has ways to intercept such attempts to never return control, such as catching the exceptions or similarly intercepting a process exit event.

All programmers will know what I'm saying, but probably have not narrated it this way to themselves. I hope you understand my perspective and can see how it fits with or against your own way of decomposing programming. For example, you may not want to consider the caller technically "dormant" b/c maybe that's a poor synonym for "paused". Perhaps you see functions as conceptually properly objects, so everything is objects to you is a lower level decomposition. I'm eager to hear such feedback. A large part of my dedication to my own decomposition techniques is the pragmatic value it's generated for me in my work, allowing me to understand anything throughout the stack of tools in distributed systems.

When a message is sent to another server, the sender is calling the receiving routine that will process that message. There is always a routine that accepts that message after all. Generally it will be a routine that knows how to dispatch the message further. Obviously there may be many functions handling many messages. Somewhere after delivery is a function that has essentially been called by the caller in the other process. The caller and callee still share understanding of the input data schema, but what they don't share, is a common flow of control. The sender does not have to be paused/dormant/blocked.

Loose vs Tight Coupling

Functions have callers. The relationship between the caller and the function is a coupling. The coupling can be loose or tight. A loose coupling is easier to deal with. There is give and take in it. In some cases a tight coupling is needed, where two systems each have orthogonal purpose, but still cooperate closely to implement the features of a complete system. Tight coupling can be more computationally efficient, use-of-memory efficient, that is, optimized. It's fair to say when possible we prefer loose couplings, for they can more easily be made to degrade gracefully.

There are two factors contributing to the tightness of the coupling, shared data schema, and shared flow of control.

Functions calling each other in a single thread are tightly coupled in flow of control. The advantage of this coupling is they know they are not fighting each other to access data, and so no mutual exclusion systems are needed. Things run efficiently with zero race condition collision between the functions. Also, it's unavoidable that many functions are run with the same control flow and from the same scope, as functions are made of function calls.

A function sending a message to a function in another thread is more loosely coupled to it. Because of the intermediary messaging system, little is known about the other function's flow of control. It may be in another thread or even another process. It could even be executed later in the same thread such as with async functions. It may be broken up into a number of messages that fan out and are handled by multiple functions across multiple processes, coupling the caller indirectly to multiple callees simultaneously.

The caller and callee, wherever is runs, are still directly coupled by the shared data schema of the data passed between them. Both must understand the data schema with some degree of compatibility. If they both have to have the exact same understanding about the whole body of data passed, that would be more tightly coupled. If each only had to understand small parts of the data passed, the functions are more loosely coupled. An example of the partial schema would be a caller that passes a whole object, such as a user object, to a function that only understands one part of the schema, such as where in the shema the user's name located. Thus the callee doesn't care about the rest of the schema, just so long as the user's name is in the expected location.

A good example of the tightest possible data schema would be two functions communicating by a unique binary protocol. In cases like this a list of long integers stored as bytes may not be compatible when the same code is run on different processors with different byte ordering.

In general, the more one function has to know about its caller's or callee's data schema the more tightly they are coupled by data schema. For example, if a calling function must call specific other functions to prepare the data to pass to function G, that's a tighter coupling than if it can just call G without preparation, given the data it started with. If function A called function B and B calls C with the same input A gave it. This couples A to C in that case, but ironically, A and C might be only very loosely coupled to B! If B's job is passing data from A to C, then all it has to know is enough to pass the data, which even in the binary data example, means a pointer and length.

You can empirically test how tightly coupled functions are, in a real system, by trying to change one of the functions, and seeing how many other functions have to change with it. Therefore, a general concept of coupling is to ask, "if I change function G, is it likely I will have to change functions that call G?" and "if I change function G, is it likely I will have to change functions that G calls?". The more function that have to be changed along with G, the more tightly coupled the functions are.

Another way of looking at the same issues is to ask "how easy is it to refactor this set of functions?" To the first few orders of accuracy, the looser the coupling, the easier it is to refactor the whole set. However, taken to extreme, the looser coupling will not actually make refactoring easier. This is because the loosest coupling evolves toward highly configurable systems, which are useful in reducing direct coupling. This can become overgeneralization which actually introduces configuration complexity which might in fact make refactoring difficult due to too much indirection. This simply reminds us that in spite of "preferring" loose coupling, in fact, what is desired is optimal coupling. The preference for "loose coupling" comes historically from the era of mostly-only tight coupling, which was inevitable before general data encodings were available.

The ideal design suggested by this interpretation is one in which functions perform well isolated responsibilities, and thus don't have change because another function changes. They are ideally plugged into the system in such a way that changes that don't change their responsibility don't affect them. If adding a new feature, only callers that want to use the new feature have to change. If changing how a feature is accomplished (or fixing a bug), then callers don't have to change how they call the function.

Optimized systems often use tight coupling to solve complex problems and present loose couplings to the rest of the system (e.g. to other processes in a distributed system). A point I'm trying to make clear is that there are potential reasons to do any conceivable type of possible coupling, via control flow and data schema planning, given a problem domain where it's advantageous. But at the same time, there is a general case for the vast majority of software in which an array of mostly loose couplings is used between functions regardless of if they are subroutines, are in the same thread, in another thread, or in another process, on another machine.

Every software system is a set of choices made about functions and how they are coupled by data and flow of control sharing. For me all the answers to questions about the swarm of needed functions and their optimal coupling come from the problem domain itself. Certain patterns are called for, and the problem domain is what calls for them.

Note: This article is subject to further editing