Chipping Away at Monoliths
My favorite way to design a program is to distill its requirements into the functionality that makes it unique, and then only build that. The other requirements can usually be satisfied by other programs that have satisfied them before. I have a lot more fun writing programs that work with other programs because I get to focus on the actual problem. That focus on the problem pays dividends too when going back to read it later.
Better Ingredients, Better Programs
I mostly really like Taco Bell Programming. It’s an article about a number of things: Getting to the point. Solving the problem. The state of system administration. Franchise logistics. What I picked up on a recent re-read though is that it’s also implicitly about how to write a good program, specifically by following the tool philosophy. The example command from the article processes files downloaded from a web crawler. It uses standard programs to list the files and control concurrency, and uses a purpose-built program to implement the processing logic:
find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process
Well, we can’t know how maintainable or robust the command is without looking into the implementation of the process
program,
and I suspect some of the concerns come from seeing how it gets used without seeing how it works.
It might be a Go binary or a node.js script.
Maybe it gracefully handles failure,
can be retried,
and has meticulous logging.
Or maybe it does none of those things
and fails spectacularly when it is run by anyone except the original author in their particular environment.
Since we can’t open up this Schrödinger’s program,
I’ll assume it’s as robust as it needs to be to satisfy everyone on the team.
However, we don’t need to open it to observe how it excels as a tool:
In the Shadow of the Monolith
Just for fun, let’s look at a monolithic implementation of the example command from Taco Bell Programming. It could be written in any language, as long as it’s an exact recreation of the original pipeline. I picked Python because it was the next easiest way for me to implement it after the pipeline version.
It is sculpture in code, marmoreal and beautiful. But it is inflexible, and it is not a tool. It is a monolith.
It exhibits great Python style. Its intent and operation is clear. It also has some weaknesses in its design:
- It can only process files found in a directory called crawl_dir in the current working directory.
- It always spins up 32 processes no matter how many files need to be processed.
We can simplify.
Use our tools to chip away at the monolith.
Replace our parallelization code with xargs
or parallel
when necessary.
Replace our filesystem walking code with taking filenames as arguments so that we can use shell globbing or find
.
With the knowledge of what’s already in the toolbox,
we can whittle our implementation down into a workable tool:
import sys
def process(filename):
"This is where the magic happens."
pass
def main(filenames):
for filename in filenames:
process(filename)
if __name__ == '__main__':
main(sys.argv[1:])
That is what I imagine the process
program might look like.
It shows that designing programs as tools can do things like remove boilerplate outer loops,
obviate inflexible input methods and complicated option handling,
and bring focus to business logic.
Erosion and Accretion
The benefits of composable software are apparent when the systems that they are elements are of are changing. Command line tools help to perform frequent interactive and exploratory tasks in the shell by allowing smaller sub-tasks to be combined. Components in systems like React help manage the effects of rapid development and frequent product changes in apps in part by enabling simpler reorganization.
When a system stops changing, it’s not uncommon for the components to accrete back into a monolith. This tendency is explored in Microservices and the Migrating UNIX Philosophy. It argues that components fuse when composability isn’t needed anymore, usually when the system goes into stasis. That is true especially when the system actually is finished and the benefits of composability start diminishing in the face of potential optimizations. In some cases though the accretion is a matter of perspective and packaging. If I develop a set of tools to perform an analysis and then send the final packaged version to a colleague, that program might appear monolithic to the person I send it to depending on how they invoke it, so it’s a matter of perspective.