John Obelenus
Solving Problems & Saving Time through Software and Crushing Entropy
The source code to your application and/or product is a representation. It is a model of a digital world you are creating. But remember:
All models are wrong, but some are useful—George Box
All models by definition are conceptual. Until the rise of “knowledge work” models referred to something tangible, something physical in the world we could touch. Software is metaphorically compared to construction, but in construction you have physical boards, nails, screws, and steel. When completed, you have a building. Software “knowledge work” is the first time (that I am aware of at least) where we create a conceptual model of a referent we cannot touch. Well, we can, but its electricity, it hurts to touch it.
There used to be a time, long ago, in our galaxy, where source code wasn’t a model. During that historical time your source code was Assembly, an instruction set for the specific hardware chipset your code would run on. You were programming, very specifically, about the electrical binary state of memory addresses. We’ve come a long way and we don’t want to go back.
High level languages are now representational models of a conceptual digital world that doesn’t actually exist. To repeat, all models are wrong. And I don’t just mean bugs here, but truly “what you think is happening, is not what it really happening.” This truth is why the space for instrumentation, observability, and server-less (or just containers) is growing exponentially right now. We are, finally, admitting to ourselves things are complicated and we cannot simply conceptually reason our way through why N% of our distributed systems are always broken:
“We used to have an illusion that we were in control and nothing was broken. But our systems are full of emergent behaviors and are broken all the time.”— Charity Majors
Today, our languages and code is more powerful and expressive than ever before. You need fewer engineers to do more. So if you have two-pizza team (or more) you are probably solving a problem with a very large surface area. We know that the more nodes — humans — we add to a network — team — the more we will be forced to communicate.
The job of technical management is to create a shared understanding in your team of an imperfect representation. (Aside: the job of people management is to keep the humans happy, and working together well. A manager is responsible for both.) There are some additional things that make accomplishing this task tricky:
There are lots of good tools now to understand what your system is actually doing. There are lots of good tools now for understanding what your users are actually doing. But within your team(s) no one person has the big picture. Everyone has a part of the model of how things ought to work. Stitching that together repeatedly, with little margin for error, is very, very hard. There are not very many tools that are specialized for that.
There is no question, CI/CD has taken over as the you-must-be-this-tall to enter the “cool kids club”. There are plenty of organizations where they have zero interest in going there, no judgement to them. But in addition to being technologically “cool” CI/CD has immense business value—there is zero doubt about it. But is CI/CD the end-all-be-all? No, there is still more to do.
Progressive delivery is what is next. There are plenty of folks who have been doing this for a while. There just wasn’t a name for it yet. If you don’t have continuous deployment its very very unlikely you’ve be able to achieve progressive delivery. Now let me define what I’m talking about.
Progressive delivery is a mechanism that slowly rolls out the new changes to your user base, according to your choices. Under normal deployment scenarios, once your new code goes live every user gets the newness immediately. Of in the case of feature flags, they are binary, on or off. Once you flip it, every user gets it. There are two cases where having a mechanism that controls how many users get a feature becomes incredibly useful.
The first obvious case is error mitigation. If you have a change to a critical path feature you can slowly roll this out. You can start with 5% of your user base, then 10%, watching to see if your error rates increase at all. And you can pull the plug. Note: this does not mean you should stop testing and make your live users test things for you—all your users will leave if you do that.
The second case is when you have a complex system full of negative and positive feedback loop experiences with your users. In a complex system you cannot assume that changing a behavior in feature X will have a desired outcome for user behavior Y. It’s too complex; humans are irrational meat-popsicles. So you can make a feature change, that is fully-functional by all objective measurements (e.g. it doesn’t cause 500 status errors) but that has the wrong desired outcome. Note: if you don’t have an observable system you’ll never be able to know X, Y, or nearly anything else about your system.
Note that this is a slight, but meaningful, difference to an A/B test. In an A/B test you have a 50:50 split, otherwise varied sample sizes will make it hard to trust your numbers. In my experience A/B tests are generally “smaller” in scope and focused on a very narrow behavior (even if that narrow scope is incredibly important) like driving more conversions by changing the sign-up flow. You could, of course, progressively deploy to a 50:50 status if you were worried about the efficacy of the A/B test. Progressive delivery and A/B tests are in the same ballpark, but if you have progressive delivery you get A/B “for free.”
In a strange turn of events at my $currentGig a co-worker and I managed to build a progressive delivery application! We didn’t set out to solve exactly that problem, but once we were done I realized what we did. It is engineered for our specific deployment environments, and we built it in NodeJS, with the proxy-middleware package. It only took us two days to build and test, complete with respectable observability and sampling. I only point this out to show that, once you’ve decided what your priorities are doing the actual work is not that difficult. We’re standing on the shoulders of our deployment platform and event monitoring vendor, so if you don’t have those, you should look into that.
Read more: Progressive Delivery
Engineers place too high a value on erasing technical debt. Over time I’m come to agree with the value that management places on erasing technical debt. There is a time and place to fix it. But it is definitely not first. And it is probably not now.
We engineers need to remember that the cleanest, purest, technology and code doesn’t necessarily win. Because winning is about winning with customers and solving their problems.
I forget that my biggest fight is not with the computer, the compiler, or the system. That is merely a secondary fight to get the job done. The real fight is making sure you solve the right problem.
Solving the wrong problem—or no problem—is how you kill your company and product. Fix your tech debt when it becomes a pain point. But that pain is not fatal. For all the tech companies that we have seen fail I cannot remember a single one whose failure story included technical debt.
Read more: Technical Debt Will Not Kill You
I’m currently working at the largest company I’ve ever worked for. That, along with being at the $previousGig for nine years (which was the smallest I’ve every worked for) meant that I have a lot of re-learning to do. A lot of growth and change happens in nine years.
There is a lot of advice when you get promoted that takes the form of “Imagine what you would have wanted from __________?” You can fill that blank in with your manager, your peer, etc. When you get promoted to a line management position this looks like: “I’m going to prioritize all the things I wish I had for my direct reports.” What this approach misses is that—all your direct reports do not work like you—their brains are fundamentally different.
This is something I am learning quickly. Spending nine years working with a very small team of people led me to believe that most engineers were like us. By the end of nine years we were finishing each others sentences. Great for spreading information, knowledge, and decisions. Bad for…just about everything else.
One of the more fundamental differences in people’s brains are how they collect and organize information (these types are obviously not exclusive, this is a fluid spectrum). There are mechanical, and methodological types. And there are organic, and freewheeling type. I am, undoubtedly the former. Key take-away: There is nothing right or wrong about being one or the other. Both types are smart, intelligent, and get things done. It’s merely how they go about it that is different. And that makes working closely with someone of the other type very difficult, because each type thinks the other type is bonkers.
When you change roles the things you wanted may not be the the things your team wants, or needs. Being a manager means taking care of your team. It does not mean doing what you wish was done to you. It is not some sort of strange business-revenge-situation; or even a business-savior-complex: “If only everyone did it my way there would be profits AND world peace”
You have to understand who you’re working with. You have to understand how they tick, how they get things done. Otherwise, some of your reports will start looking at you like the worst boss they’ve ever had. This is just one piece of the puzzle to demonstrate to engineers that when they become a manager they’ve changed careers.
I don’t think I have to convince anyone that reading code is hard. Often its so hard we throw up our hands and rewrite it. There is a saying that “tech-debt is any code that you didn’t write”. Because reading code is hard.
I don’t think I’m going out on a limb when I say: rewriting shipped-and-working code just because you find it hard to read is absolutely the wrong move. You are actively destroying learned knowledge embedded in that code. Knowledge is value.
Engineers who know that learned knowledge is value document their code. They document code because reading code is hard. Reading sentences describing why the code does what it does is easy. That is a way to protect your code from being rewritten and discarded—a way to protect the value you’ve created through your learning. When you find yourself learning about why an existing piece of code does what it does, and there isn’t documentation—go ahead and document it. Even without changing code you can add value to it.
I have the good fortune to be volunteering some time helping folks learn to code. I’m coming to a realization that—they actually know how to “code”. Or rather, they know how to think through a problem in its constituent steps. And they can break steps into smaller steps. What they don’t know is “how to speak computer”. The syntax trips them up constantly. And they don’t have a good map of concept—>syntax. This is why reading code is hard. And its why junior engineers out of college have, largely, the same problems. They may be better with syntax purely out of having spent more time with it, comfort, but they still lack a map that translates a concept into syntax. That is something that comes with confidence and experience.
When people say “Good code is eas(ier) to read” they aren’t saying that reading code is hard. They are saying that; this code is probably documented, and this code clearly demonstrates the concept the code is implementing. Bad code obfuscates what is really happening. Good code makes it front and center. Good code plainly states what was learned in order to write it and why it was written.
Read more: Reading Code Is Hard