Blog

Code As Communication

20 Feb 2020

I'm fond of saying that code has no value until it is running in production. I still believe that is a correct and important statement. But I want to qualify that statement to say that code has no performative value until it is running in production. No one's job is complete, its not "done" until its running in prod.

But there is another value in written code: communication. There comes a time when you have written the one pagers. You've written more detailed documentation and plans on what you're going to do. And you've had the meetings to discuss all of these ideas. But sometimes you still don't have traction with the folks involved. People seem to still not be on the same page. To use an aviation metaphor—you're trying to take off and there is a lot of drag slowing you down from reaching takeoff velocity.

Start writing code. Don't even try and run it. Create a draft pull request. Keep writing code and increase your code coverage on the topic. There is communication value in non-running code. It can get people past whatever the mental blocks are.

Read more: Code As Communication

Engineering Speed is a Symptom

1 Feb 2020

I think the speed of a team is the result of a function, it is the symptom of a working environment. It doesn’t happen by simply standing up and demanding that people work faster. Engineering teams deliver quickly when they have confidence. I don’t mean individual self-confidence, or a level on Maslow’s hierarchy of needs. And I certainly don’t mean the overconfidence of someone’s ego. That function is transferable experience of production systems, that results in engineering speed. A working environment is one that produces this transferable experience in engineers.

This is what my experience as a Technical Lead has taught me. Each individual has their own baseline, and their own limits. Not everyone shows up to work together starting in the same mental place. And not every company needs the same pace—a seed startup and a publicly traded company need different pieces at different speeds. There are always impediments and roadblocks that slow anyone down. Confidence comes from experience, and the experiences required, I think, are very specific. You know people have the right kind of experience its transferable. That means they come into a new scenario confident—and deliver on that confidence. But first I want to talk about what I believe are red-herrings.

Here are two things everyone thinks you absolutely need in order to go fast. There is no doubt these things help, but they are not required—that is a myth. Whether you have these or not you still need confidence that comes from experience. Neither of these give you experience or confidence.

Tests & CI/CD systems

Test coverage is a hotly debated topic. How much should coverage should you have? What does it mean when you have X% coverage? How do you know it is enough, or not enough? How long is too long for your test suite to run before it starts slowing your people and processes down? Unit tests prove one thing, that a highly specific input for an isolated piece of code results in a highly specific output. There is zero doubt that is useful in certain times and places.

But as the bugs grow the number of test cases grows. As the codebase grows the test cases and the number of bugs grow. That doesn’t sound fast, or that it scales, to me. I’ve seen teams that have a lot of tests and these teams still move slowly, because they aren’t confident in what their system is doing and therefore are unsure how to keep moving forward.

Having a CI/CD system doesn’t necessarily give you confidence either. It is definitely helpful that the deployment of code is according to a consistent process, if only that it saves a human time and removes some possibilities of human error. It is a tool, just because you use a tool doesn’t mean you’re a confident craftsman.

Staging environments that match production

Once a system reaches sufficient complexity no other environment can ever match the real live environment. It doesn’t matter if you’ve set it up exactly the same way. It doesn’t have the same traffic, and it doesn’t have the same random collisions of events that are the ones that cause problems in production. I don’t understand why people think they cannot live without a fully-matching-prod staging environment for day-to-day operations. There is no doubt it is useful for certain things at certain times. But staging environments do not bring that much difference from working on your laptop. That is a learning curve that happens pretty quickly during engineering, and then it plateaus. Forever. Therefore staging doesn’t bring confidence.

These three things are so common I am sure people cannot imagine living without them. Someone who has had these tools ought to be able to live without them. Why? Because their level of confidence isn’t given to them by these tools. At times these tools can help them gain confidence, but it is never the fundamental piece. These tools give you false confidence. You can spot the difference between false confidence: teams still move slowly, and are worried about making changes.

We need to understand the reality of our systems

There is nothing that is at all like your production system. Nothing approaches behaving like it. The only way to gain the experience that will make you confident is by living in your production system. Unit tests bear almost zero resemblance to production. Poking around with features based on old data on staging is not living in prod either.

Living in prod means watching users use your system. Whether that is standing over the shoulder of users (if you can be so lucky), or having enough observability to determine what is really happening. Standard monitoring that tells you your P95 is ~200ms is not enough. Outliers matter, because the outliers are real users having real problems that are going to complain about your product on twitter. They are going to stop using it, and stop paying for it one day.

Living in prod means knowing what your system is actually designed to do, and knowing how it works. I’ve been shying away from high-level abstractions and large dependencies more and more these days. Especially ones I don’t control or deeply understand. The more you understand about the fundamentals of your system the more confidence you’ll have. Who knows when the next left-pad exploit is going to hit your system—make sure you’re depending on things you need, and understand.

Defensive code that runs in prod is more important than unit tests. As a tech lead I find that I am writing more code to do tasks I’ve done before. Why? I am writing code that is noisy and fast to debug in production because I have a team to support. My job is not to write as much code as I can. Now my job is about communicating and being an example. Communicating examples of failure in production is extremely valuable information for my team. Especially when I may not be the one supporting a particular piece of code in the future.

The big reason that trunk-based development and continuously shipping small changes are rising in popularity is because when you make small changes you ought to know exactly what should change in your system. And you better be watching your system after you ship to make sure that it changed, and it is the only thing that changed. Observing what you ship brings confidence. Your CI/CD system is pretty useless when you work on long-lived branches and ship a lot of changes at once. Your tool is sitting idle most of the time. That does not instill confidence.

Confidence is not about intelligence. It is about experience. Usually people get experience through pure time on the ground. But that takes a long time, and plateaus quickly unless that person gets involved in every different part of the org and code. If you put that person in a new place how much of their experience is transferable? Almost none of it, and they quickly lose any confidence they’ve built up.

Read more: Engineering Speed is a Symptom

My Own (New) Management Reflections

14 Jan 2020

I'm nearing the six month mark in my $latestGig, and my role has turned into a product and people management role. I was hoping this would happen, but not before I got to fully build something myself. Such is life, ruining a well formulated plan.

I've done the customer-facing part of Product Management before as a consultant. But this time around I have learned that Product management is as much about getting alignment and agreement within your org about what to build as it is with users. There is less convincing required, but more prodding for the answer to: "And how does this effect you and your department's plans?" Of course this one runs both ways. Announced changes require getting involved to make sure that your services are still receiving everything they need, especially with new rollouts.

I haven't had a chance to manage people yet, and I'm excited for the new challenge. In the short few weeks I'm focused on communicating our team goals for the quarter, establishing expectations, and staying out in front of the team both in terms of toil and getting them all the definition and details they'll need to keep working without delay.

Over the last few years I've been less and less interested in what happens inside the black box that is the computer. I've become far more interested in how much a well-running team can really achieve. I'm interested in what structures you can engineer that are going to change the landscape for your team, department, and customers.

Read more: My Own (New) Management Reflections

What Is My Culture?

11 Nov 2019

I just finished Ben Horowitz's "What You Do Is Who You Are". I want to focus on the biggest lesson I took from the book about creating your own culture. I sum it up in this way; there is a specific belief or outlook you have, that others do not, which will determine the level of success of your endeavor.

It made me ask the question of myself. First off, my general "endeavor" has always been a well-functioning Engineering department, especially regarding "the Product" and Product Management. I admit, this is not the culture of a whole company that is looking to either "change the world" (Ha!).

I think everyone would agree that "not wasting other people's time" is an obvious thing we all agree about. However, I don't think most people actually take it seriously. One example from the book that I appreciate is from Bezos at Amazon. Meetings have agendas. They take the form of a written explanation/argument, it is not merely a set of bullet points. Every meeting starts with silence—while everyone reads the agenda. This also necessitates being on time. If everyone came on time, and was prepared, because they read the agenda beforehand, and learned what they needed to know, and read the agenda again—to get their head in the right spot for the meeting—you can finally have a productive meeting. Making shocking rules around being prepared for meetings, and living by those rules really has the chance to shorten those OODA loops, so you can waste less time as an organization, and have better reactions to the world.

But what belief do I have that is unique, that others would likely disagree with, that will make a critical difference in these teams performance? If I were to try and put this into words it would come out something like; "Build so you can walk away." In my career, this has been the defining element of my own success. Over the last ten years, every production system I've built, all four of them, are still running. And I've walked away from all of them to build the next one. This was absolutely based on a constraint—for three of those projects I worked at a three person company. In order to do more work, we had to stop doing other work. But the software kept running and people used it every day.

I view this topic as the primary difference in the maturity of the software industry versus other industries. What happens as we continue to work on our software products? We hire more folks, and create more features. More, and more, and more. But, at some point, you need more and more people to even keep up with managing what you've created. That is not scaling and does not scale. In other industries, like construction, the size of the crew doesn't change much. They move from site to site. Once they leave, the building is up and running and, more or less self-maintaining with some basic oversight no where near the level of investment to construct the building.

Underpinning this belief is an axiom that the most expensive part of any work is the cost of communication. That is why "adding people to an already late project makes it even later." Will Larsen's discussion about migrating services also sticks in my head on this topic. When you're in a growth trajectory your processes and software will be useless roughly every 18 months. Because so much has actually changed on the ground, that what worked at one size, no longer works at the new size. Who are the people most capable of building the new system? The people who are watching the current one fail (assuming everyone is reasonably equal in terms of technical ability). They won't have the ability to build the next one, if they can't walk away from the current one. There are many reasons people need to walk away from something, the largest of all is when they leave a job.

My favorite parts of the book were reading the various "shocking rules" of each of the cultures the book went over. What would my own shocking rule be? How would it enforce my culture of "build so you can walk away"? I'm still working on this, but something along the lines of a forced vacation or rotation every few months, or after shipping a project. I'm not sold that is precisely the rule, but, that is what my brain is stuck on right now.

This has a dynamic similar to the "should we release/deploy on Friday?" question. If I ask you how confident you are in what you're about to release/deploy, you say "Very confident!" If I said "Great, release it on Friday at 5pm!" would your answer change with the threat of having to work on the weekend if it goes wrong?

If I told you that tomorrow you will be working on something new, what would happen to your last work? Is there enough documentation for someone to figure out how it works? Are there enough tests so that other people can confidently make changes without worrying they're going to break the world? How many times a day is someone going to interrupt you with a question that only you know the answer to? How quickly will that exhaust you and mean you can't actually make progress on the new priority?

Moving people around regularly can build all these muscles. These muscles are valuable to creating the most value with the fewest number of people. Fewer people means paying less communication tax. That is scaling, not just getting more humans into the building. This process will also create a selection bias of people that are good at communicating. Why? Because I'm forcing them to. I'm forcing them to communicate ahead of time, through writing, through tests, through agendas and meetings, through good design and UX. Someone who stays on the same team, on the same codebase, for a long time, no longer has to communicate, because everything is known to them. And it's all in their head. How does this data get out? Usually by random osmosis of other people who need data asking random questions. It's almost never systemic. I want my culture to be one of systemic communication so that you can build something, and walk away from it—knowing that it will keep working, and that others have resources to onboard themselves to it.

Read more: What Is My Culture?

Ruffled Role Feathers, Why We Need Generalists

2 Nov 2019

This image has been the source of a lot of consternation lately. It seems that a lot of people think the role of "Full Stack Developer" is "way too many things for one person to possibly know", and consequently "if anyone claims to know all this, they must be fibbing."

I don't think there is a problem with the role of Full Stack Developer. Disclaimer: I am one. Obviously I am biased, but clearly I am not the only one who thinks this arrangement is a good thing. Full Stack is a generalist role. We need generalists. With all the talk of how good "T-Shaped" individuals are for organizations, I'm surprised to hear as much negative press for the role as I do.

Read more: Ruffled Role Feathers, Why We Need Generalists