John Obelenus
Solving Problems & Saving Time through Software and Crushing Entropy
I’m being a little tongue-in-cheek. But only a little. With the success of Ruby on Rails, Django, and a dozen PHP frameworks (not even mentioning the MS stack), all web developers have a standard choice for their flavor of programming languages. And, like IBM, no one gets fired for choosing the standard object-relational mapping on top of a relational database — no matter which one you choose.
Don’t misunderstand me, these frameworks are all fantastic, and each one has something that makes it unique. I even have my favorite. Back when I was in college (I won’t tell you when) none of these frameworks existed. Heck, JSON existed only as a spec, but most people wouldn’t find out about it, and start using it, until well after I graduated. We wrote queries and managed our databases by hand. And rather than JSON we played around with SOAP.
Back then there was a lot less data. A lot less. And data was orderly. Prior to relational databases business data was entered into spreadsheets; VisiCalc, Lotus, Quattro. The only interface to those programs was a screen and keyboard, not programmatically. RDBMs allowed for programmatic access over a network. But, on the whole it was the same data that mapped easily into a spreadsheet.
Not anymore. Now our data is messy. Very messy. So messy we can’t agree how to fix it just yet, or what a “fix” even looks like. We know we have a lot of questions and the answers are in the data. Somewhere.
Based on my experience, you never know what you’re looking for in the data ahead of time. When you translate messy data into nice clean related tables you’re going to lose things. The solution is: leave it messy. There is real value in messy data. But the problem is that messy data doesn’t work with ORMs and RDBMs.
That is why its a problem that these systems are so easy. I’ve been baited into believing I will have nice, clean, data. But, after being bit, I know better than to be fooled again. RDBMs systems make me so happy when its easy — because it should be.
“Easy things should be easy, and hard things should be possible.” — Larry Wall
The thing is messy data isn’t hard anymore. It’s only hard in these systems. ORM + RDBMs won’t go away, they will always have a place. But they have to share now. There are other systems, and they are gaining ground, where messy data is easy.
Read more: RDBMS + ORMs are too good. It is a problem.
At the $NewGig we run a lot of services with NodeJS. Unless you live in a tech-less cave you know Node is “all the rage” these days. I won’t offer my $0.02 on everyone’s take — that would be a waste of both our time.
Everyone will tell you that “you get to do two things at once!” — which is not entirely true, its complicated. But at least you get to think about it that way. And while many languages are still catching up with asynchronous operations its easily Node’s biggest advantage. But its not what has me most excited.
Coming from more classical languages and environments there is a ton of flexibility I love about NodeJS. The event loop affords you some nice design choices. In other environments, everything has to happen within the lifecycle of the request. In order to get even the smallest thing outside of that request lifecycle, you need to spin up a background message queue, a fully separate architecture.
Not so in NodeJS because of the event loop. You can send the response back at any point in time, and defer any Promise’d code onto the stack that will keep running. Don’t get me wrong, this doesn’t mean background message queues are dead now. Far from it. And this technique can be easily abused and trash your performance. But you get to do some very nice, small, operations without running up a whole separate infrastructure, and re-querying to get all the data you already have in memory.
So far I’ve been able to do a few things with it.
I’ve increased observability and done so by taking some computationally taxing comparisons out of the request loop. Sending all that data back out to a message queue would be a terrible waste of bandwidth, creating needless complexity and other bottlenecks.
I’ve kept an API endpoint snappy by returning 200 as soon as I get a POST. In this case I am lucky, because returning an error is useless for the calling system. I also get to ignore any timeout settings from the caller, and just happily crunch away on this data, which can actually take some time based on what we’re doing.
NodeJS is very, very, flexible. It is bare bones. And it is fast. Layers of abstraction are powerful, and our entire world of computing is built upon them. Hundreds of layers by now — its kind of too much, but there is no turning back now. Many other environments add many more layers of abstraction, just to serve some HTML or JSON text. NodeJS doesn’t. I don’t need more abstractions, especially leaky ones. Give me power and let me wield it. NodeJS does just that.
Read more: Fun with the Event Loop
The best way to save time while solving problems is to use the right tool for the job. Every one of those hammers in that image has a distinct purpose. If you ask the master — nicely — he just might explain the tools. There are more and more tools now, the ecosystem truly is huge. Lots more then when I began my journey of writing code. It is mind-boggling to try and keep up with everything.
The biggest problems I’ve had during this journey is when I use the wrong tool for the job. I’m still able to get the job done, and solve the problem. But, around half-way through I get a nagging thought: “There must be a better way.” You’re writing code that works against the grain of your tool. Here is a classic example that has bitten me so often: tracking historical changes in a relational database. Relational databases are the wrong tool for this job. Because their strength is keeping the current state normalized, not what has changed along the way. Depending on what you’re tracking, the right tool may be a time series DB, or it may be a document store (you always hated locking your tables with ALTERs didn’t you?).
But in an effort to get things done as quickly as you can, you smash the historical data into the relational database. You actually have to write a bunch of code to get this done. And its not the cleanest code in the world either.
One lesson I am trying to take to heart more often — the more code you write the greater surface area you create for bugs. If you use the right tool you don’t have to write the ugly code that is full of bugs.
Chances are that if you’ve have perpetually buggy code you have something in your ecosystem that was not designed for how you’re using it. Square pegs don’t fit in round holes.
The object here is not “clean code”. The objective is solving the problem, and saving time. Because time is money. Using the right tools saves you money. And making good tools is a good way to make money.
That is why there are so many good tools now. The generation of software engineers before me realized they were using the wrong tool. They went out and built the right one. There are lots of different problems that people are trying to solve. That is why there are so many tools. Figuring out what each of them is good at is now part of our job.
I am very excited about the Streaming architectures that are being developed; Kafka, Samza, Flink, Storm, etc. There are problems that arise “at scale” where throwing more boxes at the problem make the problem worse, or at least, don’t fix the underlying issues. This is a complex problem, so, naturally, these new systems are very complex. Streaming architectures are being built to handle issues of ordering (making sure that “A happens before B which happens before C”) while remaining parallelized (so they can handle hundreds of thousands of requests per second).
I lay awake at night realizing that the code I agonized over to get just right, and find those last troubling bugs, reading megabyte after megabyte of logs and traces, didn’t have to exist at all. Because I was using the wrong tool. And that the solution I ended up with was actually a bug-riddled poor-man’s version of these shiny new tools being developed.
Read more: Many Hammers, Many Nails
In every industry vendor/client relationships require careful management and these relationships can easily fall apart. However, I think there is something uniquely difficult when it comes to software.
The software industry is barely getting started if you compare it to other industries like commercial manufacturing (Henry Ford), or civil engineering (the Romans).
We remain at the beginning of creating truly re-usable building blocks of the software industry. It’s not for a lack of trying! In construction you can take for granted nails, screws, 2x4s, 4x4s, as well as building codes designed by mechanical engineers that say “You will not do …” and “You will do …” otherwise we slap your project with a failing grade. Many decisions have been made for you.
We don’t have nails, screws, and 2x4s. We might have something called Screw, and within it there is every single possible option a screw could have (length, thread width and spacing, head type, finish, base metal) and every single time you want to use Screw you need to make all those decisions over again. Some decisions you care about, some you don’t. And if you’ve ever wandered the halls of Home Depot there are some things you’ve never even heard of before!
In construction you know the cost of all your supplies, and you know what you’re paying for labor. So its fairly easy to make a fixed bid contract — cross your fingers that nothing goes wrong — and if it does, its easy to show the homeowner the problem and plead for more money. They even do it on HGTV!
But in the world of software there is a reason that fixed bid contracts are uncommon these days. Many, if not the vast majority, of the relationships between clients and vendors are time and materials (T&M). There are more unknowns, and the number of moving pieces is far greater. But I think the biggest problem, that “uniquely different” something, is a far greater knowledge gap. I may not know what screws to use on which project, but if you explain it, it makes sense. Software is so ephemeral that if you tried to explain why this one operation is so slow because the database query engine resorts to a table scan because it cannot combine the indexes, made of B-Trees, to get you the result set in a fraction of the time, your client’s eyes have glazed over. They slam the desk and say “Make it faster, that’s what I’m paying you for isn’t it?!”.
Clients pay you to just do what they want. They have zero incentive to understand what their vendors are doing. They just want results. Developers have every incentive to believe their clients, that they are a good and truthful people who understand their business, their processes, and what they want at the end of the day.
When developers finally get to the underlying truth of the matter, they have every incentive to explain what is happening to their client. However, the client has no incentive to believe the developer understands their issues — just do what I say. This isn’t a car where you can take out the broken water pump and show them, or point to the transmission fluid leaking out through a hole in the line. The client has every incentive to believe in forward progress just like every other industry. After you’ve instructed the factory to create one widget, making 1,000 widgets is a lot faster and easier right? There is no incentive to go back to the drawing board for version two!
It comes down to trust, just like every other relationship. But money is involved here. With a giant knowledge gap between two parties and money flowing in one direction that trust can quickly erode. Niceties and codes of conducts are absolutely necessary, but those cannot fix the structural imbalances in these relationships. It is easy to pick on details and specific events for when trust was eroded. But if you don’t realize the game you’re playing you can only keep juggling for so long.
In my opinion, there is one more structure we need to give to this relationship. One that alters the incentives at play. And that is a baseline understanding of being partners with your clients. Not just a vendor.
Read more: Reflections On a Decade In Consulting