Upgrading to a new point release on the Linux kernel isn't the riskiest thing you can do in your infrastructure, but it's not completely without risk. It's not so much that the kernel maintainers are playing fast and loose- they're a very conservative bunch for the most part- but some application code might be making assumptions that become incorrect in the next release.
For Ramona's company, that happened. They updated the kernel, and immediately one of their libraries, written in C++, started segfaulting. Now, the developer behind it was still with the organization, but had climbed the ranks- he was no longer a Software Engineer, or even a Senior Software Engineer, but was now a Senior Solutions Architect and was not to be bothered with trivial things.
So without bothering the very important architect, Ramona had to pick apart the offending C++ code on her own. It was older C++, and that was always worrying- conventions and style had changed a lot since this was written. There were no signs of any documentation or tests or anything that would help someone understand what the code was doing.
Fortunately, it turned out that the code was simple. Well, "simple". It was a pile of
system calls. No, not syscalls, but calls to the
system function, which allowed the C++ code to execute shell commands. Essentially it was a giant shell script wrapped in C++ code, and the C++ code didn't contain any logic to handle error, exceptions, or even verify that the return buffers contained valid data.
So, it was dangerous, pointless C++ code, but surely the Senior Solutions Architect had a good reason for writing it in the first place? He'd never have risen to such stratospheric heights in the company without having a good reason for doing that?
Ramona took the risk and contacted him: "I'm a little unclear what this code is for? Why is this in C++?"
"Ah," the Architect replied, "I was learning C++ at the time, and everything looked like a C++ solution. Plus any chance to improve my skills!"
The fix was to junk the C++ and replace it with a shell script. But it did mean that Ramona started to pay attention to what the Architect was doing, especially when the Architect got yet another promotion to "Software Manager". This allowed him to use his experience and advanced decision-making skills to really make mistakes.
For example, they had a bunch of older rackmount servers, about $100k worth, and a relative handful of new servers. Now, the new servers could provide all the capacity they needed right now, and as they grew, they'd want to replace the old servers with even newer ones- so he might as well chuck all the old servers now and make room against that future day.
There was just one problem. The Architect-turned-Manager had also decided that the development team was suffering with laptops that were, on average, two years old. They all needed new computers, in his opinion, so he'd already spent the capital budget on that.
This meant that, when they needed to expand their server capacity, there was no money to do it, and no older hardware they could press into service as a stop-gap. This made end users angry, as everything became less responsive or outright failed. That made management angry, because angry users hurt the bottom line, especially when it meant they were violating SLAs.
The good news is that the company has an opening for a new CTO. The Architect-turned-Manager is already angling for that promotion, and odds are good that he'll get it. Ramona, on the other hand, is already polishing her resume for a change to another job entirely.
This post originally appeared on The Daily WTF.