Perfect Uptime

"So," Burt said, "our bonuses at the end of the year are based on our systems uptime?"

"Well," the middle manager introducing their new incentive program said, "not exactly. It's part of your team's performance targets, so we calcula-"

"Yeah, I get it," Burt said. "But your top tier goes up to 99.999% What about those of us who get 100% up-time?"

"Well, 100% is unreasonable to expect and-"

"Our phone system can. I set up the PBX myself. It's not going down. It's the most reliable system in the company. You should give a higher bonus to teams that can hit 100% uptime."

"Ah…" The manager paused, nodded, and said, "I'll look into it," in the tone that made it quite clear that nothing was getting looked into but the conversation needed to move on.

This was circa 2002, and Natalie had just joined Initech's operations team. Mostly, she was doing the unglamorous work of untangling the rats nets of Ethernet cables left by the last person in her position, discovering that spreadsheet containing server rack assignments was incorrect, and swapping in the new UPS system for the server room.

The door next to the server room had two large buttons next to it. One of them was an accessibility button which would trigger the automatic door opener. This was useful for Natalie, when she had a cart-load of lead-acid batteries to replace. The second was the emergency stop button, which would cut power to everything in the server room.

Now, everyone was quite aware that two similar buttons right next to each other but with wildly different functions was a problem. But upper management didn't want to spend money to alter the layout or replace the buttons with something to make them more visually distinct. But they needed to do something.

Floopy Disk Container

So Burt, ever the problem solver, solved it. He cut a hole in the bottom of an old 5 1/4" floppy disk storage container, attached it to the wall over the emergency button, and demonstrated two features. First, by lifting the lid, you could easily and trivially access the emergency stop button. Second, it was just held in place with the weakest of drywall anchors and could easily be ripped from the wall in a real emergency. Someone decided that this was safe enough, and that "mitigated" the risk of an accidental shutdown.

Near the time for yearly reviews, both Natalie and Burt happened to be doing some maintenance at a secondary site. Simultaneously, their phones rang: someone had pushed the big red button and shutdown the server room. They rushed back to address the problems that this caused, though something about the call stuck in Natalie's brain.

Reviewing the security footage, later, the culprit was the "better idiot" who foils any idiot proof system: someone carrying a cart full of print outs walked right past the door-opener and spent several moments fiddling with the floppy disk case to figure out how to lift the lid and press the button inside.

No one was concerned with the security footage when Burt and Natalie arrived. The server room was controlled chaos, as everyone was scrambling to manually reboot servers, resolve issues with machines not coming back up, and generally panicking-not-panicking because nobody had ever really tested a full recovery and no one was sure if their procedures worked.

And that's when Natalie realized what struck her as odd about the phone call. It had come from one of the company numbers. One of the ones that ran through their PBX. The PBX was in the server room, it theoretically should have gone down when the emergency stop happened, but no- it had never rebooted, it had just hummed along quietly while everything else crashed.

Natalie turned to Burt. "Wait, how is the PBX still up?"

"Oh, see that rack over there? It's got a rectifier and a bank of deep-cycle marine batteries. When the main UPS cuts out, our phone system has a backup power supply."

In the yearly review, Burt's PBX system got a perfect 100% uptime rating. Burt's performance was not given equally high marks, though. Creating a fire hazard in the form of your own private battery bank and homebrew UPS, and not integrating with any of the key safety systems in the server room doesn't endear you to management.

In the end, Burt got a scolding and a negative note in his employee file. His PBX got put on the main power supply, sharing the same main and backup power as everything else in the building. The next time someone accidentally pushed the big red button, Burt's phone system didn't maintain its improbable 100% uptime.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

This post originally appeared on The Daily WTF.

Leave a Reply

Your email address will not be published.