So, we’ve been dealing with an uptick in outages lately. We’ve had a few severe storms roll through our area, and subsequently, have experienced some extended outages at a leased location where our admin offices are housed. The first storm caused an outage of a day and a half (which is unheard of for us). The second outage, a week later, was for half of a day. Then we had phone outages not long after, due to some faulty lines (the entire metro area was affected), likely related to the aforementioned severe weather.
When these outages occur, we have to scramble to get the our operator up and running at our clinic facility. Not a terrible task, but this marks the beginning of a string of decisions based on conjecture: will the power be back on in an hour? Is it worth it to send the hourly staff home? What if we start the process of moving key staff to the clinic and then the power comes on?
…and so on.
Fast-forward a week and a half. Many emails were sent out about updates to our outage instructions. It’s a quiet Friday, so, being the retroactive forward-thinking network admin, I decided to catalog EXACTLY which servers were plugged into what battery units in the server room, so I can say with certainty what will go down in any situation of power loss.
I dig into the back of the server cabinet – its a mess. Cables everywhere. But, for the most part, they are pretty locked into their ports/plugs and don’t move around too much…
I start easing my way through the mess trying to trace power cables, when all of the sudden, I hear the telltale sound of a server powering off and a bunch of lights going dim and then back on again.
The word “SHIT” lurched forth from my lips as I bolted through the door and yelled to my co-worker, “you’re going to get some calls,” then turned around to run back into the server room. It turns out there was a plug for a power distribution unit (think power strip for servers) that was on the verge of unplugging itself from the battery unit; I’m guessing this was exacerbated by my prior excursions into the server rack for other reasons – which decided to finally came loose enough to drop power in spectacular fashion.
I knocked our entire EMR system, company home folders and monitoring server offline while documenting what would happen in a power outage situation.
Not my best moment, that’s for sure.
Picture Source: [jeremyfoo (CC)]