Client (I’ll call then AirStuff) had around forty employees and does design and fabriaction of air handling systems. They have four “beige box” servers, each with a minimum of four 160GB SATA drives in a RAID5 array. All told in 2007 they had around 3TB of storage and were closing in on 75% utilization of it.
We – their MSP – had been hounding them for a year and a half to get a real backup system to replace their round-robin of SERVER-A to SERVER-B and SERVER-B to SERVER-A for four fairly robust servers. It was always, “We don’t have the money for a tape library.”
My phone rings. This is a bad sign, as my phone rings when there’s intractable problems. AirStuff has a server down.
Now, a bit on AirStuff: Their receptionist is our point-of-contact, a lovely woman with a wry sense of humor and a fine sense of when things are going wrong. We work with her because although she’s not an IT pro in any sense of the word, she is very descriptive when there’s problems, follows direction precisely and remembers solutions we’ve given before. In other words: The best of the point-of-contacts you can expect. They also have additional server drives for when they fail – same makes and models as the originals.
She reports: “SERVER-A is down. I went back to it and it was powered off. When I turned it on, I got this screen I’ve never seen before. It says “RAID configuration” and I thought I should call.”
I grab my bag and head out the door, arriving at AirStuff about forty-five minutes later. I talk with the receptionist for a minute to let her know I’m here and get her impressions. She’s baffled.
I head back to the server rack – a wire rack with the four boxes and the switching hardware – and get a look. Sure enough, the server’s dropped into its RAID card BIOS and is looking for configuration. I exit that screen and power cycle the server to get a clean start.
No joy, it wants RAID configuration. I dig into what it’s reporting for the existing and back away as if it’s possessed. Their 4 x 250GB SATA RAID5 no longer thinks it’s a RAID5 array. It thinks it’s TWO RAID5 arrays; one that’s missing two drives, and the other is missing three. I play in the console, power down, replace the known-bad drive in the second bay and hope for the best. No joy, it’s fatal.
I go out to reception and give her the news – we have to blow away the RAID array, rebuild from scratch and restore. She calls in the head of Engineering, as it’s his server for their CAD drawings. I explain “Because that was the backup server, I don’t know where the backup for that server are. Which server was it backup up to?” He gets this troubled look. “No other server had enough space for the 600GB of files, so it was backing up to itself.”
I blink and stand quiet for a few seconds as that bit of news sinks in before telling him the news. “The server is dead and I cannot resurrect the data. I have someone I can call; sight unseen – and this is not a hard quote – you’re looking at five figures to recover the data and there’s no guarantee he can recover anything at all. How do you want to proceed?”
He needs the data. I make the call, collect their server, and drop the server off at his place. It takes a full week, but he delivers 100% of the data back on a 1TB external.
So, in total:
Tape library, because backups are now cool to have: $ 9,000
Recovery of data and copying back to the rebuilt server: $ 12,000
Can’t submit a bid in time with drawings due to server being down and the data unavailable for a large open-bid project which they were pretty much a shoo-in, thus losing both the bid and the construction: $3,200,000
TLDR: Client refuses to see urgency of backup, server fails, resulting in loss of millions in potential revenue.
via: [Spiceworks Community]