Colossal IT Fail: Accidentally formatting hard disks of 9,000 PCs and 490 servers

HOLY CRAP.  Here is an IT horror story from Rod Trent over at myITforum.com (a great site if you get the chance to head there):

A bit of helpful information for the uninitiated:

SCCM or (Systems Center Configuration Manager) from Microsoft is a tool IT administrators use to facilitate automated deployments of patches, software, settings, to collections of computers (perhaps based upon memory capacity, hard disk space, OS version, etc. – a wide etc. while also allowing them to have a very granular inventory of their systems at any one time.

This is a story of SCCM gone wrong…but not because SCCM is bad, mind you, but just like anything…if you don’t know what you are doing or aren’t paying close enough attention, the tool you thought was so useful can be your undoing.

In “Lessons in what NOT to do with SCCM”, we learned that a misguided patch distributed through SCCM may have taken down an entire Australia banking system.  Then, in “Update to the SCCM package heard round the world” we heard about the numbers of desktops and servers effected, along with how HP (to whom CommBank outsources infrastructure services) is scrambling to make amends, sending HP CEO, Meg Whitman, into the fray.

Well, over the weekend, I was able to source even more scoop on this.  No one has been able to get a clear picture of the true issue until now – partly because there are plenty of fingers pointing, and partly because of pride and cover-up.

Of course…as we all know working with SCCM over the years, SCCM picks-up most of the blame when things go wrong.  However, we also know that SCCM simply does exactly what you tell it to do.  Still, it’s an easy target, particularly for those upper management types who really have no clue about how technology works.  They think they do, but when you actually take them to task, they prove their lack of knowledge in 30 seconds or less.  When that happens, it’s best to just snicker under your breath and walk away.  Confronting them with it (particularly in front of others) just gets you a direct ticket to the list of employees to show the door when times get tough.

It has been reported previously that a “patch” was the culprit of the issue at CommBank.  This was a rumor, and if it was indeed a patch, there would be a lot of others besides HP scrambling.  If it was a “patch” you’d see Microsoft onsite right beside the HP brass.  There has been question after question in the communities asking “what patch?”, “which one?”, because if a patch caused the issue it could cause problems in other companies.  Microsoft patches are developed uniquely in that they will not install on a system where it is not compatible or required.  Plus, if it were a Microsoft patch that caused the problem, Microsoft would have been helping the company rollback the errant patch.  So, folks, you can rest easy – this was not a patch.

No…this was the result of Task Sequence distributed to a custom SCCM Collection.  The Collection had been created/modified by an HP Engineer (adding a wildcard) and the engineer had inadvertently altered the Collection so that it was very similar in form and function to the “All Systems” Collection.  The Task Sequence contained automation to – here it comes – format the disks.  Yes, the disks of some 9,000 PCs and 490 servers (including domain controllers) were formatted and wiped clean.

Right now, HP is working night and day to rectify the situation.   And, of course, they (HP) are attempting to blame SCCM, saying it’s SCCM’s fault for not prompting an alert before the wiping out the disks. Anything to shift blame, I guess.  What did I say earlier?  30 seconds or less?

  • Wesley Kinslow

    That does indeed seem brutal.

    • http://www.facebook.com/people/Franck-Peter/100000181151109 Franck Peter

      Just imagine the boss’ or leader’s reaction when he found out.

  • Vlad

    That kind of tools can’t be point and click software. I never see a human failure like that using pxe, dhcp, tftp and (nfs|ftp|http).

    • SCCMGUY

      SCCM is compatible with PXE. Like the poster says if you don’t know what you’re doing, or aren’t careful, SCCM can be just as destructive over PXE as it is otherwise.

  • http://twitter.com/TheKingOfScandi TheKingOfScandinavia

    Having worked for HP (as a supporter) this really does not surprise me at all…

  • http://www.hackan.com.ar HacKan & CuBa co.

    Well, i screwed the main server where i worked once… fortunately i was able to restore a backup, but it took me a couple of hours…
    Can’t imagine about 9000 ones xD hahaha