Services Products
 
 
 
 
Published Articles
 
Webcasts
 
White papers
 
Podcasts
 
 
Published Articles
 
2007  |   2006
 
 

march 28

Alaskan orphaned server responsible for $38B data loss
In order to back up a system, you have to know it's there
Computerworld Opinion by GlassHouse CTO, Jim Damoulakis

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9014680

Anyone remotely associated with IT has by now read at least one account of the data loss suffered by the state of Alaska relating to their Permanent Fund Dividend. As more details emerge (see "Oil revenue gets baked in Alaska"), I am beginning to feel a bit like Bill Murray in "Ground Hog Day". Or, to quote this Red Sox fan's favorite Yankee, Yogi Berra, "it's déjà vu all over again."

This story, or a similar variant, has been repeated numerous times in organizations of all shapes and sizes, albeit usually without the number $38 billion linked to it. I just feel sorry for the poor guys involved - most of the time this type of screw-up isn't covered by Fox News, CNN, and the Associated Press. Giga-dollars aside, identical exposures exist today within many data centers.

One particular facet of the story caught my eye. Initial reports suggested that after the primary and secondary disk information was lost, attempts to recover from tape were unsuccessful because the "backup tapes were unreadable." Here we go again - blame tape! If only they had backed up to disk. Wrong. It turns out that the backup tapes were NOT unreadable because there were NO backup tapes. It seems that due to a process glitch, this particular data set was not being backed up.

With today's backup reporting tools, there is no excuse for repeated failed backups being undetected. However, there still remains a major gap in many data protection strategies: unknown or orphan systems. For a backup to "fail," it has to at least have been scheduled to run. If a system is brought online and never entered into the backup pool, or additional volumes are allocated to a system, but never added to the backup "include" list, there is technically no failure from the backup application's perspective. As appears to have been the case here, and we have seen elsewhere, this omission went undetected until it was too late.

Accounting for orphan systems is an arduous task. Some reporting applications attempt to provide information through activities such as network probing (often to the chagrin of the network security folks as this looks like an intrusion), but even this requires significant effort to filter out "noise" (i.e. printers and other non-server devices, multiple NIC cards in a given device) and then to manually reconcile what is and isn't being backed up and why. Finding orphan volumes is even harder, which is why, at a minimum, we typically recommend configuring backup applications to include all local volumes.

A colleague of mine likes to talk of strategic use of policy and tactical use of technology. All too often organizations, try to make the strategy about the technology. Once again, we see that it is no substitute for well thought out policy and process.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com

 

 

  © Copyright 2001 - 2007 GlassHouse Technologies, Inc. All Rights Reserved.

Privacy Policy | Terms of Use