Your business is only as good as its last backup, spend wisely

by Michael Ritacco on January 2, 2009

I was very shocked to see this story today about Journalspace.com

Journalspace is no more.

DriveSavers called today to inform me that the data was unrecoverable.

Here is what happened: the server which held the journalspace data had two large drives in a RAID configuration. As data is written (such as saving an item to the database), it’s automatically copied to both drives, as a backup mechanism.

There is a big difference between hardware redundancy and logical redundancy. You do not have to have a physical failure to lose your data.

The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that’s gone bad, and the server is back to operating with two redundant drives.

But that’s not what happened here. There was no hardware failure. Both drives are operating fine; DriveSavers had no problem in making images of the drives. The data was simply gone. Overwritten.

This sounds like a bad controller, or a bug in the raid controller firmware. As in my cases there was no cause every identified for the failure, and after we reconfigured the RAID 10 array we continued to use the same hardware and  never saw the problem again.

The data server had only one purpose: maintaining the journalspace database. There were no other web sites or processes running on the server, and it would be impossible for a software bug in journalspace to overwrite the drives, sector by sector.

The list of potential causes for this disaster is a short one. It includes a catastrophic failure by the operating system (OS X Server, in case you’re interested), or a deliberate effort. A disgruntled member of the Lagomorphics team sabotaged some key servers several months ago after he was caught stealing from the company; as awful as the thought is, we can’t rule out the possibility of additional sabotage.

But, clearly, we failed to take the steps to prevent this from happening. And for that we are very sorry.

So, after nearly six years, journalspace is no more.

I am very sorry for journalspace’s loss, but do appreciate the honesty behind the incident. We can all use this event as a reminder of the importance of backups. Backups are like insurance, you need to pay for what you cannot afford to lose. 

As a DBA we must plan for all types of failures, and make sure we explain the differences between physical recovery, logical recovery, high availability, and disaster recovery.

{ 0 comments… add one now }

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post: