Massive Data Loss … and Recovery!

The lull in posting to this blog is because I suffered a massive data loss, from which I eventually made a full recovery. This entry will describe the problem and how I solved it. First, I’ll start with a description of my system:

I built this machine from scratch last July. It contains:

  • MSI-7666 X-Power BigBang motherboard
  • Intel Core i7-960
  • 12GB of very fast memory
  • 120GB Corsair CSSD-F120GB2 flash disk, used to hold the operating system (Win7)
  • 4 1-TB Seagate Constellation SATA disks
  • 2 Nvidia GTX 470 graphics controllers, driving 2 HP LP2475w displays
  • Realtek sound controller
  • LG Blu-ray/DVD/CD reader/writer
  • Corsair power supply
  • Housed in a nice Corsair tower
  • The whole thing is protected by a CyberPower CP1500AVRLCD

Yes, it is a kick-butt machine, and maxes out most of the categories in the Windows experience ratings. Putting the operating system on a flash disk makes a huge difference in speed.

I bound the 4 disks as two 2-TB RAID0 volumes using the Intel RAID controller. I didn’t realize it when I built the machine, but the “Intel Rapid Storage Technology” RAID controller on the motherboard (in the ICH10R) is not a real RAID controller, at all – everything is done in software by their driver (the ICH10R just hides the drives from the rest of the system). So, it is a Fake-RAID controller. Had I known this, I would have never, ever, used it. My bad, for not completely researching the configuration and capabilities of my machine.

I use this machine for work, so to protect my client’s data from theft, all drives (including the Corsair) are encrypted with BitLocker.

Here’s what happened:

On Sunday night, I needed to reboot my system as a result of some updates. The system started to boot Windows, flashed a blue screen too fast to read, and reset itself. After trying a few more times to get it to boot, I turned off the machine and went to bed.

Monday morning, I tried again, and got the same result (wasn’t it Einstein that once said that the definition of insanity was trying the same thing over and over, and expecting a different result?). Looking more closely at the BIOS POST (Power-On Self Test) screens, I noticed that the Intel RAID page was showing that my two RAID0 volumes had failed. Not good.

I tried booting the Win7 installation CD, and tried to get it to recover my system, but it kept coming back with “unknown error” and failing. Thanks guys.

I needed to get the system to boot, so I could examine the drives. When I initially installed the system, I purposely left 23GB of unpartitioned space on the system disk, for exactly this sort of emergency. I installed a fresh copy of Win7 in the unpartitioned space, and booted up the machine. Not surprisingly, the data volumes were nowhere to be found, but I verified that the system partition was OK.

I did a lot of searching on Google/Bing, looking for people with similar experiences. I did find one here: (http://www.overclock.net/raid-controllers-software/478557-howto-recover-intel-raid-non-member.html). I followed the directions, but I was not getting the same results with the TestDisk program. I tried the latest and greatest version of TestDisk, and the current beta version. No luck.

I then embarked upon a program of downloading and trying more than half a dozen different commercial and freeware disk recovery products. Not one of them could even diagnose the problem; much less do anything about it. I was on my own.

At this point, I had two RAID0 volumes again (from following the instructions on web site, above), neither or which would mount in Windows. Disk Manager couldn’t detect a file system, so only physical (raw) I/O was possible.

I decided to start from first principles and work my way toward the problem. When I built the system, I chose the GUID Partition Table (GPT) as the partitioning scheme (as opposed to Master Boot Record), formatted the volumes with NTFS, and then enabled BitLocker. I know that Windows places a “protective” MBR in sector zero of a GPT disk, which helps older disk utilities to recognize the disk as having been formatted, even if they don’t understand the GPT layout. The protective MBR contains just a single partition table entry, which describes the entire disk. The next sector contains the GPT header, followed by entries for each of the partitions.

I downloaded the demo version of WinHEX to look at my volumes, and I found that the MBR had been squashed by the BIOS when I bound the disks back into RAID0 volumes. The MBR still had the 55 AA signature at the end of the sector and a disk signature, but that was it. The next sector was indeed the GPT header, and I could see the GPT partition entries (one for a 128MB Microsoft reserved partition, and the other was my data partition consuming the rest of the disk). I couldn’t see any of my files, because the data partition was encrypted, but I did see the BPB at the beginning describing the partition and specifying the file system as –FVE-FS- which is what Microsoft uses on a BitLockered volume.

When Windows looks at a disk, it always looks at sector zero for the MBR, and this MBR didn’t have any entries for partitions. Hmmm. What if I create an entry in the partition table? Then Windows would at least see the volume, and maybe it would look further at the GPT entries. It was worth a shot.

I entered the changes to sector zero with WinHEX, but it wouldn’t let me write the changes to the disk, because this was a demo version. Sigh. Back to Google/Bing, and I found the HxD disk editor, and it allowed me to edit the disk.

Here are is the structure of the MBR:

 //+
 // Cylinder, Sector, and Head fields
 //-

 typedef struct
     {
     UCHAR     head;        // Head number
     UCHAR    sector;       // Head is bits 0:5, bits 6:7 are bits 8:9 of cylinder
     UCHAR    cylinder;    // Bits 0:7 of cylinder
     } CHS, *pCHS;
 //+
 // Partition Entry
 //
 // Each MBR disk has a table of 4 partition entries near the end of the
 // first block on the device
 //-

 typedef struct
     {
     union
         {
         struct
             {
             UCHAR not_used:7;    // First 7 bits are unused
             UCHAR active:1;          // Partition is bootable
             };
         UCHAR status;               // Entire value
         };

 CHS start_chs;              // Address of first block of the partition in CHS format
 UCHAR type;                // Partition type
 CHS end_chs;              // Address of last block of the partition in CHS format
 ULONG start_pbn;    // Physical block number of the beginning of the partition (aka "LBA")
 ULONG size;               // Number of blocks in the partition
 } PARTITION_ENTRY, *pPARTITION_ENTRY;
 //+
 // Master Boot Record
 //
 // This is the first block on a boot device
 //-

 typedef struct
     {
     UCHAR code [440];                                        // Initial Program Load code
     ULONG disk_sig;                                            // Disk signature
     USHORT disk_sig_2;                                     // Rest of disk signature
     PARTITION_ENTRY partition_table [4];   // Disk partition table
     USHORT mbr_sig;                                          // MBR signature
     } DISK_MBR, *pDISK_MBR;

All I needed to do was fill in the first partition entry. The disk was not bootable, so I didn’t need to set the Active bit in the Status byte. The start and end CHS fields haven’t been used by anything in probably a decade, so I could ignore those, too. The only fields that are important are the Type field, which must be set to 0xEE for a GPT protective MBR entry, the Starting sector, and the Size of the partition.

So, using HxD, I set the Type field to 0xEE (magenta), the Start sector to 0x01 (yellow), and the Size to 0xFFFFFFFF (green). Wow, just 6 bytes. Here is what it should look like:

Of course, the very next thing I did was to backup all the important data from my two volumes to a 2TB external drive I just bought from WalMart for $130. It is fitting that I’m performing my first backup on this machine on World Backup Day.

The next thing I’m going to do is buy a 3ware RAID controller, and ditch the crappy Intel Fake-RAID “controller”. The 3ware controllers are very, very fast and reliable, and I helped design it (I designed the host-controller interface), so I might as well use it.

Advertisements

About Brian Catlin

Brian has been an engineering consultant and trainer for more than 25 years, and travels the world teaching Windows internals, device drivers, and forensics. Before entering the Windows world, Brian designed command centers for the DoD, major aerospace companies, and NASA's Jet Propulsion Laboratory. Having grown tired of living in the People's Republic of California, Brian and his family moved to Hawaii in 2009.
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s