The lull in posting to this blog is because I suffered a massive data loss, from which I eventually made a full recovery. This entry will describe the problem and how I solved it. First, I’ll start with a description of my system:
I built this machine from scratch last July. It contains:
- MSI-7666 X-Power BigBang motherboard
- Intel Core i7-960
- 12GB of very fast memory
- 120GB Corsair CSSD-F120GB2 flash disk, used to hold the operating system (Win7)
- 4 1-TB Seagate Constellation SATA disks
- 2 Nvidia GTX 470 graphics controllers, driving 2 HP LP2475w displays
- Realtek sound controller
- LG Blu-ray/DVD/CD reader/writer
- Corsair power supply
- Housed in a nice Corsair tower
- The whole thing is protected by a CyberPower CP1500AVRLCD
Yes, it is a kick-butt machine, and maxes out most of the categories in the Windows experience ratings. Putting the operating system on a flash disk makes a huge difference in speed.
I bound the 4 disks as two 2-TB RAID0 volumes using the Intel RAID controller. I didn’t realize it when I built the machine, but the “Intel Rapid Storage Technology” RAID controller on the motherboard (in the ICH10R) is not a real RAID controller, at all – everything is done in software by their driver (the ICH10R just hides the drives from the rest of the system). So, it is a Fake-RAID controller. Had I known this, I would have never, ever, used it. My bad, for not completely researching the configuration and capabilities of my machine.
I use this machine for work, so to protect my client’s data from theft, all drives (including the Corsair) are encrypted with BitLocker.
Here’s what happened:
On Sunday night, I needed to reboot my system as a result of some updates. The system started to boot Windows, flashed a blue screen too fast to read, and reset itself. After trying a few more times to get it to boot, I turned off the machine and went to bed.
Monday morning, I tried again, and got the same result (wasn’t it Einstein that once said that the definition of insanity was trying the same thing over and over, and expecting a different result?). Looking more closely at the BIOS POST (Power-On Self Test) screens, I noticed that the Intel RAID page was showing that my two RAID0 volumes had failed. Not good.
I tried booting the Win7 installation CD, and tried to get it to recover my system, but it kept coming back with “unknown error” and failing. Thanks guys.
I needed to get the system to boot, so I could examine the drives. When I initially installed the system, I purposely left 23GB of unpartitioned space on the system disk, for exactly this sort of emergency. I installed a fresh copy of Win7 in the unpartitioned space, and booted up the machine. Not surprisingly, the data volumes were nowhere to be found, but I verified that the system partition was OK.
I did a lot of searching on Google/Bing, looking for people with similar experiences. I did find one here: (http://www.overclock.net/raid-controllers-software/478557-howto-recover-intel-raid-non-member.html). I followed the directions, but I was not getting the same results with the TestDisk program. I tried the latest and greatest version of TestDisk, and the current beta version. No luck.
I then embarked upon a program of downloading and trying more than half a dozen different commercial and freeware disk recovery products. Not one of them could even diagnose the problem; much less do anything about it. I was on my own.
At this point, I had two RAID0 volumes again (from following the instructions on web site, above), neither or which would mount in Windows. Disk Manager couldn’t detect a file system, so only physical (raw) I/O was possible.
I decided to start from first principles and work my way toward the problem. When I built the system, I chose the GUID Partition Table (GPT) as the partitioning scheme (as opposed to Master Boot Record), formatted the volumes with NTFS, and then enabled BitLocker. I know that Windows places a “protective” MBR in sector zero of a GPT disk, which helps older disk utilities to recognize the disk as having been formatted, even if they don’t understand the GPT layout. The protective MBR contains just a single partition table entry, which describes the entire disk. The next sector contains the GPT header, followed by entries for each of the partitions.
I downloaded the demo version of WinHEX to look at my volumes, and I found that the MBR had been squashed by the BIOS when I bound the disks back into RAID0 volumes. The MBR still had the 55 AA signature at the end of the sector and a disk signature, but that was it. The next sector was indeed the GPT header, and I could see the GPT partition entries (one for a 128MB Microsoft reserved partition, and the other was my data partition consuming the rest of the disk). I couldn’t see any of my files, because the data partition was encrypted, but I did see the BPB at the beginning describing the partition and specifying the file system as –FVE-FS- which is what Microsoft uses on a BitLockered volume.
When Windows looks at a disk, it always looks at sector zero for the MBR, and this MBR didn’t have any entries for partitions. Hmmm. What if I create an entry in the partition table? Then Windows would at least see the volume, and maybe it would look further at the GPT entries. It was worth a shot.
I entered the changes to sector zero with WinHEX, but it wouldn’t let me write the changes to the disk, because this was a demo version. Sigh. Back to Google/Bing, and I found the HxD disk editor, and it allowed me to edit the disk.
Here are is the structure of the MBR:
//+ // Cylinder, Sector, and Head fields //- typedef struct { UCHAR head; // Head number UCHAR sector; // Head is bits 0:5, bits 6:7 are bits 8:9 of cylinder UCHAR cylinder; // Bits 0:7 of cylinder } CHS, *pCHS;
//+ // Partition Entry // // Each MBR disk has a table of 4 partition entries near the end of the // first block on the device //- typedef struct { union { struct { UCHAR not_used:7; // First 7 bits are unused UCHAR active:1; // Partition is bootable }; UCHAR status; // Entire value }; CHS start_chs; // Address of first block of the partition in CHS format UCHAR type; // Partition type CHS end_chs; // Address of last block of the partition in CHS format ULONG start_pbn; // Physical block number of the beginning of the partition (aka "LBA") ULONG size; // Number of blocks in the partition } PARTITION_ENTRY, *pPARTITION_ENTRY;
//+ // Master Boot Record // // This is the first block on a boot device //- typedef struct { UCHAR code [440]; // Initial Program Load code ULONG disk_sig; // Disk signature USHORT disk_sig_2; // Rest of disk signature PARTITION_ENTRY partition_table [4]; // Disk partition table USHORT mbr_sig; // MBR signature } DISK_MBR, *pDISK_MBR;
All I needed to do was fill in the first partition entry. The disk was not bootable, so I didn’t need to set the Active bit in the Status byte. The start and end CHS fields haven’t been used by anything in probably a decade, so I could ignore those, too. The only fields that are important are the Type field, which must be set to 0xEE for a GPT protective MBR entry, the Starting sector, and the Size of the partition.
So, using HxD, I set the Type field to 0xEE (magenta), the Start sector to 0x01 (yellow), and the Size to 0xFFFFFFFF (green). Wow, just 6 bytes. Here is what it should look like:
Of course, the very next thing I did was to backup all the important data from my two volumes to a 2TB external drive I just bought from WalMart for $130. It is fitting that I’m performing my first backup on this machine on World Backup Day.
The next thing I’m going to do is buy a 3ware RAID controller, and ditch the crappy Intel Fake-RAID “controller”. The 3ware controllers are very, very fast and reliable, and I helped design it (I designed the host-controller interface), so I might as well use it.
You must be logged in to post a comment.