Sunday, August 25, 2013

All about Raid 5

Good day guys.

I had some recent trouble with my office server, so I decided to write about the Raid version we were using.

What is Raid?

I know everyone says its Random Array of Independent Disks / Inexpensive Disks. (Inexpensive is really not that true, but you know, compared to the data they save, the disk could be inexpensive).

OK. Let me explain what it is. RAID is a mechanism in which your data will be split amongst several disks so that there are redundant (duplicate) copies of the data. With redundant copies, the system becomes fault tolerant. Remember, I am not giving you the real techie stuff. I'm writing this for people without the IT background to understand.

Raid has several levels that starts from 0 to 10 (so far). Different levels means different ways in which the data is divided and stored into the disks.

So I will be speaking about RAID 5 here. RAID 5 is a RAID level which requires 3 Hard Disks Minimum and maximum of 8.



In layman terms, RAID 5 handles 3 disks at minimum and blocks of data are split into the three disks. First of all, Raid will take these thrree or more disks and combine it as one logical unit. When you are installing the OS, you will notice that there is only one disk consisting of the total space of two disks, the third disk is not added. This is because the third disk is considered redundant and does not add up to the total space.

So if you use three 120gb hard drives, you will have 240gb of actual usable space. If you use five 120gb hard drives, you would have 480gb of usable space. The more drives you use, the more efficient your storage space becomes without losing any redundancy.

RAID 5 offers accelerated read performance because the data stream is accessed from multiple drives at the same time. Referring to figure 1, let's say that stripe A was a single file. Normally on a single drive when you open that file, the whole thing would be streamed from the one hard drive bit by bit - thus the one hard drive's max read speed is going to become a bottleneck. BUT, with a RAID-5, that one file can be accessed in 1/3 of the time because it will be read from all 3 drives at once; block 1 has the first 1/3 of the file, block 2 has the second 1/3 section of the file, and the block 3 has the last part of the file. This, in a perfect situation, causes your read speed to be tripled - with even more performance potential in RAID-5 arrays containing additional hard drives!

Rebuilding the Drive


For easier understanding/explaining, we are only going to be working with 4-bit blocks. Actual data blocks can range from 4kb (32,768 bits) up to 256kb (2,097,152 bits), but the method is exactly the same regardless of how many consecutive bits you work with. In figure 3, the yellow blocks represent the parities for each stripe. As you may notice, the parities are distributed evenly between all drives. This provides a slight increase in performance and is what separates RAID-4 from RAID-5 (RAID 4 keeps all parities on a single drive).
Lets examine the first stripe of figure 3. To compute the parity, we must run the XOR comparison on each block of data in that stripe. You XOR the first two blocks, then take the result and XOR it against the third block (and continue this for all drives in the array - except for the block where the parity will be stored).

(Drive 1) XOR (Drive 2) = (0100) XOR (0101) = (0001)
(Result) XOR (Drive 3) = (0001) XOR (0010) = (0011)

Recovering Data
The very cool thing about XOR comparisons - and what makes RAID 5 possible - is that if one value comes up missing, you can always find the missing value by doing an XOR comparison on the remaining values! Referring back to figure 3, let's say that drive 1 fails. The user will be prompted by the raid controller and alerted that a drive has failed and must be replaced. As soon as a new drive is put in, the controller will automatically start rebuilding the lost data. Here is how we rebuild drive 1, stripe 1

(Drive 2) XOR (Drive 3) = (0101) XOR (0010) = (0111)
(Result) XOR (Drive 4) = (0111) XOR (0011) = (0100)

As you can see, the final result is 0100. Now refer back to figure 3 at drive 1, stripe 1.... sure enough, its 0100! Amazingly, right? Just for fun, let's rebuild stripe 2 as well with the assumption that it is drive 1 that has failed.

(Drive 2) XOR (Drive 3) = (0000) XOR (0110) = (0110)
(Result) XOR (Drive 4) = (0110) XOR (0100) = (0010)

The missing block was calculated as 0010. Take a look at figure 3 to verify what drive 1, stripe 2 was before the failure and see if it matches the computed value... of course it does!

No comments:

Post a Comment