A Simple Look at Parity

Data is important and so is processing and protecting that data efficiently.  Almost everyone in the storage world today depends on a simple, yet ingenious method of protecting their data without having to fully duplicate it.  We are able to do this through the use of parity.  A lot of us take parity for granted and some of us have neglected to actually understand how calculating parity works because we think it’s too complex.  What I would like to show here is that parity isn’t quite as complicated as you might think.

So what is parity? Here is an attempt to give my own definition. Simply put, as far as storage and certain RAID levels are concerned, parity is the sum of all blocks in a stripe. This parity data that is calculated can be used to recalculate a lost portion of the stripe.

Let’s take RAID 5 for example.  With RAID 5, data is striped across all drives in the RAID set.  Each stripe uses one drive for parity.  The data on this drive is the sum of all the data from the other drives in the RAID set in this particular stripe.  It may be easier to understand this if you think in terms of Base-10 which I will use as an example in this post. In actuality these calculations are based on the boolean XOR function, so I’m dumbing it down a bit here.  Consider the table below. In this example we are looking at one stripe which crosses 5 disks and uses the 5th disk to store the parity data.

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 (Parity for this stripe)
12 14 10 20 56

So, in this example 12+14+10+20=56

Now let’s say you lost Disk 2.  If you were to subtract the total value of the remaining disks from the value in disk 5, you would get the value for Disk 2.

12+10+20=42,  56-42=14

There you go.  Using elementary mathematics, we’ve shown how you can protect enterprise data while saving significantly on the amount of required storage.

A couple things to note.  With RAID 5, you can only lose 1 disk.  You can see we would not have enough information to recalculate the data if we lost two disks at the same time.  Also, each stripe will alternate which drive is used for parity which works to help increase performance.

Is it really that simple?  Well, to be honest, no…  The concept is that simple but to apply it to binary data you have to take it a little further.  Remember XOR that I mentioned earlier?  The XOR function applied to binary will give similar results as when addition is applied to Base-10.

In summary, this post was just to lay out a very basic concept around how parity calculations work and to hopefully interest you in diving deeper into it.  There are already great posts out there that describe RAID and parity in more detail.  This is a really good article that goes into more detail on both parity and RAID 5: http://www.scottklarr.com/topic/23/how-raid-5-really-works/

Advertisement


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.