<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Allogro™ &#187; striping</title>
	<atom:link href="http://www.allogro.com/main/tag/striping/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.allogro.com/main</link>
	<description>Business Networking Specialists</description>
	<lastBuildDate>Sun, 13 Sep 2009 08:51:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>RAID &#8211; What is it and what are the differences?</title>
		<link>http://www.allogro.com/main/2006/08/30/8/</link>
		<comments>http://www.allogro.com/main/2006/08/30/8/#comments</comments>
		<pubDate>Thu, 31 Aug 2006 03:10:24 +0000</pubDate>
		<dc:creator>Will Murray</dc:creator>
				<category><![CDATA[Articles - Whitepapers]]></category>
		<category><![CDATA[hard disk array]]></category>
		<category><![CDATA[mirroring]]></category>
		<category><![CDATA[parity]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[striping]]></category>

		<guid isPermaLink="false">http://www.allogro.com/main/?p=8</guid>
		<description><![CDATA[Copyright &#169; 2010 <a href="http://www.allogro.com/main">Will Murray</a>. Visit the original article at <a href="http://www.allogro.com/main/2006/08/30/8/">http://www.allogro.com/main/2006/08/30/8/</a>.<br /><p><img src="/icons/Nuvola/128x128/devices/raid.png" alt="Raid icon" align="right" border="0" height="128" hspace="3" vspace="1" width="128" /></p>
<p>RAID is a method of storing data on multiple hard disks. Through the “magic” of the disk array, all of the individual disks appear as a single disk to the operating system. Large arrays can be split into smaller <em>logical disks</em>, that can be any size up to the total amount of disk space. Depending on exactly how the data is spread across the multiple disks determines the relative speed and security of the data on the disks.</p>
<p>Back in the old days when large (750MB) hard disks were relatively expensive (say $1200 US) and smaller disks (100MB) were relatively inexpensive (maybe $130), somebody figured out that&#8230; [Continue reading]</p>]]></description>
			<content:encoded><![CDATA[<!-- google_ad_section_start -->Copyright &copy; 2010 <a href="http://www.allogro.com/main">Will Murray</a>. Visit the original article at <a href="http://www.allogro.com/main/2006/08/30/8/">http://www.allogro.com/main/2006/08/30/8/</a>.<br /><p><img src="/icons/Nuvola/128x128/devices/raid.png" alt="Raid icon" align="right" border="0" height="128" hspace="3" vspace="1" width="128" /></p>
<p>RAID is a method of storing data on multiple hard disks. Through the “magic” of the disk array, all of the individual disks appear as a single disk to the operating system. Large arrays can be split into smaller <em>logical disks</em>, that can be any size up to the total amount of disk space. Depending on exactly how the data is spread across the multiple disks determines the relative speed and security of the data on the disks.</p>
<p>Back in the old days when large (750MB) hard disks were relatively expensive (say $1200 US) and smaller disks (100MB) were relatively inexpensive (maybe $130), somebody figured out that it could be possible to link several of the inexpensive disks together to roughly equal the capacity of a single larger disk. The complete package was called a <em>disk array</em>, and the method of storing the data on the disks was called RAID, for <em>Redundant Array of Inexpensive Disks</em>.</p>
<p>Today, the relative price difference between an 80GB and a 160GB or even a 400GB hard drive is not so great; however, RAID is still very much a part of life with computers—especially with servers. Obviously there is more to RAID than just a cost savings. (RAID now stands for Redundant Array of Independent Devices, which indicates the drift away from price being the motivating factor in choosing a RAID solution.) In fact, by the time you factor in the additional hardware to create and manage the array, RAID usually costs more than non-RAID solutions. So why do we use it? The answers are speed and reliability.</p>
<p><span id="more-8"></span>Imagine for a moment that you were an engineer creating the first RAID system (I’d say “RAID Array”, which is the common term, but that’s redundant like saying PIN Number). You design the system so that when someone saves data to the array, it starts filling up the first disk in the array first. Then, when the person saves more data than will fit on the first disk, the system automatically stores the next data on the next disk. This process continues, until eventually you fill up the third and fourth disks. That’s the concept anyway, though the actual implementation (below) is a little different. This fast and expandable solution is known as <strong>RAID-0</strong> (also known as disk striping with no parity; more about striping and parity in a moment) and is still used commonly today. It is the RAID solution with what is generally considered the best performance.Array controllers make use of the fact that each disk can write data to the disk at the same time another disk is busy. So the controller breaks the data up into smaller chunks (called <em>stripes</em>), and sends one stripe to disk 1. While disk 1 is busy storing the first stripe, the controller sends the next stripe of data to disk #2. While disks 1 and 2 are writing their data, the controller sends another stripe to disk #3 (if there is one), and so on. Eventually it runs out of disks in the array, so it starts back over with disk 1 again. Using this method, all of the disk space is available for use.</p>
<p>What happens if disk #2 unexpectedly fails? Since data is spread across all the disks, if even one of them becomes damaged (or the controller), the entire array becomes unusable. That’s the down side of RAID-0.</p>
<p>Back to the drawing board you go. What if we spend more money—twice as much to be exact— and build two arrays that contain exactly the same information? That way if one of the arrays dies, the other array will continue to run until we can replace the failed disk. This solution became known as <strong>RAID-1</strong> (disk mirroring). Because it is basically two identical RAID-0 arrays, it offers exactly the same performance with the best fault-tolerance (i.e., it is both fast and able to protect against failures). Unfortunately, it’s also usually the most expensive RAID solution. While each half of the mirror can utilize all its disk space, half of the total storage capacity is lost due to the mirroring (an array with 360GB of total storage would only hold 180GB of usable data; the other 180GB would be a duplicate copy of the first 180GB). Note: If a single controller manages both mirrors, it is possible that both arrays could end up with identically bad data if the controller goes bad. Some people use two separate but compatible controllers to reduce this slight risk.</p>
<p>Okay. That sounds like a workable solution, but it’s too expensive for large arrays. You figure that the data really wouldn’t need to be duplicated if there was just some way to (a) detect errors and (b) recover from errors when they are found. Your solution is to calculate and store a checksum of all the data that’s saved. A <em>checksum</em> is just a fancy word for a process that detects errors that can be used to recover from them when they are found. Naturally it doesn’t do a lot of good to store the checksum alongside the actual data. If the data and the checksum were on the same disk, and if the disk went bad, then you would be no better off than with RAID-0.</p>
<p><strong>RAID-2</strong> (hamming code error correction), <strong>RAID-3</strong> (virtual disk blocks), <strong>RAID-4</strong> (dedicated parity disk), and <strong>RAID-5</strong> (striped parity) are all variations on this theme of storing the data along with a checksum on multiple disks (writing with <em>parity</em> is the term). RAID-2 is no longer used, and RAID-3 and RAID-4 are not used except in certain uncommon situations. RAID-5 has emerged the clear winner in this parity slugfest.</p>
<p>RAID-5 combines the striping from RAID-0 with error checking. When the controller sends a stripe of data to disk 1, it will send a stripe of parity data to a different disk in the array. It is very smart about where it sends the data, so that if any one disk in the array fails, the entire array can keep right on running. Many arrays allow for <em>hot swapping</em> the failed disk while the server is running. Then, once a replacement disk is inserted, the array restores the lost data onto the new drive automatically by combing the remaining good data with the parity information. Naturally, the array becomes unrecoverable if more than one disk goes bad (which is why you want to swap out bad disks as quickly as possible) or if the controller goes bad.</p>
<p>RAID-5 (striping with parity) does have a bit more overhead than RAID-0 (striping without parity). All that parity takes a little time to write to disk (making RAID-5 a bit slower than RAID-0 or RAID-1) and a little extra space (generally 1 extra disk than would be required for RAID-0).</p>
<p>So how many disks do you need?</p>
<p>For RAID-0, the answer is two (any less than two, and you wouldn’t have an array—just a regular hard disk with an expensive controller that probably would complain about needing another disk). The two disks also need to be the same size. Many controllers will accept differently sized disks, but the smallest disk in the array will determine the capacity of the entire array. So, if you have an array with one 80GB disk and one 400GB disk, the maximum usable space in the array would be only 160GB (what a waste!).</p>
<p>For RAID-1 you need twice as many disks as you want to use for storage. For example, if you want 400 GB of usable data, you need to purchase TWO 400GB disks. Or, you might be able to purchase FOUR 200GB disks and have the array configure two disks in RAID-0 (striping without parity) and then use RAID-1 (mirroring) to duplicate the striped sets. This is actually referred to as <strong>RAID-10</strong>. RAID-10 is actually just a combination of RAID-1 and RAID-0. Depending on the controller, you should be able to create striped disks that are mirrored (<strong>RAID-0+1</strong>) or mirrors that are striped across multiple disks (<strong>RAID-1+0</strong>).</p>
<p>For RAID-5 you need at least 3 disks. The first two disks stripe the data, and the third disk stripes the parity. (Actually with RAID-5 the parity stripe rotates between all the disks, but it’s easier to think of it as one disk holding the parity information. Incidentally, RAID-4 uses one dedicated disk for parity just as I described.) RAID-5 is nice because after the initial investment in three disks, you can add additional disks to the array and utilize them fully (well, up to the size of the smallest disk in the array) without losing any more overhead for parity.</p>
<p>Just like with RAID-10, there are a few RAID-5 permutations available. <strong>RAID-50</strong> (or <strong>RAID-5+0</strong>) consists of a series of RAID-5 groups striped in RAID-0 fashion to improve the RAID-5 performance without reducing data protection. There is also <strong>RAID-53</strong> (or <strong>RAID-5+3</strong>) which combines striping (in RAID-0 style) with RAID-3’s virtual disk blocks (something I won’t delve into). This offers higher performance than RAID-3 but at much higher cost.</p>
<p>You may see the term <strong>RAID-1+5</strong> when ordering a server. This usually indicates the first two drives in the server are configured for mirroring (RAID-1), and the remaining drives in the system (at least 3) are configured for striping with parity (RAID-5). This is common for Microsoft Windows-based servers. The operating system and other applications are stored in the mirror. Data, which usually benefits from the multiple simultaneous-writing features of striping, are stored in the RAID-5 portion of the array.</p>
<p>There are other RAID configurations available, but they are not widely supported at this time. Some worth noting are:</p>
<ul>
<li><strong>RAID-6</strong>: This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance.</li>
<li><strong>RAID-7</strong>: This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer.</li>
<li><strong>RAID-S</strong>: This is an alternate, proprietary method for parity RAID from EMC Symmetrix. It appears to be similar to RAID-5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array.</li>
</ul>
<p>Finally, another term that you frequently hear along with RAID is JBOD. Remember back to the example of the first array you designed? You created an array of disks that filled up sequentially—when disk 1 was full, things spilled over onto disk 2, and so on. That type of an array (no striping, no parity) is considered Just a Bunch Of Disks, or <strong>JBOD</strong>. They don’t make a whole lot of sense in most cases, but people still use them. They do not have the speed of striping, and naturally there’s no fault-tolerance. They are just a bunch of disks available for sequential storage of data, much like magnetic tape or an optical disk. In fact, using them as an interface between another sequential storage medium is one of the few things they do pretty well.</p>
<p>Throughout this article, I’ve used the term <strong>controller</strong>. The controller can either be a piece of hardware or some software running on the server. Software controllers are generally a poor option for most situations because the operating system and CPU of the computer have to process and manage all the data flowing to and from the disks. Mirroring is usually faster in this situation than striping, and adding parity to the mix bogs things down even more. Hardware controllers are much lower in price than they used to be, and should always be considered an important component in a server. Many server-class motherboards now offer RAID (at least RAID-0, RAID-1, and JBOD) support built-in. Even many higher-end desktop motherboards offer RAID support for SATA (Serial ATA) drives.</p>
<p><strong>Summary</strong><br />
The three types of RAID you are most likely to encounter or care about are RAID-0, RAID-1, and RAID-5.</p>
<p>RAID-0 is fastest performer and the least expensive option, but it offers no protection against faults. It’s great for situations where you need a lot of speed and fault-tolerance is not very important. You might find it used for storing cached or temporary information that needs to be retrieved quickly, but is harmless if lost.</p>
<p>RAID-1 is just as fast as RAID-0, but because it requires twice the disks as RAID-0, it’s usually the most expensive option. If one half of the mirror has a problem, the good half takes over (temporarily becoming a RAID-0 array) until the bad half can be replaced. Regeneration of the mirror after a failure is usually pretty fast, and it may even be automatic.</p>
<p>RAID-5 is slower than RAID-1 because it has to write extra parity data to a disk in the array, but it only requires one extra disk total, no matter how large the array grows. Many controllers allow for hot swapping of bad disks, and like RAID-1, regeneration of the array is usually fast and automatic.</p>
<p>Innovative engineers have figured out ways to combine various RAID methods to improve performance or tweak other measurements, but most are based on some combination of these three methods (e.g., RAID-10 actually being RAID-0+1 or RAID-1+0).</p>
<p>RAID-1+5 is a little different. It uses RAID-1 (mirroring) for the first two drives in a server, and the operating system and applications are generally stored there. The remaining three or more disks use RAID-5 for storing data, since striping generally works well for storing frequently changing data.</p>
<p>RAID controllers are a built-in option on many server and higher-end desktop PC motherboards. If you have that option and your budget is tight, use it! Otherwise, spend the money for a better hardware RAID controller for more options and better performance. Software-based RAID is an option is most server operating systems, but performance is generally much worse than with hardware RAID, but may still be a better option than doing without RAID completely.</p>
<p>Some information was taken from WhatIs.com’s <a href="http://searchstorage.techtarget.com/gDefinition/0,294236,sid5_gci214332,00.html">definition of RAID</a>.</p>
<!-- google_ad_section_end -->]]></content:encoded>
			<wfw:commentRss>http://www.allogro.com/main/2006/08/30/8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://www.allogro.com/main/icons/Nuvola/128x128/devices/raid.png" />
		<media:content url="http://www.allogro.com/main/icons/Nuvola/128x128/devices/raid.png" medium="image">
			<media:title type="html">Raid icon</media:title>
		</media:content>
	</item>
	</channel>
</rss>
