interesting article on RAID
Triple-Parity RAID and Beyond by Adam Leventhal | December 17, 2009 http://queue.acm.org/detail.cfm?id=1670144 As hard-drive capacities continue to outpace their throughput, the time has come for a new level of RAID. -- - Chuck (372 Days until IPv4 depletion: http://ipv4depletion.com/)
Yeah, it's complicated by the fact that a lot of RAID systems will consider any I/O error to mean the entire device is bad. ZFS does a particularly good job here, I've seen single raidz2 sets able to recover from having 4 simultaneously failing disks. I've had other vendor's arrays start failing disks while reconstructing from a single failure. Thankfully it was only 1 additional failure on a 2 parity set. Still PITA. On Thu, Mar 11, 2010 at 8:09 PM, Chuck Anderson <cra@wpi.edu> wrote:
Triple-Parity RAID and Beyond by Adam Leventhal | December 17, 2009
http://queue.acm.org/detail.cfm?id=1670144
As hard-drive capacities continue to outpace their throughput, the time has come for a new level of RAID.
Theo> Yeah, it's complicated by the fact that a lot of RAID systems Theo> will consider any I/O error to mean the entire device is bad. This is why Netapp has moved to their double-parity raid sets. It's a help, but not perfect. I'm going to read the article in a few and then comment in particular. Theo> ZFS does a particularly good job here, I've seen single raidz2 Theo> sets able to recover from having 4 simultaneously failing disks. Now that's interesting. How was performance during this issue? Theo> I've had other vendor's arrays start failing disks while Theo> reconstructing from a single failure. Thankfully it was only 1 Theo> additional failure on a 2 parity set. Still PITA. I once had a double disk failure in a Netapp. Ouch! This was before their double parity stuff was widely deployed. One disk died due to head problems, the other died due to the controller board on the disk dying. I managed to swap the good board from the bad hardware disk onto the good hardware, bad board disk, re-insert it and have the system come back up and rebuild. Big sigh of relief there for sure! Another thing I like about alot of newer raid implementations (including Linux RAID, assuming you use it) is the consistency checks or scrubbing that you can run, to hopefully catch errors before they become problems. Reading the disk(s) and checking blocks for consistency, and even re-writing them to make sure the data is good. John
participants (3)
-
Chuck Anderson
-
John Stoffel
-
Theo Van Dinter