Main > Everything Else
Workflow server
mystic96:
--- Quote from: kahlid74 on March 22, 2013, 09:15:02 am ---
--- Quote from: mystic96 on March 22, 2013, 08:56:29 am ---
--- Quote from: MonMotha on March 21, 2013, 11:44:05 pm ---You can also of course get "enterprise" SSDs. I've seen 2TB models for ~$2000, which is what a decent array of 4-5 "enterprise" HDDs will run you, albeit for less capacity, and the performance will totally kill any spinning metal array you can make for that price. There's a reason they come as 8x PCIe cards. I'm not sure that the reliability is any better than consumer models, though, but that's generally true of "enterprise" HDDs too (they just have better error reporting since the consumer ones are crippled for marketing reasons).
--- End quote ---
I can't say with 100% assurance that this is true of all makes, but EMC's ent SSD drives actually have double the stated capacity. The second set of chips are left in a low power state and when an in-use chip hits it's max write count then that data is moved to a standby chip and the original is disabled, pointers are updated, etc, etc. The big thing about enterprise class drives is that they are meant to be spun 95%+ of the time, and their MTBFs are rated accordingly.
Any particular article you can point me to regarding the raid 5 issue you speak of? I'm having a really hard time digesting that having never experienced it myself.
--- End quote ---
Basically what Mon Motha is saying is that RAID5 since it's origin, has flaws, which were masked by small drive sizes. With 1-4 TB drives now, a RAID5 of 8 disks each with 4TB is a time bomb waiting to go off. The likely hood of a data set on two drives failing is considerably high because of how much space they hold.
--- End quote ---
Oh I got what he said, he's just clearly been misinformed.
The whole point of raid 5 is the parity, or, redundancy. The "data set" (block, in IT speak) isn't mirrored across two drives like a raid 1 - it's calculated. That's why it sucks at write speed, because every write i/o on the app side causes 4 i/os on the controller side -- 2 (shoot, maybe 3... I forget) of those having to do with re-calculating and updating the parity. The great thing about parity is that it isn't just stored on a single drive. That's why you can remove an entire drive from the array and not lose ANY data... because parity (I'm going to meme that later). A block of data is a block of data, it doesn't matter the capacity of the spindle it resides on.
Since it hasn't been posted yet, I took a break from this response to look for this misinformation on the web, but was only able to find the opposite (no surprise). Not that I know this site, but it was the first response and yet still factually correct while having a title that sounds quite the contrary! Here's my source: http://www.standalone-sysadmin.com/blog/2012/08/i-come-not-to-praise-raid-5/ , and here was my Google search query: Raid 5 on large drives losing data
--- Quote ---Now, lets move on to RAID-5. You need at least 3 drives in a RAID-5, because unlike the exact copy of RAID-1, RAID-5 has a parity, so that any individual pieces of data can be lost, and instead of recovering the data by copying it, it's recalculated by examining the remaining data bits.
So when we encounter a URE during normal RAID operations, the array calculates what the missing data was, the data is re-written so we'll have it next time, and the array carries on business as usual.
--- End quote ---
URE = unrecoverable read error, more common in larger drives w/ more spindles.
--- Quote from: kahlid74 on March 22, 2013, 09:15:02 am ---By comparison, if you had EMC walk in the front door to design your new system, they would use a max of 6 drives per group at RAID6 and then make super groups of RAID 60 for your data.
--- End quote ---
:lol Come on man... be honest. How many meetings have you had with EMC about proper solution design? Because we've got PBs of DMX and VMAX here (need I mention the other manufacturers?), and I can tell you exactly how many times I've heard raid 6/60 as the answer to anything other than a joke - zero.
MonMotha:
Unrecoverable read errors happen. RAID5 protects against one, which is great. However, suppose you lose an entire drive. It happens. You now have no redundancy. You go to rebuild the array and, lo and behold, you get another unrecoverable read error on another drive. It happens amazingly often, and, as you point out the frequency that it happens at is essentially a function of the amount of data you have, not the number of drives you have. Hence, with modern really high capacity drives, you're more likely to have it happen than you were on the old 320GB things. The thing is, you've probably got a lot of infrequently accessed data. You may not notice that there's an unreadable sector on the drive until you go to do a full rebuild due to that failed drive.
RAID6 gives you a second layer of redundancy so that you can actually have a really good chance at a successful rebuild.
You can also routinely "scrub" a RAID5 array to try and find those errors while you still have the N+1 redundancy from the parity available, but that has pretty nasty overhead. If you've got a lot of downtime on your server, it can be OK. If it's a 24/7 active server, it can be a real problem. I do this on my 2-element RAID 1 arrays since there's a similar problem. The IO hit during the scrub sucks bigtime, but it's acceptable in most of my situations, and the tradeoff of going to a 3 element RAID1 or 4 element level 1+0 or 6 could not be justified.
And yes, I'm well aware of the extra "unused" capacity set aside on SSDs for wear leveling. I've done a fair bit of work with bare MTD devices on Linux going back to the early days of JFFS2 before NAND was popular. Even if you do "wear out" the flash (which is amazingly hard to do except in pathological cases), it "should" fail read-only. Sadly, controller bugs, seem to cause...other modes of failure most of the time, and seemingly well before erase cycle limitations have probably been reached, in most cases, assuming proper wear leveling.
As to your 60PB system, they're probably doing things well beyond simple RAID to make that work.
ark_ader:
Mirrored drives, an image database that is denormalized, added cache and...this always makes me laugh:
:notworthy:
mystic96:
--- Quote from: ark_ader on March 22, 2013, 08:06:34 pm ---Mirrored drives, an image database that is denormalized, added cache and...this always makes me laugh:
:notworthy:
--- End quote ---
OMFG, that is by far the baddest-assed thing I have ever seen!!!! Thanks for sharing, that quickly got forwarded to work buddies :lol
MonMotha,
Sorry dude - I wrote a response and hit Post - lost it due to timeout for my account :banghead:
Quick & dirty because I have to pick up the lady here in a few -- UREs happen very infrequently, certainly not even close to frequently enough to worry about losing an array. I think that article that I cited above said you would have to read a 3TB drive from sector 1 to end 3 times over before experiencing a single URE... and said URE would be fixed nearly instantly via the raid controller. I'm sorry, but you're just absolutely mistaken about raid 5's inability to protect data using larger drives. Multiple drive failures are statistically impossible for all but the worst-case scenarios using improper gear. I've never seen it in prod, though I did see it once on a server that got power spiked. Again - worst case scenario for being on unclean power. Even a mom & pop shop can afford a $50 UPS.
As for raid 6, again, sorry but at best it's a niche solution - much the same as raid 50 is a niche deal. And if you're talking about a small array (say 4 drives, as in the bare minimum required) - you'll never find a single storage person that would recommend using that over a raid 10. Your r6 still loses 2 drives, has nearly double the overhead of r5 (two parities, remember?), and way less performance than an r10.
Because I want to stay here for a while and not be "that guy," I want to close with -- we all have to run our own environments our own way. My initial response/hammer throwing regarding raid 5 may have been construed as over the top and I apologize if it was taken as an attack. But I felt it necessary to point out that it was incorrect due to the fact it was directed towards OP who was looking for advice. I wish you all as few failures and lowest queue lengths as possible with whatever solution you choose :cheers:
P.S. I didn't say 60PBs (I intentionally left the number blank - even 1pb is enough to show my employer takes storage freakishly serious).
MonMotha:
Everybody I've talked to has expressed very serious concern over getting an unexpected URE during RAID5 rebuild. I've seen this sentiment expressed pretty widely. Either it's a total myth that's propagated widely, or it has merit. The logic certainly makes sense. If you don't scrub your RAID5 with some regularity, it could bite you hard, and RAID6 would save your butt at the expense of just needing one more drive (consider it's only a real issue on largeish arrays - >6 disks or so - where the incremental cost of one more disk isn't generally a huge deal). There may also be a small performance issue due to the double parity, but it seems like there's probably a striping pattern that mostly nulls that back out. At this point, it seems like your goal is probably capacity over performance - see below regarding my experiences with non-SAN scale RAID arrays vs. SSDs.
You may be right; I don't normally build storage systems bigger than a few TB, which is easily done with RAID 1 or 5 still, especially since I can deal with scrubbing it once a month or so. I just know I've seen it mentioned a lot over the past couple years, and everybody says its of specific importance on drives >1TB or so.
As to performance, it depends on your strategy. If you can do the parity calcs at full speed (easily done these days in software, and with minimal overhead) and have "pessimistic" reading (where you always check parity/mirror on read, even if the drive indicates no error), you've got more potential bandwidth from a 6 disk RAID6 than a 3+3 RAID 1+0, and you get the capacity of 4 drives, not just 3. If you're willing to treat reads "optimistically", you can get higher read BW out of the 0+1 (since you can stripe the read across both halves of the mirror, or schedule separate IO ops), but you may miss an unsignaled read error (and this is where having "enterprise" drives makes a difference - "consumer" drives are frequently silent upon read error, whereas "enterprise" ones complain).
I'd suspect that a single high end consumer SSD will still blow away a 5-6 element RAID5, and probably even a RAID0 on random operations. Sequential may be more of a shootout. Reliability is tough to guess at. The single SSD has no redundancy, whereas a RAID5 will have N+1, but you've also got 5-6 times the devices to experience a failure in. RAID 1 on SSDs can unfortunately be of limited use due to controller glitches. Of course, for a given number of $$$, you'll get way more capacity out of the revolving metal in any case.