SSD Data Integrity

Silencing hard drives, optical drives and other storage devices

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

SSD Data Integrity

Post by ces » Sat Feb 25, 2012 5:05 am

SSD Endurance: Data Integrity is Marathon, Not a Sprint

"As the modern go-to technology for Tier 0 enterprise storage, SSDs house business’s most critical data and are core to generating revenue. Split-second access to this data is important, but if that data should become corrupted, the most rapid access in the world won’t matter."

http://www.tomshardware.com/us/sponsore ... urance-120

Does anyone have any comments or thoughts on this subject?

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Feb 25, 2012 5:07 am

ces wrote:
Vicotnik wrote:
boost wrote:The Z68's SSD caching is called Intel Smart Response. The SSD caches reads (and writes in high performance mode) to a raid array. You need at least to hard drives for a raid array.
Are you sure about that? From what I heard a single HDD will do, no need for an array. That said I don't think much of SRT, too much complexity and reliance on driver support.
I read that you need one HDD and one SSD... but you have to select a configuration option titled RAID to configure it to work.

Can you share more of your thoughts on your caution... and perhaps what experiences have contributed to your caution.

I am struggling with the need for speed vs a Seagate whitepaper cautioning that data is more easily subject to corruption over time on an SSD. Given how flash memory works, as well as my own experience with SSDs, that caution sort of makes sense to me. SRT seems to represent an optimal solution to this problem.

Ralf Hutter
SPCR Reviewer
Posts: 8636
Joined: Sat Nov 23, 2002 6:33 am
Location: Sunny SoCal

Re: SSD Data Integrity

Post by Ralf Hutter » Sat Feb 25, 2012 6:22 am

ces wrote:Split-second access to this data is important, but if that data should become corrupted, the most rapid access in the world won’t matter."
That's the exact reason I still haven't bit on SSDs yet. And the only one I've ever considered buying is the seemingly most reliable of all, the Intel X-25m.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Feb 25, 2012 8:38 am

Ralf Hutter wrote:
ces wrote:Split-second access to this data is important, but if that data should become corrupted, the most rapid access in the world won’t matter."
That's the exact reason I still haven't bit on SSDs yet. And the only one I've ever considered buying is the seemingly most reliable of all, the Intel X-25m.
I bought two of them just as they were being end of lifed just for that reason.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Feb 25, 2012 8:40 am

CA_Steve wrote:Here's a nice review on SRT at Anandtech. Tradeoffs:
- Z68 and small SSD = can overclock, use SRT, and Quick Sync (the latter via Virtu). You may not have any apps that make use of Quick Sync. Z68 boards are pricey.
- P67 and larger SSD = can overclock. Mobo cheaper than Z68. Don't have to muddle with Virtu. Can load all apps on SSD and media on HDD.
- H67 and larger SSD = no overclocking, has Quick Sync (via Virtu). Mobo cheaper than P67. Can load all apps on SSD and media on HDD.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Mar 03, 2012 5:54 pm

Does anyone have any more opinion or data on SSD Data Integrity?

Are Seagate's claims BS or is there some substance to them?

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: SSD Data Integrity

Post by washu » Sat Mar 03, 2012 7:00 pm

1. It's basically an ad piece written by Seagate. Even if they have some SSDs on the market, they have a vested interest in making mechanical hard drives look better. That alone makes the article very suspect.

2. The article claims to be about "Data Integrity", but it is actually about Data Availability. Not very credible if they don't even get the terminology correct.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Mar 03, 2012 7:21 pm

washu wrote:The article claims to be about "Data Integrity", but it is actually about Data Availability
It talks about data integrity and that is an issue for me. It sort of makes sense, these flash drives are requiring more and more error correction with each generation.

The problem is that Seagate is taking a self serving position. I am seeking out any other authority on the subject. This is all I have found:

At a meeting of the at the 10th Usenix Conference on File and Storage Technologie:

SSDs Have a 'Bleak' Future, Researchers Say
"...as NAND flash densities increase, so do issues such as read and write latency and data errors"
http://www.computerworld.com/s/article/ ... rchers_say

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: SSD Data Integrity

Post by washu » Sat Mar 03, 2012 7:55 pm

The article mentions data integrity in passing, but all the failure modes given are data availability problems.

Yes, flash drives require more error correction as time goes on. So do mechanical hard drives. So does everything else in a modern computer. Computer buses, Ethernet, storage links, etc, they all have more error detection/correction as things get faster/bigger. It's the nature of the beast.

Simple example: PCI express has much greater error detection then old PCI. Where are the articles about PCIe data corruption? None, because it's a non-issue. It was designed with the appropriate level of error detection. Same with SSDs. As long as the error detection/correction is appropriate for the task at hand then there is no issue.

The reason you can't find much about the issue is because again it is a non-issue. There were similar articles in the past about how mechanical drives would reach their limit and be unusable because of errors. They were wrong then and this article about SSDs is wrong now.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Sat Mar 03, 2012 10:39 pm

washu wrote:Yes, flash drives require more error correction as time goes on. So do mechanical hard drives. So does everything else in a modern computer. Computer buses, Ethernet, storage links, etc, they all have more error detection/correction as things get faster/bigger. It's the nature of the beast.
You may be right, but somehow I don't think it is so simple.

"I know from the emails I get that many readers think that once they've looked at the single issue of flash endurance - they've covered covered the bases for enterprise SSDs. While endurance remains a challenge for each new flash SSD generation - it's only a single one of many dimensions in the SSD life mix. That's why (in 2008) StorageSearch.com started this directory of definitive technology articles to help guide readers through the reliability maze. "
http://www.storagesearch.com/ssd-data-art.html

With Computer buses, Ethernet and similar situations, all you have to do is detect the corruption and then you resend the data to correct it. With silent SSD corruption, there is no good way to correct the corruption even if you can identify it. The original data is now gone. SSD technology has some inherently unique cross-cell data corruption problems.

SEE ALSO:

1. French Retailer Data Offers SSD Failure Rates
http://www.macobserver.com/tmo/article/ ... ure_rates/

2. SSD "Silent Data Corruption"
"CRC can only identify that an error has occurred. It cannot correct it, but it does prevent "silent data corruption.""
"Currently the only way to validate that the drive does not have this silent corruption problem is with explicit testing. One method would be to run a test where the LBA itself and an incremental value are stored in that LBA. The host test system would ensure that the data returned contains the correct LBA and incremental value. If the wrong value is returned the host would know the drive failed the test.

This problem can also occur if there is a power failure while the LBA is being updated, the table may contain only information for the data that is now stale. To prevent errors like this, designers can use on-board capacitors (SuperCaps) to provide temporary power during a failure so that the buffers can be flushed and the LBA table properly updated.

Lastly, there is the issue of silent corruption due to firmware bugs. The only way to guard against firmware bugs is through extensive directed testing so that products are known to be bug-free before being put into production."
http://www.storagesearch.com/sandforce-art1.html

3. Enterprise Storage Forum

" During programming, a cell will create a field large enough to perhaps change the properties of a neighboring cell. While designers go to great lengths to make sure that this doesn't happen or that the SSD can compensate for this happening, this phenomenon still occurs and can lead to silent data corruption.

The silent data corruption scenario is fairly simple – a cell has some voltage applied to it, a bias voltage or a program/erase voltage. The resulting field can disturb neighboring cells changing their properties. For example, the number of electrons in the cell can decrease or electrons can be tunneled into the cell. In either case, it's possible to change the value in the cell silently."

"However, it is important to realize that you can get data corruption on the SSD at some point."

"Reducing lithography size brings the cells closer together, reducing the distance between the source and the drain. This allows more cells in a given space, hopefully reducing costs and allowing SSDs to have larger capacities. However, the one fundamental aspect that does not change with lithography size is the voltages that are applied to the cells. The 5V bias voltage has to be applied to bring the cells to a conductive state, you still need 0.5V to read a cell, and 20V to program/erase a cell. However, existing data corruption problems may actually get much worse because the EM fields are the same size and will be stronger in neighboring cells as they are closer together. This only makes the problem of possible data corruption worse"

http://www.enterprisestorageforum.com/t ... t-Year.htm

Dirge
Posts: 111
Joined: Sun Apr 11, 2004 8:55 pm
Location: New Zealand

Re: SSD Data Integrity

Post by Dirge » Thu Mar 08, 2012 3:22 pm

Right now it appears SSDs don't have much of an advantage of traditional magnetic storage in regards to reliability. Check out the following two links for further studies.

http://www.tomshardware.com/reviews/ssd ... ,2923.html

http://www.zdnet.com/blog/storage/ssd-r ... rbxccnbzd1

Elrast
Posts: 23
Joined: Fri May 09, 2008 9:40 pm
Location: Australia

Re: SSD Data Integrity

Post by Elrast » Fri Mar 09, 2012 4:34 am

Spent a bit of time pondering about this for my build. The issue *may* be exacerbated for SSDs, but its by no means absent in HDDs.

There's three mitigation paths (which can be used concurrently) -

* Use of ZFS with RAID-Z (admittedly a small audience). There's a few other esoteric file systems that does "live" bit level correction. This corrects for a flipped media bit transparently. Maybe we'll see more of this when Windows 8 server comes round (new FS coming).

* Incremental backup. Assuming you keep a long enough backup chain, even if a bit is flipped and backed up as flipped, you could go back to a previous unflipped version. Trick is you may not know whether the current version has flipped bits in it...

* Protect high priority files with checksum/hash/PAR2. Hashes and the like will detect changes. PAR2 is actually a FEC that can unflip media errors. Pain to maintain, unless the majority of files on the system are static, or you can integrate the checksum generation into the application level (ie save file = save file to FS, then create new checksum)

I'm going with option #2 and keeping my fingers crossed

HFat
Posts: 1753
Joined: Thu Jul 03, 2008 4:27 am
Location: Switzerland

Re: SSD Data Integrity

Post by HFat » Fri Mar 09, 2012 11:18 am

You can also be careful and prevent most data corruption problems. It's not only bit-flip that's an issue.
a) KISS
b) test your hardware and software extensively
c) use ECC RAM
d) use battery-backed controllers (or enough caps as Intel put in the SSD controller of the 320) and/or a decent UPS

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Fri Mar 09, 2012 12:34 pm

HFat wrote:c) use ECC RAM
I have thought about it, but I thought you have to get server motherboards to get that. Are there any popular enthusiast Asus or Gigabyte motherboards that support ECC. Do the consumer Intel CPUs even support ECC?

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Fri Mar 09, 2012 12:36 pm

HFat wrote: use battery-backed controllers (or enough caps as Intel put in the SSD controller of the 320) and/or a decent UPS
I was unaware that intel put capacitors in the 320. Do you know if the 520 or any other Intel SSDs also have capacitors in them. Did the old 32mm Intel SSDs have capacitors in them?
Last edited by ces on Fri Mar 09, 2012 12:47 pm, edited 1 time in total.

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: SSD Data Integrity

Post by washu » Fri Mar 09, 2012 1:42 pm

ces wrote:I have thought about it, but I thought you have to get server motherboards to get that. Are there any popular enthusiast Asus or Gigabyte motherboards that support ECC. Do the consumer Intel CPUs even support ECC?
Note: Only referring to current SB CPUs.

None of the consumer Intel CPUs combined with a consumer chipset support ECC. The low end consumer CPUs (Celeron, Pentium, i3) support ECC when combined with a server (C200 series) chipset. None of the high end (i5, i7) CPUs support ECC in any case.

Basically the low end CPUs that don't have a Xeon equivalent support ECC for cheap servers.

ces
Posts: 3395
Joined: Thu Feb 04, 2010 6:06 pm
Location: US

Re: SSD Data Integrity

Post by ces » Fri Mar 09, 2012 2:20 pm

washu wrote:None of the consumer Intel CPUs combined with a consumer chipset support ECC. The low end consumer CPUs (Celeron, Pentium, i3) support ECC when combined with a server (C200 series) chipset. None of the high end (i5, i7) CPUs support ECC in any case. Basically the low end CPUs that don't have a Xeon equivalent support ECC for cheap servers.
Probably even then, them memory isn't sold through typical consumer channels I would think.

How much benefit is there to the ECC memory?

Mr Evil
Posts: 566
Joined: Fri Jan 20, 2006 10:12 am
Location: UK
Contact:

Re: SSD Data Integrity

Post by Mr Evil » Fri Mar 09, 2012 4:27 pm

ces wrote:Probably even then, them memory isn't sold through typical consumer channels I would think.
A lot of places don't stock it, but it isn't really hard to find.
ces wrote:How much benefit is there to the ECC memory?
Google released some statistics on RAM errors a couple of years ago. The conditions their servers run under might make the results quite different than for a desktop machine, but it seems likely that errors are more common than people assume.

This is one of the reasons I stick to AMD - all their CPUs cupport ECC (unbuffered only, in the case of non-server chips), and a large proportion of motherboards for them do too.

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: SSD Data Integrity

Post by washu » Fri Mar 09, 2012 4:57 pm

ces wrote: Probably even then, them memory isn't sold through typical consumer channels I would think.
How much benefit is there to the ECC memory?
ECC isn't too hard to find if you need it.

For mission critical stuff I would always use it. All our critical servers where I work use ECC and I would never use anything else.

That being said, I think a lot of ECC "studies" are scaremongering. I remember reading a study from the mid 90s (I think from IBM) that if it held true would mean modern computers would have memory errors every 2-3 minutes. If that were true a modern desktop with with 4-16 GB of RAM would crash before your OS was even installed.

HFat
Posts: 1753
Joined: Thu Jul 03, 2008 4:27 am
Location: Switzerland

Re: SSD Data Integrity

Post by HFat » Fri Mar 09, 2012 9:53 pm

I haven't seen these scaremongering studies. What I've seen led me to believe that the numbers of errors aren't a function of the amount of RAM you've got but more of the amount of chips (or something like that) and that the error distribution is far from random as you seem to assume with most error being caused by bad DIMMs. You could therefore have relatively high average number of errors with most users very rarely experiencing one (and most likely not seeing it when it happens).
That also led me to believe you could avoid the worst pitfalls of non-ECC RAM for non-critical computers by regular and rigorous testing. But the way I see it, it's usually simpler and better to get ECC RAM if you're that worried.

Post Reply