washu wrote:
Yes, flash drives require more error correction as time goes on. So do mechanical hard drives. So does everything else in a modern computer. Computer buses, Ethernet, storage links, etc, they all have more error detection/correction as things get faster/bigger. It's the nature of the beast.
You may be right, but somehow I don't think it is so simple."
I know from the emails I get that many readers think that once they've looked at the single issue of flash endurance - they've covered covered the bases for enterprise SSDs. While endurance remains a challenge for each new flash SSD generation - it's only a single one of many dimensions in the SSD life mix. That's why (in 2008) StorageSearch.com started this directory of definitive technology articles to help guide readers through the reliability maze. "
http://www.storagesearch.com/ssd-data-art.htmlWith Computer buses, Ethernet and similar situations, all you have to do is detect the corruption and then you resend the data to correct it. With silent SSD corruption, there is no good way to correct the corruption even if you can identify it. The original data is now gone. SSD technology has some inherently unique cross-cell data corruption problems.
SEE ALSO:1. French Retailer Data Offers SSD Failure Rateshttp://www.macobserver.com/tmo/article/ ... ure_rates/2. SSD "Silent Data Corruption""CRC can only identify that an error has occurred. It cannot correct it, but it does prevent "silent data corruption.""
"Currently the only way to validate that the drive does not have this silent corruption problem is with explicit testing. One method would be to run a test where the LBA itself and an incremental value are stored in that LBA. The host test system would ensure that the data returned contains the correct LBA and incremental value. If the wrong value is returned the host would know the drive failed the test.
This problem can also occur if there is a power failure while the LBA is being updated, the table may contain only information for the data that is now stale. To prevent errors like this, designers can use on-board capacitors (SuperCaps) to provide temporary power during a failure so that the buffers can be flushed and the LBA table properly updated.
Lastly, there is the issue of silent corruption due to firmware bugs. The only way to guard against firmware bugs is through extensive directed testing so that products are known to be bug-free before being put into production."
http://www.storagesearch.com/sandforce-art1.html3. Enterprise Storage Forum " During programming, a cell will create a field large enough to perhaps change the properties of a neighboring cell. While designers go to great lengths to make sure that this doesn't happen or that the SSD can compensate for this happening, this phenomenon still occurs and can lead to silent data corruption.
The silent data corruption scenario is fairly simple – a cell has some voltage applied to it, a bias voltage or a program/erase voltage. The resulting field can disturb neighboring cells changing their properties. For example, the number of electrons in the cell can decrease or electrons can be tunneled into the cell. In either case, it's possible to change the value in the cell silently."
"However, it is important to realize that you can get data corruption on the SSD at some point."
"Reducing lithography size brings the cells closer together, reducing the distance between the source and the drain. This allows more cells in a given space, hopefully reducing costs and allowing SSDs to have larger capacities. However, the one fundamental aspect that does not change with lithography size is the voltages that are applied to the cells. The 5V bias voltage has to be applied to bring the cells to a conductive state, you still need 0.5V to read a cell, and 20V to program/erase a cell. However,
existing data corruption problems may actually get much worse because the EM fields are the same size and will be stronger in neighboring cells as they are closer together. This only makes the problem of possible data corruption worse"
http://www.enterprisestorageforum.com/t ... t-Year.htm