In this paper, the author takes the ZFS as an example to analyze the effects
of disk and memory corruption on file system data integrity. ZFS is a
state-of-the-art file system which embedded a great many mechanisms for data
integrity. The study shows that ZFS is robust to a wide range of disk and
memory corruption.
The citations in this paper provide us much information. Some of them can be
cited as evidences in our later work.
1 The existence of hardware-based memory corruption
R. Baumann. Soft errors in advanced computer systems. IEEE Des. Test, 22
(3):258–266, 2005.
T. C. May and M. H. Woods. Alpha-particle-induced soft errors in dynamic
memories. IEEE Trans. on Electron Dev, 26(1), 1979.
J. F. Ziegler and W. A. Lanford. Effect of cosmic rays on computer memories.
Science, 206(4420):776–788, 1979.
X. Li, K. Shen, M. C. Huang, and L. Chu. A memory soft error measurement on
production systems. In USENIX, 2007.
T. J. O’Gorman, J. M. Ross, A. H. Taber, J. F. Ziegler, H. P. Muhlfeld, C. J.
Montrose, H. W. Curtis, and J. L. Walsh. Field
testing for cosmic ray soft errors in semiconductor memories. IBM J. Res.
Dev., 40(1):41–50, 1996.
B. Schroeder, E. Pinheiro, andW.-D.Weber. DRAM errors in the wild: a large-
scale field study. In SIGMETRICS, 2009.
2 bugs lead to “wild writes” into random memory contents
J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, and A. Gupta.
Hive: Fault Containment for Shared-Memory Multiprocessors. In SOSP, 1995.
CERT/CC Advisories.
http://www.cert.org/advisories/.
Kernel Bug Tracker.
http://bugzilla.kernel.org/.
US-CERT Vulnerabilities Notes Database.
http://www.kb.cert.org/vuls/.
Y. Xie, A. Chou, and D. Engler. Archer: using symbolic, pathsensitive
analysis to detect memory access errors. In FSE, 2003.
3 disk corruptions caused by spikes in power, erratic arm movements, and
scratches in media.
D. Anderson, J. Dykes, and E. Riedel. More Than an Interface: SCSI vs. ATA.
In FAST, 2003.
T. J. Schwarz, Q. Xin, E. L. Miller, D. D. Long, A. Hospodor, and S. Ng. Disk
Scrubbing in Large Archival Storage Systems.In MASCOTS, 2004.
The Data Clinic. Hard Disk Failure.
http://www.dataclinic.co.uk/hard-disk-
failures.htm.
4 complexities in modern disk firmware
V. Prabhakaran, L. N. Bairavasundaram, N. Agrawal, H. S. Gunawi, A. C.
Arpaci-Dusseau, and R. H. Arpaci-Dusseau. IRON File Systems. In SOSP, 2005.
5 disk corruptions caused by firmware error
a) write to the wrong location
G. Weinberg. The Solaris Dynamic File System.
http://members.visi.net/thedave/sun/DynFS.pdf.
b) disk lost a write but report complete.
R. Sundaram. The Private Lives of Disk Drives.
http://partners.netapp.com/
go/techontap/matl/sample/0206tot resiliency.html.
c) error caused by bus
R. Green. EIDE Controller Flaws Version 24.
http://mindprod.com/jgloss/eideflaw.html.
J. Wehman and P. den Haan. The Enhanced IDE/Fast-ATA FAQ. http://thef-
nym.sci.kun.nl/cgi-pieterh/atazip/atafq.html.
6 data corruption caused by buggy device drivers
A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An Empirical Study of
Operating System Errors. In SOSP, 2001.
D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as Deviant
Behavior: A General Approach to Inferring Errors in Systems Code. In SOSP,
2001.
M. M. Swift, B. N. Bershad, and H. M. Levy. Improving the Reliability of
Commodity Operating Systems. In SOSP, 2003.