For example you want to keep 1 pettabyte safe for 10 years. Let's say that only one in a million you want to lose some of that data.
The data is stored on a harddisk. As a single hardrive wouldn't be safe enough you decide to copy the data to a number of hardrives.
Now let's assume an almost perfect world where there is no flooding, war, other disasters. There are also no software bugs, driver bugs, ...
Just an anual failure rate of hardrives, say 4%. So each year every hardrive has a 4% chance on failure.
What happens if a hardrive fails?
You will replace the hardrive and copy the information from another hardrive. The time this takes consists of:
- detecting the failure
- speed of the rebuild divided by the disk size
Yeary disk failure rate: 4%
Disk size 1000GB
Rebuild rate 5MB/S
Rebuild time for a disk is than 5 days.
If you have a mirror of two disks: The chance that the first disks fails is 4% (per year)
The chance that the second disk fails during the rebuild is 4%*5/360=0.000556
The chance on both happening= 4% * 4%*5/360=2.22E-05
For one PB you need 1000 mirrors.
Chance on a failure is 1000*4% * 4%*5/360=2.22E-05 = 0.022
You want to keep it safe for 10 years: 10*0.022=0.22!
So with a mirror you have only 22% chance on keeping the PB safe for 10 years.
A triple mirror will give you: 0.000123 chance
4-way mirror:6.86E-08.
This for a perfect world!
Assume that your disks are stored offsite, or you don't daily check all disks.
Assume you need on average 90 days to detect a hardrive failure, replace the disk and copy the data over.
A 4-way mirror now has only 0.0004 chance on survival.
A five way mirror gets you 0.000004: 4 times in a million you will lose data!
You need a 6-way mirror to pass the one in a million constraint.