That different drive sizes may have different failure characteristics as well. Slightly different base failure rates, and BackBlaze has in the past speculated There is some evidence in the table above that different manufacturers may have Of course, a plot (see the top of the page) is more compelling than a table. Some of the large HGST drives have had a run of good luck as well. Have failure rates close to the mean of the prior, and the model suggests that Strongly influenced by the prior: all of the drives that were originally omitted If you look carefully, you’ll see that drives with smaller samples are more To get a more stable estimate of the distribution, we can omit AFRsĬomputed for drives with less than 1 million days of service: Reasonable prior distribution from the original data – which works here as Problem in baseball where he takes an empirical Bayes Library ( dplyr, nflicts = FALSE ) hdds % tibble :: as_tibble () %>% select ( mfg = MFR, name = Models, size = Drive.Size, days = Drive.Days, failures = Drive.Failures ) %>% mutate ( name = trimws ( gsub ( ",", "", name, fixed = TRUE )), days = as.integer ( gsub ( ",", "", days, fixed = TRUE )), failures = as.integer ( gsub ( ",", "", failures, fixed = TRUE )) ) %>% bind_rows ( omitted ) %>% mutate ( # Compute BackBlaze's "Annualized Failure Rate". Re-estimating Failure Rates using Empirical Bayesįirst, we can extract the data that is missing from the table but mentioned in Prior expectation of the failure rate (which might be close to the historicalĪverage across all drives) with observed failure events to produce a moreĪccurate estimate for each model. This looks like a perfect use case for a Bayesian approach: we want to combine a Text of the article and available in their public datasets). Less than 5,000 days of operation in Q4 2019 (although they are detailed in the The authors are sensitive to this possibility and suppress data from drives with This might lead us to question the accuracy for smaller samples in fact, Uses simple averages to compute the “Annualized Failure Rate” (AFR), despite theįact that the actual count data vary by orders of magnitude, down to a singleĭigit. One of the things that strikes me about the presentation above is that BackBlaze They’re also notable as the only large public Its hundreds of thousands of hard drives, most recently on Februaryįailure rate of different models can vary widely, these posts sometimes make a A Bayesian Estimate of BackBlaze's Hard Drive Failure RatesĮach quarter the backup service BackBlaze publishes data on the failure rate of
0 Comments
Leave a Reply. |