Question about Backblaze study

How, what, where and why - when using the software.
bilateral
Posts: 3
Joined: 2015.08.10. 06:10

Question about Backblaze study

Post by bilateral »

Hi, I have been reading about and trying out HDSentinel, and done some reading here on the forum, as well as your web site, and I must say I am very impressed. I am not exactly a Newbie--been around computers for over 30 years. These days I am content to use free versions of most things, but I think I am going to buy your pro version this time.

Since I've been researching and reading about HDD issues recently, I came across several articles about the ongoing Backblaze study. For anyone reading who doesn't know, Backblaze is a cloud storage service in the US. They claim to have 40,000 large Hard Disk Drives, so they've had the opportunity to test which brands and models have failed the most, and which SMART settings appear to correlate with failures.

I know that you have been "ahead of the curve" with your analysis of SMART information for some time, but I am wondering what you think of the information they have developed about HDD's. For example, if I understand correctly, one of the parameters they found was that if a drive experiences even ONE failed sector, the chances that it will fail soon increase exponentially from drives that have not had any failed sectors.

Has anything from the Backlaze data changed your thinking or influenced the settings or indicators you have on HDSentinel?

Thanks,
Michael
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Question about Backblaze study

Post by hdsentinel »

Thanks for your message and kind words !

Yes, I completely understand their opinion and conclusion - while personally I agree partially only.
Usually one (or even a relatively low number of) bad sectors may not as critical - but the most important is to be informed about any, even a minor problem (even one bad sector) as yes, there is high chance that the amount of problems increase very quickly and dramatically.

In the years, different articles released about how hard disk self-monitoring (S.M.A.R.T.) function works (if "works" at all). Some suggested that S.M.A.R.T. is bad, can't be used to determine real problems and can't be used to predict failures.
This was true for "traditional" S.M.A.R.T. evaluation methods: other tools and system BIOSes only identified the "critical" attributes and verified if their value reached a manufacturer-specified threshold. But they did not check the correlation between S.M.A.R.T. attributes, did not check the real number of problems and did not verify all attributes.
The result was that many errors ignored by them, S.M.A.R.T. did not show problems on almost-failed hard disk drives. Also hard disks usually failed completely before reached the "S.M.A.R.T. failure" level.

Generally, Hard Disk Sentinel designed from the very first version to provide solution for these issues: detect and report any kind of problems, yes, even one single bad sector, exactly to be informed about even a minor degradation as yes, usually (but not all cases) more and more problems follow, very quickly. This lead to corrupted / lost data and failed hard disk.

In theory, we'd not need to worry about those bad sectors as these are already fixed by the hard disk: then the disk uses the spare area instead of these bad sectors, all read and write operations are using the spare area. This is called "reallocation". So the original bad sector never re-used (regardless of OS, software, partitions, etc.. used).
So even if the hard disk report these bad sectors, it may still be used (this is why some people may even ignore them completely) - but ONLY if we confirm that all such problems fixed - and confirmed that the status is stable.

Yes, I completely agree that if one (or more) bad sectors found, the risk for more such bad sectors increase dramatically: in many cases, more and more reported from time to time.
This may risk data loss and cause system instability - generally having high number of reallocated sectors can be risky.

Personally I always recommend to perform intensive hard disk testing with the different test methods of Hard Disk Sentinel.
This way it is possible to determine
- if there are any, further, currently un-detected bad sectors (and if so, fix them by reallocation)
- if the hard disk status is stable (all possible bad sectors found and reallocated) - and in this case the hard disk can still be used.

See http://www.hdsentinel.com/faq.php#tests
for more information about these hard disk tests, which are recommended
- when installing a hard disk (even a new hard disk - as it may be damaged during shipping and/or installation and may have undetected problems)
- especially when installing an used hard disk
- any time when any (even minor) new problem found, the health decreased.

Usually, relatively low number of bad sectors (if we confirm with the tests that all problems are fixed by reallocation) can be accepted, as described at http://www.hdsentinel.com/faq.php#health
but of course on such drives, I'd use constant monitoring and configure alerts in Hard Disk Sentinel to be notified about any degradation, any new problem recorded in the "Log" page of the hard disk.

There are some techniques in Hard Disk Sentinel to increase attention about any, even minor new problems.
This is not designed to frighten users - but exactly to inform that something happened and need attention.

For more information about the biggest problems of S.M.A.R.T. evaluation in general (used in other tools / BIOSes) and how Hard Disk Sentinel is different, how it detects and reports the real number of issues, please check:

www.hdsentinel.com/smart

and the Help -> Appendix -> Health Calculation ( http://www.hdsentinel.com/help/en/52_cond.html )
This shows the foundamentals only, the actual calculation is more complicated and uses other methods, other S.M.A.R.T. attributes as well, depending on the hard disk model, firmware version and so.

Also the
http://www.hdsentinel.com/faq_repair_ha ... _drive.php
page describes how to diagnose, repair the problems - and after confirmed that the hard disk is stable, even acknowledge problems by clearing the error-counter, to be informed about possible new problems only.
Of course this is not possible on a failed hard disk or a hard disk with very bad health.

The http://www.hdsentinel.com/hard_disk_cases.php page and their sub-pages may be also interesting, as they show common situations (for example about bad sectors - and what happens if they remain un-detected for longer time).
bilateral
Posts: 3
Joined: 2015.08.10. 06:10

Re: Question about Backblaze study

Post by bilateral »

Thanks for your very detailed reply! I actually found you because of your excellent article on S.M.A.R.T. I have already read carefully many of the pages here on your site that you mentioned, and I expect to go back to them again. I will also look at the others.

In trying to understand the S.M.A.R.T. system that the industry established, in my reading I found that many of the manufacturers use different ways of both displaying and interpreting the SMART data reported. Does your software have a way of allowing for those differences for different manufacturers when it reads the data, and even different models of their HDD's?

Thanks again,
Michael
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Question about Backblaze study

Post by hdsentinel »

Yes, you are absolutely correct: different manufacturers use S.M.A.R.T. differently. When checking the S.M.A.R.T. data, attributes, their meanings, it is very important to know the current hard disk model ID and even firmware version - as these may determine how we should interpret the detected and reported data.

For example, there is a very common attribute "Raw Read Error Rate" (attribute 1, present almost on all modern hard disks).
On perfect, correctly working WD, Toshiba, Samsung hard disks, the "Data" field is usually 0 (or very minimal).
In contrast, on a perfect, correctly working Seagate hard disk it has very high values (even hundreds of millions) and can have high jumps up/down (which is completely normal - in this forum, there were some topics about this attribute).

Also I can name the "power on time" attribute. Some models count the power on time in seconds, minutes (or half-minutes), hours, etc
and this may change even with firmware version.
So it is important to interpret these fields correctly to get the accurate values displayed.

Generally Hard Disk Sentinel designed exactly to
- recognise the hard disk S.M.A.R.T. information for the actual hard disk (or SSD)
- report the possible problems (if found) in the text description and by the health % value

This way there is no need to manually check the attributes one-by-one (unless really required or if the user really interested of course ;)
and especially no need to compare an attribute (eg. the mentioned "Raw read error rate" but there are lots of others) across models as this may lead false assumption.

And yes, as Hard Disk Sentinel is independent from manufacturers (do not "favor" any model / manufacturer over others) it gives a clear picture about the current status, regardless of how the S.M.A.R.T. attribute values / the error-level thresholds set by manufacturer.
bilateral
Posts: 3
Joined: 2015.08.10. 06:10

Re: Question about Backblaze study

Post by bilateral »

That's very good. It is clear that you have thought a lot about how to make useful software for the purpose of detecting drive health status. Thanks for answering my questions, and I look forward to using your software on a regular basis.

Michael
Post Reply