SMART Tests - Short + Long Questions

dkhath · Post by **dkhath** » 2024.12.06. 07:00

I've come here via a three day's journey into night...so to speak. I bought a new NAS device (TerraMaster F4-212) and four WD Red Pro 20TB WD201KFGX drives. I've always run a bad blocks test (a day or more in length) for new hard drives I add to my other QNAP NAS servers, and I've never had any problems. With the new TerraMaster NAS (TOS 5.1), I could not find the same bad blocks test for the drive, so I ran the long SMART test. For all four drives, the test keeps getting aborted by the host. So, I tried a variety of other approaches to run the same test: 1) use ssh and smartctl commands to watch the drive while the test is running; 2) use a HD dock on my Win10 machine and software (WD Dashboard and GSmartControl). All of those produced the "aborted by host" error, although at one point I got the long test in GSmartControl to make it to 30% on one before I cancelled it (for resource purposes). I talked with a WD customer support, and they said to return the drives. But are these drives really bad if the SMART long test keeps getting aborted by host or "manually"? Is this a problem with the new NAS? (Probably not since it also happens on a different machine...perhaps for different reasons.)

I checked out HDS, and immediately purchased it. It is exactly what I want from an HD tool--fast, detailed, and easy-to-use. I can relive my glory days of watching blocks fill up as I did in the 90s when defragging a lab full of PC drives in DOS.

Reading through the forum here and in other locations on the Web, no one seems to express grave concern over a short or long SMART test aborting by host. I've never seen anyone ask what to do if they can NEVER get it to finish. Given my experience mentioned above:

If I run a SMART short test, and it is interrupted, how can I get it to successfully finish? I've tried changing power settings on my Win10, and it hasn't helped on the several drives I have tried.
If I can never get a short SMART test to finish, does that mean the drive is bad? (Logically, if I can't run a test to check, then how will I ever know?)
Let's say I do get the short SMART test to complete on a drive at some point and move on to testing with the long SMART test, and it fails repeatedly to complete (like my experience mentioned above), does this equate to the drive being bad?
If a SMART short or long test completion is not a concern, should I just skip them and go directly to Disk -> Surface test -> Read + Write test every time I test a drive?

Thanks in advance for your help and advice!

Post by **hdsentinel** » 2024.12.06. 13:05

The internal hardware hard disk self tests (SMART tests) are generally good to use for testing the drives, exactly because they run "inside" the drive itself. So in theory, these are unaffected from issues which may also related to the storage subsystem (eg. a data cable which may not provide perfect connection during intensive data transfer). Also they can help as they'd perform as fast as possible, even if the drive transfer performance would be limited (for example if a drive would be connected with slower USB 2.0 connection).

But as described in the Help:

https://www.hdsentinel.com/help/en/62_testfaq.html
"In some cases, these hardware tests (Disk -> Short self test, Disk -> Extended self test) are not available, not supported or they result in an error quickly even in relatively low number of problems. No further information is returned about the result, for example it is not possible to list the sector(s) which are damaged. In such case, an appropriate software testing method is required."

So generally the Short + Long hardware disk self tests can stop (fail) or may run for very long time (sometimes even weeks!) without completion - even on a 100% perfect drive.

If you receive "aborted by host" error, then in most cases the problem is the disk controller (motherboard chipset), more specifically its driver: sometimes it may be not 100% fully compatible with the drive or "just" stops the internal disk test for no reason.
Yes, sometimes related to power management settings, so disabling the HDD spin down in Windows Power Management settings can help - but sometimes this is not enough. For example, sometimes the Advanced Power Management setting/level (if supported by the disk drive) can cause troubles too for the self test.

I'd check if there may be updated SATA controller (chipset) driver and/or may try a completely different connection method:
- use USB adapter/dock if you connected by direct SATA port
- use SATA port or a different USB dock/adapter (or just different USB slot) if you tried USB connection.

If you use Report menu -> Send test report to developer option, it is possible to check the current connection / situation, it may give ideas, thoughts about what could be done.

But generally I'd not worry if the short / extended self test can't be used. Annoying - but it does NOT mean the drive itself is wrong.
Yes, I'd surely use Disk menu -> Surface test -> Read test in this case.
This is the best way to surely perform a complete surface scan and this is better than using other methods as (in addition to perform the complete scan) it reports
- sectors which may be slower than expected (eg. darker green blocks)
- sectors which are harder to read (yellow blocks)
- sectors which are unreadable (show as bad)
- plus monitors/reports the whole performance, temperature and possible status changes
which are completely ignored by other solutions.

About the questions specifically:

> If I run a SMART short test, and it is interrupted, how can I get it to successfully finish?
> I've tried changing power settings on my Win10, and it hasn't helped on the several drives I have tried.

Because the issue is probably not the hard disk drive - but the controller/chipset/driver etc.

> If I can never get a short SMART test to finish, does that mean the drive is bad?

NOT. Absolutely NOT.

> (Logically, if I can't run a test to check, then how will I ever know?)

You can verify the operation by the surface test: if it reports no errors and the Health % remains at 100% then you can be sure.

> Let's say I do get the short SMART test to complete on a drive at some point and
> move on to testing with the long SMART test, and it fails repeatedly to complete
> (like my experience mentioned above), does this equate to the drive being bad?

No.

> If a SMART short or long test completion is not a concern, should I just skip them and
> go directly to Disk -> Surface test -> Read + Write test every time I test a drive?

Yes, the best way is to use Disk -> Surface test -> Read test for a complete (safe) scan.
And if you want to be 100% sure, then yes, a write test can be also good idea to verify that all sectors can be both
- written
- read back
- and the data is same as we wrote

In the FAQ, there is a topic:
https://www.hdsentinel.com/faq.php#tests
Hard disk health is low or recently changed or I just installed a new (used) hard disk. How can I perform a deep analysis?
suggesting different type of tests, exactly to verify / confirm that the drive is perfect - or reveal any issue before we'd fill it.

This suggests the short/extended self tests, the Read test and the Reinitialise Disk Surface test too. The later is a really long (time consuming) test, especially on a high capacity drive, so maybe the Disk menu -> Surface test -> Write + Read test is a better option now.
But if you perform the Disk -> Surface test -> Read test - and it does not show error + the Health remains at 100%, then probably the write testing is not required.

dkhath · Post by **dkhath** » 2024.12.06. 16:35

Thank you, @hdsentinel. Your answers verify my assumptions. And thank you for taking the time to be thorough in your answers and explanation. I had not seen the quoted passage in the Help section, but I likely would have asked everything I did just to be 100% certain. Also, thank you and your team for your help and the software. It's the best Christmas present I've had in years--the best purchase I've ever made for PC support. I can't wait to test all of my hard drives--I still have every one since I bought a PC in 1997.

The mistake I made was assuming the SMART test in the new NAS was a substitute for the Bad Blocks test (a Linux utility, I just now discovered) that QNAP provides in its GUI.

A few more quick questions: if the power fluctuates during a read + write test or goes out during a test, is there a way to resume the test or must it be restarted (since the power fluctuation likely caused new problems)? I ask because this very thing happened to me yesterday due to construction in my area.

Post by **hdsentinel** » 2024.12.09. 11:45

Thanks for your kind words, really appreciated

You can restart any disk test from any position, exactly to allow us to restart, without the need to start from the beginning again.

Please refer to:

https://www.hdsentinel.com/kb/category/7/disk-testing-and-repairing/how-to-pauseresume-disk-testing-how-to-test-partial-area-of-the-disk-drive.html

If you need to stop the disk surface test, you can any time start a new surface test and continue where the previous test stopped: please open Disk menu -> Surface test and select the appropriate hard disk drive / SSD and testing method.
Then before starting the test, select the Configuration tab in this window and enable Limit testing to specific data blocks. Under this option you can specify first block where to start the disk testing. For example, if you specify 4000, then the disk test will start from 40%, if previously the first 40% tested (as each block represents 0.01% of the complete usable data area).

dkhath · Post by **dkhath** » 2024.12.09. 21:31

Thanks again for the answers on resuming tests.

I want to offer my test results for anyone who runs across this thread with similar issues and wants to see how things turned out.

I have finished testing one of my four Western Digital Red Pro 20TB WD201KFGX hard drives. It detected no issues. Here are the stats on the test and how long it took.

20TB Hard Drive
WRITE + read test
Write Test: 25 hours, 26 minutes to complete
Read Test: 25 hours, 20 minutes to complete
Total test time: 50:46

To reduce random issues unrelated to the test, I set up a dedicated Beelink Mini S (Windows 11) powered by a battery back-up device with an attached Insignia - 2-Bay HDD docking station (NS-PCHDEDS19).

I had two concerns initially that are not problems, so I will summarize here to put anyone else at ease if they have similar concerns.

Shaded Blocks
Having seen different patterns for blocks last in DOS with hard drive utilities decades ago, I was curious if the two blocks with different shades meant that there were potential issues down the road. HD Sentinel has stated [p=19387] that "some minor darker green blocks are possible during the surface test--it is normal."

Reset to device, \Device\RaidPort1, was issued. (UASPStor; ID: 129)
At some point, I turned on the option to monitor Windows' event log for errors (Configuration > Preferences > Advanced Options > Monitor Windows Event Log for problems related to disks and storage). This resulted in 626 errors of the title above through the course of testing the drive. HD Sentinel has stated [t=14591] that "these problems are not errors recorded/reported by the actual physical disk drive." These errors are specific to the computer and its operating system, not the hard drive itself. In my case, it may be the Insignia HD Dock (USB). I was not yet able to resolve this issue with the system I used for testing, but it appears that DOS and WinPE would not have these issues. The errors did not interrupt my testing, so I simply ignored them.

Hard Disk Monitoring

SMART Tests - Short + Long Questions

SMART Tests - Short + Long Questions

Re: SMART Tests - Short + Long Questions

Re: SMART Tests - Short + Long Questions

Re: SMART Tests - Short + Long Questions

Re: SMART Tests - Short + Long Questions