Page 1 of 1

Safe to use repaired drive? 6% Health (had semaphore timeout

Posted: 2020.02.23. 16:31
by wwcanoer
Is it reasonably safe to use this repaired drive for secondary backup?

When connecting a 6+ year old MyBook 4TB USB 3.0 WDBACW0040HBK-H1 3813B with Hitachi drive that had not been used for several years, it quickly reported 2224 weak sectors. (Only has 32 hours of use but was shipped in airline baggage. I was able to read almost everything on it when I verified that I had a second copy of everything on the drive.)

During the first repair test, got the error "the semaphore timeout period has expired." The test was stopped, computer rebooted and test repeated, with the error repeating at the same block (7196). So, I deleted the current partition, set to GPT and created 3 primary partitions: (REF: https://www.hdsentinel.com/forum/viewtopic.php?t=1950)
Partition 1. Good 0-7170 = 2,736,021 MB (now tests 100% good)
Partition 2. Error 7171-7229 = 22,129 MB (not tested, will not use)
Partition 3. Good 7230 to 9999 = 1,057,247 MB (now tests 100% good)

After doing re-initialization tests on Partitions 1 & 3, the drive is now:
6% health
637 bad sectors... moved to the spare area.
296 weak sectors... may be remapped any time.

Questions:

(Q1) Is it reasonably safe to use this drive for secondary backup?
(I will reset the counters in order to have better resolution of future weak sectors.)

(Q2) After a repair or reinitialization test, should weak/pending sectors be zero because they have all been tested and determined to be good, bad or damaged? I assume that the remaining weak sectors are because I did not run the tests on the area (partition 2) that gave the error.

(Q3) I feared that if there was physical damage, then re-testing the error area would risk expanding the error area. Is that a valid concern or should I try to repair this area? Is it correct for me to isolate the error area in a middle partition? Or any reason that I should not use any of the drive past the error? (i.e. not use partition 3)

(Q4) Re: "the semaphore timeout period has expired" error
(Q4.1) Once this error has occurred, will it continue for the rest of the drive or should I have let the repair test keep running?
(Q4.2) What is the typical cause of this error? It happened twice at the same block, so I assume that it is a problem with the drive itself. It has not recurred since.

=========
Drive history:
=========
2020-02-22 6:40:03 AM,#197 Current Pending Sector Count 312 -> 296
2020-02-22 6:40:03 AM,#196 Reallocation Event Count 635 -> 637
2020-02-22 6:40:03 AM,#5 Reallocated Sectors Count 635 -> 637

2020-02-21 12:05:15 PM,#197 Current Pending Sector Count 952 -> 312
2020-02-21 12:05:15 PM,#196 Reallocation Event Count 555 -> 635
2020-02-21 12:05:13 PM,#5 Reallocated Sectors Count 555 -> 635

2020-02-21 12:00:13 PM,#197 Current Pending Sector Count 1680 -> 952
2020-02-21 12:00:13 PM,#196 Reallocation Event Count 464 -> 555
2020-02-21 12:00:13 PM,#5 Reallocated Sectors Count 464 -> 555

2020-02-21 11:00:07 AM,#197 Current Pending Sector Count 1720 -> 1680
2020-02-21 11:00:07 AM,#196 Reallocation Event Count 459 -> 464
2020-02-21 11:00:07 AM,#5 Reallocated Sectors Count 459 -> 464

2020-02-20 9:09:30 PM,#197 Current Pending Sector Count 1744 -> 1720
2020-02-20 9:09:30 PM,#196 Reallocation Event Count 456 -> 459
2020-02-20 9:09:30 PM,#5 Reallocated Sectors Count 456 -> 459

2020-02-20 8:25:04 PM,#197 Current Pending Sector Count 1760 -> 1744
2020-02-20 8:25:03 PM,#196 Reallocation Event Count 454 -> 456
2020-02-20 8:25:03 PM,#5 Reallocated Sectors Count 454 -> 456

2020-02-20 3:53 PM - During repair test, "the semaphore timeout period has expired" error appears, let run until 6:27 PM and then stopped. (overlaid screenshot)

2020-02-20 7:25:07 AM,#197 Current Pending Sector Count 1736 -> 1760
2020-02-20 5:17:15 AM,#196 Reallocation Event Count 447 -> 454
2020-02-20 5:17:15 AM,#5 Reallocated Sectors Count 447 -> 454

2020-02-20 5:14 AM - During repair test, "the semaphore timeout period has expired" error appears and then stopped. (bottom screenshot)

2020-02-20 4:40:05 AM,#197 Current Pending Sector Count 1768 -> 1736
2020-02-20 4:40:05 AM,#196 Reallocation Event Count 435 -> 447
2020-02-20 4:40:04 AM,#5 Reallocated Sectors Count 435 -> 447

2020-02-19 9:23:04 PM,#197 Current Pending Sector Count 2120 -> 1768
2020-02-19 9:23:04 PM,#196 Reallocation Event Count 188 -> 435
2020-02-19 9:23:03 PM,#5 Reallocated Sectors Count 188 -> 435

2020-02-17 1:56:11 PM,#197 Current Pending Sector Count 2192 -> 2120
2020-02-17 1:56:11 PM,#196 Reallocation Event Count 160 -> 188
2020-02-17 1:56:10 PM,#5 Reallocated Sectors Count 160 -> 188

2020-02-02 5:04:10 PM,#197 Current Pending Sector Count 2224 -> 2192
2020-02-02 5:04:10 PM,#196 Reallocation Event Count 138 -> 160
2020-02-02 5:04:09 PM,#5 Reallocated Sectors Count 138 -> 160

2020-01-14 6:59:30 AM,#197 Current Pending Sector Count 0 -> 2224
2020-01-14 6:59:30 AM,#196 Reallocation Event Count 0 -> 138
2020-01-14 6:59:29 AM,#5 Reallocated Sectors Count 0 -> 138

Re: Safe to use repaired drive? 6% Health (had semaphore tim

Posted: 2020.02.24. 13:05
by hdsentinel
Thanks for your question.

Generally to be short: I'd not expect too much from a hard disk with 6% health only.
Usually such low health means multiple problems and/or high amount of problems displayed in the text description on the Overview page.
We can attempt to perform Disk Repair (which generally designed for less problems and higher health, eg. around 80-90%) and even use the Reinitialise Disk Surface test to stabilize and improve the general usability of the disk drive.
But we can't expect a drive with 6% health to work as a perfect, new hard disk.

Yes, maybe we can consider to use as a secondary backup and with proper partitioning to make smaller partition(s) and use only them to prevent accessing the area which previously showed damages, slower/unreadable sectors and so.
But we can't be 100% sure that the affected area would not be used by the disk (eg. during internal self-test) so I'd say that on a hard disk with 6% health only, you can expect new and new problems. Maybe not tomorrow or a week - but later.

So if you prefer to use it - then use ONLY with constant monitoring and immediately backup upon any new problem saved on the Log page (you can configure alert for this purpose by enabling Configuration -> Alerts -> When a new log entry is added).


> it quickly reported 2224 weak sectors.
> I was able to read almost everything on it when I verified that I had a second copy of everything on the drive.)

Generally if you do not attempt to read the weak sectors, then the disk drive seems usable. If no file(s) stored on the weak sectors, then you're lucky, then the files can be usually copied with no problems.

When weak sectors reported, Hard Disk Sentinel suggests this page:
https://www.hdsentinel.com/hard_disk_ca ... ectors.php
which gives details about such weak sectors, how they may appear and how to diagnose, reveal, fix them (both by Hard Disk Sentinel and in general).

As described, these are in many cases related to something else, not the hard disk itself. Cables, connections, insufficient power, sudden reset/removal are frequent causes of such weak sectors.
They can be usually fixed, repaired, for example by the Disk menu -> Surface test -> Disk Repair (if the amount of weak sectors is LOW, eg. only few or 10-20 reported). But if there are 1000's of such weak sectors reported, then you'd need to use the Disk menu -> Surface test -> Reinitialise Disk Surface test which designed to help when the health is so low and there are 1000's of such sectors reported. This Reinitialise Disk is destructive (clears all data) but if you could make a backup, then it can be useful.


> During the first repair test, got the error "the semaphore timeout period has expired." The test was stopped, computer rebooted
> and test repeated, with the error repeating at the same block (7196).

Yes, it is possible. It means that the hard disk can't perform this kind of repair as should and it is disconnected by the system.
This is usually not really related to the hard disk but caused by the USB adapter (if it's an external drive) or the disk controller driver which may not tolerate too long operation (recovering from an error) well - and a timeout happens and the disk is disconnected.

In this case, trying the same test again usually results in same error at the same position and you'd need to
- try to connect the hard disk differently: by a different USB adapter, USB docking station or connect directly to the motherboard SATA port (if possible). Different connection/different operation environment can help as it may give better error recovery for the disk drive
- after backup, try the mentioned Disk menu -> Surface test -> Reinitialise Disk Surface. This performs different kind of recovery / disk stabilization: while the Disk Repair test attempts to focus and recover the data, the Reinitialise Disk Surface performs like a "low level format". Can help to repair the disk drive (with no focus on the data) which can help if you have a backup.


> (Q1) Is it reasonably safe to use this drive for secondary backup?
> (I will reset the counters in order to have better resolution of future weak sectors.)

The weak sectors SHOULD BE repaired completely by the above methods. There should be no counter resets for those: the above mentioned tests should completely elimianate the weak sectors. Please try the above suggestions.
Ideally as a result, you should get any number of bad sectors (which are already reallocated, so they'll no longer cause problems as described at https://www.hdsentinel.com/faq.php#health ) but there should be no weak sectors at all.

If you could reduce from 2000+ to 200+ (so with 90%) then it is nice, but I'd try to attempt to reduce even more.

Then, if a further Disk menu -> Surface test -> Read test would show no errors, then yes, you can use the disk drive for secondary purpose and with monitoring to be informed about possible new issues, degradations.


> (Q2) After a repair or reinitialization test, should weak/pending sectors be zero

YES ;)

> because they have all been tested and determined to be good, bad or damaged?
> I assume that the remaining weak sectors are because I did not run the tests on the area (partition 2) that gave the error.

Yes, it is possible. The Reinitialise Disk Surface should be performed on the complete drive.


> (Q3) I feared that if there was physical damage, then re-testing the error area would risk
> expanding the error area. Is that a valid concern or should I try to repair this area?

Yes, it is absolutely valid concern. This is why (even if you make partitions to exclude the area from future use) we can't be 100% sure that the problematic area will be never touched by the drive. Mechanical issues (eg. small particles) can spread and can affect other parts, even "far" from the original bad area.
This is why it may be better to try to "stress-test" and force the detection, repairing, checking all possible problems now, revealing and stabilizing - before filling the drive with data.


> (Q4) Re: "the semaphore timeout period has expired" error
> (Q4.1) Once this error has occurred, will it continue for the rest of the drive or should I have let the repair test keep running?

No. The semaphore timeout error causes that the operating system disconnects the drive (as the device does not respond in time) so the rest of the test will fail (usually results in lots of red blocks, as the rest of the drive is no longer accessible).
You can verify: if you click on any block of the disk surface map (to view the contents of the sector) you'll only see an error that the sector can't be read (even sectors which were previously processed and displayed with green). This confirms that the disk drive is no longer accessible for the system (and for Hard Disk Sentinel).


> (Q4.2) What is the typical cause of this error?
> It happened twice at the same block, so I assume that it is a problem with the drive itself. It has not recurred since.

Usually it is not related to the disk drive but related to USB adapter/enclosure/docking station.
Trying the same drive with different USB adapter/dock can give different results.
Yes, the issue always happen at the same position: the very first weak sector detected - it is normal.