Page 1 of 1

[BUG?] Incorrect Health value with Even ID 129

Posted: 2020.03.25. 12:09
by unknown14725
I am seeing HD Sentinel's "Health" value dropping whenever there is reported Event ID 129 "Reset to device, \Device\RaidPort1, was issued."

I had this happening a few times on a 4 year old HDD and the "Health" value dropped down to 97% over a couple of days period, so I figured it might be an issue with the age of the drive. I replaced the drive with a new one, but unfortunately Event ID 129 is still happening. And the new drive's Health quickly dropped down to 98%.

It seems a bit weird that system errors affect the drive's Health value. Is it supposed to behave this way? If I fix the problem with this controller, I assume there is no way to restore the Health value to 100%. So it seems a bit flawed.

Image

Re: [BUG?] Incorrect Health value with Even ID 129

Posted: 2020.03.25. 15:39
by hdsentinel
Of course NOT A BUG, but exactly the opposite: it is a feature, designed exactly to reveal possible problems, related to the storage subsystem.
Sorry, but I do not really understand why you think it is a bug/problem - it is NOT.

It is an optional function/feature, which you enabled: see the option at Configuration -> Advanced options -> Monitor Windows Event Log for problems related to disks and storage.

If this option enabled, then yes, it is completely normal and expected that disk-specific events (related to a particular drive) are noticed and to suggest the problem, the health value is also affected as you can see.

And exactly because in some cases such issues are not (only) related to the disk drive (but sometimes related to the disk controller, cables/connections) even a different drive in the same position may be affected, as you can see.

> I assume there is no way to restore the Health value to 100%.

Sorry, seems you did not read the text description. It clearly shows that running a Disk menu -> Surface test -> Repair test clears such issues and restores the health to the original value. The graph on the bottom may still show the lower value today (because it shows the daily lowest value, as described in the Help too).

If you do not prefer to be notified about such issues - then please disable the option you previously enabled (which means that you WANT to be notified about possible issues related to the storage subsystem).
Not sure why you say "bug" when things work exactly as should...

Re: [BUG?] Incorrect Health value with Even ID 129

Posted: 2020.03.25. 16:22
by unknown14725
Well, here's my reasoning for thinking it's a bug:

1) The documentation does not mention or explain that Event Log entries will affect the Health value of a device. Nor does the documentation explain that the disk tests will restore the Health value to 100% if there are controller/drivers problems registered in the Event Log that has caused the value to drop.

Ref: https://www.hdsentinel.com/help/en/52_cond.html
Ref: https://www.hdsentinel.com/help/en/30_c_adv.html
Ref: https://www.hdsentinel.com/help/en/62_testfaq.html



2) The software interface does not indicate or explain that running a "Disk repair" will restore the Health value to 100%.

Ref: "The following function can be used to improve the situation: Disk -> Surface Test -> Disk repair
At this point, warranty replacement of the disk is not yet possible, only if the health drops further
."

Ref: https://i.imgur.com/Ajb6AId.png

In fact, this would indicate that HD Sentinel has registered this error as an actual permanent device error, and not a controller/driver error. Take particular note of the wording in the bottom sentence in the quote above. Why would warranty replacements, and surface tests be mentioned together like this if it's "only" an Event Log entry? You see why it looks like a bug? (Why is even warranty mentioned?)



So yes, the counter-intuitive text in the interface combined with the lack of documentation of how certain features behave will result in people believing a "feature" is in fact a bug. If you want something to be a "feature", you need to document this. Or else "strange" and "unexplained" behaviour will be regarded as a bug ;)

Re: [BUG?] Incorrect Health value with Even ID 129

Posted: 2020.03.25. 17:27
by hdsentinel
Thanks!

Sorry, but as you can see, the documentation clearly describes the option: how it works, what it does and how the reported problems may be displayed and the disk health may be affected.
Also mentions that the Disk -> Surface Test -> Disk repair function can be used to improve the situation (exactly as displayed in the text description too, because as we know, most users never read the documentation).

https://www.hdsentinel.com/help/en/30_c_adv.html

Monitor Windows Event Log for problems related to disks and storage: By this function, while running, Hard Disk Sentinel detects if there is any issue related to the storage subsystem, reported by the disk controller driver or Windows itself (independently from the self-monitoring status of the drive). This includes problems related directly to hard disk drive (eg. bad sectors) - but also related to something else (cables, connections) which may cause bus reset, retries, communication problems or other issues. Such problems reported in the health and the text description, for example:
Recently the following entries added to System Event Log:
56 warnings, most recent: The IO operation at logical block address 8fa0 for Disk 9 (PDO name: \Device\0000009d) was retried. (disk; ID: 153)
The following function can be used to improve the situation: Disk -> Surface Test -> Disk repair


> 2) The software interface does not indicate or explain that running a "Disk repair" will restore the Health value to 100%.

The text description indicates that.
As there were problems detected (and logged in the Windows Event Log), we can't be SURE that after the repair, the health will SURELY restore to 100%
It surely removes the previously recorded events so the health will definitely improve (ideally back to 100% if the test does not find further issues).


> In fact, this would indicate that HD Sentinel has registered this error as an actual permanent device error, and not a controller/driver error.

Sorry but as described in the help these MAY be device errors - but there MAY be other errors too. This is why the name of the option called "... problems related to disks and storage".


> Why would warranty replacements, and surface tests be mentioned together like this if it's "only" an Event Log entry?

"Warranty" mentioned exactly for some users to prevent "panic": when a minor, even 1% health decrease, some users may immediately ask for service / warranty replacement.
The information displayed should indicate that such issues are NOT pre-failure conditions.

The Disk Repair function (as a generic repair function) can be used to clear these recorded events too, this is why recommended to "improve the status" (yes, ideally back to 100%).


> You see why it looks like a bug? (Why is even warranty mentioned?)

No, exactly because of the above, I do not really understand ....
You enabled an option - which works exactly as should: detect and report problems - and then you say it is a bug? ;)
The Help describes the details. Both the cause and the solution too explained in both the user interface and the Help....

Re: [BUG?] Incorrect Health value with Even ID 129

Posted: 2020.03.25. 20:48
by hdsentinel
FYI: there is a similar topic in the Questions section:

https://www.hdsentinel.com/forum/viewto ... 48&p=16691

where discussed that "NO, of course these problems are NOT PERMANENT".

If you prefer to quickly and easily restore the health back to 100% and remove the related event from the text description, you can do the following (mentioned there):

1) completely close Hard Disk Sentinel
2) delete the files DISKDATA_xxx.WED and DISKDATA_xxx.WEL files from the folder of the software (where xxx is related to the appropriate hard disk drive: the file name contains the disk model ID, firmware version and serial number).
3) when you launch Hard Disk Sentinel again, the problems will be no longer displayed and the health will be 100% again (assuming that there are no other problems).
Also you can disable the option to prevent detection / notification of possible future events.

However, this does not clear the events from Windows Event Log itself and (of course) does not prevent further events to be logged there. So I'd more recommend to try investigating the real cause of the events and attempt to avoid them.

Common causes can be
- not correct / older chipset driver (if the hard disk drive connected to the motherboard chipset)
- issue with cables/connections or power source which may cause a temporary communication problem (which required reset)
- sudden power loss, improper shutdown / disconnection (in case of an external drive)

These events usually do not cause big troubles - but in some situations, they can cause troubles/system errors which are otherwise hard to detect/reveal. This is the purpose of the event monitoring.