Hard Disk Monitoring

Hi,

I'm a long time (well, make it eight or ten years) user of HD-Sentinel. In all that time, I have hardly ever seen anything entered in the tabs "Log" or "Alerts".

Just now, I noticed from the daily status-email that one of my drives dropped in health (a low number of read errors). Still, there is no entry in either Log or Alerts. The lists are both blank.

What exactly will turn up in these lists? Does it have to be a degradation to "critical" level (I tend to avoid those by swapping drives long before, so that would explain why I've never seen any alerts)?

MfG

Please check the Help - which describes how the Log and Alerts page work:

https://www.hdsentinel.com/help/en/14_log.html
https://www.hdsentinel.com/help/en/18_alerts.html

The Log page should show changes, degradations (and possible improvements) in the most critical attributes.
For example new bad sectors, weak sectors and so. The list of the attributes depend on the actual disk type/model (the Help shows the basic list but there could be more, especially on new disk drives, SSDs).

Generally the degradation could be displayed there. If possible, please use Report menu -> Send test report to developer option about the particular drive, then I can check, verify its degradation.

The Alerts page shows only issued alerts related to the actual drive, when any alert enabled on the Configuration -> Alerts page.
For example, if an overheat, low disk health or similar alert configured and issued for the disk drive, then it should be listed there.
If no alerts configured (or the drive never triggered alert) then yes, this is empty.

Thank you for the explanation.

I've been watching the drive in question for a while now:

- it used to be at 100%
- one day it dropped to 98%
- about a week later it came back up to 100%
- now I just noted it again fell to 98%

In tab Overview it says:

***
Recently the following entries added to System Event Log:
4 errors, most recent: (Disk; ID: 7)
***

The Log tab is empty ("No problems logged".) The Alerts tab is also empty. I only learned about the drive from monitoring HDS' status emails.

Apparently not HDS detected errors on the drive, but Windows itself did. Since HDS now monitors the system event log, it displays those errors, yet it doesn't alert about them. When the errors get old (e.g. exceed the "7 days recent" classification), HDS "forgets" about them and returns health status to perfect.

Wouldn't it make sense to include those issues in either log or alerts?

Regards

> The Log tab is empty ("No problems logged".)

Yes, it is completely normal.
As I tried to explain, "The Log page should show changes, degradations (and possible improvements) in the most critical attributes.".

It means that logical errors (related to Windows installation, use of actual software and so) not listed there, as these logical errors are not really related to the physical device. I mean after a complete erase, reinstall, these errors are cleared, as these alone usually not indicate problems with the real disk status.
Yes, they may cause troubles with the actual file system, so these may be detected and reported.

But these are logged in the Windows Event log, not listed on the Log of the physical disk drive in Hard Disk Sentinel.

> The Alerts tab is also empty.

Yes, as I also wrote, it is completely normal if you may not configured any ALERT on Configuration -> Alerts page for the particular hard disk which triggered and alert issued. For example overheat, low health and so.

> I only learned about the drive from monitoring HDS' status emails.

Yes, if no alert is configured, it is completely normal that alert not issued, so then possible problems may be displayed in Hard Disk Sentinel (and also included in daily status reports if that option enabled).

> Apparently not HDS detected errors on the drive, but Windows itself did.

Yes, because it seems there is no physical problem (yet) with the drive, but Windows logged some issues.
These are in most cases related to some driver issue or minor incompatibility / timing issue, so personally I'd check these first.
Also it may be good idea to perform some tests ( https://www.hdsentinel.com/faq.php#tests ) to reveal any possible new problems - or confirm if the status of the hard disk is really correct.

> Since HDS now monitors the system event log, it displays those errors, yet it doesn't alert about them.

It does alert them of course: if you configure a "low health" alert and the health drops below the specific threshold, you'll immediately get the alert (for example in e-mail) and also the Alerts page will show that alert issued it of course.
Just maybe the small % decrease was not enough to reach this low health threshold now.

You can enable alert on low health at Configuration -> Alerts page.
Also you may adjust the threshold at Configuration -> Thresholds / Tray Icon page (or if you prefer to have custom strict threshold for this particular drive, double click on the Health bar on the main window when the disk drive selected).

Then alert will be issued of course, even if the health decreased by a real problem with the device (which is surely related to the disk itself, which "remain" after complete reinstallation) and/or by a logical problem saved in Windows Event Log.

> When the errors get old (e.g. exceed the "7 days recent" classification), HDS "forgets" about them and returns health status to perfect.

Not 7 days (as it is much longer) but yes, depending on the problems, older issues are automatically removed as they may be no longer applicable.

> Wouldn't it make sense to include those issues in either log or alerts?

These included in Windows Event Log.
Sometimes (for example related to problematic driver of the disk controller) there can be 1000's of such alerts in the Windows Event Log.
I see no point in copy/paste them to Hard Disk Sentinel (especially as these may be not really related to physical disk and status), this is why Hard Disk Sentinel list only their count and by the health decrease, it indicates that something happened which may need attention.
Yes, if configured, alert is issued of course.

If you use Report menu -> Send test report to developer, I'd be more than happy to check the actual REAL physical disk status and also verify the event IDs logged in Windows Event Log.
Analysing these together helps to check and verify the correlation, examine what may exactly happened - and this always give thoughts for future improvements about both logging/displaying and also determining/reporting status (for example to verify if in this case the events should be considered more seriously).

Ok, new development:

I noticed that the disk in question was showing a rising number of "Raw Read Error Rate" (about 5000 and climbing.) There were no reallocated sectors. The "Health" as indicated by HDS was still 100% though (save for the sporadic messages in event log when HDS-health kept dropping to 98% and going back up a few days later.)

Still, I felt uneasy about the steadily climbing number (just didn't feel right) and replaced the drive. Replacement went fine, there were no errors indicated during "retiring" the drive from its Windows Storage Spaces pool.

When I examined the drive afterwards first thing I tried was to start a self test: All three tests failed. When I tried to start a surface test, there was an error message on the very first sector "couldn't start surface test" (or similar.)

Bottom line - the drive appears to be dead.

HDS showed the indications for it (the SMART readouts with the climbing error rate), but didn't even reduce its health estimate and reported no warnings. Of course, the SMART value was well above its threshold, but a continually rising error rate should have rung a few bells.

Regards

Sounds so interesting.

Can you please use Report menu -> Send test report to developer option? Would be nice to see the status of the drive if it still starts at least (or its last known status if you can reload by using File menu -> Open status of offline disks).

Without the model ID, firmware version, hard to say for sure, but generally ONLY the "Raw Read Error Rate" change may not be enough to indicate problems (some drives even have millions there).
But would be nice to see the complete response of the disk, examine the situation to increase attention to this attribute (or some others if they may show anything) for this particular model.

If the drive was generally working (apart from this increase and the events logged to Windows Event Log and detected by Hard Disk Sentinel), such sudden, complete failure sounds very interesting. Not sure, but may be related to ESD, mechanical shock or similar (not sure if such may happened), so would be nice to check its status to verify what may happened and if possible, modify the status reporting of course.

Hi,

I just submitted a report (I didn't know about this display-offline-disk-data feature yet - neat!)

The drive in question is this one:

model: WDC WD30EFRX-68EUZN0
SN: WD-WMC4N1281324

If it's of value, I can submit a report of the drive now (including the FAIL status for the three self tests) using another machine. Would that be helpful?

Also, a second drive seems to start its downward progress: Its "Raw Error Read Rate" is only at 115 (decimal notation, not SMART attribute), but it's starting to climb as well. The drive is still mounted, but in an external enclosure, so the data is not as complete as the first one:

model: WDC WD40EFRX-68WT0N0
SN: WD-WCC4E1ZEXCZ1

I plan on watching this drive for a while still before replacing it.

Regards

Thanks for the report, yes, I see.

Apart from that, the drive did not report big problems, so I wonder what happened with it.

Yes, if possible to connect it, I'd be happy to see a recent report too where the disk tests failed, just to check the failure code.

Personally I'd surely try to run a Disk menu -> Surface test -> Read test and also a Disk menu -> Surface test -> Reinitialise disk surface test.
Would be curious about its result, to check if the data area could be read/written in general.

Personally I have some WD drives where Raw Read Error rate is over 5000 with absolutely no problems with them (all tests passed with no errors, the disk is fully usable).
This is why I'm so curious to investigate this particular situation, examine the functionality, possible errors / issues with your disk drive.

Thanks for increasing attention for this special case.

I connected the drive again and did some further tests:

This time, a surface test/disk repair did not refuse to start, but detect some errors and reportedly correct them, right at the start of the test (sector 0). I aborted that test and started a surface test/reinitialize. This test did not report any more errors for the first few percent of the disk, after which I also aborted it.

But the best: After these tests, both the "short self test" and the "conveyance self test" completed without errors. The "extended self test" ran for a few minutes without errors before I cancelled it (I ran out of time). All of those tests failed before.

As it stands now, all that's changed is that "raw read error rate" climbed some more, from 5073 to 5628. I'll be running a full surface test when I get to it and see what happens.

I sent a "report to developer" just FYI.

Regards

Thanks for the information and the new report.

From what you wrote, I'm sure that the contents of sector 0 (the MBR) damaged somehow previously.
Do you remember if power failure, power loss, disconnect or similar happened long time ago (when problems started)?
Then the contents of that sector lost, it was not readable and caused the troubles.

In such cases, usually weak sector is detected and reported by Hard Disk Sentinel and the drive seems not working at all: there may be delays or even problems with partitioning, formatting and so.
And yes, there is good chance that the manufacturer-specific tests (Disk menu -> Short self test, Extended self test) function fail.

Yes, if the Disk Repair test could quickly identify and repair this particular sector - and there is no further issue - the drive is likely repaired, stabilized and should work correctly.

Yes, as described at
https://www.hdsentinel.com/hard_disk_ca ... ectors.php
after that, these hardware self tests may also work and report no problems at all.

Just to make sure, I'd also surely complete the Disk menu -> Surface test -> Reinitialize Disk Surface.

Maybe I'd run numerous times: once to surely perform the complete surface, then record the "raw read error rate" (or you may send report again just to check).

Then, I'd perform this test again, just to verify if there is any further change, increase or not. If the drive is stable, all sectors should work properly - and that number (and the health % of course) should not change.

Hard Disk Monitoring

Log/Alerts

Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts

Re: Log/Alerts