More Questionable Behavior HDS 4.40

Post here if you encounter any problems or bugs with the software.
Stubi
Posts: 38
Joined: 2011.03.17. 10:08

More Questionable Behavior HDS 4.40

Post by Stubi »

Since I deleted so many data based on the HDS messages yesterday already I will destroy the RAID 0 array and take out the problem disk. But before I had a further closer look at HDS.

Months ago I had a command timeout of 65537 - combined with other data transfer errors. Reason was an oxidized data cable in my RAID 0 array. At this time HDS did not report any problem. But INTEL RAID told me one day that my RAID array has severe data problems (I reported this here already - it was a total data loss of my RAID 0 array without any HDS warning despite questionable SMART values for a longer time already). The whole story you can find here (had to enter the link as code because I cannot enter a link and if i put it in the text it destroys the text).

Code: Select all

http://www.hdsentinel.com/forum/viewtopic.php?f=9&t=1753
After this I set the offset to -65537 so that I can check easily by hand if the value changes from now 0 again (HDS does not warn me automatically as I could experience).

Yesterday the disk reported the first time Uncorrectable Errors. I did not get a warning by the HDS software about this. Today I wanted to look how high this value has to go so that I will get a warning. So I entered -500 offset. The result is that I have now suddenly 4294966798 Uncorrectable Errors and I got a health warning that I have 65038 errors with data transfer (I guess it is talking about the old command timeout but this does not fit too since the offset is -65537).

Honestly I don't understand the reporting anymore. Please help me.

Please see the attachments for all this.
Attachments
Further Wrong Data Values.png
Further Wrong Data Values.png (18.39 KiB) Viewed 8393 times
Further Wrong Data Values 2.png
Further Wrong Data Values 2.png (19.87 KiB) Viewed 8393 times
User avatar
hdsentinel
Site Admin
Posts: 3115
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: More Questionable Behavior HDS 4.40

Post by hdsentinel »

> Months ago I had a command timeout of 65537 - combined with other data transfer errors.
> Reason was an oxidized data cable in my RAID 0 array. At this time HDS did not report any problem.
>

Code: Select all

http://www.hdsentinel.com/forum/viewtopic.php?f=9&t=1753
Thanks for your message and the information. As we discussed then, the reports and the details you provided helped to check this one (and further attributes) which are usually not critical, but yes, in specific situations can cause problems.
This is why (since your old post) Hard Disk Sentinel could improve and now detect and display problems - if found.

> After this I set the offset to -65537 so that I can check easily by hand if the value changes from now 0 again
> (HDS does not warn me automatically as I could experience).

Yes - older version did not warn for that specific attribute on that specific hard disk model.

> Yesterday the disk reported the first time Uncorrectable Errors. I did not get a warning by the HDS software about this.

As I asked previously in the other topic(s) you opened, can you please use Report -> Send test report to developer option?
I suspect on your system something may not work as expected.

> Today I wanted to look how high this value has to go so that I will get a warning. So I entered -500 offset.
> The result is that I have now suddenly 4294966798 Uncorrectable Errors and I got a health
> warning that I have 65038 errors with data transfer

As you can see, when that value changing, Hard Disk Sentinel reports in the text description and the health value immediately.
Just it would be nice to investigate that how (on your system) the underflow may occur (see the previous topic you opened), so when you set a -500 value, the counter should remain at zero, instead of under-flow to a high number. It's so interesting.

> (I guess it is talking about the old command timeout but this does not fit too since the offset is -65537).
No, I can make sure it's completely different - it is related to the "Uncorrectable errors count", where you just set the -500 value.

Thanks for the attachments - but the Report -> Send test report to developer option is the best way to check situations and advise.

Excuse me for the troubles - as soon as I'll know more from these reports, I can check what happened and how to fix/improve it, if the problem is related to the software.
Stubi
Posts: 38
Joined: 2011.03.17. 10:08

Re: More Questionable Behavior HDS 4.40

Post by Stubi »

hdsentinel wrote:Yes - older version did not warn for that specific attribute on that specific hard disk model.
HDS still cares only about the values with the green circle on the left side. There is no change in the latest version 4.40. At least this was my test result.
Stubi wrote:Thanks for the attachments - but the Report -> Send test report to developer option is the best way to check situations and advise.
I told you already in my other bug report that I deleted the RAID 0 arrays already because I took out the problem disk and sent it to Seagate. Until I will get a replacement I am using an emergency system.

The best thing will be to delete the scrap RAID 0 HDS records. If you remember I had the problem with the big gap at the SMART value graphs too. So with the new RAID 0 array I will make a reset of those data records for the 3 RAID 0 disks. I have many other disks (backup and so mostly offline). They do not seem to have such data problems. What do I have to delete to start new with the two existing disks when I add the new replacement disk?

There was a problem with HDS and an Intel Matrix RAID driver if I remember correctly. HDS had problems to work with this new driver version for a while. I had this problem too since there was no HDS fix for a while. Perhaps this is the reason for the data scrap?

Anyhow, thank you for your support.
User avatar
hdsentinel
Site Admin
Posts: 3115
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: More Questionable Behavior HDS 4.40

Post by hdsentinel »

> HDS still cares only about the values with the green circle on the left side. There is no change in the latest version 4.40. At least this was my test result.

I wrote in the other post, but I copy here also as it is very important:

I can confirm this is NOT TRUE of course.
The attributes (with green circle) are marked by critical by the manufacturers and also they significantly determine the hard disk health.
However, many other attributes can indicate problems and of course these are also examined and evaluated by Hard Disk Sentinel, reporting problems if the values _really_ indicate. This includes the attribute you mentioned and also several other attributes, which the manufacturers do not care about (eg. in other S.M.A.R.T. tools they would never indicate problems, even if their change may be important).


> The best thing will be to delete the scrap RAID 0 HDS records.

HDS does not record the RAID 0 itself, but records values for the hard disk drives (individually).
They can be removed any time if required, just (when Hard Disk Sentinel is not active) please delete the DISKDATA_ files for the appropriate hard disk(s) (the file names constructed by hard disk model, serial number, revision).
You may backup these files, just to be sure.

> If you remember I had the problem with the big gap at the SMART value graphs too.

Yes I remember. However, as we discussed, it is not a problem: the oldest value always kept and then the following days (after monitoring the drives for very long period of years) replaced by most recent values. This causes that the oldest value is displayed and then the graph is proportionally displayed and the most recent days (years) are displayed.
This may be changed to save statistics even for many more years ...

> So with the new RAID 0 array I will make a reset of those data records for the 3 RAID 0 disks. I have many other disks (backup and so mostly offline).
> They do not seem to have such data problems.

Yes, I suspect they may not be used as much as the RAID 0 disks - so statistics about these drives not saved (their records did not reach the point when the very old statistical values (except the oldest) discarded).

> What do I have to delete to start new with the two existing disks when I add the new replacement disk?

You may not need to do anything - when you install the new / replacement hard disk (which never used before in your system) it has automatically completely new records. It can't mixed with the statistics related to the previously used drives. Their records still kept - so if you prefer, by removing the DISKDATA files related to them, you can clear these statistics.

> There was a problem with HDS and an Intel Matrix RAID driver if I remember correctly.

Yes, there was serious bugs in some of the Intel Matrix drivers: periodically they reported invalid (random or all zero) values instead of the actual attribute values.
However, as soon as these released and detected, updated Hard Disk Sentinel versions published immediately.
Everywhere (on the index page, download page, facebook, twitter, etc... ) we showed that important updates available for the particular Intel drivers - to prevent confusion.

> HDS had problems to work with this new driver version for a while.
> I had this problem too since there was no HDS fix for a while.

So there was no long delay, the fixed version released as soon as possible.

> Perhaps this is the reason for the data scrap?

I'm not really sure if you mean "scrap" the gap in the S.M.A.R.T. graph (which can be normal if you monitor drives for very long time) or the under-flow when using the offsets (which would need to be investigated - as I could not reproduce the situation until now).

> Anyhow, thank you for your support.

Me thanks too about the information and your attention.
Sorry if I was boring or caused confusion with the answers, just wanted to check the situation with more details as then Hard Disk Sentinel can improve in all possible ways.
Post Reply