Two Total HD/Data Losses Without Warning by The Software
Two Total HD/Data Losses Without Warning by The Software
Recently I had two severe HD problems. The first HD was in a notebook. HD Sentinel showed always perfect conditions. What it did not care about was an increasing disk shift. The result of this was a complete useless HD after a while - it just lost suddenly all data. But HD Sentinel showed always a green icon and reported everything is perfect - until the HD died within minutes.
Then I have a RAID 0 setup on my desktop PC with 3x1TB disks. HD Sentinel showed always perfect conditions. It did not warn that the command timeout was increasing and finally the CRC error count too. Result was that Intel RAID got data problems - in the end all data were lost. The reason was a contact problem of a SATA cable.
Now I begin to ask myself if the concept of HD Sentinel is correct - I had two total losses without any warning. The software monitors certain values programmed in the software. I am not able to tell the software to monitor other values and create an alarm if they change by a certain amount. At least I could not find a setting to do this. Is this correct that I cannot set up anything like this? If so I can end up with the same disaster at any time again or I have to check the SMART values myself on a daily basis. But honestly - if I have to check manually the SMART data all the time then I do not need the green icons on the desktop that show "everything is perfect" even if this statement is wrong. But perhaps I did not find the settings to tell the software what parameters it should monitor and create an alarm if necessary.
Then I have a RAID 0 setup on my desktop PC with 3x1TB disks. HD Sentinel showed always perfect conditions. It did not warn that the command timeout was increasing and finally the CRC error count too. Result was that Intel RAID got data problems - in the end all data were lost. The reason was a contact problem of a SATA cable.
Now I begin to ask myself if the concept of HD Sentinel is correct - I had two total losses without any warning. The software monitors certain values programmed in the software. I am not able to tell the software to monitor other values and create an alarm if they change by a certain amount. At least I could not find a setting to do this. Is this correct that I cannot set up anything like this? If so I can end up with the same disaster at any time again or I have to check the SMART values myself on a daily basis. But honestly - if I have to check manually the SMART data all the time then I do not need the green icons on the desktop that show "everything is perfect" even if this statement is wrong. But perhaps I did not find the settings to tell the software what parameters it should monitor and create an alarm if necessary.
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Two Total HD/Data Losses Without Warning by The Software
Thanks for your message and excuse me for the troubles you experienced.
I can make sure our goal is to detect and report all kind of problems. However, not all issues are the same: some problems, conditions are harder to detect. In this situation yes, Hard Disk Sentinel need to diagnose the actual status and report based on that. However, minor changes of some attributes - if there are no other actual problems reported (for example, no spin up / surface related / seek element related issues) the minor change may not affect the health value as you can see.
In this situation, the Report -> Send test report to developer option plays a VERY IMPORTANT role. These reports always help to analyse such special situations and espeically how to examine some special attributes, their change and how they are (and should be) related to failure prediction, especially for drives from different manufacturers.
These reports allowed to monitor and report special failures for particular drives and this is how Hard Disk Sentinel is constantly improving, just like if you can send "suspicious" files to anti-virus labs for analysing.
Disk shift is very a special attribute and now Hard Disk Sentinel warns only on high amount of reported issues, according the previous reports. We're just investigating it to refine how it could be used to detect possible problems, especially with the relation of other attributes. The report(s) would help to investigate for this particular hard disk model.
The other issue (even sounds weird) is not really a hard disk related problem. As you can see, the connections / cables caused troubles. Hard Disk Sentinel surely reported the cable/connection issues (giving the opportunity to check cables/connections) - however, these are related to the operating environment of the drive. Dropping the health of the drive (which works correctly, without problems) would not be a good idea in this case.
Cable/connection issues caused troubles many time in IDE/ATA world. For IDE/ATA hard disk, Hard Disk Sentinel even displays more warning : In case of a sudden crash or reboot it is recommended to try a different, short data cable (avoid round cables, use 80 wire standard cables instead).
SATA controllers usually tolerate the situation better, but especially in RAID configurations, such connection issues can cause unsynchronized RAID array, exactly as you experienced.
However, I agree that maybe with the total amount of command timeouts and transfer error errors, more noticeable warning could be displayed.
I'm sorry again for the issues you experienced. However, I can make sure Hard Disk Sentinel is constantly improved by the new and new reports which always help to examine how status of the drive(s) change and how special attributes (and especially, the relation between them) could be used in prediction of possible problems.
The idea about creating custom alerts for attributes sounds interesting and it is possible that it will be added to a later version.
I can make sure our goal is to detect and report all kind of problems. However, not all issues are the same: some problems, conditions are harder to detect. In this situation yes, Hard Disk Sentinel need to diagnose the actual status and report based on that. However, minor changes of some attributes - if there are no other actual problems reported (for example, no spin up / surface related / seek element related issues) the minor change may not affect the health value as you can see.
In this situation, the Report -> Send test report to developer option plays a VERY IMPORTANT role. These reports always help to analyse such special situations and espeically how to examine some special attributes, their change and how they are (and should be) related to failure prediction, especially for drives from different manufacturers.
These reports allowed to monitor and report special failures for particular drives and this is how Hard Disk Sentinel is constantly improving, just like if you can send "suspicious" files to anti-virus labs for analysing.
Disk shift is very a special attribute and now Hard Disk Sentinel warns only on high amount of reported issues, according the previous reports. We're just investigating it to refine how it could be used to detect possible problems, especially with the relation of other attributes. The report(s) would help to investigate for this particular hard disk model.
The other issue (even sounds weird) is not really a hard disk related problem. As you can see, the connections / cables caused troubles. Hard Disk Sentinel surely reported the cable/connection issues (giving the opportunity to check cables/connections) - however, these are related to the operating environment of the drive. Dropping the health of the drive (which works correctly, without problems) would not be a good idea in this case.
Cable/connection issues caused troubles many time in IDE/ATA world. For IDE/ATA hard disk, Hard Disk Sentinel even displays more warning : In case of a sudden crash or reboot it is recommended to try a different, short data cable (avoid round cables, use 80 wire standard cables instead).
SATA controllers usually tolerate the situation better, but especially in RAID configurations, such connection issues can cause unsynchronized RAID array, exactly as you experienced.
However, I agree that maybe with the total amount of command timeouts and transfer error errors, more noticeable warning could be displayed.
I'm sorry again for the issues you experienced. However, I can make sure Hard Disk Sentinel is constantly improved by the new and new reports which always help to examine how status of the drive(s) change and how special attributes (and especially, the relation between them) could be used in prediction of possible problems.
The idea about creating custom alerts for attributes sounds interesting and it is possible that it will be added to a later version.
Re: Two Total HD/Data Losses Without Warning by The Software
Thank you for your response.
Yes, this would be great if I could set for smart data that are important for my situation an alarm value like you have it for the temperature for instance (but not there in the settings). It should be possible to enter such values where you can correct the existing SMART values anyway - like the offset for instance. And here just one more column for those alarm values. So everyone could control all values that seem to be important to him. Of course you should provide an auto monitoring how it is now if the user is not able to understand how to handle those settings. But only if he wants it or does not care.
Since those values are different for all disks these settings should be on the disk level. For instance take the temperature alarm how it is now - I have more than 10 disks. Many of them have a different temperature range because the are a different brand or older or a different disk format - like desktop or notebook disk in external boxes. Older disks and disk in external boxes go higher than disk that are internal and have a fan. What does 1 temperature warning setting really bring? Not all disks are the same. So this alarm value should be on the disk level too. How it is now it does not bring anything if you have many different disks.
So a possibility to set alarm values on the disk level for each SATA value would be my feature requests.
Yes, this would be great if I could set for smart data that are important for my situation an alarm value like you have it for the temperature for instance (but not there in the settings). It should be possible to enter such values where you can correct the existing SMART values anyway - like the offset for instance. And here just one more column for those alarm values. So everyone could control all values that seem to be important to him. Of course you should provide an auto monitoring how it is now if the user is not able to understand how to handle those settings. But only if he wants it or does not care.
Since those values are different for all disks these settings should be on the disk level. For instance take the temperature alarm how it is now - I have more than 10 disks. Many of them have a different temperature range because the are a different brand or older or a different disk format - like desktop or notebook disk in external boxes. Older disks and disk in external boxes go higher than disk that are internal and have a fan. What does 1 temperature warning setting really bring? Not all disks are the same. So this alarm value should be on the disk level too. How it is now it does not bring anything if you have many different disks.
So a possibility to set alarm values on the disk level for each SATA value would be my feature requests.
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Two Total HD/Data Losses Without Warning by The Software
Thanks for the suggestion, yes, I agree that it would even add a higher level of flexibility for controlling / managing / inspecting S.M.A.R.T. values in addition to the current features (eg. offsets, custom temperature thresholds and so).
The next major version (4.20) is already finalised and will be released soon - but I can make sure that after that, a future version will surely have such feature.
Anyway, if you have the opportnity about sending report(s), those may help - to check and issue alarm "out of the box".
The next major version (4.20) is already finalised and will be released soon - but I can make sure that after that, a future version will surely have such feature.
Anyway, if you have the opportnity about sending report(s), those may help - to check and issue alarm "out of the box".
Re: Two Total HD/Data Losses Without Warning by The Software
Just one more feature request in connection to this problem. I am not able to see any further changes at the moment. The problem occurred some days ago (attached the graph of just one of the problem values). But the graph display just shows as last value the value from today but not the value when I had the problem. So I am not able to see if there was a further change or not in the values. Because of this a possibility to save the graph data for a certain date would be great. As it is now I just can make a screenshot with another tool and can compare it to the actual SMART data. The date 19.12.12 does not help anything. I cannot see if the values got worse since I had the problem - and this was only some days ago. I just see I have a problem today (12.01.13) - what is not correct at all since I fixed it already. So the value 2409 remained constant since I fixed the cable problem some days ago and so there is no problem anymore - just the old value.
So with all the problems together (mentioned before too) I just can use any software that shows SMART data and compare them to my screenshots of the SMART data I created before. HD Sentinel does not provide much help even with controlling a further change in the values. Somehow I am a little bit surprised that I am the first one to have such problems. But perhaps I did not find a setting - if so please let me know it.
So with all the problems together (mentioned before too) I just can use any software that shows SMART data and compare them to my screenshots of the SMART data I created before. HD Sentinel does not provide much help even with controlling a further change in the values. Somehow I am a little bit surprised that I am the first one to have such problems. But perhaps I did not find a setting - if so please let me know it.
- Attachments
-
- 2013-01-12_125115 HD Sentinel.jpg (39.88 KiB) Viewed 22094 times
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Two Total HD/Data Losses Without Warning by The Software
Did you use the "Offset" field for the "Ultra ATA CRC Error count" attribute after fixing the data cable?
As you may know, the internal counter of the errors do not automatically re-set to zero as the hard disk itself "does not know" that the data cable may be changed, replaced or fixed by any means.
So after fixing the problem, you would need to manually re-set the error counter by using the Offset field, to specify -2409 in this situation to clear the error counter to zero again.
That would result that the errors will be no longer reported in the text description and both the current S.M.A.R.T. page and the graph on the bottom should show the new value (0 in this case) and only increase if there may be any future problem.
Have you used the "Offset" value to clear the error counter? If not, you can still see the previously collected number of data transfer errors on the graph, which is normal.
Again, the Report -> Send test report to developer option may be useful to check the actual situation.
Currently the amount of displayed values on the graph is limited to show the trend of changing values on different days between the current date and the first day you used the drive.
In a future version it will be possible to examine the change of the attribute with more details by viewing more data points and exporting the complete set of data collected for any attribute.
As you may know, the internal counter of the errors do not automatically re-set to zero as the hard disk itself "does not know" that the data cable may be changed, replaced or fixed by any means.
So after fixing the problem, you would need to manually re-set the error counter by using the Offset field, to specify -2409 in this situation to clear the error counter to zero again.
That would result that the errors will be no longer reported in the text description and both the current S.M.A.R.T. page and the graph on the bottom should show the new value (0 in this case) and only increase if there may be any future problem.
Have you used the "Offset" value to clear the error counter? If not, you can still see the previously collected number of data transfer errors on the graph, which is normal.
Again, the Report -> Send test report to developer option may be useful to check the actual situation.
Currently the amount of displayed values on the graph is limited to show the trend of changing values on different days between the current date and the first day you used the drive.
In a future version it will be possible to examine the change of the attribute with more details by viewing more data points and exporting the complete set of data collected for any attribute.
Re: Two Total HD/Data Losses Without Warning by The Software
I used the offsets for a while. I do not know how this is at this version. But before they were stored in the registry. My Sentinel software is on drive D. Whenever I restored the OS on the C partitions all the offset values were often not correct anymore since they were from the last system backup. And so I gave up using the offsets. Are they stored in the folder where the software is now or still in the registry?
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Two Total HD/Data Losses Without Warning by The Software
It seems it is not configured now - at least the graph shows the high(er) value, not the cleared 0.
By default, yes, I can confirm the offsets are stored in the registry.
However, you can any time use Configuration -> Advanced options -> Change folder to store statistics and settings button to specify any folder on any drive to store all settings, statistics, details in that folder. This means that the registry is also re-directed to a file in that folder.
(it is especially useful if the OS and the software is on an SSD / CF card / Disk-On-Module device and writes to that device should be minimized).
By default, yes, I can confirm the offsets are stored in the registry.
However, you can any time use Configuration -> Advanced options -> Change folder to store statistics and settings button to specify any folder on any drive to store all settings, statistics, details in that folder. This means that the registry is also re-directed to a file in that folder.
(it is especially useful if the OS and the software is on an SSD / CF card / Disk-On-Module device and writes to that device should be minimized).
Re: Two Total HD/Data Losses Without Warning by The Software
I have the settings that the data are stored in the folder now (I complained about the registry before already. Then this feature came in a later version). But I had the impression that this does not work with the offset values. So I will check again. Takes a while to find out since I do not want to make an OS restore. But if you are sure that the offsets too are stored in the folder now I will enter them again.
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Two Total HD/Data Losses Without Warning by The Software
I can confirm that the offsets are also re-direted if the data folder is set there.
If you prefer, you may configure the data folder to save settings and there you can see a REG.DAT file which contains everything should be saved to the registry. If you set the offset and later manually open the REG.DAT file (for example in notepad) you can find the configured offset there. Or if you can send that file to info@hdsentinel.com , I can look up and verify+confirm if the setting really saved there.
If you prefer, you may configure the data folder to save settings and there you can see a REG.DAT file which contains everything should be saved to the registry. If you set the offset and later manually open the REG.DAT file (for example in notepad) you can find the configured offset there. Or if you can send that file to info@hdsentinel.com , I can look up and verify+confirm if the setting really saved there.
Re: Two Total HD/Data Losses Without Warning by The Software
For a while I created my own registry data rescue file. But often I forgot and then the new version of this software came out where I could redirect the data to the folder. But I did not check the offset anymore since it kept me busy so often to re-enter them (and I have many disks).
So thank you for your information. I will start to use the offset values again. Just entered them
So thank you for your information. I will start to use the offset values again. Just entered them