Hi Janos,
I'm having some troubles with a WD green drive (WD20EACS - 2TB). First sign of failure were accumulating raw read errors so I bought a new hdd and copied the data over (like usual with ddrescue). Over time current pending sector count started to increase and offline uncorrectable sector count also. So I opted for reinitialising the disk surface and that took around 4 days (with reinitialising level 3). This being done the reallocated sector count jumped up to 290 and the reallocation event count around 210 (don't really understand why they are not equal but that does not matter much). So far so good, that's what I actually wanted to happen, force the hdd to relocate failing sectors to spare sectors.
Problem is that suddenly the current pending sector count jumped to 65535!! So I did a next long surface test, this time with write + read, not to have to wait 4 days again... All seemed fine (no bad sectors found anyway, no new relocation) but the pending sector count stays on 65535. Presently I'm running again a reinitialising but only with level 1. So far more then 50% has been done (actually the part where the problematic relocated sectors were I believe) but nothing is changing at all in the SMART stats.
I have read that this 65535 (actually a binary number) has something to do with HDS which created a buffer underflow condition. I believe that it has nothing to do with actual (real) 65535 sectors at all. Very likely it's due to the fact that at prior to the reinitialising scan I used to offset the pending sector count with a negative value, in order to avoid unnecessary alarms to be triggered. I believe after the reinitialisation the real pending sector count dropped back to 0, thus combinde with the negative offset value it somehow jammed the SMART value...
What do you think? Is there a way to correct that wrong SMART stat report (besides simply offsetting)?
I know you will ask for a report... but I will send you only reports if I can check the content I send. I don't like this way you build in HDS so simply have something sent to you without being able to control what it is.
Thank you for your expert advises!
Current Pending Sector Count 65535
-
- Posts: 15
- Joined: 2014.09.17. 16:51
Current Pending Sector Count 65535
- Attachments
-
- hdsentinel failing hdd wd20eacs.jpg (68.88 KiB) Viewed 9647 times
- hdsentinel
- Site Admin
- Posts: 3128
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Current Pending Sector Count 65535
> This being done the reallocated sector count jumped up to 290 and the reallocation event count around 210
> (don't really understand why they are not equal but that does not matter much).
In most cases, they're not equal. The reallocation event counts the number of started reallocation events when one OR MORE sector need to be reallocated. One reallocation event can force more sectors to be reallocated. So the number of reallocation events can be less than the reallocated sectors count.
On the other hand, the reallocation event counter can be higher than the reallocated sectors count: when the reallocation can't be completed (for example due to unrecoverable error, eg. power loss) then the reallocation starts again and the event counter increases again.
> So far so good, that's what I actually wanted to happen, force the hdd to relocate failing sectors to spare sectors.
Yes, this is exactly the purpose of the Reinitialise Disk Sector test of Hard Disk Sentinel.
> Problem is that suddenly the current pending sector count jumped to 65535!!
> I have read that this 65535 (actually a binary number) has something to do with HDS which created a buffer underflow condition.
Generally, yes, this value is a -1, an underflow.
But I can confirm that the underflow is NOT related to Hard Disk Sentinel, but related to the hard disk itself: happened exactly "inside" the hard disk firmware which counts the problems.
Usually such underflow may happen when the hard disk can't properly update the proper counter in the S.M.A.R.T. or (during fixing of the sector) decreased the counter twice.
For example, if the reallocation starts but can't be completed (for example because of the above mentioned power loss) it may happen that the hard disk already decreased the amount of weak/pending sectors - but then, after power loss and re-started reallocation, the counter decreased again (while processing the same sector).
> I believe that it has nothing to do with actual (real) 65535 sectors at all.
Yes. The Disk menu -> Surface test -> Read test would show lots of problems when you'd really have so high number of weak sectors.
> Very likely it's due to the fact that at prior to the reinitialising scan I used to offset the pending sector count with a negative value,
> in order to avoid unnecessary alarms to be triggered. I believe after the reinitialisation the real pending sector count dropped
> back to 0, thus combinde with the negative offset value it somehow jammed the SMART value...
I can confirm that when you use the "Offset" in Hard Disk Sentinel, then it performs only a "logical" clear of errors. It does not actually re-write the S.M.A.R.T. error counter inside the hard disk: by the Offset, you only acknowledged the problems in Hard Disk Sentinel exactly as you wrote: to prevent displaying the error (or triggering alarm) because you know what happened, you confirmed that you want to be notified only about possible new problems.
The underflow happened inside the hard disk is completely different and the Offset has no effect in it.
> What do you think? Is there a way to correct that wrong SMART stat report (besides simply offsetting)?
Generally, the further testing you started is a good idea.
Such problems can be very hard to reproduce, I only encountered very few times. In some of these cases simple usage (after long time) caused that the hard disk found out that there is no real weak sector, and the error counter can be zero.
Intensive testing with Hard Disk Sentinel may make this more quick.
But in some cases yes, the counter remain at 65535 even after long time, yes, in such case, the best is if you use offset -65535 as then the counter will be zero and no such value displayed again.
> I know you will ask for a report...
Yes, this always help - especially upon such interesting conditions as they are hard to find and impossible to reproduce.
Such reports (especially about such special cases) always help reporting problems.
> but I will send you only reports if I can check the content I send.
You can any time check what you send.
Just open Configuration -> Send test report page and there you can create the raw developer report.
You can send there immediately - but if you prefer, you can save it (even edit if you wish) and send manually.
> I don't like this way you build in HDS so simply have something sent to you without being able to control what it is.
Originally only the above method was avaiable, exactly to allow checking (and saving) complete report if required.
Users asked a simpler, easier way - that's why the other method (Report menu -> Send test report to developer) is preferred, but you can still do as you fell it may be better for you.
> (don't really understand why they are not equal but that does not matter much).
In most cases, they're not equal. The reallocation event counts the number of started reallocation events when one OR MORE sector need to be reallocated. One reallocation event can force more sectors to be reallocated. So the number of reallocation events can be less than the reallocated sectors count.
On the other hand, the reallocation event counter can be higher than the reallocated sectors count: when the reallocation can't be completed (for example due to unrecoverable error, eg. power loss) then the reallocation starts again and the event counter increases again.
> So far so good, that's what I actually wanted to happen, force the hdd to relocate failing sectors to spare sectors.
Yes, this is exactly the purpose of the Reinitialise Disk Sector test of Hard Disk Sentinel.
> Problem is that suddenly the current pending sector count jumped to 65535!!
> I have read that this 65535 (actually a binary number) has something to do with HDS which created a buffer underflow condition.
Generally, yes, this value is a -1, an underflow.
But I can confirm that the underflow is NOT related to Hard Disk Sentinel, but related to the hard disk itself: happened exactly "inside" the hard disk firmware which counts the problems.
Usually such underflow may happen when the hard disk can't properly update the proper counter in the S.M.A.R.T. or (during fixing of the sector) decreased the counter twice.
For example, if the reallocation starts but can't be completed (for example because of the above mentioned power loss) it may happen that the hard disk already decreased the amount of weak/pending sectors - but then, after power loss and re-started reallocation, the counter decreased again (while processing the same sector).
> I believe that it has nothing to do with actual (real) 65535 sectors at all.
Yes. The Disk menu -> Surface test -> Read test would show lots of problems when you'd really have so high number of weak sectors.
> Very likely it's due to the fact that at prior to the reinitialising scan I used to offset the pending sector count with a negative value,
> in order to avoid unnecessary alarms to be triggered. I believe after the reinitialisation the real pending sector count dropped
> back to 0, thus combinde with the negative offset value it somehow jammed the SMART value...
I can confirm that when you use the "Offset" in Hard Disk Sentinel, then it performs only a "logical" clear of errors. It does not actually re-write the S.M.A.R.T. error counter inside the hard disk: by the Offset, you only acknowledged the problems in Hard Disk Sentinel exactly as you wrote: to prevent displaying the error (or triggering alarm) because you know what happened, you confirmed that you want to be notified only about possible new problems.
The underflow happened inside the hard disk is completely different and the Offset has no effect in it.
> What do you think? Is there a way to correct that wrong SMART stat report (besides simply offsetting)?
Generally, the further testing you started is a good idea.
Such problems can be very hard to reproduce, I only encountered very few times. In some of these cases simple usage (after long time) caused that the hard disk found out that there is no real weak sector, and the error counter can be zero.
Intensive testing with Hard Disk Sentinel may make this more quick.
But in some cases yes, the counter remain at 65535 even after long time, yes, in such case, the best is if you use offset -65535 as then the counter will be zero and no such value displayed again.
> I know you will ask for a report...
Yes, this always help - especially upon such interesting conditions as they are hard to find and impossible to reproduce.
Such reports (especially about such special cases) always help reporting problems.
> but I will send you only reports if I can check the content I send.
You can any time check what you send.
Just open Configuration -> Send test report page and there you can create the raw developer report.
You can send there immediately - but if you prefer, you can save it (even edit if you wish) and send manually.
> I don't like this way you build in HDS so simply have something sent to you without being able to control what it is.
Originally only the above method was avaiable, exactly to allow checking (and saving) complete report if required.
Users asked a simpler, easier way - that's why the other method (Report menu -> Send test report to developer) is preferred, but you can still do as you fell it may be better for you.