Best procedure if damaged sectors are found with error 1117?

How, what, where and why - when using the software.
BlueDragon
Posts: 15
Joined: 2014.09.17. 16:51

Best procedure if damaged sectors are found with error 1117?

Post by BlueDragon »

Hi,

Starting this new thread because I would be glad to have your expert opinion about this problem. :)

This HDD is a WD green drive actually coming from a MyBook enclosure which I had to crack open a couple of days ago. In the history of this MyBook which originally had a USB + eSata + Firewire interface there have been a couple of incidents. I bought it around 2008 and last year I had the first real troubles / failure. After the big shock thinking I had lost 1TB of (only very partially backed up data) I tried all I could to find out what exactly happened. It took me quite a while of trial and error before coming to the conclusion that actually the power adapter was the culprit. After having received a "new" defective one (also unstable) I finally could have the drive working correctly again. This incident actually brought me to HDS. Here is the result of the read surface scan at that time (probably over USB):
20130516_R_WD_My_Book_WD-WCAU42174427_01.01A01-surface-full.jpg
20130516_R_WD_My_Book_WD-WCAU42174427_01.01A01-surface-full.jpg (699.72 KiB) Viewed 20120 times
This one was one day after, I assume after I exchanged the AC adapter and probably (not sure anymore) after I did a reinitialize surface test:
WD_My_Book_WD-WCAU42174427_01.01A01-surface3.jpg
WD_My_Book_WD-WCAU42174427_01.01A01-surface3.jpg (279.58 KiB) Viewed 20120 times
Now around 1-2 months ago I started again to have strange failures (HDD suddenly not available anymore and again ok after rebooting or disconnecting /reconnecting) and finally nothing anymore. At some point I tried all the various interfaces and all were dead. So again big shock first (like usual only partial / older backups because not enough disk space left) and then I took my courage and cracked open the black box, since I was assuming the disk could still be ok inside, actually just the bad quality interface board being dead. After I finally had some new drive (new space) I tried to make a full partition backup with shadow protect. I ended up with an error after approx. 9 hours stating "a device attached to the system is not functioning". So goodbye the first 700MB image already copied over... (snapshot destroyed) :(

The extended self test is ending with an error after several hours and the read only surface test (still running) looks like that so far:
WDC_WD10EAVS-00D7B0_WD-WCAU42174427_01.01A01-surface.jpg
WDC_WD10EAVS-00D7B0_WD-WCAU42174427_01.01A01-surface.jpg (363.07 KiB) Viewed 20120 times
What is your opinion?
1) Has one of the disks heads been damaged (massive raw read errors and around 300 pending sectors so far, see smart details below)? What can explain these regular patterns which seems to be recreated with time? Was the faulty (unstable) AC power adapter the first cause?
2) How to best proceed now since I cannot even make a full backup? I could try to simply copy over the files directory by directory but that might take a lot of time and even that would not make sure, that the data is not already partially corrupted. (I am thinking of trying to use a linux tool like dd or something like that I read about recently.) Of course I could also try to make a read-write-read test but I'm a bit afraid of destroying more data or maybe simply definitely ruin the disk (it happened already one time to me while I was trying to use spin rite...)
3) I assume that since there are no bad sectors it might still be possible to rescue all the data?
Here the latest smartctl SMART status:

Code: Select all

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   195   195   051    Pre-fail  Always       -       100984
  3 Spin_Up_Time            0x0027   159   149   021    Pre-fail  Always       -       7050
  4 Start_Stop_Count        0x0032   087   087   000    Old_age   Always       -       13011
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   037   037   000    Old_age   Always       -       45995
 10 Spin_Retry_Count        0x0032   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       257
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       49
193 Load_Cycle_Count        0x0032   196   196   000    Old_age   Always       -       13011
194 Temperature_Celsius     0x0022   120   089   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   199   199   000    Old_age   Always       -       278
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1023
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0
Thanks again a lot for your advises!!
Last edited by BlueDragon on 2014.10.01. 04:18, edited 1 time in total.
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Best procedure if damages sectors are found with error 1117?

Post by hdsentinel »

Thanks for the images.
Please use Report menu -> Send test report to developer option, as this way it is possible to check the complete status of the hard disk.
The list of attributes (especially if copied from a different tool) is not really useful to determine the problems and the real situation.
Alternatively, in Hard Disk Sentinel you may use right click -> Copy entire page to clipboard on the S.M.A.R.T. page to copy everything to the clipboard and paste it.

> 1) Has one of the disks heads been damaged (massive raw read errors and around 300 pending sectors so
> far, see smart details below)?
> What can explain these regular patterns which seems to be recreated with time? Was the faulty (unstable) AC power adapter the first cause?

From the patterns, yes, I'd say the head (more precisely its internal amplifier which recognises the magnetic field) is damaged.
This may hardly recognise the stored data and that's why it performs numerous retries, causing the slowness.

Yes, unstable power source is the number one cause of such issues.
Then the head(s) may not be able to poperly store the data on the disk surface and/or may not be able to read it back.


> 2) How to best proceed now since I cannot even make a full backup? I could try to simply copy over the files directory by directory but
> that might take a lot of time

Yes, this is the best way. As the hard disk may not be able to start spinning (due to the power source) and/or may not be able to read its internal administrative area on spin up, it is possible that all data will be lost.
So the best is to start copy important data, file-by-file to ensure that you'll have all important files backed up, while it's possible.


> and even that would not make sure, that the data is not already partially corrupted.

Yes, this is even possible - but all restarts / power cycles may reduce chances of backup of important data.
If you perform file-by-file copy, then you can verify them, and if something may be corrupted, you may only lose 1-2 files.
Currently I suspect most of the data can be read without corruption.

> (I am thinking of trying to use a linux tool like dd or something like that I read about recently.)

Not really a good idea. If somehow the data is corrupted, then it will copy it happily - but if you copy to a 2nd hard disk and the partition table / MFT gets corrupted, then even if you had a copy of the sector contents, you can't use any kind of information as you can't read the file system at all.


> Of course I could also try to make a read-write-read test but I'm a bit afraid of destroying more data or maybe simply
> definitely ruin the disk (it happened already one time to me while I was trying to use spin rite...)

Read-write-read test may helpful - if the hard disk has REALLY good power (by the replaced AC adapter).
But I'd more suggest to perform the backup and then use the Disk menu -> Surface test -> Reinitialise disk surface test to "reset" the hard disk surface to a new, empty, working state and fix possible problems.

> 3) I assume that since there are no bad sectors it might still be possible to rescue all the data?

On the image, I see damaged blocks. If you'll be lucky, even these data can be read after some retries (this is why the associated blocks are yellow, as the data can be read after a retry), so currently there is good chance to backup.
But this may be decreased with time and especially power cycles, especially because of the AC adapter.
BlueDragon
Posts: 15
Joined: 2014.09.17. 16:51

Re: Best procedure if damages sectors are found with error 1117?

Post by BlueDragon »

The list of attributes (especially if copied from a different tool) is not really useful to determine the problems and the real situation.
Alternatively, in Hard Disk Sentinel you may use right click -> Copy entire page to clipboard on the S.M.A.R.T. page to copy everything to the clipboard and paste it.
Well I had the impression that with this all important attributes are there but if you prefer, here is the copy of the HDS presentation of the SMART data:

Code: Select all

1,Raw Read Error Rate,51,195,195,OK,101013,0,Enabled
3,Spin Up Time,21,159,149,OK,7033,0,Enabled
4,Start/Stop Count,0,87,87,OK (Always passing),13015,0,Enabled
5,Reallocated Sectors Count,140,200,200,OK,0,0,Enabled
7,Seek Error Rate,51,200,200,OK,0,0,Enabled
9,Power On Time Count,0,37,37,OK (Always passing),46008,0,Enabled
10,Spin Retry Count,51,100,100,OK,0,0,Enabled
11,Drive Calibration Retry Count,51,100,100,OK,0,0,Enabled
12,Drive Power Cycle Count,0,100,100,OK (Always passing),257,0,Enabled
192,Power off Retract Cycle Count,0,200,200,OK (Always passing),49,0,Enabled
193,Load/Unload Cycle Count,0,196,196,OK (Always passing),13015,0,Enabled
194,Disk Temperature,0,120,89,OK (Always passing),30,0,Enabled
196,Reallocation Event Count,0,200,200,OK (Always passing),0,0,Enabled
197,Current Pending Sector Count,0,199,199,OK (Always passing),278,0,Enabled
198,Off-Line Uncorrectable Sector Count,0,200,200,OK (Always passing),0,0,Enabled
199,Ultra ATA CRC Error Count,0,200,200,OK (Always passing),1023,0,Enabled
200,Write Error Rate,51,200,200,OK,0,0,Enabled
Yes, unstable power source is the number one cause of such issues.
Then the head(s) may not be able to poperly store the data on the disk surface and/or may not be able to read it back.
Can it be that the amplifier did not write the data correctly on the platters because of too weak / unstable power supply at the time of writting the sectors? In that case simply rewriting them freshly (for instance by wiping out all with zeros or by reinitialising with HDS) should make the disk look like new?!
As the hard disk may not be able to start spinning (due to the power source) and/or may not be able to read its internal administrative area on spin up, it is possible that all data will be lost.
Maybe it was not very clear out of my explanations above but now the drive is directly connected to the mainboard (SATA) and is getting the power also from the board, hence it should be very stable by now.
Not really a good idea. If somehow the data is corrupted, then it will copy it happily - but if you copy to a 2nd hard disk and the partition table / MFT gets corrupted, then even if you had a copy of the sector contents, you can't use any kind of information as you can't read the file system at all.
Actually I made a mistake, instead of writing ddrescue I wrote dd (a linux command). Here is a short explanation about ddrescue if you are interested: http://www.gnu.org/software/ddrescue/ddrescue.html So actually it should be a good tool exactly for that type of problems like I'm having with this disk.
Read-write-read test may helpful - if the hard disk has REALLY good power (by the replaced AC adapter).
But I'd more suggest to perform the backup and then use the Disk menu -> Surface test -> Reinitialise disk surface test to "reset" the hard disk surface to a new, empty, working state and fix possible problems.
If my assumption above is correct then re-writing the sectors should really "magically" improve the disk status.
On the image, I see damaged blocks. If you'll be lucky, even these data can be read after some retries (this is why the associated blocks are yellow, as the data can be read after a retry), so currently there is good chance to backup.
But this may be decreased with time and especially power cycles, especially because of the AC adapter.
Since now the power supply should really not be a problem anymore, power cycles should not affect the drive I think (or I hope at least). :)

In any case I will start to first selectively (manualy) copy the files over to a new disk. Then probably I'm going to try this ddrescue software (might be helpful in a future similar scenario, especially because these external HDDs are really more subject to power supply problems as well as rather to high temperatures). After that I probably will try to have only the bad sectors be read-written-read and finally I will reinitialise the whole surface.

Thank you again very much for your advises and excellent support! I will highly advise your software around me.
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Best procedure if damages sectors are found with error 1117?

Post by hdsentinel »

> Well I had the impression that with this all important attributes are there but if you prefer, here is the copy of the HDS presentation of the SMART data:

Thanks, but not really.

I prefer to use Report menu -> Send test report to developer option, that's why I always recommend that.
Much better, easier and faster way to send diagnostic information about any kind of hard disk as then it is possible to quickly check the complete model ID, firmware version, S.M.A.R.T. data (which is only a part of the picture), disk controller, its driver version and so.
These together always give more information about the actual situation and upon any troubles, it is much easier to diagnose the situation.

> Can it be that the amplifier did not write the data correctly on the platters because of too weak / unstable power
> supply at the time of writting the sectors? In that case simply rewriting them freshly (for instance by
> wiping out all with zeros or by reinitialising with HDS) should make the disk look like new?!

Yes, in theory this should happen - if the hard disk would run now with proper power source, which gives proper power for the operation.

> Maybe it was not very clear out of my explanations above but now the drive is directly connected to the mainboard
> (SATA) and is getting the power also from the board, hence it should be very stable by now.

Thanks, I see. I thought you still use with the AC adapter, but if it gets power from the PSU, then it should be fine.

> Actually I made a mistake, instead of writing ddrescue I wrote dd (a linux command).

Yes. This is what I wrote: if the data (including the administrative areas, like partition table / MFT / directory structure) may be corrupted, then if you have an actual copy of the disk, then it may be unusable.
This is why the copying the files is more preferred.

> If my assumption above is correct then re-writing the sectors should really "magically" improve the disk status.

Yes, this is why I suggested to perform that test with Hard Disk Sentinel to improve the situation, the usability of the hard disk in general ;)

> In any case I will start to first selectively (manualy) copy the files over to a new disk.
> Then probably I'm going to try this ddrescue software (might be helpful in a future similar
> scenario, especially because these external HDDs are really more subject to power supply
> problems as well as rather to high temperatures). After that I probably will try to have only
> the bad sectors be read-written-read and finally I will reinitialise the whole surface.

This is perfect I think.
Personally then I'd put some data on the hard disk, and after some time (for example one week normal use and one week "rest" powered OFF if possible) I'd try to perform the read test again in Hard Disk Sentinel, just to verify if the problems are really fixed and the complete surface can be now used correctly.
BlueDragon
Posts: 15
Joined: 2014.09.17. 16:51

Re: Best procedure if damages sectors are found with error 1117?

Post by BlueDragon »

I just wanted to add the next part of the story for anybody facing the same type of hdd failure like I did.

First let me quote some information of the ddrescue presentation from here: http://www.gnu.org/software/ddrescue/ma ... -structure
GNU ddrescue is not a derivative of dd, nor is related to dd in any way except in that both can be used for copying data from one device to another. The key difference is that ddrescue uses a sophisticated algorithm to copy data from failing drives causing them as little additional damage as possible.

Ddrescue manages efficiently the status of the rescue in progress and tries to rescue the good parts first, scheduling reads inside bad (or slow) areas for later. This maximizes the amount of data that can be finally recovered from a failing drive.

The standard dd utility can be used to save data from a failing drive, but it reads the data sequentially, which may wear out the drive without rescuing anything if the errors are at the beginning of the drive.

Other programs read the data sequentially but switch to small size reads when they find errors. This is a bad idea because it means spending more time at error areas, damaging the surface, the heads and the drive mechanics, instead of getting out of them as fast as possible. This behavior reduces the chances of rescuing the remaining good data.
I had quite some time to find the proper way to use this free tool (being rather a linux beginner) but I can only recommend it in cases like I had. With SystemRescueCD from here http://sourceforge.net/p/systemrescuecd/wiki/Home/ I could manage to mount the defective drive and to copy an image to another (NTFS) partition. Then I mounted this image with ImDisk Toolkit form here http://www.ltr-data.se/opencode.html/ and could chkdsk it and copy whatever was possible. Actually I even mounted this image as if it was the defective drive (letter) and thus I could use HDS to re-verify and later re-initialize the surface of the defective HDD.

Interestingly after I had copied all the partition to the image file with the help of ddrescue, the defective HDD surface was in better state then before:
WDC_WD10EAVS-00D7B0_WD-WCAU42174427_01.01A01-surface-2.jpg
WDC_WD10EAVS-00D7B0_WD-WCAU42174427_01.01A01-surface-2.jpg (314.14 KiB) Viewed 20106 times
After my first read test with HDS I wrote down the damaged sectors. So I had HDS have a read only scan again selectively on these damaged sectors. At my surprise they were good again. :D Then I started again a full surface read only scan and the overall image was much better then the one above with no damaged or bad sector at all left. Finally I started a thorough surface re-initialization and until now it seems to work very well. When it will be finished I will again scan with read only, to get the reading speed and to check if the disk surface seems good again.

I wish HDS could also tell what file is touched by a damaged or bad sector seen after a read only scan. That would be a very nice additional feature! There seem to be a way to do that with another little linux tool http://sourceforge.net/projects/ddrutility/ working with the logfiles from ddrescue, but I did not tried it so far (and again it's not plain easy to use).
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Best procedure if damages sectors are found with error 1117?

Post by hdsentinel »

Thanks for the details and the information !

When you previously got the worse results in the read test, it could be related to the worse power source.

When the drive processed (even with the read test) and the power source is now stable, it could better, faster read the contents stored in the underlying sectors. This is why you now get better results: slightly faster areas and sectors read with no damages (with no retries).

> I wish HDS could also tell what file is touched by a damaged or bad sector seen after a read only scan.

Constantly investigate the possibilities of new features and functions to improve Hard Disk Sentinel, for example its surface test ;)
I can confirm that this area is also one of the field(s) under researching and construction - and may be available in a later version.
BlueDragon
Posts: 15
Joined: 2014.09.17. 16:51

Re: Best procedure if damaged sectors are found with error 1117?

Post by BlueDragon »

So here I am again, sorry for the delay in my answer. :)
When the drive processed (even with the read test) and the power source is now stable, it could better, faster read the contents stored in the underlying sectors. This is why you now get better results: slightly faster areas and sectors read with no damages (with no retries).
Actually for all the steps I did in recent times the drive was already connected directly to the mainboard power and sata connectors. Hence I'm afraid your assumption is not fitting for that particular case. Clearly the results got better after several reads and the same power source.

This being said I really don't trust this HDD anymore. Though on SMART level everything seems stable now (after double re-initialization and several surface read and read + write scans) I noticed that the HDD is making some abnormal noise. It's not really a clicking sound (I know this sound when a head cannot position itself and is re-calibrating all the time for instance) but some recurring sound which is lessening when for instance I set HDS not to allow the drive to spin down. So I did this because this sound was getting a bit on my nerves but I'm also sure it's not normal at all and the drive might have some (definitive) failure rather sooner then later...
User avatar
hdsentinel
Site Admin
Posts: 3128
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Best procedure if damaged sectors are found with error 1117?

Post by hdsentinel »

> Actually for all the steps I did in recent times the drive was already connected directly to the mainboard power and sata connectors.
> Hence I'm afraid your assumption is not fitting for that particular case.
> Clearly the results got better after several reads and the same power source.

Yes. But as I remember, _originally_, when the delays / fluctuations first reported, the drive was connected to a different power supply.
And this is exactly why I wrote that now, after the power source changed and more stable, then you'll experience better results.


> This being said I really don't trust this HDD anymore.

I can completely understand and agree. The internal amplifier of the heads may already started to have problems and degrade.
As I wrote, personally I'd worry that with time, the contents of the disk sectors (even if the drive is simply put in the cupboard) could not read back.

The noise you hear can be also when the disk head is re-positioning to the actual sector to perform a retry (internally, which is transparent for us, just may resulting a slightly lower response time), which is also sign of a such problem.

As this problem is really rare and somehow not easy to detect, I'd be more than happy if you can use Report menu -> Send test report to developer option, to check its current status, after performed the tests.
BlueDragon
Posts: 15
Joined: 2014.09.17. 16:51

Re: Best procedure if damaged sectors are found with error 1117?

Post by BlueDragon »

Yes. But as I remember, _originally_, when the delays / fluctuations first reported, the drive was connected to a different power supply.
And this is exactly why I wrote that now, after the power source changed and more stable, then you'll experience better results.
Hmm... maybe by reading and reading again it became a bit better. Actually there were many phases. Summarised as follows:
1) HDD performing normally in the USB enclosure
2) 1st failure signs. Cause: unstable power supply (adapter)
3) After replacement of the power adapter and re-initialisation of the surface, HDD performing ok again (though with clear weaknesses on some sectors already in terms of slower performance, as can be seen on my screenshots above)
4) Around one year later failure of the SATA-USB/FW/SATA interface so I opened the enclosure and connected the HDD directly to the mainboard controller and power supply
5) Surface read scan showing a couple of damaged sectors and 1117 errors
6) Copy of the partition / disk content to another disk (using an image file) by means of ddrescue NB: ddrescue could read all without errors so actually 100% rescue of the disk data
7) Read test again only for damaged sectors of step 5). Result: 100% readable!
8) Total surface read scan again. Result: 100% readable --> hence in better state then in step 5) which was already with the stable power supply!
9) Re-initilisation scan of surface (maybe 2 times, not sure), several read scans
10) Performs ok so far, but with clear performance weaknesses of some sectors (always the same read scan pattern more or less) and abnormal noises. Will not use it for non backed up (sensitive) data anymore.

I hope with this summary the situation will be easier to comprehend. :)
The noise you hear can be also when the disk head is re-positioning to the actual sector to perform a retry (internally, which is transparent for us, just may resulting a slightly lower response time), which is also sign of a such problem.
Not sure about this since it's when the disk iddles actually, not when it's actively reading. As I wrote before this noise is getting less when HDS "pings" the drive regularly for not going to park / sleep.
As this problem is really rare and somehow not easy to detect, I'd be more than happy if you can use Report menu -> Send test report to developer option, to check its current status, after performed the tests.
Yes I will do a read scan again in the next days and send you the a report.

Thanks again for your support!
Post Reply