Inconsistent Soft Error SMART reports

InquiringMind · Post by **InquiringMind** » 2024.11.04. 01:45

Running HDS on a second system, a Dialogue Flybook, which has had its (PATA) disk replaced with a Kingston SMS200S3/240G 240 GB M2 SSD https://www.amazon.co.uk/dp/B00JT0DSQK , using a Lindy mSATA to 2.5inch IDE SSD adapter https://www.amazon.co.uk/dp/B00TOBJVEM - so maybe a little on the unusual side.

I noticed (while doing a large deletion) that HDS was reporting a very large number of Uncorrectable Soft Reads and an equal number of Soft ECC Corrections (which kind of seems contradictory) in the SMART report. Since I didn't receive any errors I thought I would just keep an eye on things. However the next time I checked, the figure was zero.

I then kept an eye out for the next time this happened:

: SoftError-1.jpg (115.05 KiB) Viewed 547 times

And checked again later (the laptop had been suspended and resumed in the meantime):

: SoftError-2.jpg (105.92 KiB) Viewed 547 times

Any idea about this apparent inconsistency?

BTW, as a feature request, could the dates on the bottom of the graph be printed at a 45° angle, rather than vertically, to make them easier to read and allow more space for the graph itself?

Post by **hdsentinel** » 2024.11.04. 13:52

Sorry, I do not really understand the reason of posting in the "bug reports", so I moved the topic to the other forum.
I see no bug/problem or any similar. Hard Disk Sentinel does everything as should: detect and report the values/numbers/attributes as provided by the hard disk / SSD - exactly to keep a record of the values.

Of course if the SSD reports a large number - then it is detected, reported, logged. But if later the SSD provides a different value (zero in this case) I'm afraid it is normal and expected that it is also recorded, logged. As the graph records and shows the latest value detected on every day, if this number is zero - then it is reported later, exactly as should.

> I then kept an eye out for the next time this happened:SoftError-1.jpg
> And checked again later (the laptop had been suspended and resumed in the meantime):SoftError-2.jpg
> Any idea about this apparent inconsistency?

Not really sure why the SSD reports the high number and then zero. Probably the power cycle reset these attributes.

Can you please use Report menu -> Send test report to developer option? I'd check and compare with reports of the same model, check if may be a firmware issue of the SSD or similar. I'll examine also the behaviour of a similar model.

> BTW, as a feature request, could the dates on the bottom of the graph be printed at a 45° angle,
> rather than vertically, to make them easier to read and allow more space for the graph itself?

Thanks for the tip, will check the possibilities!

Also a tip if you prefer to examine the history with more details: by default, Hard Disk Sentinel automatically selects the dates to be displayed, so not all history displayed. You can double click on the graph to toggle to display ALL recorded dates (the last value recorded every day).
Also after right click, you can use the "Save disk information" to export the dates/numbers in CSV file: this can be load (eg. in Excel) to examine and even make a high res graph if required.

InquiringMind · Post by **InquiringMind** » 2024.11.04. 23:06

Debug report sent.

hdsentinel wrote: ↑2024.11.04. 13:52 ...Hard Disk Sentinel does everything as should: detect and report the values/numbers/attributes as provided by the hard disk / SSD - exactly to keep a record of the values....As the graph records and shows the latest value detected on every day, if this number is zero - then it is reported later, exactly as should.

Isn't that the problem though - if the goal is to track and report on storage problems before they become critical, shouldn't the highest value be recorded and reported? Especially when that number is large?

This appears to occur only with high traffic (I started a fast defrag for a test earlier - it's been a couple of years since the last one) and that has racked up quite an impressive error count:

: MoreSoftErrors.png (56.94 KiB) Viewed 534 times

At this stage, I would guess this to be a bandwidth issue - the Lindy adaptor is limited to PATA data rates (UDMA100, so around 100MB/s) which I suspect the SSD can comfortably exceed. So this might be a consequence of the interface not being able to cope with the traffic (which, given current SSDs are hitting the 6Gb/s limit for SATA-6G, could happen elsewhere).

Post by **hdsentinel** » 2024.11.05. 13:53

Thanks for the report, examined, yes, I see the situation.
Also did some testing with similar SSD (tested with same Kingston SMS200... family, just used the 120GB model for a quick test) and yes, it produces similar results. Also checked numerous reports of the same models from the last 10+ years and see similar pattern too.

As you can see, during intensive use, the attributes 195, 201, 204 increase (all together) - then after power cycle, they all reset back to zero. So what you see is generally normal (related to the internal SSD controller chip) and does not indicate problem in the operation of the SSD.

> Isn't that the problem though - if the goal is to track and report on storage problems before they become critical,

Yes, exactly as you wrote.

> shouldn't the highest value be recorded and reported? Especially when that number is large?

No, not really for many reasons.
First of all, sometimes there is a relation between attributes (like now, when 3 attributes change together) and in many cases this relation is important too to determine the seriousness of a problem. For this, we would always need a "snapshot" of all attributes.
Simply recording and showing the highest value for 1-1 attribute may be misleading as then the resulting image may show a situation which never actually happened, never actually provided by the disk drive. The S.M.A.R.T. page (and the graphs) generally record the snapshots, exactly as provided by the drive.

Also many attributes can fluctuate and this is normal. There are dozens of examples, when the raw "data" field of an attribute for a disk model / family can increase (even to millions) and the drop back (we can name Seagate hard disks Raw Read Error Rate for an exmple). This is completely normal and expected - alone changing these numbers may not indicate problems at all.

Generally manufacturers use S.M.A.R.T. attributes completely freely, for different purposes, to reflect different kind of information. Some of them may record number of events (eg. real errors) but some may simply count read/write operations, amount of startup/power cycles and so. Some may happily increase to such high values and then reset automatically on a power cycle - all normal, no need to worry about them.

Of course there are attributes which are more important than others. These are usually marked as "Critical": if you select the attribute, you may notice "Critical" in the list of flags next to the attribute graph on the S.M.A.R.T. page in Hard Disk Sentinel and these are listed with green tick (or a red X, depending on its status) in the list.
The attributes we discuss now are not such critical ones, but designed to show performance-related information, this is why the flags section shows "Performance".
As according dozens of reports, these attributes seem change without any adapter, I'm sure that your adapter is not in the background.

Generally the Health % value on the Overview page designed as you wrote: to reflect the daily LOWEST (worst) Health % measured, recorded for each days, exactly to allow us to spot on any issue happened which resulted in decrease of the Health (even if an attribute may later improves back, for example if weak/pending sectors detected, then repaired).

But alone the fact that some (especially only performance related) attributes change may not mean that the drive is about to fail, there would be no reason to flag a such drive and show any issue for that.

If you prefer, you can perform tests of course exactly to reveal any possible issue - or confirm that the drive is working correctly.
I'd surely do - and as I see from the report, you also used: the Disk menu -> Short self test / Extended self tests both result Successfully Completed status.
Probably the Disk menu -> Surface test -> Read test would show no problem, no Health % change or so (just maybe the attributes may change until the next power cycle).

InquiringMind · Post by **InquiringMind** » 2024.11.15. 06:51

Thanks for the detailed explanation - I'm still rather concerned but if the manufacturers decide to reset a particular statistic on power-cycle, they presumably don't consider it important.

This information though would seem well worth adding to the help page at https://www.hdsentinel.com/help/en/12_smart.html

Hard Disk Monitoring

Inconsistent Soft Error SMART reports

Inconsistent Soft Error SMART reports

Re: Inconsistent Soft Error SMART reports

Re: Inconsistent Soft Error SMART reports

Re: Inconsistent Soft Error SMART reports

Re: Inconsistent Soft Error SMART reports