Disk read test failing - breaking RAID array

How, what, where and why - when using the software.
InquiringMind
Posts: 7
Joined: 2024.10.20. 22:03

Disk read test failing - breaking RAID array

Post by InquiringMind »

I tried running HDSentinel's Disk Surface (Read) test on a Samsung 470 series SSD which was part of a RAID-0 array controlled by an LSI (now Avago/Broadcom) MegaRAID 9260-8i. The read tests failed (even when Windows was started from a separate hard disk) and MegaRAID Storage Manager (MSM) reported that the drive under test had been taken offline, breaking the RAID array.

It was possible to bring the drive back online with MSM and no data appears to have been lost, but surely a read test shouldn't be so traumatic?

(Further debug details sent to info@hdsentinel.com - on reconsideration, this thread should perhaps have been opened in the Bugs forum so feel free to relocate it).
Attachments
Failed Read.jpg
Failed Read.jpg (315.87 KiB) Viewed 49 times
User avatar
hdsentinel
Site Admin
Posts: 3102
Joined: 2008.07.27. 17:00
Location: Hungary
Contact:

Re: Disk read test failing - breaking RAID array

Post by hdsentinel »

Seems really weird.
Personally tested many RAID arrays on similar RAID controllers with all kind of hard disks (both SATA / SAS) and SSDs and never encountered any similar - and no other user ever reported any similar.

Generally yes, the Read test is the safest: it simply starts reading all sectors exported to the OS, from the Master Boot Record to all data sectors. It simply never causes any issue directly which could lead to similar situation.

According the image yes, I see the error message responded by the OS: "Error 2: The system cannot find the file specified" which means that the file (= the complete array in this case) removed from the system by the RAID controller - would be nice to know why, what is the "bug" in the RAID controller operation.

I received the report file, I'm still checking. Not sure if the driver of the RAID controller may be related: maybe somehow it does not "tolerate" that Hard Disk Sentinel attempted to lock the drive for exclusive use (to prevent other apps and Windows itself from accessing the partition on the RAID array. As this is the first step by default - maybe I can only imagine this situation.

So maybe you can try to disable it, just for a quick test (if you still prefer):
- try to open Disk menu -> Surface test and select the drive and test type
- but before starting the test, select the Configuration tab in this window and uncheck the enabled "Lock drive during test (unmount volumes)" option
- then proceed the test. Maybe on the Configuration tab, you can also use the Limit testing to specific data blocks and specify (for example) first block to be tested = 5000 to test only 2nd half of the array, just to check if there is any difference.

As I see from the report, you use Windows XP - and while I personally like and still actively use Windows XP, maybe the combination of the XP driver of the RAID controller and this particular SSD model can be related too, as (according the experiences) not all controllers have proper XP drivers prepared for SSDs.

I'll surely try to reproduce, make similar array and check / inspect the results, verify if the combination of Windows XP + controller + driver + SSD can be somehow related - and check if there is anything to do to avoid issues.
Thanks for increasing attention - and sorry for the possible troubles.
InquiringMind
Posts: 7
Joined: 2024.10.20. 22:03

Re: Disk read test failing - breaking RAID array

Post by InquiringMind »

I tried again, disabling the Lock Drive option and received similar results (new screenshot attached).

The driver used by the RAID card is version 4.32.0.32 of megasas.sys, dated 17/9/2010 - not a spring chicken but the latest version I could find with WinXP support. In terms of controller SSD support, the only thing I'm aware of are two firmware addons - Cachecade (using SSDs as a cache for HDDs) and FastPath (for better performance with SSDs), neither of which I have.

Aside from one sudden SSD failure (where it ceased to be visible even in the controller BIOS screen), I've not had any issues with this RAID setup since starting it in June 2018, aside from it not handling low-power (S3) suspend-to-RAM.
Attachments
20241024-174726_R_SAMSUNG_470_Series_SSD_S0SWNEAB400617_AXM09B1Q-surface-full.jpg
20241024-174726_R_SAMSUNG_470_Series_SSD_S0SWNEAB400617_AXM09B1Q-surface-full.jpg (650.4 KiB) Viewed 34 times
Post Reply