I tried running HDSentinel's Disk Surface (Read) test on a Samsung 470 series SSD which was part of a RAID-0 array controlled by an LSI (now Avago/Broadcom) MegaRAID 9260-8i. The read tests failed (even when Windows was started from a separate hard disk) and MegaRAID Storage Manager (MSM) reported that the drive under test had been taken offline, breaking the RAID array.
It was possible to bring the drive back online with MSM and no data appears to have been lost, but surely a read test shouldn't be so traumatic?
(Further debug details sent to info@hdsentinel.com - on reconsideration, this thread should perhaps have been opened in the Bugs forum so feel free to relocate it).
Disk read test failing - breaking RAID array
-
- Posts: 14
- Joined: 2024.10.20. 22:03
Disk read test failing - breaking RAID array
- Attachments
-
- Failed Read.jpg (315.87 KiB) Viewed 174 times
- hdsentinel
- Site Admin
- Posts: 3106
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Disk read test failing - breaking RAID array
Seems really weird.
Personally tested many RAID arrays on similar RAID controllers with all kind of hard disks (both SATA / SAS) and SSDs and never encountered any similar - and no other user ever reported any similar.
Generally yes, the Read test is the safest: it simply starts reading all sectors exported to the OS, from the Master Boot Record to all data sectors. It simply never causes any issue directly which could lead to similar situation.
According the image yes, I see the error message responded by the OS: "Error 2: The system cannot find the file specified" which means that the file (= the complete array in this case) removed from the system by the RAID controller - would be nice to know why, what is the "bug" in the RAID controller operation.
I received the report file, I'm still checking. Not sure if the driver of the RAID controller may be related: maybe somehow it does not "tolerate" that Hard Disk Sentinel attempted to lock the drive for exclusive use (to prevent other apps and Windows itself from accessing the partition on the RAID array. As this is the first step by default - maybe I can only imagine this situation.
So maybe you can try to disable it, just for a quick test (if you still prefer):
- try to open Disk menu -> Surface test and select the drive and test type
- but before starting the test, select the Configuration tab in this window and uncheck the enabled "Lock drive during test (unmount volumes)" option
- then proceed the test. Maybe on the Configuration tab, you can also use the Limit testing to specific data blocks and specify (for example) first block to be tested = 5000 to test only 2nd half of the array, just to check if there is any difference.
As I see from the report, you use Windows XP - and while I personally like and still actively use Windows XP, maybe the combination of the XP driver of the RAID controller and this particular SSD model can be related too, as (according the experiences) not all controllers have proper XP drivers prepared for SSDs.
I'll surely try to reproduce, make similar array and check / inspect the results, verify if the combination of Windows XP + controller + driver + SSD can be somehow related - and check if there is anything to do to avoid issues.
Thanks for increasing attention - and sorry for the possible troubles.
Personally tested many RAID arrays on similar RAID controllers with all kind of hard disks (both SATA / SAS) and SSDs and never encountered any similar - and no other user ever reported any similar.
Generally yes, the Read test is the safest: it simply starts reading all sectors exported to the OS, from the Master Boot Record to all data sectors. It simply never causes any issue directly which could lead to similar situation.
According the image yes, I see the error message responded by the OS: "Error 2: The system cannot find the file specified" which means that the file (= the complete array in this case) removed from the system by the RAID controller - would be nice to know why, what is the "bug" in the RAID controller operation.
I received the report file, I'm still checking. Not sure if the driver of the RAID controller may be related: maybe somehow it does not "tolerate" that Hard Disk Sentinel attempted to lock the drive for exclusive use (to prevent other apps and Windows itself from accessing the partition on the RAID array. As this is the first step by default - maybe I can only imagine this situation.
So maybe you can try to disable it, just for a quick test (if you still prefer):
- try to open Disk menu -> Surface test and select the drive and test type
- but before starting the test, select the Configuration tab in this window and uncheck the enabled "Lock drive during test (unmount volumes)" option
- then proceed the test. Maybe on the Configuration tab, you can also use the Limit testing to specific data blocks and specify (for example) first block to be tested = 5000 to test only 2nd half of the array, just to check if there is any difference.
As I see from the report, you use Windows XP - and while I personally like and still actively use Windows XP, maybe the combination of the XP driver of the RAID controller and this particular SSD model can be related too, as (according the experiences) not all controllers have proper XP drivers prepared for SSDs.
I'll surely try to reproduce, make similar array and check / inspect the results, verify if the combination of Windows XP + controller + driver + SSD can be somehow related - and check if there is anything to do to avoid issues.
Thanks for increasing attention - and sorry for the possible troubles.
-
- Posts: 14
- Joined: 2024.10.20. 22:03
Re: Disk read test failing - breaking RAID array
I tried again, disabling the Lock Drive option and received similar results (new screenshot attached).
The driver used by the RAID card is version 4.32.0.32 of megasas.sys, dated 17/9/2010 - not a spring chicken but the latest version I could find with WinXP support. In terms of controller SSD support, the only thing I'm aware of are two firmware addons - Cachecade (using SSDs as a cache for HDDs) and FastPath (for better performance with SSDs), neither of which I have.
Aside from one sudden SSD failure (where it ceased to be visible even in the controller BIOS screen), I've not had any issues with this RAID setup since starting it in June 2018, aside from it not handling low-power (S3) suspend-to-RAM.
The driver used by the RAID card is version 4.32.0.32 of megasas.sys, dated 17/9/2010 - not a spring chicken but the latest version I could find with WinXP support. In terms of controller SSD support, the only thing I'm aware of are two firmware addons - Cachecade (using SSDs as a cache for HDDs) and FastPath (for better performance with SSDs), neither of which I have.
Aside from one sudden SSD failure (where it ceased to be visible even in the controller BIOS screen), I've not had any issues with this RAID setup since starting it in June 2018, aside from it not handling low-power (S3) suspend-to-RAM.
- Attachments
-
- 20241024-174726_R_SAMSUNG_470_Series_SSD_S0SWNEAB400617_AXM09B1Q-surface-full.jpg (650.4 KiB) Viewed 159 times
-
- Posts: 14
- Joined: 2024.10.20. 22:03
Re: Disk read test failing - breaking RAID array
Did a bit of experimenting (after taking another backup). With the 9260-8i, it isn't possible to "de-RAID" disks as such, any that aren't part of a RAID array just aren't available (they don't show up under Disk Management, etc). I did manage to create a RAID-0 array with just one disk though, and tried re-running the Read test. No change - same behaviour as above.
- hdsentinel
- Site Admin
- Posts: 3106
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Disk read test failing - breaking RAID array
Yesterday finally had time to try to check the situation.
Yes, I noticed what you wrote: even a standalone drive needs to be configured as a RAID-0 "array" with one disk as member.
The closest controller what I could use for testing is an Intel branded LSI 9260-4i. I installed this under Windows XP SP 3 and installed 100% same driver what you use: version 4.27.1.32 (date: 6-11-2010). According the developer report, this is what you have installed too.
I downloaded the driver from
https://www.broadcom.com/support/knowledgebase/1211161491782/megaraid-sas-9260-4i---9260-8i---9260de-8i---9261-8i-downloads
"Signed driver v4.27.1.32 and v4.27.1.64 MID_1401090_all_Windows_Signed_v4.27.1.32_and_v4.27.1.64.zip"
I'm afraid I have no Samsung 470 Series SSD at all for testing, so used very random SSDs now (2x Kingston to make a RAID-0 array and also a Samsung SSD to make a RAID-0 "array" as single drive.
I tried to perform all tests you mentioned: first the Disk menu -> Extended self test on a RAID member. The test ran for 30+ minutes and then completed without error:
Then tried to perform Disk menu -> Surface test -> Read test on the the real array (2x SSDs in RAID-0), everything worked as expected:
and also then on the Samsung SSD too. This is generally a failing SSD with very low Health, so the disk surface map shows many slower blocks - but generally the test could complete:
Personally I'm so sad that I could not reproduce the issue - as then I could immediately begin checking / improving in all possible ways.
To be honest, not really sure what can cause any issue on your system, why the controller puts the array offline, as generally there is no reason for that. I'm still trying with other possible SSDs. Not sure if the controller firmware and/or the SSD model itself would cause troubles or so. Would be nice to get some 470 SSDs for testing, but because of their age, I do not think I can get any working drives.
Ps. I used simple SFF-8087 -> 4xSATA cable to connect the drives to the controller. Do you use some backplane/enclosure or so? Should not be problem but maybe...
Yes, I noticed what you wrote: even a standalone drive needs to be configured as a RAID-0 "array" with one disk as member.
The closest controller what I could use for testing is an Intel branded LSI 9260-4i. I installed this under Windows XP SP 3 and installed 100% same driver what you use: version 4.27.1.32 (date: 6-11-2010). According the developer report, this is what you have installed too.
I downloaded the driver from
https://www.broadcom.com/support/knowledgebase/1211161491782/megaraid-sas-9260-4i---9260-8i---9260de-8i---9261-8i-downloads
"Signed driver v4.27.1.32 and v4.27.1.64 MID_1401090_all_Windows_Signed_v4.27.1.32_and_v4.27.1.64.zip"
I'm afraid I have no Samsung 470 Series SSD at all for testing, so used very random SSDs now (2x Kingston to make a RAID-0 array and also a Samsung SSD to make a RAID-0 "array" as single drive.
I tried to perform all tests you mentioned: first the Disk menu -> Extended self test on a RAID member. The test ran for 30+ minutes and then completed without error:
Then tried to perform Disk menu -> Surface test -> Read test on the the real array (2x SSDs in RAID-0), everything worked as expected:
and also then on the Samsung SSD too. This is generally a failing SSD with very low Health, so the disk surface map shows many slower blocks - but generally the test could complete:
Personally I'm so sad that I could not reproduce the issue - as then I could immediately begin checking / improving in all possible ways.
To be honest, not really sure what can cause any issue on your system, why the controller puts the array offline, as generally there is no reason for that. I'm still trying with other possible SSDs. Not sure if the controller firmware and/or the SSD model itself would cause troubles or so. Would be nice to get some 470 SSDs for testing, but because of their age, I do not think I can get any working drives.
Ps. I used simple SFF-8087 -> 4xSATA cable to connect the drives to the controller. Do you use some backplane/enclosure or so? Should not be problem but maybe...
-
- Posts: 14
- Joined: 2024.10.20. 22:03
Re: Disk read test failing - breaking RAID array
Thanks for the follow-up.
https://docs.broadcom.com/docs/12349696
So unless the driver version change makes a difference, it would seem there is an issue with the Samsung 470s.
Sorry, but the driver version I have installed is, as noted above, 4.32.0.32 which can be downloaded from:hdsentinel wrote: ↑2024.10.29. 15:53 ...I installed this under Windows XP SP 3 and installed 100% same driver what you use: version 4.27.1.32 (date: 6-11-2010). According the developer report, this is what you have installed too.
https://docs.broadcom.com/docs/12349696
Aside from the driver difference, SSD type may be a factor. I tried to force HDS to do a read test on the Crucial drives but it kept listing the Samsungs (presumably since they were first in the array?) so I tried re-ordering the RAID array so the Crucials were first - no go. Dismantled the array and created a new RAID-0 with just the two Crucials (leaving the Samsungs unassigned) but the Samsungs were still listed. So I then created a second RAID-0 with the five Samsungs, and this time I could select the Crucial array. So I ran a read test on that and it worked: I then ran a read test on the Samsung array and it failed again:hdsentinel wrote: ↑2024.10.29. 15:53To be honest, not really sure what can cause any issue on your system, why the controller puts the array offline, as generally there is no reason for that. I'm still trying with other possible SSDs. Not sure if the controller firmware and/or the SSD model itself would cause troubles or so. Would be nice to get some 470 SSDs for testing, but because of their age, I do not think I can get any working drives.
Nope - I have the same cables as yourself.hdsentinel wrote: ↑2024.10.29. 15:53Ps. I used simple SFF-8087 -> 4xSATA cable to connect the drives to the controller. Do you use some backplane/enclosure or so? Should not be problem but maybe...
So unless the driver version change makes a difference, it would seem there is an issue with the Samsung 470s.
- hdsentinel
- Site Admin
- Posts: 3106
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: Disk read test failing - breaking RAID array
> Sorry, but the driver version I have installed is, as noted above, 4.32.0.32 which can be downloaded from:
> https://docs.broadcom.com/docs/12349696
Hm... From the report file you sent I see (line 1229):
PCI\VEN_1000&DEV_0079&SUBSYS_92611000&REV_05\4&9784C61&0&0048
4.27.1.32 6-11-2010 LSI MegaRAID SAS 9260-8i
That's why I tried with 4.27.1.32 driver
I'll examine with 4.32.0.32 too.
> I tried to force HDS to do a read test on the Crucial drives but it kept listing the Samsungs (presumably since they were first in the array?)
Generally the Surface test function tests the whole array configured, regardless of which drive is the first in the array.
The purpose of the RAID is exactly to prevent accessing drives independently in all possible ways. Ideally (as you can see) we have some methods to access the drives one-by-one at least to check their S.M.A.R.T. status (and if possible, use internal hardware short/extended self tests).
But for Surface tests, we can only test the complete array, as exported to the OS, similarly as (for example) Windows can read/write the logical drive (the complete array) during any file operation or during format.
> created a new RAID-0 with just the two Crucials (leaving the Samsungs unassigned) but the Samsungs were still listed.
Hard Disk Sentinel automatically detects unsassigned drives - and it may show them like if they'd all part of an array, but of course then the surface test would not touch the unassigned drives (exactly because they are not exported to the OS in any ways). The Information page of the main window displays these drives as "unassigned" or "hot spare" or similar, to indicate that these drives are not part of the configuration (even if detected/listed). Sorry for the possible confusion.
> So I then created a second RAID-0 with the five Samsungs, and this time I could select the Crucial array.
> So I ran a read test on that and it worked
Thanks, good to hear.
> I then ran a read test on the Samsung array and it failed again.
Yes, then I really worry that the SSD model is the important factor.
Not sure, but is it possible to check what happens if you configure only one such Samsung 470 as RAID-0 (so generally one member)?
Just to see if the issue happens then too.
I'm still trying to check where can I order at least one Samsung 470 for testing - would be nice to know if this would be "enough" (so no more drives required for a real RAID array). Do you offer one for sale? Then I'd surely able to check with a such drive.
Not sure if there can be anything to do if there is a minor compatibility between this specific model and the controller and the software, but I'd be happy to examine, reproduce and check for any possible solutions/workarounds (and also try with other driver versions etc.) so investiate the situation.
Thanks so much for increasing attention and time on investigation!
> https://docs.broadcom.com/docs/12349696
Hm... From the report file you sent I see (line 1229):
PCI\VEN_1000&DEV_0079&SUBSYS_92611000&REV_05\4&9784C61&0&0048
4.27.1.32 6-11-2010 LSI MegaRAID SAS 9260-8i
That's why I tried with 4.27.1.32 driver
I'll examine with 4.32.0.32 too.
> I tried to force HDS to do a read test on the Crucial drives but it kept listing the Samsungs (presumably since they were first in the array?)
Generally the Surface test function tests the whole array configured, regardless of which drive is the first in the array.
The purpose of the RAID is exactly to prevent accessing drives independently in all possible ways. Ideally (as you can see) we have some methods to access the drives one-by-one at least to check their S.M.A.R.T. status (and if possible, use internal hardware short/extended self tests).
But for Surface tests, we can only test the complete array, as exported to the OS, similarly as (for example) Windows can read/write the logical drive (the complete array) during any file operation or during format.
> created a new RAID-0 with just the two Crucials (leaving the Samsungs unassigned) but the Samsungs were still listed.
Hard Disk Sentinel automatically detects unsassigned drives - and it may show them like if they'd all part of an array, but of course then the surface test would not touch the unassigned drives (exactly because they are not exported to the OS in any ways). The Information page of the main window displays these drives as "unassigned" or "hot spare" or similar, to indicate that these drives are not part of the configuration (even if detected/listed). Sorry for the possible confusion.
> So I then created a second RAID-0 with the five Samsungs, and this time I could select the Crucial array.
> So I ran a read test on that and it worked
Thanks, good to hear.
> I then ran a read test on the Samsung array and it failed again.
Yes, then I really worry that the SSD model is the important factor.
Not sure, but is it possible to check what happens if you configure only one such Samsung 470 as RAID-0 (so generally one member)?
Just to see if the issue happens then too.
I'm still trying to check where can I order at least one Samsung 470 for testing - would be nice to know if this would be "enough" (so no more drives required for a real RAID array). Do you offer one for sale? Then I'd surely able to check with a such drive.
Not sure if there can be anything to do if there is a minor compatibility between this specific model and the controller and the software, but I'd be happy to examine, reproduce and check for any possible solutions/workarounds (and also try with other driver versions etc.) so investiate the situation.
Thanks so much for increasing attention and time on investigation!