Page 1 of 1

StoreLibTest has stopped working

Posted: 2018.04.12. 12:01
by NNMike
Hi Guys,

I've been using HDS for a while now for disk wiping & testing, however just recently we've been seeing Windows crash reports for 'StoreLibTest' when hot-plugging disks into our attached enclosures. This has not been a problem, as during the disk installation, we are just checking each device presents correctly and is above 100% to start the testing suite.

However, last night our batch testing was interrupted by this error and it does appear to be related to HDS as there were 3 active reports for StoreLibTest has stopped working and 3 surface scan windows that had incomplete progress tasks, that could not be viewed. Please see attached screenshot showing the symptom and advise if there is additional information you would like.

The main HDS window will also often stop responding requiring a force quit and restart.

Re: StoreLibTest has stopped working

Posted: 2018.04.16. 13:22
by hdsentinel
Hi,

Thanks for your message and the information.

Yes, while not common, this is possible.

While usually there should be no issues, in some rare situations (depending on the special controller, the number and actual status of the disks) the controller may not respond - or just respond very slowly (only after minutes) as the enumeration of the disks (especially if some of the disks have higher amount of problems or completely failed) can take very long time.

This happened now: StoreLibTest is an external module of Hard Disk Sentinel, designed to communicate with some of these special SAS/SATA RAID and HBA cards, RAID solutions. When a such really problematic drive found, the controller may not respond - so StoreLibTest may hang or crash.

Upon such problems, anything attempting to read/write the particular drive (or in worst case some other drives on the same controller) seems not responding too. This is I'm afraid completely normal and expected: so then the appropriate surface scan window may seems frozen - and also Hard Disk Sentinel itself seems not responding: as it is waiting for the response of the controller (and the external module communicating with it).

In such situation, disconnection of the problematic drive usually helps - as then the external module will continue to operate and also (as the surface windows and HDS got the proper response) everything continues as should.
You may also try to terminate StoreLibTest manually (if still active) in the task manager.

It may be good to isolate such drives and verify them one-by-one, so then hopefully they will not affect the operation of the tests and other drives too.

I suggest to please use Report menu -> Send test report to developer option. Then I can check the actual situation, verify the current RAID controller (HBA) type, model and drivers installed for it - plus possible other devices (SAS backplane, enclosure or so).
If you have such problematic drive(s) connected, I can check their status too (then the detection may require longer time, as the problematic drive(s) may respond much slower - if there is any response at all).
Such developer reports always help to check the "raw" response of the controller, examine the detection times and possible errors reported - and these always give thoughts for future improvements, to make things even better, more stable and robust.

Re: StoreLibTest has stopped working

Posted: 2018.04.19. 14:50
by NNMike
Hi HDSentinel,

After some further internal testing, it appears as a combination of a bad disk within an enclosure and a specific SAS Expander card ( Intel RES2CV240 ). I originally designed a custom test rig using this card as my SAS/SATA expander to provide 16 disks to an external Dell 6Gb/s SAS HBA on a Win7 rig which worked perfectly.

After recently moving on to start using your program, Server 2016 was used as the test software OS and it appears Windows does not handle this card correctly - I also can't change the drivers as it's no longer listed as a hardware device within device manager like previously for Win 7. It all works as expected, however we observe Windows Event Viewer filling up with RAID/Port# warnings until it starts to occur every second - hanging every disk device on the server.

After disconnecting the RES2CV240 card and using some other expanders within enclosures we have been able to run 45 concurrent disk surface tests without issue on this same machine.


Still seeing these StoreLibTest crash messages when hot plugging out a disk (sometimes), but this is not effecting anything else if a test is still in progress on a different disk.

I can re-connect up the other enclosure if you want a report for your own development - but it looks like a Windows/Driver issue?

Re: StoreLibTest has stopped working

Posted: 2018.04.23. 09:03
by hdsentinel
Yes, generally it is possible that there may be an incompatibility in the driver - or related to a combination of the firmware of the card + driver + enclosure/backplane itself.
In such situations, yes, changing drivers sometimes help.

Maybe if possible (when relatively few disks connected), I'd suggest to use Report menu -> Send test report to developer option.
So I can check the raw response of the controller, examine the detected information about the enclosure.
With some combinations of controllers + expander + enclosure, it was possible to improve the situation, add further protection against such issues so the developer reports always help to examine such situations and possibilities.