Here's my environment:
LSI Megaraid 9260-8i
Intel RES2SV240 24 port SAS expander
LSI approved cables going between card and expander and expander to card.
For the longest time, I have been getting "Transient error detected while communicating with PD:14" errors in Megaraid storage manager. I opened a support ticket with LSI after exhausting all sorts of tests. I removed Drive 14, drive 15(after learning LSI offset counts by one), swapping cables, backplanes, etc...nothing got rid of these errors. Even with PD14 and PD15 missing from my array, I continually saw these bizarre errors. LSI thought the expander might have an issue, but I didn't have a spare expander to try out. I let the issue eventually go with LSI, since they weren't much help, and it was only warnings/information items (unexpected sense messages showed up as well).
so what does this have to do with HDSentinel? the other day I opened Megaraid storage manager and noticed for the last 24 hours there hadn't been one Transient error message logged. these were previously occuring every 15 seconds, and suddenly they stopped. I checked around and noticed that HDSentinel was not running. I re-launched the app, and within a few seconds..."Transient Error detected..." started showing up again.
I can now reproduce this easily, by clicking the "Update Disk Information" button, which will cause "Unexpected sense" and eventually a "transient error detected" message in megaraid every time i press the button. If I force quit out of HDS, the warnings/info items stop.
I'm not saying this is damaging anything, as I've just ignored the warnings, but wanted to bring it up because i didn't see anything in the bug reports
LSI Megaraid 9260-8i transient errors detected with HDS
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: LSI Megaraid 9260-8i transient errors detected with HDS
Thanks for your attention and excuse me for the troubles.
However, I do not think this is a bug - at least not in Hard Disk Sentinel.
As you may know, Hard Disk Sentinel enumerates all hard disks (SATA and SAS) connected to the RAID controller.
When it finds a drive, it needs to check if the drive is a SATA or SAS first, to determine how the status information should be detected.
It seems your drive at position #14 somehow works differently than others and this initial detection triggers this (otherwise harmful) message.
Or if you do not have drives at these positions, I suspect it may be somehow related to the enclosure itself.
Can you please use Report menu -> Send test report to developer option? This way it is possible the disk drive type and the information it provides (and the "raw" information we got from the enclosure via the RAID controller).
This would help lots to check what can be trigger this message in the Megaraid tool and how it may be possible to avoid it.
> after learning LSI offset counts by one
Yes, sometimes port numbering is really interesting and does not match the physical ports
> these were previously occuring every 15 seconds,
This sounds interesting - as Hard Disk Sentinel checks the hard disk status only once per every 5 minutes (default option, can be adjusted by the slider at Configuration -> Advanced options page). So if you saw it every 15 seconds, then it may be something else (or a combination of things).
> I'm not saying this is damaging anything, as I've just ignored the warnings, but wanted to bring it up because i didn't see anything in the bug reports
This actually never cause any problems. Personally I have no idea why it's even logged.
Just like if the hard disk / BIOS setup would log errors on every bootup that hard disk(s) at specific ports not installed / not available, when the BIOS tries to enumerate which port(s) have disks connected ....
However I agree that it should be avoided - if it can be avoided.
This is why the Report menu -> Send test report to developer option (preferably with latest Hard Disk Sentinel version) would be essential to see the actual status and check how the drive(s) / device(s) may work differently than expected.
However, I do not think this is a bug - at least not in Hard Disk Sentinel.
As you may know, Hard Disk Sentinel enumerates all hard disks (SATA and SAS) connected to the RAID controller.
When it finds a drive, it needs to check if the drive is a SATA or SAS first, to determine how the status information should be detected.
It seems your drive at position #14 somehow works differently than others and this initial detection triggers this (otherwise harmful) message.
Or if you do not have drives at these positions, I suspect it may be somehow related to the enclosure itself.
Can you please use Report menu -> Send test report to developer option? This way it is possible the disk drive type and the information it provides (and the "raw" information we got from the enclosure via the RAID controller).
This would help lots to check what can be trigger this message in the Megaraid tool and how it may be possible to avoid it.
> after learning LSI offset counts by one
Yes, sometimes port numbering is really interesting and does not match the physical ports
> these were previously occuring every 15 seconds,
This sounds interesting - as Hard Disk Sentinel checks the hard disk status only once per every 5 minutes (default option, can be adjusted by the slider at Configuration -> Advanced options page). So if you saw it every 15 seconds, then it may be something else (or a combination of things).
> I'm not saying this is damaging anything, as I've just ignored the warnings, but wanted to bring it up because i didn't see anything in the bug reports
This actually never cause any problems. Personally I have no idea why it's even logged.
Just like if the hard disk / BIOS setup would log errors on every bootup that hard disk(s) at specific ports not installed / not available, when the BIOS tries to enumerate which port(s) have disks connected ....
However I agree that it should be avoided - if it can be avoided.
This is why the Report menu -> Send test report to developer option (preferably with latest Hard Disk Sentinel version) would be essential to see the actual status and check how the drive(s) / device(s) may work differently than expected.
Re: LSI Megaraid 9260-8i transient errors detected with HDS
Thanks for the reply. A report is on it's way, and I agree that it currently is of no concern really, as they are just warnings (though transient error could indicate bad cables, power, etc). I have ignore the CDB messages as LSI themselves say to ignore those, but Since it only happens with HDS running, I was thinking maybe a command used by HDS to enumerate drive smart status or something was causing a command that LSI did not recognize or something.
I initially thought it was enclosure as well, but I have individual drive cages (Supermicro 5-in-3) without a backplane (except the SAS expander). I removed drives 13, 14, and 15 (since errors are with PD:14) and with them removed, I still get PD:14, which leads me to the next thing:
I ran LSI's utility for gathering logs, and noticed in the logs that there were additional messages pointing to the address of the SAS expander, so perhaps it has an issue, but again, this only seems to occur with HDS running.
regardless of this, I still trust HDS to protect the drives, and may try swapping the expander for another (if i can get a good deal), but just wanted to see if the developer of HDS had ideas of what might cause it, as I've been troubleshooting it for months now
Thanks!
I initially thought it was enclosure as well, but I have individual drive cages (Supermicro 5-in-3) without a backplane (except the SAS expander). I removed drives 13, 14, and 15 (since errors are with PD:14) and with them removed, I still get PD:14, which leads me to the next thing:
I ran LSI's utility for gathering logs, and noticed in the logs that there were additional messages pointing to the address of the SAS expander, so perhaps it has an issue, but again, this only seems to occur with HDS running.
regardless of this, I still trust HDS to protect the drives, and may try swapping the expander for another (if i can get a good deal), but just wanted to see if the developer of HDS had ideas of what might cause it, as I've been troubleshooting it for months now
Thanks!
- hdsentinel
- Site Admin
- Posts: 3115
- Joined: 2008.07.27. 17:00
- Location: Hungary
- Contact:
Re: LSI Megaraid 9260-8i transient errors detected with HDS
Thanks for the report !
I checked it and I suspect I see the "problem": on the appropriate port (14) the enclosure itself (more precisely, the SAS expander chip) is "responding" when Hard Disk Sentinel tries to access it. This happens regardless of the actual drive positions used.
Somehow this triggers the problem.
Generally yes, it is absolutely safe, does not cause any troubles but I agree that it may be frightening and if possible, should be avoided.
I just sent an e-mail with some information for testing - I'm waiting to see an updated report where (hopefully) things will be better.
Thanks for your attention about pointing on this - and excuse me for the troubles
I checked it and I suspect I see the "problem": on the appropriate port (14) the enclosure itself (more precisely, the SAS expander chip) is "responding" when Hard Disk Sentinel tries to access it. This happens regardless of the actual drive positions used.
Somehow this triggers the problem.
Generally yes, it is absolutely safe, does not cause any troubles but I agree that it may be frightening and if possible, should be avoided.
I just sent an e-mail with some information for testing - I'm waiting to see an updated report where (hopefully) things will be better.
Thanks for your attention about pointing on this - and excuse me for the troubles
Re: LSI Megaraid 9260-8i transient errors detected with HDS
Sorry about the late reply, Thanks again for looking into this. With the patch you sent everything so far is working extremely well, not a single issue to report. Thanks for the quick diagnosis and response on my bug report too...some developers take months to reply