"ICRC, ABRT" errors with IRST (Intel Rapid Storage Technology, imsm): Could it be software?

Question

I have a system with two IRST RAID1: sda+sdb (2TB), sdc+sdd (1TB) (in Linux-speech)

Each pair of disks was bought in one order, i.e. they are the same disk drives of the same age.

The 2TB RAID contains operating systems (Windows, Linux) and various data partitions, while the 1TB RAID contains some non-essential software).

The 1TB RAID is only used by Windows, while the 2TB partition is used by both operating systems.

Now I noticed (via smartd in Linux) that sdc is having an increasing error count:

smartd[2008]: Device: /dev/sdc [SAT], ATA error count increased from 628 to 651

It's the only error count that increased. Specifically the disk (HGST HTS541010A9E680) has no read errors, no pending sectors and no redirected sectors. The disk also passed a long self-test.

Examining the error more closely, it looks like this:

Device Error Count: 651 (device log contains only the most recent 4 errors)
...
Error 651 [2] occurred at disk power-on lifetime: 4947 hours (206 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 00 11 00 00 19 0e 07 8f 09 00  Error: ICRC, ABRT at LBA = 0x190e078f = 420349839

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 20 00 28 00 00 19 0e 0e 40 40 00     00:00:57.526  READ FPDMA QUEUED
  60 00 20 00 20 00 00 19 0e 0e 80 40 00     00:00:57.526  READ FPDMA QUEUED
  60 00 20 00 18 00 00 19 0e 0c c0 40 00     00:00:57.526  READ FPDMA QUEUED
  60 00 20 00 10 00 00 19 0e 0d 00 40 00     00:00:57.526  READ FPDMA QUEUED
  60 00 20 00 08 00 00 19 0e 0d 40 40 00     00:00:57.526  READ FPDMA QUEUED

The other error was also at LBA 420349839 (and the other two errors logged had different LBAs). Also the command that led to the error always was READ FPDMA QUEUED.

Also in Linux the transfer statistics look good (at udma6):

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2            4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

Even after reading blocks at maximum speed, these counters did not increase. Originally I had suspected a bad or loose cable, or maybe radio interference.

So I wonder (as many files are read by Windows from the 1TB RAID): Could this error be due to the fact that the disk is part of a RAID1, that it's the Intel chipset (8086:2822 (rev 05)), or that it's Windows 10 running? Also, is there a method to map the LBA in the error message to a file on the NTFS partition on the RAID

The other disk in the RAID just has one such error:

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 1
...
Error 1 [0] occurred at disk power-on lifetime: 3163 hours (131 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 00 11 00 00 00 03 72 a7 00 00  Error: ICRC, ABRT at LBA = 0x000372a7 = 225959

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 40 00 00 00 00 00 03 72 78 40 00     00:00:59.573  READ FPDMA QUEUED
  60 00 20 00 08 00 00 00 03 41 60 40 00     00:00:59.564  READ FPDMA QUEUED
  60 00 80 00 00 00 00 00 03 40 a8 40 00     00:00:59.563  READ FPDMA QUEUED
  60 00 70 00 00 00 00 00 03 1c d0 40 00     00:00:59.562  READ FPDMA QUEUED
  60 00 30 00 00 00 00 00 03 1c 88 40 00     00:00:59.562  READ FPDMA QUEUED

"ICRC, ABRT" errors with IRST (Intel Rapid Storage Technology, imsm): Could it be software?

0 Answers0