SSD SMART errors and strange dmesg errors, is this a dying disk?

Question

I've started noticing weird things in my dmesg log, and my 4 months old SSD. For example:

[    9.647535] ata7.00: exception Emask 0x10 SAct 0x7ffffbff SErr 0x300000 action 0x6 frozen
[    9.647542] ata7.00: irq_stat 0x08000000, interface fatal error
[    9.647546] ata7: SError: { Dispar BadCRC }
[    9.647551] ata7.00: failed command: READ FPDMA QUEUED
[    9.647558] ata7.00: cmd 60/b0:00:18:51:0f/03:00:07:00:00/40 tag 0 ncq 483328 in
[    9.647558]          res 40/00:18:c8:5c:0f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[    9.647561] ata7.00: status: { DRDY }
[    9.647564] ata7.00: failed command: READ FPDMA QUEUED
[    9.647570] ata7.00: cmd 60/00:08:c8:54:0f/04:00:07:00:00/40 tag 1 ncq 524288 in
[    9.647570]          res 40/00:18:c8:5c:0f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[    9.647573] ata7.00: status: { DRDY }

I've also noticed that my SMART values are weird:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       16
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       23

Runtime bad block seems to be steady, but it appears that UDMA CRC error is increasing after each reboot (probably due to Dispar BadCRC above), which is not nice..

After research online, I've tried replacing the SATA cable, but it doesn't look like it helped.

Once the system is up, I can't really notice anything different, everything appears to be working, but I can't be sure since it's the system disk and isn't being written to as much..

I've seen this tip about disabling NCQ, but I have two other disks in there which benefit from NCQ, and there is no tip on how to disable it for that drive only.

Is this a dying disk? Any idea how to find out the cause?

Here are the rest of the weird dmesg lines: http://pastebin.com/HCxiPwkM

And smartctl output: http://pastebin.com/h4c4MkEb

EDIT:

This also just happened while the machine was running:

Jun 13 00:27:48 kernel: [21674.310312] ata7.00: exception Emask 0x10 SAct 0x400 SErr 0x100000 action 0x6 frozen
Jun 13 00:27:48 kernel: [21674.310317] ata7.00: irq_stat 0x08000000, interface fatal error
Jun 13 00:27:48 kernel: [21674.310320] ata7: SError: { Dispar }
Jun 13 00:27:48 kernel: [21674.310323] ata7.00: failed command: READ FPDMA QUEUED
Jun 13 00:27:48 kernel: [21674.310327] ata7.00: cmd 60/00:50:00:36:4f/01:00:00:00:00/40 tag 10 ncq 131072 in
Jun 13 00:27:48 kernel: [21674.310327]          res 40/00:50:00:36:4f/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Jun 13 00:27:48 kernel: [21674.310329] ata7.00: status: { DRDY }
Jun 13 00:27:48 kernel: [21674.310333] ata7: hard resetting link
Jun 13 00:27:49 kernel: [21674.802471] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 330)
Jun 13 00:27:49 kernel: [21674.843512] ata7.00: configured for UDMA/133
Jun 13 00:27:49 kernel: [21674.845404] ata7: EH complete

According to this link, PSU may be the cause..?

EDIT 2

Tried changing things a bit today, all of my disks were on the same PSU cable, now they are not, but it doesn't seem to help..

Runtime_Bad_Block       18
UDMA_CRC_Error_Count    25

score 0 · Accepted Answer · answered Jun 19 '15 at 18:06

Several days after moving all my disks to the internal SATA controller, the errors have disappeared and the SMART values have not increased.

Now it remains to be seen whether this was a specific incompatibility between my SSD and the Marvell controller, or the controller has died entirely; which is probably what's going on..but that's a topic for another day.

SSD SMART errors and strange dmesg errors, is this a dying disk?

1 Answers1