I'm getting regular emails from the smart daemon about my NVMe disk.
SMART error (ErrorCount) detected on host: desk
This message was generated by the smartd daemon running on:
host name: [redacted]
DNS domain: [redacted]
The following warning/error was logged by the smartd daemon:
Device: /dev/nvme0, number of Error Log entries increased from 2519 to 2521
Device info:
KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sat Oct 7 23:38:04 2023 EDT
Another message will be sent in 24 hours if the problem persists.
I've been trying to figure this out for months but I've not had any luck. Here are the various commands I have tried and their output.
smartctl -a /dev/nvme0
$ sudo smartctl -a /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-13-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KBG30ZMV256G TOSHIBA
Serial Number: X8OPD1PGP12P
Firmware Version: ADHA0101
PCI Vendor/Subsystem ID: 0x1179
IEEE OUI Identifier: 0x00080d
Controller ID: 0
NVMe Version: 1.2.1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 00080d 04004ad9aa
Local Time is: Sun Oct 15 17:53:35 2023 EDT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0017): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.30W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.30W - - 2 2 2 2 0 0
3 - 0.0500W - - 4 4 4 4 8000 32000
4 - 0.0050W - - 4 4 4 4 8000 40000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 4096 0 0
1 + 512 0 3
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 31 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 33%
Data Units Read: 35,454,740 [18.1 TB]
Data Units Written: 70,575,255 [36.1 TB]
Host Read Commands: 306,457,518
Host Write Commands: 881,616,851
Controller Busy Time: 12,766
Power Cycles: 342
Power On Hours: 21,991
Unsafe Shutdowns: 617
Media and Data Integrity Errors: 0
Error Information Log Entries: 2,528
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 31 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 2528 0 0x301c 0xc002 0x000 - 4 -
1 2527 0 0x201d 0xc004 0x028 - 1 -
2 2526 0 0x101d 0xc004 0x028 - 1 -
3 2525 0 0x6005 0xc002 0x000 - 4 -
4 2524 0 0x6004 0xc004 0x028 - 1 -
5 2523 0 0x5006 0xc004 0x028 - 1 -
6 2522 0 0x1006 0xc005 0x028 - 1 -
7 2521 0 0x4013 0xc005 0x028 - 0 -
nvme error-log /dev/nvme0
nvme list
$ sudo ./nvme-cli-latest-x86_64.AppImage list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 X8OPD1PGP12P KBG30ZMV256G TOSHIBA 0x1 256.06 GB / 256.06 GB 512 B + 0 B ADHA0101