About a year ago, I acquired a threadripper based desktop computer, the hardware details of which are:
Samsung 970 EVO Plus 500GB - Solid state drive
ASRock X399M TAICHI (latest firmware)
Gigabyte Radeon RX 580 GAMING 4GB
AMD Ryzen Threadripper 2950X - Processor
Corsair Vengeance LPX (32GB)
I've been using NixOS on this machine lately and Arch earlier. The current configuration of this system is:
Linux quasar-nixos-tr 5.4.6 #1-NixOS SMP Sat Dec 21 10:05:23 UTC 2019 x86_64 GNU/Linux
However, this system is far from stable. It constantly crashes with a hard lockup with no way to recover except for a hard poweroff. Dropping to a tty also doesn't work and I cannot ssh into this machine either.
I'd really appreciate any hints on where to look to fix this.
Issuing a poweroff and reboot to this system also often results in a kernel panic, one of which I managed to capture:
https://i.sstatic.net/nEVKC.jpg
I've also run a memtest which revealed no issues with my memory. Sorting through logs have also revealed nothing so far.
Things checked so far:
- rdrand bug: my system is unaffected
I've given up on this. The 2950x is just not viable on Linux. What I did:
- I RMA'd the processor, and a stable system for a few weeks which went back to recurring crashes again.
- I've tried changing power settings in the BIOS, to no avail.
- I finally sold the processor and got myself a 3950X. That meant quite a bit financial hit for me.
I now do have a stable system, and I think my current system is at least as performant as the previous TR system.