1

I am executing a CUDA kernel in my A100 GPU and I've realized that the power consumption at some points is higher than nvidia-smi given range:

enter image description here

The picture has been taken from nvtop.

Is it something that I should be worry about?

Hennes
  • 65,804
  • 7
  • 115
  • 169

2 Answers2

2

The only worrisome aspect is the temperature, which seems to be at an unimpressive 52°C. This doesn't make sense if the power draw is truly above the max.

So, take your pick. Either:

  • The power draw figure is false
  • The temperature reported is false
  • nvtop is not working correctly with your GPU.

I would suggest verifying the temperature using other applications. If they also report the same readings, then you don't need to worry. Check both CPU and GPU and motherboard.

Useful references:

harrymc
  • 498,455
2

The power draw of a GPU is uneven - it has spikes and lows. The specified power draw of a card is ment to be read as "rolling average over one second" during which time it can fluctuate over and belo that value - this is one of the reasons why PSU specs are recommended to be way over the sum of specified component power draw in a GPU-heavy rig.

nvidia-smi and friends report not the rolling average, but the momentary power draw, which can of course exceed the specified value. If you randomly sample your GPU power draw over a statistically relevant sample, you are very likely to see a value very close to specs.

Eugen Rieck
  • 20,637