Pytorch says that CUDA is not available (on Ubuntu)

Question

I'm trying to run Pytorch on a laptop that I have. It's an older model but it does have an Nvidia graphics card. I realize it is probably not going to be sufficient for real machine learning but I am trying to do it so I can learn the process of getting CUDA installed.

I have followed the steps on the installation guide for Ubuntu 18.04 (my specific distribution is Xubuntu).

My graphics card is a GeForce 845M, verified by lspci | grep nvidia:

01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)

I also have gcc 7.5 installed, verified by gcc --version

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And I have the correct headers installed, verified by trying to install them with sudo apt-get install linux-headers-$(uname -r):

Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).

I then followed the installation instructions using a local .deb for version 10.1.

Now, when I run nvidia-smi, I get:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 845M        On   | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0    N/A /  N/A |     88MiB /  2004MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       982      G   /usr/lib/xorg/Xorg                            87MiB |
+-----------------------------------------------------------------------------+

and I run nvcc -V I get:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

I then performed the post-installation instructions from section 6.1, and so as a result, echo $PATH looks like this:

/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

echo $LD_LIBRARY_PATH looks like this:

    /usr/local/cuda-10.1/lib64

and my /etc/udev/rules.d/40-vm-hotadd.rules file looks like this:

    # On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear
    ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
    ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
    GOTO="vm_hotadd_end"
    
    LABEL="vm_hotadd_apply"
    
    # Memory hotadd request
    
    # CPU hotadd request
    SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"
    
    LABEL="vm_hotadd_end"

After all of this, I even compiled and ran the samples. ./deviceQuery returns:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 845M"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2004 MBytes (2101870592 bytes)
  ( 4) Multiprocessors, (128) CUDA Cores/MP:     512 CUDA Cores
  GPU Max Clock rate:                            863 MHz (0.86 GHz)
  Memory Clock rate:                             1001 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

and ./bandwidthTest returns:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce 845M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         11.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         11.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         14.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

But after all of this, this Python snippet (in a conda environment with all dependencies installed):

    import torch
    torch.cuda.is_available()

returns False

Does anybody have any idea about how to resolve this? I've tried to add /usr/local/cuda-10.1/bin to etc/environment like this:

    PATH=$PATH:/usr/local/cuda-10.1/bin

And restarting the terminal, but that didn't fix it. I really don't know what else to try.

EDIT - Results of collect_env for @kHarshit

Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce 845M
Nvidia driver version: 418.87.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.18.5
[pip] pytorch-ranger==0.1.1
[pip] stylegan2-pytorch==0.12.0
[pip] torch==1.5.0
[pip] torch-optimizer==0.0.1a12
[pip] torchvision==0.6.0
[pip] vector-quantize-pytorch==0.0.2
[conda] numpy                     1.18.5                   pypi_0    pypi
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] stylegan2-pytorch         0.12.0                   pypi_0    pypi
[conda] torch                     1.5.0                    pypi_0    pypi
[conda] torch-optimizer           0.0.1a12                 pypi_0    pypi
[conda] torchvision               0.6.0                    pypi_0    pypi
[conda] vector-quantize-pytorch   0.0.2                    pypi_0    pypi

Can you post the result of [collect_env](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py) script? — kHarshit, Jun 13 '20 at 13:40
Pytorch doesn't use the system cuda when installed via pip or conda, it ships with its own copy of the cuda runtime and should work as long as the graphics card has compute capability >= 5.0 and the graphics driver supports the desired version of cuda. Did you install using `conda install pytorch torchvision cudatoolkit=10.1 -c pytorch`? It could be that you installed the CPU version of pytorch instead. — jodag, Jun 13 '20 at 13:54
@jodag I think that has fixed it! Would you mind elaborating a bit more on what exactly this does in an answer? I will accept it as soon as it's posted. — wfgeo, Jun 13 '20 at 14:25
Users might also find this question (and the answers) useful https://stackoverflow.com/questions/67122586/import-torch-oserror-winerror-127 — mrk, Dec 31 '21 at 21:44
@jodag both `nvidia-cuda-toolkit` (provided by Canonical PPA) and [NVIDIA](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04) worked. The trick was creating a new environment or forcefully reinstalling `pytorch`. — JP Ventura, Feb 17 '22 at 13:57

jodag · Accepted Answer · 2021-01-05T16:58:27.540

PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support.

There are two scenarios which could have caused your issue.

You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA.
You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1. The two potential fixes in this case would be to either install updated drivers (version >= 440.33 according to Table 2) or to install a version of PyTorch compiled against CUDA 10.1.

To determine the appropriate command to use when installing PyTorch you can use the handy widget in the "Install PyTorch" section at pytorch.org. Just select the appropriate operating system, package manager, and CUDA version then run the recommended command.

In your case one solution was to use

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

which explicitly specifies to conda that you want to install the version of PyTorch compiled against CUDA 10.1.

For more information about PyTorch CUDA compatibility with respect drivers and hardware see this answer.

Edit After you added the output of collect_env we can see that the problem was that you had the CUDA 10.2 version of PyTorch installed. Based on that an alternative solution would have been to update the graphics driver as elaborated in item 2 and the linked answer.

This solution works. Just note if you previously installed the cpu only version, make sure to remove the "cpuonly" package before you install the new cuda-enabled packages. It gave me a hard time before I noticed it. — Ahmed Ktob, Nov 06 '20 at 05:13
running `pip3 install pytorch torchvision cudatoolkit=10.1 -c pytorch` return `ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'pytorch`. `pip3 install pytorch torchvision cudatoolkit==10.1` also not work. — Muhammad Yasirroni, Jun 23 '21 at 05:21
Found solution: pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html [source](https://pytorch.org/get-started/previous-versions/) — Muhammad Yasirroni, Jun 23 '21 at 05:24
Case #1 for me. I also had to manually uninstall `pytorch`. Use `conda list` to make sure you're not using the `cpu` build. — Jeff Bezos, Feb 27 '23 at 21:55

score 5 · Answer 2 · answered Feb 17 '22 at 13:54

TL; DR

Install NVIDIA Toolkit provided by Canonical or NVIDIA third-party PPA.
Reboot your workstation.
Create a clean Python virtual environment (or reinstall all CUDA dependent packages).

Description

First install NVIDIA CUDA Toolkit provided by Canonical:

sudo apt install -y nvidia-cuda-toolkit

or follow NVIDIA developers instructions:

# ENVARS ADDED **ONLY FOR READABILITY**
NVIDIA_CUDA_PPA=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
NVIDIA_CUDA_PREFERENCES=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
NVIDIA_CUDA_PUBKEY=https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub

# Add NVIDIA Developers 3rd-Party PPA
sudo wget ${NVIDIA_CUDA_PREFERENCES} -O /etc/apt/preferences.d/nvidia-cuda
sudo apt-key adv --fetch-keys ${NVIDIA_CUDA_PUBKEY}
echo "deb ${NVIDIA_CUDA_PPA} /" | sudo tee /etc/apt/sources.list.d/nvidia-cuda.list

# Install development tools
sudo apt update
sudo apt install -y cuda

then reboot the OS load the kernel with the NVIDIA drivers

Create an environment using your favorite manager (conda, venv, etc)

conda create -n stack-overflow pytorch torchvision
conda activate stack-overflow

or reinstall pytorch and torchvision into the existing one:

conda activate stack-overflow
conda install --force-reinstall pytorch torchvision

otherwise NVIDIA CUDA C/C++ bindings may not be correctly detected.

Finally ensure CUDA is correctly detected:

(stack-overflow)$ python3 -c 'import torch; print(torch.cuda.is_available())'
True

Versions

NVIDIA CUDA Toolkit v11.6
Ubuntu LTS 20.04.x
Ubuntu LTS 22.04 (prior official release)

sudo apt install -y cuda made the difference for me. – Rexcirus Mar 17 '23 at 11:21 — Rexcirus, Mar 17 '23 at 11:21

score 1 · Answer 3 · answered Jan 15 '22 at 04:03

In my case, just restarting my machine made the GPU active again. The initial message I got was that the GPU is currently in use by another application. But when I looked at nvidia-smi, there was nothing that I saw. So, no changes to dependencies, and it just started working again.

score 1 · Answer 4 · answered Jun 29 '22 at 19:55

1

Another possible scenario is that environment variable CUDA_VISIBLE_DEVICES is not set correctly before installing PyTorch.

answered Jun 29 '22 at 19:55

Shunchi Zhang

49
7

score 1 · Answer 5 · answered Oct 23 '22 at 13:19

In my case it worked to do as follows:

remove the CUDA drivers

sudo apt-get remove --purge nvidia*

Then get the exact installation script of the drivers based on your distro and system from the link: https://developer.nvidia.com/cuda-downloads?target_os=Linux

In my case it was dabian on x64 so I did:

wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda

And now nvidia-smi works as intended!

I hope that helps

score 0 · Answer 6 · answered Sep 14 '22 at 12:03

If your CUDA version does not match what PyTorch expects, you will see this issue.

On Arch / Manjaro:

Get Pytorch from here: https://pytorch.org/get-started/locally/
Note what CUDA version you are getting PyTorch for
Get the same CUDA version from here: https://archive.archlinux.org/packages/c/cuda/
Install CUDA using (e.g.) sudo pacman -U --noconfirm cuda-11.6.2-1-x86_64.pkg.tar.zst

Do not update to a newer version of CUDA than PyTorch expects. If PyTorch wants 11.6 and you have updated to 11.7, you will get the error message.

score -1 · Answer 7 · answered Nov 04 '22 at 19:25

Make sure that os.environ['CUDA_VISIBLE_DEVICES'] = '0' is set after if __name__ == "__main__":. So your code should look like this:

import torch
import os

if __name__ == "__main__":
     os.environ['CUDA_VISIBLE_DEVICES'] = '0'
     print(torch.cuda.is_available()) // true
     ...

Pytorch says that CUDA is not available (on Ubuntu)

EDIT - Results of collect_env for @kHarshit

7 Answers7

TL; DR

Description

Versions

Linked

Related