查看 Driver 版本
执行 nvidia-smi
命令可以查看主机上安装的 NVIDIA 驱动版本,如下图当前主机的驱动版本为 470.141.03
。
+------------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.8 |
|--------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|================================+======================+======================|
查看 CUDA 版本
可以通过 nvcc -V
命令查看,如下图当前主机的 CUDA 版本为 12.1
。
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
Driver 和 CUDA 版本兼容性
跨次要版本的兼容
NVIDIA 在 CUDA 11.0
版本后支持跨小版本号的向前兼容,无需任何额外的配置。对于任意支持 CUDA >= 11.0
版本的主机驱动(驱动版本 >= 450.80.02
),均可兼容所有 CUDA 11.x
版本的应用;同理,对于任意支持 CUDA >= 12.0
版本的驱动(驱动版本 >= 525.60.13
),均可兼容所有 CUDA 12.x
版本的应用。
跨任意版本的向后兼容(backward compatible)
执行 nvidia-smi
命令显示在驱动版本旁边的 CUDA Version
为当前驱动版本支持的最高 CUDA 版本,比如上面主机驱动 470.141.03
支持的最高 CUDA 版本为 11.8
,所有 CUDA 版本都是向后兼容的,即所有 <= 11.x
CUDA 版本的应用都可以正常运行(结合上面的“跨次要版本的兼容”)。
跨主要版本的向前兼容(forward compatible)
跨大版本向前兼容是指最高支持 CUDA 11.x
版本的主机驱动兼容 CUDA 12.x
的应用,仅支持 NVIDIA Data Center GPUs,比如:V100、A100、H100。
高于当前主机驱动支持的 CUDA 版本且跨大版本号的应用是无法正常运行的,假设当前主机支持的最高 CUDA 版本为 11.8
,则 12.x
版本的 CUDA 应用在机器上运行会报错,比如下面 pytorch 的报错提示 The NVIDIA driver on your system is too old
,
(cu121) [root@VM workspace]$ python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/root/miniconda3/envs/cu121/lib/python3.10/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1720538459595/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
>>>
为了避免频繁升级主机驱动(需要重启机器),NVIDIA 在 CUDA 10.0
版本后支持跨大版本的向前兼容,通过安装 cuda-compat
包,可以在较低的驱动版本上运行较新 CUDA 版本的应用。cuda-compat
包的默认安装路径为 /usr/local/cuda-xx.x/compat/
,可以通过配置 LD_LIBRARY_PATH
环境变量来使用。比如,通过执行 export LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:"$LD_LIBRARY_PATH"
,即可在 470.141.03
驱动版本下正常运行 CUDA 12.x
版本的应用(结合上面的“跨次要版本的兼容”)。
(cu121) [root@VM workspace]$ rpm -ivh https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-compat-12-1-530.30.02-1.x86_64.rpm
(cu121) [root@VM workspace]$ ll /usr/local/cuda-12.1/compat/
total 144692
lrwxrwxrwx 1 root root 28 Feb 22 2023 libcudadebugger.so.1 -> libcudadebugger.so.530.30.02
-rwxr-xr-x 1 root root 10488824 Feb 22 2023 libcudadebugger.so.530.30.02
lrwxrwxrwx 1 root root 12 Feb 22 2023 libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root 20 Feb 22 2023 libcuda.so.1 -> libcuda.so.530.30.02
-rwxr-xr-x 1 root root 29900840 Feb 22 2023 libcuda.so.530.30.02
lrwxrwxrwx 1 root root 27 Feb 22 2023 libnvidia-nvvm.so -> libnvidia-nvvm.so.530.30.02
lrwxrwxrwx 1 root root 27 Feb 22 2023 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.530.30.02
-rwxr-xr-x 1 root root 85979712 Feb 22 2023 libnvidia-nvvm.so.530.30.02
lrwxrwxrwx 1 root root 37 Feb 22 2023 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.530.30.02
-rwxr-xr-x 1 root root 21784224 Feb 22 2023 libnvidia-ptxjitcompiler.so.530.30.02
(cu121) [root@VM workspace]$ LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:$LD_LIBRARY_PATH python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.version.cuda
'12.1'
>>> torch.cuda.is_available()
True
>>>
(cu124) [root@VM workspace]$ LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:$LD_LIBRARY_PATH python
Python 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.version.cuda
'12.4'
>>> torch.cuda.is_available()
True
>>>
(cu121) [root@VM workspace]$ LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:$LD_LIBRARY_PATH nvidia-smi | head -n8
+------------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 12.1 |
|--------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|================================+======================+======================|