系统信息
- 系统镜像:
docker pull nvidia/cuda:12.4.1-cudnn-devel-rockylinux8
源码编译
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# 编译 CUDA 版本
# 请根据实际修改 CMAKE_CUDA_ARCHITECTURES,此处 CMAKE_CUDA_ARCHITECTURES="80" 对应 A100 GPU
# 参考:https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#1-take-note-of-the-compute-capability-of-your-nvidia-devices-cuda-your-gpu-compute--capability
cmake -B build \
-DBUILD_SHARED_LIBS=OFF \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES="80" \
-DGGML_CUDA_F16=ON \
-DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --config Release -j 8
参考资料
- https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md