llama.cpp 编译

系统信息

  • 系统镜像:docker pull nvidia/cuda:12.4.1-cudnn-devel-rockylinux8

源码编译

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# 编译 CUDA 版本
# 请根据实际修改 CMAKE_CUDA_ARCHITECTURES,此处 CMAKE_CUDA_ARCHITECTURES="80" 对应 A100 GPU
# 参考:https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#1-take-note-of-the-compute-capability-of-your-nvidia-devices-cuda-your-gpu-compute--capability
cmake -B build \
    -DBUILD_SHARED_LIBS=OFF \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES="80" \
    -DGGML_CUDA_F16=ON \
    -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --config Release -j 8

参考资料

  1. https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md
Comment