系统信息

系统镜像：docker pull nvidia/cuda:12.4.1-cudnn-devel-rockylinux8
python 版本：3.10
pytorch 版本：2.6.0+cu124

源码编译安装

# 下载源码
git clone -b v2.8.3 https://github.com/Dao-AILab/flash-attention.git
cd ./flash-attention
git submodule update --init --recursive

# 安装依赖
yum install -y gcc-toolset-11
pip install ninja
source /opt/rh/gcc-toolset-11/enable

# 编译安装
MAX_JOBS=4 \
  NVCC_THREADS=2 \
  FLASH_ATTENTION_FORCE_BUILD=TRUE \
  pip wheel \
  -v \
  --no-build-isolation \
  --no-cache-dir \
  --no-deps \
  .
pip install ./flash_attn-*.whl

单测（可选）

注意：所有单测执行下来一般会有 <100 个测例失败，属于正常情况，只要不影响所使用模型的训练、推理精度，可以忽略未通过的测例。

pip install pytest
pytest -svv tests/test_flash_attn.py