deepspeed zero3 + llamafactory 保存checkpoint后第一step 就 OOM
deepspeed zero3 llamafactory 保存checkpoint后第一step 就 OOM
4张16g显卡 训练14b模型
{"train_batch_size": "auto","train_micro_batch_size_per_gpu": "auto","gradient_accumulation_steps": "auto",&quo…
2026/6/16 15:35:06