监控清单(Prometheus + Grafana)
指标名(vLLM)含义告警阈值vllm:time_to_first_token_secondsTTFTp95 > SLO1.5 持续 3minvllm:time_per_output_token_secondsTPOTp95 > 50msvllm:num_requests_running在跑请求< max-num-seqs 0.9 时可扩量vllm:num_requests_waiting队列持续 …
2026/7/6 2:40:56