Titan AI LogoTitan AI

vllm

51,225
8,449
Python

项目描述

vLLM is a high-throughput, memory-efficient inference and serving engine designed for Large Language Models (LLMs), offering optimized execution, multimodal support, and ease of use for model serving.

Project Information

Created on 2/9/2023
Updated on 7/2/2025

Categories

ai-content-generation
machine-learning-framework
text-processing

Tags

model-deployment
data-processing
open-source-community
cloud-native
research-frontier

Topics

amd
transformer
mlops
tpu
inferentia
rocm
qwen
inference
hpu
llm-serving
deepseek
model-serving
llama
cuda
gpt
pytorch
xpu
trainium
llmops
llm