Titan AI LogoTitan AI

flex-nano-vllm

278
14
Python

Project Description

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

Project Information

Created on 8/6/2025
Updated on 9/26/2025