Home
AI Rankings
Quick Deploy
Scenes
Hidden Gems

Home
Projects
flex-nano-vllm

flex-nano-vllm

278

14

Python

Project Description

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

Project Information

Created on 8/6/2025

Updated on 9/26/2025

Large Language Models

Open Source LLMs
AI Assistant
Code Generation
Multimodal AI
Browse Categories

Retrieval Augmented Generation

RAG Retrieval
Vector Search
Semantic Search
Knowledge Graph
Document Q&A

AI Content Generation

Image Generation
Audio Processing
Video Generation
3D Modeling
Art Creation

AI Agents

Agent Framework
Workflow Automation
Multi-Agent Systems
Task Planning
Decision Intelligence

Conversational AI

AI Assistant
Chatbot
Customer Service
Multimodal Chat
Voice Assistant

Data Analysis

Data Mining
Predictive Analytics
Data Visualization
AutoML
Feature Engineering

Development Tools

Code Generation
Code Completion
Code Review
Test Automation
DevOps

Deployment & Operations

Model Deployment
Containerization
Edge Computing
Monitoring
Performance Optimization

Quick Navigation

Today's Featured
AI Rankings
Recent Updates
Hidden Gems
Quick Deploy
Scenarios
Favorites
Search Projects

Technical Guides

Building RAG Applications: From Basics to Production
Mastering AI Agents: Architecture and Implementation
MCP vs RAG: Understanding Their Applications and Differences in LLMs
Sesame AI CSM: Next-Gen Speech Synthesis
Top 10 Open Source AI Libraries in 2025: Unlocking the Power of AI for All
Top 10 Open-Source Large Language Models (LLMs) in 2025: Complete Guide for Developers and Enterprises
RAG Meaning AI Deep Dive: Understanding the Power of Retrieval-Augmented Generation

About Us

About Project
GitHub AI
Privacy Policy
Terms of Service

© 2025 Titan AI Explore. All rights reserved.

Curated premium AI open source projects worldwide, helping you find suitable technical solutions