Titan AI LogoTitan AI

moshivis

202
25
Python

项目描述

MoshiVis is a Vision Speech Model that enables natural conversations about images while maintaining low latency. It builds on the Moshi speech-text model and adds visual discussion capabilities with a cross-attention mechanism.

Project Information

Created on 3/20/2025
Updated on 7/1/2025

Categories

speech-technology
ai-content-generation
conversational-assistant

Tags

algorithm-model
model-deployment
open-source-community
multimodal
real-time-processing