Titan AI LogoTitan AI

moshivis

202
25
Python

项目描述

MoshiVis is a Vision Speech Model that enables natural conversations about images while maintaining low latency. It builds on the Moshi speech-text model and adds visual discussion capabilities with a cross-attention mechanism.

项目信息

创建于 3/20/2025
更新于 7/1/2025

分类

conversational-assistant
speech-technology
ai-content-generation

标签

algorithm-model
model-deployment
open-source-community
multimodal
real-time-processing