项目描述
MiniCPM-o is an end-side multimodal LLM capable of processing images, video, text, and audio inputs, providing high-quality text and speech outputs. It supports real-time speech conversation, emotion/speed/style control, and multimodal live streaming on devices like iPad.
Project Information
Created on 1/29/2024
Updated on 7/2/2025
Categories
speech-technology
ai-content-generation
conversational-assistant
Tags
ready-to-use
model-deployment
multimodal
real-time-processing
open-source-community
Topics
minicpm
minicpm-v
multi-modal