Titan AI LogoTitan AI

describe-anything

1,202
70
Python

项目描述

Describe Anything is a multimodal AI model that generates detailed localized captions for images and videos based on user-provided regions. It includes a benchmark for evaluating model performance on this task.

项目信息

创建于 4/4/2025
更新于 7/2/2025

分类

image-processing
ai-content-generation
machine-learning-framework

标签

algorithm-model
data-processing
open-source-community
multimodal
research-frontier

主题

describe-anything
detailed-localized-captioning
large-multimodal-models
vision-language-model