An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.