VisionSeptember 20, 2023OpenAI

Improving Image Generation with Better Captions

James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo

Abstract

We study how image generation models can be improved by training on better image captions. We develop an automatic captioning pipeline that generates highly descriptive image captions. Training text-to-image models on these improved captions substantially improves the quality and prompt-following ability of the resulting models, which we call DALL-E 3.

Key Findings

1Showed that better training captions dramatically improve image generation quality
2Developed an automatic captioning pipeline for training data improvement
3Achieved significantly better prompt-following compared to DALL-E 2
4Demonstrated improved text rendering in generated images
5Integrated natively with ChatGPT for conversational image generation

Impact & Significance

DALL-E 3 integrated text-to-image generation directly into ChatGPT, making AI image creation accessible to millions. Its focus on caption quality influenced how subsequent models approach training data curation.

Related Tools

Dall E Chatgpt

Read Full Paper

Improving Image Generation with Better Captions

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku