VisionSeptember 20, 2023OpenAI

Improving Image Generation with Better Captions

James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo

Abstract

We study how image generation models can be improved by training on better image captions. We develop an automatic captioning pipeline that generates highly descriptive image captions. Training text-to-image models on these improved captions substantially improves the quality and prompt-following ability of the resulting models, which we call DALL-E 3.

Key Findings

  • 1Showed that better training captions dramatically improve image generation quality
  • 2Developed an automatic captioning pipeline for training data improvement
  • 3Achieved significantly better prompt-following compared to DALL-E 2
  • 4Demonstrated improved text rendering in generated images
  • 5Integrated natively with ChatGPT for conversational image generation

Impact & Significance

DALL-E 3 integrated text-to-image generation directly into ChatGPT, making AI image creation accessible to millions. Its focus on caption quality influenced how subsequent models approach training data curation.

Related Tools

Read Full Paper