Multimodal ML Researcher

ResearchSeniorFull-time

$230K-$400KPosted 2 months ago

About the Role

Google DeepMind seeks a Multimodal ML Researcher to advance models that understand and generate across text, image, video, and audio modalities. You will develop new architectures and training methods for the Gemini model family.

Requirements

PhD in CS or ML with multimodal focus
Strong publication record at top venues
Experience with vision-language models
Deep expertise in PyTorch or JAX
Experience with large-scale model training

Nice to Have

Experience with video understanding
Background in audio or speech processing
Experience with cross-modal retrieval
Knowledge of embodied AI or robotics

Benefits

Google-level total compensation

World-class research environment

Unlimited compute for research

Conference travel budget

20% personal project time

Relocation package

Skills

Multimodal MLVision-LanguageJAXResearchDeep LearningPython

Apply for this Position

Related Jobs

Research Scientist — Foundation Models

Google DeepMind · Mountain View, CA

$220K-$380KResearch

Generative Models Researcher

Stability AI · London, UK · Remote

£130K-£210KResearch

AI Safety Lead

Google DeepMind · London, UK

£180K-£300KAI Safety

AI Research Intern — Summer 2026

Meta · New York, NY

$10K-$12K/monthResearch

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com