Multimodal ML Researcher

Google DeepMind·Mountain View, CA

ResearchSeniorFull-time
$230K-$400KPosted 2 months ago

About the Role

Google DeepMind seeks a Multimodal ML Researcher to advance models that understand and generate across text, image, video, and audio modalities. You will develop new architectures and training methods for the Gemini model family.

Requirements

  • PhD in CS or ML with multimodal focus
  • Strong publication record at top venues
  • Experience with vision-language models
  • Deep expertise in PyTorch or JAX
  • Experience with large-scale model training

Nice to Have

  • Experience with video understanding
  • Background in audio or speech processing
  • Experience with cross-modal retrieval
  • Knowledge of embodied AI or robotics

Benefits

Google-level total compensation
World-class research environment
Unlimited compute for research
Conference travel budget
20% personal project time
Relocation package

Skills

Multimodal MLVision-LanguageJAXResearchDeep LearningPython

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com