ML Engineer — Evaluation

Scale AI·San Francisco, CA

Machine LearningMidFull-time
$190K-$300KPosted 3 weeks ago

About the Role

Scale AI is seeking an ML Engineer to build automated evaluation systems for large language models. You will develop benchmarks, create model comparison tools, and design metrics that accurately measure model capabilities across diverse tasks.

Requirements

  • 3+ years of ML engineering experience
  • Strong skills in Python and statistical analysis
  • Experience with LLM evaluation methodologies
  • Understanding of NLP benchmarks and metrics
  • Strong software engineering fundamentals

Nice to Have

  • Experience building leaderboards or benchmark platforms
  • Background in psychometrics or test design
  • Familiarity with Scale AI's evaluation platform
  • Experience with multi-turn evaluation frameworks

Benefits

Equity in a fast-growing AI company
Full benefits package
In-office perks (meals, gym)
Professional development budget
Annual team retreats
Commuter benefits

Skills

PythonLLM EvaluationNLPBenchmarkingStatistical AnalysisML Engineering

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com