Data Engineer — Training Data

OpenAI·San Francisco, CA

Data ScienceSeniorFull-time
$190K-$320KPosted 1 months ago

About the Role

OpenAI is hiring a Data Engineer to build and maintain the data pipelines that power our model training. You will develop systems for data collection, cleaning, deduplication, and quality assessment at massive scale.

Requirements

  • 5+ years of data engineering experience
  • Strong skills in Python, SQL, and distributed data processing
  • Experience with Apache Spark, Beam, or similar frameworks
  • Knowledge of data quality and governance
  • Experience with petabyte-scale data systems

Nice to Have

  • Experience with training data for language models
  • Background in web crawling and text extraction
  • Familiarity with data deduplication techniques
  • Experience with data labeling and annotation platforms

Benefits

Competitive equity
Full health benefits
Unlimited PTO
Learning budget
Free meals
401(k) match

Skills

Data EngineeringPythonSQLSparkData PipelinesLarge-Scale Systems

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com