ML Engineer — Inference Optimization

Hugging Face·Paris, France

AI EngineeringSeniorFull-timeRemote
€120K-€200KPosted 2 months ago

About the Role

Hugging Face is hiring an ML Engineer to optimize inference performance across our model hub. You will work on model optimization techniques, develop efficient serving infrastructure, and build tools that make running ML models faster and cheaper for the community.

Requirements

  • 4+ years of experience in ML engineering or systems optimization
  • Deep understanding of model inference and optimization techniques
  • Proficiency in Python, C++, and CUDA
  • Experience with model quantization, distillation, or pruning
  • Familiarity with inference frameworks (ONNX Runtime, TensorRT, vLLM)

Nice to Have

  • Contributions to Hugging Face Optimum or TGI
  • Experience with hardware-specific optimizations (ARM, Apple Silicon)
  • Background in compiler optimization
  • Experience with serverless inference platforms

Benefits

Equity package
Fully remote
French/EU social benefits
Home office budget
Conference budget
4-day work week option

Skills

Inference OptimizationPythonC++CUDAQuantizationModel Serving

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com