ML Engineer — Inference Optimization

AI EngineeringSeniorFull-timeRemote

€120K-€200KPosted 2 months ago

About the Role

Hugging Face is hiring an ML Engineer to optimize inference performance across our model hub. You will work on model optimization techniques, develop efficient serving infrastructure, and build tools that make running ML models faster and cheaper for the community.

Requirements

4+ years of experience in ML engineering or systems optimization
Deep understanding of model inference and optimization techniques
Proficiency in Python, C++, and CUDA
Experience with model quantization, distillation, or pruning
Familiarity with inference frameworks (ONNX Runtime, TensorRT, vLLM)

Nice to Have

Contributions to Hugging Face Optimum or TGI
Experience with hardware-specific optimizations (ARM, Apple Silicon)
Background in compiler optimization
Experience with serverless inference platforms

Benefits

Equity package

Fully remote

French/EU social benefits

Home office budget

Conference budget

4-day work week option

Skills

Inference OptimizationPythonC++CUDAQuantizationModel Serving

Apply for this Position

Related Jobs

Prompt Engineer

xAI · Palo Alto, CA

$150K-$250KAI Engineering

Open Source ML Engineer

Hugging Face · New York, NY · Remote

$180K-$280KAI Engineering

CUDA Engineer — AI Frameworks

NVIDIA · Santa Clara, CA

$200K-$350KAI Engineering

Developer Advocate — ML

Hugging Face · Remote · Remote

$140K-$220KSales & Marketing

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com