Backend Engineer — Model Inference

Cohere·Toronto, Canada

AI EngineeringSeniorFull-timeRemote

CA$160K-CA$250KPosted 3 weeks ago

About the Role

Cohere is hiring a Backend Engineer to optimize model inference systems. You will build low-latency serving infrastructure, implement batching strategies, and develop the backend services that power Cohere's enterprise API.

Requirements

5+ years of backend engineering experience
Strong proficiency in Go, Rust, or C++
Experience with high-throughput, low-latency systems
Knowledge of model serving and inference optimization
Experience with gRPC, REST APIs, and microservices

Nice to Have

Experience with vLLM, TensorRT-LLM, or similar
Background in distributed systems
Familiarity with GPU memory management
Experience with continuous batching techniques

Benefits

Equity in a well-funded AI startup

Premium health and dental

Remote-first culture

Home office budget

Annual learning stipend

Flexible PTO

Skills

GoRustModel ServingBackend EngineeringgRPCInference Optimization

Apply for this Position

Related Jobs

Prompt Engineer

xAI · Palo Alto, CA

$150K-$250KAI Engineering

Data Scientist — Enterprise AI

Cohere · Toronto, Canada · Remote

CA$140K-CA$220KData Science

Open Source ML Engineer

Hugging Face · New York, NY · Remote

$180K-$280KAI Engineering

CUDA Engineer — AI Frameworks

NVIDIA · Santa Clara, CA

$200K-$350KAI Engineering

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com