Backend Engineer — Model Inference

Cohere·Toronto, Canada

AI EngineeringSeniorFull-timeRemote
CA$160K-CA$250KPosted 3 weeks ago

About the Role

Cohere is hiring a Backend Engineer to optimize model inference systems. You will build low-latency serving infrastructure, implement batching strategies, and develop the backend services that power Cohere's enterprise API.

Requirements

  • 5+ years of backend engineering experience
  • Strong proficiency in Go, Rust, or C++
  • Experience with high-throughput, low-latency systems
  • Knowledge of model serving and inference optimization
  • Experience with gRPC, REST APIs, and microservices

Nice to Have

  • Experience with vLLM, TensorRT-LLM, or similar
  • Background in distributed systems
  • Familiarity with GPU memory management
  • Experience with continuous batching techniques

Benefits

Equity in a well-funded AI startup
Premium health and dental
Remote-first culture
Home office budget
Annual learning stipend
Flexible PTO

Skills

GoRustModel ServingBackend EngineeringgRPCInference Optimization

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com