Interpretability Researcher

Anthropic·San Francisco, CA

AI SafetySeniorFull-timeRemote
$220K-$380KPosted 1 months ago

About the Role

Anthropic is seeking an Interpretability Researcher to understand how neural networks represent and process information. You will develop new techniques for mechanistic interpretability, analyze model internals, and contribute to making AI systems more transparent and trustworthy.

Requirements

  • PhD in ML, neuroscience, physics, or mathematics
  • Research experience in interpretability or related fields
  • Strong mathematical foundations
  • Proficiency in Python and PyTorch/JAX
  • Ability to communicate complex ideas clearly

Nice to Have

  • Published work in mechanistic interpretability
  • Experience with sparse autoencoders or probing methods
  • Background in neuroscience or cognitive science
  • Experience with visualization tools for neural networks

Benefits

Top-of-market equity
Premium healthcare
Remote work option
Generous research budget
Conference and education stipend
Sabbatical program

Skills

InterpretabilityMechanistic InterpretabilityPyTorchResearchMathematicsPython

Related Jobs

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com