Interpretability Researcher

Anthropic·San Francisco, CA

AI SafetySeniorFull-timeRemote

$220K-$380KPosted 1 months ago

About the Role

Anthropic is seeking an Interpretability Researcher to understand how neural networks represent and process information. You will develop new techniques for mechanistic interpretability, analyze model internals, and contribute to making AI systems more transparent and trustworthy.

Requirements

PhD in ML, neuroscience, physics, or mathematics
Research experience in interpretability or related fields
Strong mathematical foundations
Proficiency in Python and PyTorch/JAX
Ability to communicate complex ideas clearly

Nice to Have

Published work in mechanistic interpretability
Experience with sparse autoencoders or probing methods
Background in neuroscience or cognitive science
Experience with visualization tools for neural networks

Benefits

Top-of-market equity

Premium healthcare

Remote work option

Generous research budget

Conference and education stipend

Sabbatical program

Skills

InterpretabilityMechanistic InterpretabilityPyTorchResearchMathematicsPython

Apply for this Position

Related Jobs

AI Safety Researcher

Anthropic · San Francisco, CA · Remote

$200K-$350KAI Safety

ML Engineer — Constitutional AI

Anthropic · San Francisco, CA · Remote

$220K-$370KMachine Learning

AI Safety Lead

Google DeepMind · London, UK

£180K-£300KAI Safety

Front-End Engineer — Claude Products

Anthropic · San Francisco, CA · Remote

$180K-$300KAI Engineering

Preparing for Your AI Career?

Vincony has all 400+ AI models in one place — compare responses, AI debate, Image/Video/Voice generator, and 20 more tools to help you learn and build with AI.

Visit Vincony.com