We're looking for researchers and engineers to join our Interpretability team at Anthropic, working on mechanistic interpretability of neural networks to make them safe. Responsibilities include developing methods for understanding LLMs, designing and running robust experiments, and creating and analyzing new interpretability features and circuits.

Requirements

Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
Design and run robust experiments, both quickly in toy scenarios and at scale in large models
Create and analyze new interpretability features and circuits to better understand how models work
Build infrastructure for running experiments and visualizing results
Work with colleagues to communicate results internally and publicly

Benefits

Competitive compensation
Benefits
Optional equity donation matching
Generous vacation
Generous parental leave
Flexible working hours

Requirements

Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
Design and run robust experiments, both quickly in toy scenarios and at scale in large models
Create and analyze new interpretability features and circuits to better understand how models work
Build infrastructure for running experiments and visualizing results
Work with colleagues to communicate results internally and publicly

Benefits

Competitive compensation
Benefits
Optional equity donation matching
Generous vacation
Generous parental leave
Flexible working hours

Research Scientist, Interpretability

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Research Scientist, Interpretability

Research Engineer, Interpretability

Research Engineer / Scientist, Societal Impacts

Research Scientist, Interpretability

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Research Scientist, Interpretability

Research Engineer, Interpretability

Research Engineer / Scientist, Societal Impacts

Job Details

About Anthropic