Research

image

Towards Data-centric Interpretability with Sparse Autoencoders

Nick Jiang*, Lily Sun*, Lewis Smith, Neel Nanda

blog | summary

image

Vision Transformers Don’t Need Trained Registers

Nick Jiang*, Amil Dravid*, Alexei Efros, Yossi Gandelsman

NeurIPS 2025 (Spotlight - top 3% of submissions)

paper | code | summary

TLDR: we mechanistically study how attention sinks emerge, then remove them at test time.

image

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Nick Jiang*, Anish Kachinthaya*, Suzie Petryk, Yossi Gandelsman

ICLR 2025

paper | code | summary

TLDR: we use logit lens to identify and remove hallucinations from vision-language models.