Research

image

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit

Nick Jiang*, Lily Sun*, Lisa Dunlap, Lewis Smith, Neel Nanda

NeurIPS Mech Interp Workshop 2025 (Spotlight)

paper | blog | summary

TLDR: we show that sparse autoencoders are a versatile tool for textual data analysis.

image

Vision Transformers Don’t Need Trained Registers

Nick Jiang*, Amil Dravid*, Alexei Efros, Yossi Gandelsman

NeurIPS 2025 (Spotlight - top 3% of submissions)

paper | code | summary

TLDR: we find a sparse mechanism that cause attention sinks, then remove them at test time.

image

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Nick Jiang*, Anish Kachinthaya*, Suzie Petryk, Yossi Gandelsman

ICLR 2025

paper | code | summary

TLDR: we use logit lens to identify and remove hallucinations from vision-language models.