Anna Hedström

Hi! I’m a Postdoctoral Fellow at the ETH AI Center, supervised by Prof. Dr. Menna El‑Assady (IVIA Lab) and Prof. Dr. Andreas Krause (LAS group) at ETH Zürich. My research aims to advance AI safety through exploring the intersection of evaluation-centric interpretability and alignment of large language models (LLMs). I'm interested in developing principled methods that turn mechanistic insights of model internals into signals for steering and post-training control.

I completed a Ph.D. in Machine Learning at TU Berlin with distinction, advised by Prof. Dr. Marina Höhne and Prof. Dr. Wojciech Samek. I hold an M.Sc. from KTH and a B.Sc. from UCL.

Previously, I held multiple ML roles across industry; most recently, I joined the AI Research Programme at J.P. Morgan working on mechanistic steering of LLMs. Before my Ph.D., I freelanced in ML, worked with credit risk at Klarna, time‑series modeling at Bosch, and interned at Black Swan Data and BCG. I like to advise and support startups on AI/ML and contribute to open-source software (e.g., Quantus).

📍 I'm currently based in Zürich, Switzerland.

✉️ Email: hedstroem.anna@gmail.com

Selected Research

Full list: Google Scholar

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

Hedström A., Amoukou S. I., Bewley T., Mishra S., Veloso M.

ICML, 2025.

Paper Code

BibTeX

@inproceedings{anna2025abstention,
  title = {To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models},
  author = {\textbf{Hedstr{\"o}m}, Anna and Amoukou, Salim and Bewley, Tom and Mishra, Saumitra and Veloso, Manuela},
  booktitle={Forty-Second International Conference on Machine Learning (ICML)!},
  year={2025},
}

MERA

(Survey certification!) Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions

Hedström A., Bommer P. L., Burns T. F., Lapuschkin S., Samek W., Höhne M.

TMLR, 2025.

Paper Code

BibTeX

@inproceedings{
gef2024,
title={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions}},
author={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{\textbf{(Survey Certification!)}} \textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Tom, Burns and Lapuschkin, Sebastian and Samek, Wojciech  and H{\"o}hne, Marina M-C},
booktitle={Transactions on Machine Learning Research},
year={2025},
}

GEF

CoSy: Evaluating Textual Explanations of Neurons

Kopf L., Bommer P. L., Hedström A., Lapuschkin S., Höhne M.

NeurIPS, 2024.

Paper Code

BibTeX

@inproceedings{
kopf2024cosy,
title={\href{https://openreview.net/pdf?id=R0bnWrpIeN}{CoSy: Evaluating Textual Explanations of Neurons}},
author={Kopf, Laura and Bommer, Philine Lou and \textbf{Hedstr{\"o}m}, Anna and Lapuschkin, Sebastian, and H{\"o}hne, Marina M-C},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
}

Cosy"

Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond

Bareeva D., Yolcu G. U., Hedström A., Schmolenski N., Wiegand T., Samek W., Lapuschkin S.

NeurIPS Workshop on Attributing Model Behavior at Scale, 2024.

Paper Code

BibTeX

@inproceedings{
quanda2024,
title={\href{https://openreview.net/pdf?id=IFk4bOA11Z}{Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond}},
  author={Bareeva, Dilyara and Yolcu, Galip Umit and \textbf{Hedström}, Anna and Schmolenski, Niklas and Wiegand, Thomas and Samek, Wojciech and Lapuschkin, Sebastian},
booktitle={Second NeurIPS Workshop on Attributing Model Behavior at Scale},
year={2024},
}

Quanda"

Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond

Hedström A., Weber L., Krakowczyk D., Bareeva D., Motzkus F., Samek W., Lapuschkin S., Höhne M.

JMLR, 2023.

Paper Code

BibTeX

@inproceedings{hedstrom2023quantus,
  title={\href{https://www.jmlr.org/papers/v24/22-0142.html}{Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond}},
  author={\textbf{Hedstr{\"o}m}, Anna and Weber, Leander and Krakowczyk, Daniel and Bareeva, Dilyara and Motzkus, Franz and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina M-C},
  booktitle={Journal of Machine Learning Research},
  volume={24},
  number={34},
  pages={1--11},
  year={2023},
}

The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus

Hedström A., Bommer P. L., Wickström K. K., Samek W., Lapuschkin S., Höhne M.

TMLR, 2023.

Paper Code

BibTeX

@inproceedings{hedstrommeta,
  title={\href{https://openreview.net/pdf?id=j3FK00HyfU}{The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus}},
  author={\textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Wickstr{\"o}m, Kristoffer Knutsen and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina MC},
  booktitle={Transactions on Machine Learning Research},
  year={2023},
}

/div>

(Spotlight!) Tutorial: Quantus x Climate — Applying Explainable AI Evaluation in Climate Science

Bommer P. L.*, Hedström A.*, Kretschmer M., Höhne M. M.-C.

ICLR Workshop on Climate Change AI, 2023.

Paper Code

BibTeX

@inproceedings{bommer2023tutorial,
  title={\href{https://www.climatechange.ai/papers/iclr2023/1}{Tutorial: Quantus x Climate - Applying explainable AI evaluation in climate science}},
  author={\href{https://www.climatechange.ai/papers/iclr2023/1}{\textbf{(Spotlight!)}} Bommer, Philine L and \textbf{Hedstr{\"o}m}, Anna and Kretschmer, Marlene and Höhne, Marina M.-C.},
  booktitle={ICLR Workshop on Tackling Climate Change with Machine Learning},
  year={2023},
}

News

Oct 2025 — Gave a talk at Stanford Engima Project about ICML 2025 paper (Virtual)

Oct 2025 — Pitched to Daniel Ek (Spotify founder) about my research! (Zurich, CH)

Sep 2025 — Started Postdoctoral Fellow at ETH AI Center on AI safety (Zurich, CH)

Aug 2025 — Defended Ph.D. thesis in Machine Learning Interpretability at TU Berlin, with distinction!

July 2025 — Quantus community reached 60,000 downloads and 600+ stars on GitHub!

May 2025 — Paper on LLM steering accepted at ICML 2025 (Vancouver, CA)

Jan 2025 — Paper on geometric and unified evaluation awarded a survey certification by TMLR!

Dec 2024 — Paper on adversarial attacks accepted at NeurIPS Workshop Interpretable AI (New Orleans, US)

Sep 2024 — Started AI Research Programme at J.P. Morgan (London, UK)

May 2024 — Gave a talk on LLM x interpretability at United Nations' AI for Good Global summit (Geneva, CH)

Feb 2024 — Gave a keynote lectures series in XAI AI Invicta School of Artificial Intelligence (Porto, PT)

Feb 2024 — Gave a webinar in applying XAI in climate science at Climate Change AI (Virtual)

Dec 2023 — Presented Quantus in NeurIPS poster sessions (New Orleans, US)

Dec 2023 — Presented eMPRT & sMPRT at NeurIPS XAI workshop (New Orleans, US)

Jun 2023 — Started as Visiting Scientist at Fraunhofer AI Department (Berlin, DE)

Sep 2023 — Gave a talk at at SFI Visual Intelligence (Virtual)

May 2023 — Gave a spotlight tutorial at ICLR Climate Change AI (Kigali, RW)

Apr 2023 — Gave a talk at Physikalisch-Technische Bundesanstalt (PTB) (Berlin, DE)

Mar 2023 — Gave a lecture at SFB 1294 Spring School on Data Assimilation (Virtual)

Jan 2023 — Gave a tutorial at NLDL Deep Learning Conference winter school (Tromsø, NO)