Portrait of Anna Hedström

Hi! I’m a Postdoctoral Fellow at the ETH AI Center, supervised by Prof. Dr. Menna El‑Assady (IVIA Lab) and Prof. Dr. Andreas Krause (LAS group) at ETH Zürich. My research focuses on evaluation-centric interpretability, large language model (LLM) alignment and AI safety.

I completed a Ph.D. in Machine Learning at TU Berlin with distinction, advised by Prof. Dr. Marina Höhne and Prof. Dr. Wojciech Samek. I hold an M.Sc. from KTH and a B.Sc. from UCL.

Previously, I held multiple ML roles across industry; most recently, I joined the AI Research Programme at J.P. Morgan working on mechanistic steering of LLMs. Before my Ph.D., I freelanced in ML, worked with credit risk at Klarna, time‑series modeling at Bosch, and interned at Black Swan Data and BCG. I advise startups on AI and like to contribute to open‑source software (e.g., Quantus).

📍 I'm currently based in Zürich, Switzerland.

✉️ Email: hedstroem.anna@gmail.com

Selected Research

Full list: Google Scholar

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Hedström A., Amoukou S. I., Bewley T., Mishra S., Veloso M.
ICML, 2025.
Paper Code
BibTeX
@inproceedings{anna2025abstention,
  title = {To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models},
  author = {\textbf{Hedstr{\"o}m}, Anna and Amoukou, Salim and Bewley, Tom and Mishra, Saumitra and Veloso, Manuela},
  booktitle={Forty-Second International Conference on Machine Learning (ICML)!},
  year={2025},
}

MERA

(Survey certification!) Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions
Hedström A., Bommer P. L., Burns T. F., Lapuschkin S., Samek W., Höhne M.
TMLR, 2025.
Paper Code
BibTeX
@inproceedings{
gef2024,
title={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions}},
author={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{\textbf{(Survey Certification!)}} \textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Tom, Burns and Lapuschkin, Sebastian and Samek, Wojciech  and H{\"o}hne, Marina M-C},
booktitle={Transactions on Machine Learning Research},
year={2025},
}

GEF

CoSy: Evaluating Textual Explanations of Neurons
Kopf L., Bommer P. L., Hedström A., Lapuschkin S., Höhne M.
NeurIPS, 2024.
Paper Code
BibTeX
@inproceedings{
kopf2024cosy,
title={\href{https://openreview.net/pdf?id=R0bnWrpIeN}{CoSy: Evaluating Textual Explanations of Neurons}},
author={Kopf, Laura and Bommer, Philine Lou and \textbf{Hedstr{\"o}m}, Anna and Lapuschkin, Sebastian, and H{\"o}hne, Marina M-C},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
}

Cosy"

Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond
Bareeva D., Yolcu G. U., Hedström A., Schmolenski N., Wiegand T., Samek W., Lapuschkin S.
NeurIPS Workshop on Attributing Model Behavior at Scale, 2024.
Paper Code
BibTeX
@inproceedings{
quanda2024,
title={\href{https://openreview.net/pdf?id=IFk4bOA11Z}{Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond}},
  author={Bareeva, Dilyara and Yolcu, Galip Umit and \textbf{Hedström}, Anna and Schmolenski, Niklas and Wiegand, Thomas and Samek, Wojciech and Lapuschkin, Sebastian},
booktitle={Second NeurIPS Workshop on Attributing Model Behavior at Scale},
year={2024},
}

Quanda"

Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond
Hedström A., Weber L., Krakowczyk D., Bareeva D., Motzkus F., Samek W., Lapuschkin S., Höhne M.
JMLR, 2023.
Paper Code
BibTeX
@inproceedings{hedstrom2023quantus,
  title={\href{https://www.jmlr.org/papers/v24/22-0142.html}{Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond}},
  author={\textbf{Hedstr{\"o}m}, Anna and Weber, Leander and Krakowczyk, Daniel and Bareeva, Dilyara and Motzkus, Franz and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina M-C},
  booktitle={Journal of Machine Learning Research},
  volume={24},
  number={34},
  pages={1--11},
  year={2023},
}
The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus
Hedström A., Bommer P. L., Wickström K. K., Samek W., Lapuschkin S., Höhne M.
TMLR, 2023.
Paper Code
BibTeX
@inproceedings{hedstrommeta,
  title={\href{https://openreview.net/pdf?id=j3FK00HyfU}{The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus}},
  author={\textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Wickstr{\"o}m, Kristoffer Knutsen and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina MC},
  booktitle={Transactions on Machine Learning Research},
  year={2023},
}
/div>
(Spotlight!) Tutorial: Quantus x Climate — Applying Explainable AI Evaluation in Climate Science
Bommer P. L.*, Hedström A.*, Kretschmer M., Höhne M. M.-C.
ICLR Workshop on Climate Change AI, 2023.
Paper Code
BibTeX
@inproceedings{bommer2023tutorial,
  title={\href{https://www.climatechange.ai/papers/iclr2023/1}{Tutorial: Quantus x Climate - Applying explainable AI evaluation in climate science}},
  author={\href{https://www.climatechange.ai/papers/iclr2023/1}{\textbf{(Spotlight!)}} Bommer, Philine L and \textbf{Hedstr{\"o}m}, Anna and Kretschmer, Marlene and Höhne, Marina M.-C.},
  booktitle={ICLR Workshop on Tackling Climate Change with Machine Learning},
  year={2023},
}

News

Sep 2025 — Started Postdoctoral Fellow at ETH AI Center on AI safety (Zurich, CH)

Aug 2025 — Defended Ph.D. thesis in Interpretable Machine Learning at TU Berlin, with distinction!

July 2025Quantus community reached 60,000 downloads and 600+ stars on GitHub!

May 2025Paper on LLM steering accepted at ICML 2025 (Vancouver, CA)

Jan 2025Paper on geometric and unified evaluation awarded a survey certification by TMLR!

Dec 2024 — Paper on adversarial attacks accepted at NeurIPS Workshop Interpretable AI (New Orleans, US)

Sep 2024 — Started AI Research Programme at J.P. Morgan (London, UK)

May 2024 — Gave a talk on LLM x interpretability at United Nations' AI for Good Global summit (Geneva, CH)

Feb 2024 — Gave a keynote lectures series in XAI AI Invicta School of Artificial Intelligence (Porto, PT)

Feb 2024 — Gave a webinar in applying XAI in climate science at Climate Change AI (Virtual)

Dec 2023 — Presented Quantus in NeurIPS poster sessions (New Orleans, US)

Dec 2023 — Presented eMPRT & sMPRT at NeurIPS XAI workshop (New Orleans, US)

Jun 2023 — Started as Visiting Scientist at Fraunhofer AI Department (Berlin, DE)

Sep 2023 — Gave a talk at at SFI Visual Intelligence (Virtual)

May 2023 — Gave a spotlight tutorial at ICLR Climate Change AI (Kigali, RW)

Apr 2023 — Gave a talk at Physikalisch-Technische Bundesanstalt (PTB) (Berlin, DE)

Mar 2023 — Gave a lecture at SFB 1294 Spring School on Data Assimilation (Virtual)

Jan 2023 — Gave a tutorial at NLDL Deep Learning Conferencewinter school (Tromsø, NO)