
Hi! I’m a Postdoctoral Fellow at the ETH AI Center, supervised by Prof. Dr. Menna El‑Assady (IVIA Lab) and Prof. Dr. Andreas Krause (LAS group) at ETH Zürich. My research focuses on evaluation-centric interpretability, large language model (LLM) alignment and AI safety.
I completed a Ph.D. in Machine Learning at TU Berlin with distinction, advised by Prof. Dr. Marina Höhne and Prof. Dr. Wojciech Samek. I hold an M.Sc. from KTH and a B.Sc. from UCL.
Previously, I held multiple ML roles across industry; most recently, I joined the AI Research Programme at J.P. Morgan working on mechanistic steering of LLMs. Before my Ph.D., I freelanced in ML, worked with credit risk at Klarna, time‑series modeling at Bosch, and interned at Black Swan Data and BCG. I advise startups on AI and like to contribute to open‑source software (e.g., Quantus).
📍 I'm currently based in Zürich, Switzerland.
✉️ Email: hedstroem.anna@gmail.com
Selected Research
Full list: Google Scholar
BibTeX
@inproceedings{anna2025abstention, title = {To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models}, author = {\textbf{Hedstr{\"o}m}, Anna and Amoukou, Salim and Bewley, Tom and Mishra, Saumitra and Veloso, Manuela}, booktitle={Forty-Second International Conference on Machine Learning (ICML)!}, year={2025}, }
BibTeX
@inproceedings{ gef2024, title={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions}}, author={\href{https://openreview.net/forum?id=ukLxqA8zXj¬eId=5ceyt8qT4e}{\textbf{(Survey Certification!)}} \textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Tom, Burns and Lapuschkin, Sebastian and Samek, Wojciech and H{\"o}hne, Marina M-C}, booktitle={Transactions on Machine Learning Research}, year={2025}, }
BibTeX
@inproceedings{ kopf2024cosy, title={\href{https://openreview.net/pdf?id=R0bnWrpIeN}{CoSy: Evaluating Textual Explanations of Neurons}}, author={Kopf, Laura and Bommer, Philine Lou and \textbf{Hedstr{\"o}m}, Anna and Lapuschkin, Sebastian, and H{\"o}hne, Marina M-C}, booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year={2024}, }
BibTeX
@inproceedings{ quanda2024, title={\href{https://openreview.net/pdf?id=IFk4bOA11Z}{Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond}}, author={Bareeva, Dilyara and Yolcu, Galip Umit and \textbf{Hedström}, Anna and Schmolenski, Niklas and Wiegand, Thomas and Samek, Wojciech and Lapuschkin, Sebastian}, booktitle={Second NeurIPS Workshop on Attributing Model Behavior at Scale}, year={2024}, }
BibTeX
@inproceedings{hedstrom2023quantus, title={\href{https://www.jmlr.org/papers/v24/22-0142.html}{Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond}}, author={\textbf{Hedstr{\"o}m}, Anna and Weber, Leander and Krakowczyk, Daniel and Bareeva, Dilyara and Motzkus, Franz and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina M-C}, booktitle={Journal of Machine Learning Research}, volume={24}, number={34}, pages={1--11}, year={2023}, }
BibTeX
@inproceedings{hedstrommeta, title={\href{https://openreview.net/pdf?id=j3FK00HyfU}{The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus}}, author={\textbf{Hedstr{\"o}m}, Anna and Bommer, Philine Lou and Wickstr{\"o}m, Kristoffer Knutsen and Samek, Wojciech and Lapuschkin, Sebastian and H{\"o}hne, Marina MC}, booktitle={Transactions on Machine Learning Research}, year={2023}, }
BibTeX
@inproceedings{bommer2023tutorial, title={\href{https://www.climatechange.ai/papers/iclr2023/1}{Tutorial: Quantus x Climate - Applying explainable AI evaluation in climate science}}, author={\href{https://www.climatechange.ai/papers/iclr2023/1}{\textbf{(Spotlight!)}} Bommer, Philine L and \textbf{Hedstr{\"o}m}, Anna and Kretschmer, Marlene and Höhne, Marina M.-C.}, booktitle={ICLR Workshop on Tackling Climate Change with Machine Learning}, year={2023}, }
News
Sep 2025 — Started Postdoctoral Fellow at ETH AI Center on AI safety (Zurich, CH)
Aug 2025 — Defended Ph.D. thesis in Interpretable Machine Learning at TU Berlin, with distinction!
July 2025 — Quantus community reached 60,000 downloads and 600+ stars on GitHub!
May 2025 — Paper on LLM steering accepted at ICML 2025 (Vancouver, CA)
Jan 2025 — Paper on geometric and unified evaluation awarded a survey certification by TMLR!
Dec 2024 — Paper on adversarial attacks accepted at NeurIPS Workshop Interpretable AI (New Orleans, US)
Sep 2024 — Started AI Research Programme at J.P. Morgan (London, UK)
May 2024 — Gave a talk on LLM x interpretability at United Nations' AI for Good Global summit (Geneva, CH)
Feb 2024 — Gave a keynote lectures series in XAI AI Invicta School of Artificial Intelligence (Porto, PT)
Feb 2024 — Gave a webinar in applying XAI in climate science at Climate Change AI (Virtual)
Dec 2023 — Presented Quantus in NeurIPS poster sessions (New Orleans, US)
Dec 2023 — Presented eMPRT & sMPRT at NeurIPS XAI workshop (New Orleans, US)
Jun 2023 — Started as Visiting Scientist at Fraunhofer AI Department (Berlin, DE)
Sep 2023 — Gave a talk at at SFI Visual Intelligence (Virtual)
May 2023 — Gave a spotlight tutorial at ICLR Climate Change AI (Kigali, RW)
Apr 2023 — Gave a talk at Physikalisch-Technische Bundesanstalt (PTB) (Berlin, DE)
Mar 2023 — Gave a lecture at SFB 1294 Spring School on Data Assimilation (Virtual)
Jan 2023 — Gave a tutorial at NLDL Deep Learning Conferencewinter school (Tromsø, NO)