avatar

Donya Rooein

Postdoctoral Fellow
Bocconi University
donya.rooein (at) unibocconi.it

About Me

Open to academic and industry opportunities for 2026

I study how language technologies can be made more useful, fair, and human-centred, with a particular focus on education and young users.

I am currently a postdoctoral fellow in the Department of Computer Science at Bocconi University. I work with Prof. Dirk Hovy at the Milan Natural Language Processing Group (MilaNLP Lab), where I contribute to INTERGRATOR, a project studying demographic factors in language technologies and their implications for the future of conversational AI.

My research sits at the intersection of Natural Language Processing and Human-Computer Interaction, with a strong emphasis on educational applications. I am particularly interested in evaluating large language models in educational settings, especially for young users, and in understanding how culture, difficulty, fairness, and human factors shape AI systems in practice.

I am also interested in NLP for social good, including biases and social norms for low-resource languages such as Farsi.

I received my Ph.D. in Information Technology Engineering from Politecnico di Milano, where I was advised by Prof. Barbara Pernici and Prof. Paolo Paolini. I was also a research visitor at ETH Zurich during the final year of my Ph.D.

My Ph.D. thesis was "A Scalable, Reconfigurable, and Adaptive Framework for Chatbots in Education." During my doctoral work, I focused on adaptive conversational agents and designed configurable educational chatbots that support diverse users with different demographic backgrounds and learning needs.

Recent News

Selected Projects

KidAlign AI project illustration

KidAlign AI

Survey-based research on how children use AI across learning, creativity, social interaction, and emotional support, bringing together the perspectives of children, parents, and educators to identify safety gaps and design needs.

View Project
PATS project illustration

PATS

Personality-Aware Teaching Strategies with Large Language Model Tutors, grounded in learning science and LLM-based evaluation of tutoring strategies.

View Project
Biased Tales project image

Biased Tales

A dataset for analyzing how biases influence protagonists' attributes and story elements in LLM-generated stories for children.

View Project
LegalBot project image

LegalBot

A collaboration between Politecnico di Milano and Tribunale di Milano aimed at improving access to legal information through a conversational interface.

View Project
TalkyTutor project image

TalkyTutor

An adaptive educational chatbot built with the aCHAT framework so teachers and other non-technical users can customize content and conversational learning flows.

Publications

Please find all publications on my Google Scholar.

  1. Exploring Subjective Tasks in Farsi: A Survey Analysis and Evaluation of Language Models. WASSA, EACL 2026. link
  2. PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors. EACL 2026. link
  3. Can Reasoning Help Large Language Models Capture Human Annotator Disagreement? EACL 2026. link
  4. Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction. BabyLM, EMNLP 2025. link
  5. Biased Tales: Cultural and Topic Bias in Generating Children's Stories. EMNLP 2025. link
  6. Co-detect: Collaborative Discovery of Edge Cases in Text Classification. EMNLP 2025. link
  1. Measuring Gender Bias in Language Models in Farsi. GeBNLP, ACL 2025. link
  2. Large Language Models for Education: Understanding the Needs of Stakeholders, Current Capabilities and the Path Forward. BEA, ACL 2025. link
  3. Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting. BEA, ACL 2025. link
  4. Are Large Language Models for Education Reliable for All Languages? BEA, ACL 2025. link
  5. Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models' Capabilities on Iranian Social Norm Classification. NAACL 2025. link

Datasets

  • PATS — Personality-aware teaching dialogues at three difficulty levels (stories & images). Dataset
  • Biased Tales — Cultural and topic bias in LLM-generated children's stories. Dataset
  • Farsi Subjective Tasks — Survey of 110 papers on sentiment, emotion, and toxicity detection in Farsi. Dataset
  • GBFA — Farsi translations of ISEAR, BBQ, and HONEST for gender bias evaluation in Farsi LMs. Dataset
  • ISN — Iranian Social Norms: 1,699 bilingual (Farsi/English) samples labelled with social norms. Dataset


Feel free to use my website's source code.