WisPerMed@PerAnsSumm 2025: Strong Reasoning Through Structured Prompting and Careful Answer Selection Enhances Perspective Extraction and Summarization of Healthcare Forum Threads

Tabea Pakull, Hendrik Damm, Henning Schäfer, Peter Horn, Christoph M. Friedrich

May 2025

Abstract

Healthcare community question-answering (CQA) forums provide multi-perspective insights into patient experiences and medical advice. Summarizations of these threads must account for these perspectives, rather than relying on a single “best” answer. This paper presents the participation of the WisPerMed team in the PerAnsSumm shared task 2025, which consists of two sub-tasks: (A) span identification and classification, and (B) perspectivebased summarization. For Task A, encoder models, decoder-based LLMs, and reasoningfocused models are evaluated under finetuning, instruction-tuning, and prompt-based paradigms. The experimental evaluations employing automatic metrics demonstrate that DeepSeek-R1 attains a high proportional recall (0.738) and F1-Score (0.676) in zero-shot settings, though strict boundary alignment remains challenging (F1-Score: 0.196). For Task B, filtering answers by labeling them with perspectives prior to summarization with Mistral-7B-v0.3 enhances summarization. This approach ensures that the model is trained exclusively on relevant data, while discarding non-essential information, leading to enhanced relevance (ROUGE-1: 0.452) and balanced factuality (SummaC: 0.296). The analysis uncovers two key limitations: data imbalance and hallucinations of decoder-based LLMs, with underrepresented perspectives exhibiting suboptimal performance. The WisPerMed team’s approach secured the highest overall ranking in the shared task.

Type

Conference paper

Publication

Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

WisPerMed@PerAnsSumm 2025: Strong Reasoning Through Structured Prompting and Careful Answer Selection Enhances Perspective Extraction and Summarization of Healthcare Forum Threads

Abstract

Tabea Pakull

Researcher in the second cohort

Hendrik Damm

Researcher in the second cohort

Henning Schäfer

Researcher in the first cohort

Peter Horn

Principal Investigator

Christoph M. Friedrich

Co-Speaker