Paper Image

Evaluating language models for medical question answering

Published on:

8 April 2024

Primary Category:

Computation and Language

Paper Authors:

Iñigo Alonso,

Maite Oronoz,

Rodrigo Agerri


Key Details

MedExpQA benchmark to assess language models for medical QA

Includes gold reference explanations from doctors

Models struggle with outdated knowledge and hallucinations

Performance much worse for non-English languages

Still substantial room for improvement

AI generated summary

Evaluating language models for medical question answering

This paper introduces MedExpQA, a new multilingual benchmark to assess the performance of large language models on medical question answering. It includes reference explanations from doctors to evaluate reasoning and uses gold knowledge for comparison. Results show current models still struggle with outdated knowledge, hallucinated content, and non-English languages, indicating much room for improvement.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up