Paper Title:
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Published on:
2 May 2024
Primary Category:
Computation and Language
Paper Authors:
Seungone Kim,
Juyoung Suk,
Shayne Longpre,
Bill Yuchen Lin,
Jamin Shin,
Sean Welleck,
Graham Neubig,
Moontae Lee,
Kyungjae Lee,
Minjoon Seo
Outperforms existing open-source evaluators
Closely matches scores from humans and GPT-4
Performs both direct assessment and pairwise ranking
Incorporates flexible custom evaluation criteria
Models, code, and data publicly available
An open-source evaluator model
This paper introduces Prometheus 2, an open-source language model specialized for evaluating the quality of text generated by other language models. It demonstrates superior performance in providing scores and rankings that closely match human judgment, while also allowing flexible evaluation based on custom criteria beyond just helpfulness.
Evaluating large language models for assisting programmers
Efficient ranking of text options through selective pairwise comparisons
Evaluating language models through competitive contests
Evaluating text-to-3D models with GPT-4V
Scalable Judges for Evaluating Language Models
Reliability of Large Language Models for Factual Knowledge
No comments yet, be the first to start the conversation...
Sign up to comment on this paper