An open-source evaluator model

Paper Title:

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Published on:

2 May 2024

Primary Category:

Computation and Language

Paper Authors:

Seungone Kim,

Juyoung Suk,

Shayne Longpre,

Bill Yuchen Lin,

Jamin Shin,

Sean Welleck,

Graham Neubig,

Moontae Lee,

Kyungjae Lee,

Minjoon Seo

Bullets

Key Details

•

Outperforms existing open-source evaluators

•

Closely matches scores from humans and GPT-4

•

Performs both direct assessment and pairwise ranking

•

Incorporates flexible custom evaluation criteria

•

Models, code, and data publicly available

Explore the topics in this paper

evaluation

language models

metrics

rankings

text quality

AI generated summary

An open-source evaluator model

This paper introduces Prometheus 2, an open-source language model specialized for evaluating the quality of text generated by other language models. It demonstrates superior performance in providing scores and rankings that closely match human judgment, while also allowing flexible evaluation based on custom criteria beyond just helpfulness.