Paper Image

Evaluating tool usage skills of language models

Published on:

21 December 2023

Primary Category:

Computation and Language

Paper Authors:

Zehui Chen,

Weihua Du,

Wenwei Zhang,

Kuikun Liu,

Jiangning Liu,

Miao Zheng,

Jingming Zhuo,

Songyang Zhang,

Dahua Lin,

Kai Chen,

Feng Zhao


Key Details

Decomposes tool usage evaluation into sub-skills like planning and reasoning

Introduces T-Eval benchmark with tailored metrics for each sub-skill

Analysis on various models surfaces bottlenecks and consistent trends

Provides new perspective for improving tool utilization capabilities

Benchmark dataset and code publicly available

AI generated summary

Evaluating tool usage skills of language models

This paper introduces T-Eval, a benchmark that evaluates the tool usage skills of large language models in a step-by-step manner. It breaks down tool usage into sub-skills like following instructions, planning, reasoning, retrieval, understanding responses, and reviewing results. Experiments on various models provide insights on current limitations and consistency with overall performance, guiding future development of skillful tool agents.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up