Paper Image

Code language models struggle at detecting vulnerabilities

Published on:

27 March 2024

Primary Category:

Software Engineering

Paper Authors:

Yangruibo Ding,

Yanjun Fu,

Omniyyah Ibrahim,

Chawin Sitawarin,

Xinyun Chen,

Basel Alomair,

David Wagner,

Baishakhi Ray,

Yizheng Chen

Bullets

Key Details

Existing vulnerability datasets have poor data quality and accuracy

Code language models overestimate performance due to dataset flaws

PrimeVul offers realistic data and evaluation for vulnerability detection

State-of-the-art models failed on PrimeVul, performing like random guesses

Significant gaps remain before models can detect vulnerabilities reliably

AI generated summary

Code language models struggle at detecting vulnerabilities

This paper evaluated code language models on their ability to detect vulnerabilities in software code. It uncovered major flaws in existing datasets used to train these models, like inaccurate labels and duplicated data. It then introduced a new benchmark called PrimeVul that is higher quality and more realistic. When tested on PrimeVul, state-of-the-art code language models performed very poorly, with results akin to random guessing. This reveals current models are far from ready to be deployed for security roles and calls for more innovative research.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up