Paper Image

Standardized dataset metadata format for machine learning

Published on:

28 March 2024

Primary Category:

Machine Learning

Paper Authors:

Mubashara Akhtar,

Omar Benjelloun,

Costanza Conforti,

Joan Giner-Miguelez,

Nitisha Jain,

Michael Kuchnik,

Quentin Lhoest,

Pierre Marcenac,

Manil Maskey,

Peter Mattson,

Luis Oala,

Pierre Ruyssen,

Rajat Shinde,

Elena Simperl,

Goeffry Thomas,

Slava Tykhonov,

Joaquin Vanschoren,

Steffen Vogler,

Carole-Jean Wu

Bullets

Key Details

Introduces Croissant: a metadata format for datasets used in ML

Makes datasets more discoverable, portable and interoperable

Addresses data management challenges and supports responsible AI

Supported by popular repositories like HuggingFace and Kaggle

Enables loading datasets directly into ML frameworks

AI generated summary

Standardized dataset metadata format for machine learning

This paper introduces Croissant, a metadata format that standardizes how datasets are described to make them more portable, interoperable and usable across machine learning tools and frameworks. It addresses key data management challenges and supports responsible AI practices.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up