Paper Image

Uncovering Hidden Biases: How Learning from Incomplete Data Can Improve Machine Learning

Published on:

24 August 2018

Primary Category:

Machine Learning

Paper Authors:

Yeounoh Chung,

Peter J. Haas,

Eli Upfal,

Tim Kraska

Bullets

Key Details

Training data often suffers from sampling bias or covariate shift, yielding poor generalization.

We introduce the notion of 'unknown unknowns' - missing examples due to the training/test distribution mismatch.

Novel techniques use multiple data sources to estimate number and values of missing examples.

Adding estimated missing data to training sets enhances model generalization ability.

Experiments on simulated and real datasets validate the proposed techniques.

AI generated summary

Uncovering Hidden Biases: How Learning from Incomplete Data Can Improve Machine Learning

This paper proposes novel techniques to account for 'unknown unknowns' - data missing from training sets due to sampling bias or covariate shift - in order to improve machine learning model generalization. The key idea is to leverage multiple overlapping data sources to estimate the number and likely values of missing examples, without access to test data. Experiments on simulated and real-world datasets indicate the proposed techniques can significantly enhance model performance on new data.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up