Discriminative machine learning for maximal representative subsampling

Hauptmann, Tony; Fellenz, Sophie; Nathan, Laksan; Tüscher, Oliver; Kramer, Stefan

Discriminative machine learning for maximal representative subsampling

dc.contributor.author	Hauptmann, Tony
dc.contributor.author	Fellenz, Sophie
dc.contributor.author	Nathan, Laksan
dc.contributor.author	Tüscher, Oliver
dc.contributor.author	Kramer, Stefan
dc.date.accessioned	2023-12-20T11:28:44Z
dc.date.available	2023-12-20T11:28:44Z
dc.date.issued	2023
dc.description.abstract	Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.	en_GB
dc.identifier.doi	http://doi.org/10.25358/openscience-9834
dc.identifier.uri	https://openscience.ub.uni-mainz.de/handle/20.500.12030/9852
dc.language.iso	eng	de
dc.rights	CC-BY-4.0	*
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	*
dc.subject.ddc	004 Informatik	de_DE
dc.subject.ddc	004 Data processing	en_GB
dc.title	Discriminative machine learning for maximal representative subsampling	en_GB
dc.type	Zeitschriftenaufsatz	de
jgu.journal.title	Scientific reports	de
jgu.journal.volume	13	de
jgu.organisation.department	FB 08 Physik, Mathematik u. Informatik	de
jgu.organisation.name	Johannes Gutenberg-Universität Mainz
jgu.organisation.number	7940
jgu.organisation.place	Mainz
jgu.organisation.ror	https://ror.org/023b0x485
jgu.pages.alternative	20925	de
jgu.publisher.doi	10.1038/s41598-023-48177-3	de
jgu.publisher.issn	2045-2322	de
jgu.publisher.name	Macmillan Publishers Limited, part of Springer Nature	de
jgu.publisher.place	London	de
jgu.publisher.year	2023
jgu.rights.accessrights	openAccess
jgu.subject.ddccode	004	de
jgu.subject.dfg	Ingenieurwissenschaften	de
jgu.type.dinitype	Article	en_GB
jgu.type.resource	Text	de
jgu.type.version	Published version	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: discriminative_machine_learni-20231215102324854.pdf
Size:: 2.42 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.57 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

DFG-491381577-G