Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://doi.org/10.25358/openscience-9834
Autoren: Hauptmann, Tony
Fellenz, Sophie
Nathan, Laksan
Tüscher, Oliver
Kramer, Stefan
Titel: Discriminative machine learning for maximal representative subsampling
Online-Publikationsdatum: 20-Dez-2023
Erscheinungsdatum: 2023
Sprache des Dokuments: Englisch
Zusammenfassung/Abstract: Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.
DDC-Sachgruppe: 004 Informatik
004 Data processing
Veröffentlichende Institution: Johannes Gutenberg-Universität Mainz
Organisationseinheit: FB 08 Physik, Mathematik u. Informatik
Veröffentlichungsort: Mainz
ROR: https://ror.org/023b0x485
DOI: http://doi.org/10.25358/openscience-9834
Version: Published version
Publikationstyp: Zeitschriftenaufsatz
Nutzungsrechte: CC BY
Informationen zu den Nutzungsrechten: https://creativecommons.org/licenses/by/4.0/
Zeitschrift: Scientific reports
13
Seitenzahl oder Artikelnummer: 20925
Verlag: Macmillan Publishers Limited, part of Springer Nature
Verlagsort: London
Erscheinungsdatum: 2023
ISSN: 2045-2322
DOI der Originalveröffentlichung: 10.1038/s41598-023-48177-3
Enthalten in den Sammlungen:DFG-491381577-G

Dateien zu dieser Ressource:
  Datei Beschreibung GrößeFormat
Miniaturbild
discriminative_machine_learni-20231215102324854.pdf2.48 MBAdobe PDFÖffnen/Anzeigen