Discriminative machine learning for maximal representative subsampling

dc.contributor.authorHauptmann, Tony
dc.contributor.authorFellenz, Sophie
dc.contributor.authorNathan, Laksan
dc.contributor.authorTüscher, Oliver
dc.contributor.authorKramer, Stefan
dc.date.accessioned2023-12-20T11:28:44Z
dc.date.available2023-12-20T11:28:44Z
dc.date.issued2023
dc.description.abstractBiased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.en_GB
dc.identifier.doihttp://doi.org/10.25358/openscience-9834
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/9852
dc.language.isoengde
dc.rightsCC-BY-4.0*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subject.ddc004 Informatikde_DE
dc.subject.ddc004 Data processingen_GB
dc.titleDiscriminative machine learning for maximal representative subsamplingen_GB
dc.typeZeitschriftenaufsatzde
jgu.journal.titleScientific reportsde
jgu.journal.volume13de
jgu.organisation.departmentFB 08 Physik, Mathematik u. Informatikde
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number7940
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.pages.alternative20925de
jgu.publisher.doi10.1038/s41598-023-48177-3de
jgu.publisher.issn2045-2322de
jgu.publisher.nameMacmillan Publishers Limited, part of Springer Naturede
jgu.publisher.placeLondonde
jgu.publisher.year2023
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode004de
jgu.subject.dfgIngenieurwissenschaftende
jgu.type.dinitypeArticleen_GB
jgu.type.resourceTextde
jgu.type.versionPublished versionde

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
discriminative_machine_learni-20231215102324854.pdf
Size:
2.42 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.57 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections