Unsupervised anomaly detection of implausible electronic health records : a real-world evaluation in cancer registries
dc.contributor.author | Röchner, Philipp | |
dc.contributor.author | Rothlauf, Franz | |
dc.date.accessioned | 2023-10-24T10:33:54Z | |
dc.date.available | 2023-10-24T10:33:54Z | |
dc.date.issued | 2023 | |
dc.description.abstract | Background: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. | en_GB |
dc.description.abstract | Methods: Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection—a total of 785 different records—are evaluated in a real-world scenario by medical domain experts. | en_GB |
dc.description.abstract | Results: Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. | en_GB |
dc.description.sponsorship | Deutsche Forschungsgemeinschaft (DFG)|491381577|Open-Access-Publikationskosten 2022–2024 Universität Mainz - Universitätsmedizin | |
dc.identifier.doi | http://doi.org/10.25358/openscience-9639 | |
dc.identifier.uri | https://openscience.ub.uni-mainz.de/handle/20.500.12030/9657 | |
dc.language.iso | eng | de |
dc.rights | CC-BY-4.0 | * |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | * |
dc.subject.ddc | 004 Informatik | de_DE |
dc.subject.ddc | 004 Data processing | en_GB |
dc.subject.ddc | 330 Wirtschaft | de_DE |
dc.subject.ddc | 330 Economics | en_GB |
dc.title | Unsupervised anomaly detection of implausible electronic health records : a real-world evaluation in cancer registries | en_GB |
dc.type | Zeitschriftenaufsatz | de |
jgu.journal.title | BMC Medical Research Methodology | de |
jgu.journal.volume | 23 | de |
jgu.organisation.department | FB 03 Rechts- und Wirtschaftswissenschaften | de |
jgu.organisation.name | Johannes Gutenberg-Universität Mainz | |
jgu.organisation.number | 2300 | |
jgu.organisation.place | Mainz | |
jgu.organisation.ror | https://ror.org/023b0x485 | |
jgu.pages.alternative | 125 | de |
jgu.publisher.doi | 10.1186/s12874-023-01946-0 | de |
jgu.publisher.issn | 1471-2288 | de |
jgu.publisher.name | Springer Nature | de |
jgu.publisher.place | London | de |
jgu.publisher.year | 2023 | |
jgu.rights.accessrights | openAccess | |
jgu.subject.ddccode | 004 | de |
jgu.subject.ddccode | 330 | de |
jgu.subject.dfg | Geistes- und Sozialwissenschaften | de |
jgu.type.contenttype | Scientific article | de |
jgu.type.dinitype | Article | en_GB |
jgu.type.resource | Text | de |
jgu.type.version | Published version | de |