Evaluating outlier probabilities : assessing sharpness, refinement, and calibration using stratified and weighted measures

dc.contributor.authorRöchner, Philipp
dc.contributor.authorMarques, Henrique O.
dc.contributor.authorCampello, Ricardo J. G. B.
dc.contributor.authorZimek, Arthur
dc.date.accessioned2025-08-20T14:35:07Z
dc.date.available2025-08-20T14:35:07Z
dc.date.issued2024
dc.description.abstractAn outlier probability is the probability that an observation is an outlier. Typically, outlier detection algorithms calculate real-valued outlier scores to identify outliers. Converting outlier scores into outlier probabilities increases the interpretability of outlier scores for domain experts and makes outlier scores from different outlier detection algorithms comparable. Although several transformations to convert outlier scores to outlier probabilities have been proposed in the literature, there is no common understanding of good outlier probabilities and no standard approach to evaluate outlier probabilities. We require that good outlier probabilities be sharp, refined, and calibrated. To evaluate these properties, we adapt and propose novel measures that use ground-truth labels indicating which observation is an outlier or an inlier. The refinement and calibration measures partition the outlier probabilities into bins or use kernel smoothing. Compared to the evaluation of probability in supervised learning, several aspects are relevant when evaluating outlier probabilities, mainly due to the imbalanced and often unsupervised nature of outlier detection. First, stratified and weighted measures are necessary to evaluate the probabilities of outliers well. Second, the joint use of the sharpness, refinement, and calibration errors makes it possible to independently measure the corresponding characteristics of outlier probabilities. Third, equiareal bins, where the product of observations per bin times bin length is constant, balance the number of observations per bin and bin length, allowing accurate evaluation of different outlier probability ranges. Finally, we show that good outlier probabilities, according to the proposed measures, improve the performance of the follow-up task of converting outlier probabilities into labels for outliers and inliers.en
dc.identifier.doihttps://doi.org/10.25358/openscience-12335
dc.identifier.urihttps://openscience.ub.uni-mainz.de/handle/20.500.12030/12356
dc.language.isoeng
dc.rightsCC-BY-4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.ddc330 Wirtschaftde
dc.subject.ddc330 Economicsen
dc.subject.ddc600 Technikde
dc.subject.ddc600 Technology (Applied sciences)en
dc.titleEvaluating outlier probabilities : assessing sharpness, refinement, and calibration using stratified and weighted measuresen
dc.typeZeitschriftenaufsatz
jgu.journal.titleData mining and knowledge discovery
jgu.journal.volume38
jgu.organisation.departmentFB 03 Rechts- und Wirtschaftswissenschaften
jgu.organisation.nameJohannes Gutenberg-Universität Mainz
jgu.organisation.number2300
jgu.organisation.placeMainz
jgu.organisation.rorhttps://ror.org/023b0x485
jgu.pages.end3757
jgu.pages.start3719
jgu.publisher.doi10.1007/s10618-024-01056-5
jgu.publisher.eissn1573-756X
jgu.publisher.nameSpringer
jgu.publisher.placeDordrecht
jgu.publisher.year2024
jgu.rights.accessrightsopenAccess
jgu.subject.ddccode330
jgu.subject.ddccode600
jgu.subject.dfgGeistes- und Sozialwissenschaften
jgu.type.dinitypeArticleen_GB
jgu.type.resourceText
jgu.type.versionPublished version

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
evaluating_outlier_probabilit-202508201635071663.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
5.1 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections