Signal Processing and Speech Communication Laboratory
homeresults of the month › Mean- and Median-Based Outlier Testing

Mean- and Median-Based Outlier Testing

Published
Sun, Mar 01, 2026
Tags
rotm
Contact
rotm 03 2026

In this study, we revisit the classical problem of identifying outlier sequences from a large set of sequences when neither the typical distribution nor the outlier distribution is known a priori (universal setting). Assuming that the number of sequences grow faster than the sequence lengths, we introduce and analyze two simple yet powerful test statistics — mean‐based and median‐based estimators — that are computationally tractable and whose errors decrease exponentially with the sequence length.

If the number of outliers grows sublinearly with the total number of sequences, a mean-based test that estimates the typical distribution by averaging the empirical distributions across all sequences achieves the same error exponent as the maximum likelihood test that has full knowledge of the typical and outlier distribution.

If outliers comprise a constant fraction of sequences, the mean estimate becomes biased. We show that a median-based test — using the component-wise median to estimate the typical distribution — recovers the optimal error exponent with high probability. We formalize this behavior via the concept of a typical error exponent, connecting classical large-deviation theory with modern universal testing frameworks.

Our paper was presented at the International Zurich Seminar on Information and Communication (IZS) 2026 and is available on arXiv.

Browse the Results of the Month archive.