Universal Outlier Hypothesis Testing

home › theses & projects › Universal Outlier Hypothesis Testing

Universal Outlier Hypothesis Testing

Status

Open

Type

Master Thesis

Announcement date

09 Apr 2026

Mentors

Bernhard Geiger

Research Areas

Intelligent Systems

Short Description

Universal outlier hypothesis testing refers to a hypothesis testing problem where one observes a large number of length-n sequences—the majority of which are distributed according to the typical distribution and a small number are distributed according to the outlier distribution—and one wishes to decide, which of these sequences are outliers without having knowledge of these distributions.

Previous work has assumed a fixed and known number M of sequences and has studied the error of a generalized likelihood ratio test as the sequence length n grows to infinity (arXiv:1302.4776). Recently, these results were extended to the setting where the number of sequences M grows faster than the sequence lengths—it was shown that a much simpler, computationally more efficient test achieves the same optimal error exponent in many relevant settings (arXiv:2601.00712). Specifically, the test estimates the typical distribution from the mean (or median) over all observation sequences, and uses this estimate for determining outliers. Several questions remain open, though, and could be the topic of your Master Thesis/Project:

Study the performance of these tests for the setting where the number of outliers is not known, or if not all outliers follow the same outlier distribution.
While previous works investigate the total probability of error of the hypothesis test (Chernoff Regime), we may be interested in the performance of a test with fixed false alarm or misdetection rates (Stein Regime).
Investigate the performance of these tests in realistic settings on real-world datasets (finite n, finite M) and compare the results to the state of the art on outlier detection.

Your Tasks

Get familiar with the mathematical tools used to obtain the results in (arXiv:1302.4776,2601.00712).
Extend the results along (a subset of) the directions mentioned above using mathematical proofs.
Illustrate theoretical results with experiments on synthetic datasets; numerically evaluate the robustness of theoretical results if assumptions are violated.
Review the state of the art on outlying sequence detection and perform experiments comparing different approaches.

Your Profile

Good mathematical skills, especially in probability and statistics; prior knowledge in information theory and hypothesis testing is beneficial
Practical experience in Python for machine learning (e.g., implementing published methods or executing code from existing repositories, developing code for the proposed tests, etc.)

Contact

Bernhard Geiger (geiger@tugraz.at)