Robust Test-Time Adaptation for Visual and Multimodal Learning under Distribution Shifts
- Status
- finished
- Student
- Jixiang Lei
- Mentors
- Research Areas
Modern deep learning systems often experience severe performance degradation when deployed under distribution shifts caused by noise, weather, hardware failures, or un- foreseen domain mismatches. Test-time adaptation (TTA) has emerged as a practical solution, enabling models to autonomously refine their behavior during inference using only unlabeled target samples. This dissertation advances the robustness and scalability of TTA to support reliable deployment in unpredictable and safety-critical real-world environments.
The first methodological axis focuses on the adaptation space, addressing the fun- damental question of what should be adapted. Three complementary directions are explored: (i) hybrid fine-tuning for convolutional networks, which selectively updates shallow representation layers and normalization parameters to restore feature quality; (ii) prompt-based adaptation for transformer architectures, where lightweight learnable tokens adjust internal attention pathways without modifying the foundational backbone weights; and (iii) reliability-aware modality management, which identifies asymmetric degradation in multimodal systems to dynamically modulate sensory contributions. By either suppressing unreliable signals or restoring corrupted representations, this approach prevents cross-modal error propagation, extending TTA from simple normalization-layer fixes to structure-aware adaptation for modern, large-scale models.
The second methodological axis investigates adaptation objectives, addressing the question of how adaptation should be guided. Recognizing that conventional entropy minimization can lead to overconfident yet incorrect updates, this dissertation intro- duces uncertainty-aware penalties to mitigate such risks. We further propose cross-stage and cross-modal consistency constraints to preserve coherent representation dynamics throughout the adaptation process. In particular, a rank-aware calibration loss stabilizes the model’s confidence by preserving relative logit rankings, while a modality-aware con- sistency loss strengthens multimodal fusion by aligning sensory streams and suppressing the influence of unreliable inputs. These objectives provide a principled foundation for stable adaptation in both batch and online streaming environments.
In summary, the proposed methodologies demonstrate that TTA can serve as a ro- bust and deployment-ready paradigm, significantly enhancing the practical viability and reliability of machine learning systems operating in dynamic real-world environments
