Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning

1UC Davis       2University of Kentucky
WACV 2026

*Equal Contribution

Reliability-based Curriculum Learning (RCL)

Reliability-Based Curriculum Learning.

Our proposed RCL framework uses multiple frozen multimodal large language models as zero-shot teachers to guide source-free domain adaptation. RCL estimates the reliability of pseudo-labels by measuring inter-teacher agreement and progressively trains the target model through a three staged curriculum, from highly reliable samples to least reliable ones. This cooperative learning strategy allows robust knowledge distillation without access to source data, achieving state-of-the-art (SOTA) performance across standard SFDA benchmarks.

Abstract

Existing SFDA methods struggle to fully use pre-trained knowledge and often rely on a single model’s predictions or handcrafted prompts, limiting robustness under domain shift. Multimodal Large Language Models (MLLMs) offer a promising alternative: they encode rich visual-semantic knowledge and generalize well without task-specific tuning. However, their use in SFDA is hindered by instructionfollowing failures, inconsistent outputs, and high inference costs. We propose Reliability-based Curriculum Learning (RCL), a novel framework that distills robust supervision from multiple frozen MLLMs into a compact target model. RCL organizes adaptation as a three-stage curriculum that progressively incorporates pseudo-labels based on intermodel agreement and model confidence, enabling stable and noise-aware training. Our approach achieves stateof-the-art performance on standard SFDA datasets, OfficeHome, DomainNet-126, and VisDA-C, outperforming zeroshot MLLMs, their ensembles, all without accessing source data or tuning foundation models.

Key Features

1. We propose the first SFDA framework that distills knowledge from multiple frozen MLLMs.

2. We introduce STS, a simple semantic matching method to convert open-ended MLLM outputs into class predictions.

3. We design RCL, a three-stage curriculum (RKT, SMKE, MMR) that incorporates pseudo-labels based on MLLM agreement and model confidence.

4. We achieve state-of-the-art results on Office-Home, DomainNet-126, and VisDA-C, outperforming CLIP-based and zero-shot MLLM baselines.

Main SFDA Benchmarks

RCL achieves state-of-the-art performance across standard SFDA benchmarks, including Office-Home and DomainNet, and across all adaptation settings.

Office-Home

Accuracy (%) on Office-Home dataset. SF: source-free. CP: uses CLIP prompting. ViT: ViT backbone. * indicates zero-shot.
Method SF CP ViT A→C A→P A→R C→A C→P C→R P→A P→C P→R R→A R→C R→P Avg.
Source - 44.764.269.4 48.357.960.3 49.540.367.2 59.745.673.0 56.7
PADCLIP-RN 57.584.083.8 77.885.584.7 76.359.285.4 78.160.286.7 76.6
ADCLIP-RN 55.485.285.6 76.185.886.2 76.756.185.4 76.856.185.5 75.9
ELR 58.478.781.5 69.279.579.3 66.358.082.6 73.459.885.1 72.6
PLUE 49.173.578.2 62.973.574.5 62.248.378.6 68.651.881.5 66.9
C-SFDA 60.380.282.9 69.380.178.8 67.358.183.4 73.661.386.3 73.5
PSAT-GDA 73.188.189.2 82.188.888.9 83.072.089.6 83.373.791.3 83.6
DIFO-C-RN 62.687.587.1 79.587.987.4 78.363.488.1 80.063.387.7 79.4
DIFO-C-B32 70.690.688.8 82.590.688.8 80.970.188.9 83.470.591.2 83.1
CLIP-RN* - 51.785.083.7 69.385.083.7 69.351.783.7 69.351.785.0 72.4
LLaVA-34B (w/ STS)* - 78.393.789.5 87.093.789.5 87.078.389.5 87.078.393.7 87.2
InstBLIP-XXL (w/ STS)* - 82.091.688.8 82.291.688.8 82.282.088.8 82.282.091.6 86.2
ShrGPT4V-13B (w/ STS)* - 66.785.884.8 83.285.884.8 83.266.784.8 83.266.785.8 80.1
RCL (Ours) 82.595.393.3 89.195.392.7 89.382.492.8 89.482.195.4 90.0
RCL-ViT (Ours) 83.195.793.1 89.295.392.6 89.282.392.9 90.083.295.5 90.2

DomainNet

Accuracy (%) on DomainNet. SF: source-free. CP: uses CLIP prompting. ViT: ViT backbone. * indicates zero-shot performance.
Method SF CP ViT C→P C→R C→S P→C P→R P→S R→C R→P R→S S→C S→P S→R Avg.
Source - 42.653.751.9 52.966.751.6 49.156.843.9 60.948.653.2 52.7
DAPL-RN 72.487.665.9 72.787.665.6 73.272.466.2 73.872.987.8 74.8
ADCLIP-RN 71.788.166.0 73.286.965.2 73.673.068.4 72.374.289.3 75.2
PLUE 59.874.056.0 61.678.557.9 61.665.953.8 67.564.376.0 64.7
TPDS 62.977.159.8 65.679.061.5 66.467.058.2 68.664.375.3 67.1
DIFO-C-RN 73.889.069.4 74.088.770.1 74.874.669.6 74.774.388.0 76.7
DIFO-C-B32 76.687.274.9 80.087.475.6 80.877.375.5 80.576.787.3 80.0
LLaVA-34B (w/ STS)* - 84.491.083.7 85.591.083.7 85.584.483.7 85.584.491.0 86.1
InstBLIP-XXL (w/ STS)* - 82.589.083.0 86.789.083.0 86.782.583.0 86.782.589.0 85.3
ShrGPT4V-13B (w/ STS)* - 79.787.979.2 79.987.979.2 79.979.779.2 79.979.787.9 81.7
RCL (Ours) 87.692.887.9 89.292.787.8 89.687.787.6 89.487.592.7 89.4
RCL-ViT (Ours) 88.193.388.0 89.793.388.0 89.788.087.8 89.788.193.3 89.7

Ablation Studies

We present select key ablation studies to analyze the contribution of individual RCL components and the robustness under different settings. See paper for all ablation experiments.

RCL Component Ablation

RKT SMKE MMR →A →C →P →R Avg.
82.873.389.388.183.3
87.780.293.392.088.3
88.580.995.192.589.3
89.3 82.3 95.3 92.9 90.0

Takeaway: Each module contributes complementary progressive improvements, and combining all three components yields the strongest performance.

RCL Without MLLMs

Method →C →P →R →A Avg.
TPDS (A) 59.1 81.7 81.7 71.6 73.5
LCFD-C-B32 (B) 72.2 90.2 89.7 81.0 83.3
DIFO-C-B32 (C) 70.4 90.8 88.8 82.3 83.1
RCL (A,B,C) 71.9 90.7 89.2 81.7 83.4

Takeaway: Combining non-MLLM teachers yields comparitive but limited gains.


Effect of MLLM Teacher Configuration

Takeaway: RCL consistently outperforms individual MLLMs and their ensembles, demonstrating robustness across teacher choices and maintaining strong performance even with weaker teachers.

Impact of Number of MLLM Teachers

Takeaway: Multiple teachers improve supervision diversity over a single teacher, with performance saturating beyond three teachers.

BibTeX

@article{chen2024empowering,
  title={Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning},
  author={Chen, Dongjie and Patwari, Kartik and Lai, Zhengfeng and Zhu, Xiaoguang and Cheung, Sen-ching and Chuah, Chen-Nee},
  journal={arXiv preprint arXiv:2405.18376},
  year={2024}
}