Recent work highlights a strong correlation between OOD detection and OSR in both settings and performance. Both tasks detect new categories with shifted semantics, while OSR also requires maintaining in-distribution (ID) accuracy. OES supports evaluation of a model’s ability to handle semantic shifts. Unlike existing remote sensing benchmarks that randomly split ID and OOD samples, OES consider the semantic shift degree between coarse and fine classes, aligning the setup with real-world deployment scenarios.
We unify the tasks of semantic shift OOD detection and open-set recognition (OSR) into a single test task to evaluate the model’s ability to handle semantic shifts.
ID Classes: We use the 94 classes as ID classes, defined in ./sub-dataset1-RGB-domain1/OOD_split/ID_94.txt
.
Training/Test Sets: Organized in ./sub-dataset1-RGB-domain1/ID/train
and ./sub-dataset1-RGB-domain1/ID/test
.
Covariate shift OOD detection emphasizes robustness to covariate shifts, also referred to as full-spectrum OOD detection, where the ID data remain semantically consistent, while covariates vary. Given the practical needs of remote sensing, we focus on the following shifts:
For each dataset exhibiting domain shift relative to Sub-Dataset 1, we define the following test tasks:
./sub-dataset2-RGB-domain2/ID/test
)../sub-dataset1-RGB-domain1/ID/train
).OOD-Easy
: 48 classes from Sub-Dataset 1 (minor shifts).OOD-Hard
: 47 classes from Sub-Dataset 1 (significant shifts).Bias-OOD
: 22 classes from Sub-Dataset 2 (shifts). Path: ./sub-dataset2-RGB-domain2/OOD/test
.SUN
: As above../sub-dataset3-Aerial-domain3/ID/test
).Bias-OOD
: 66 classes from Sub-Dataset 3. Path: ./sub-dataset3-Aerial-domain3/OOD/test
.SUN
: As above../sub-dataset4-MSRGB-domain4/ID/test
).Bias-OOD
: 22 classes from Sub-Dataset 4. Path: ./sub-dataset4-MSRGB-domain4/OOD/test
.SUN
: As above../sub-dataset5-IR-domain5/ID/test
).Bias-OOD
: 26 classes from Sub-Dataset 5. Path: ./sub-dataset5-IR-domain5/OOD/test
.SUN
: As above.The rapid advancement of remote sensing generates vast amounts of high-quality images daily, necessitating models to recognize novel classes in open-world scenarios. However, existing CIL benchmarks in remote sensing are constrained by limited category diversity, restricted coarse-grained coverage, and uniform data scales, inadequately capturing real-world complexities. To address these limitations, we evaluate existing CIL methods using three benchmarks:
./sub-dataset1-RGB-domain1/CIL_split/CIL_coarse_split.json
../sub-dataset1-RGB-domain1/CIL_split/CIL_scale_split.json
.To assess the model’s adaptability to data from different domains, we benchmark DIL on OES. We select 50 categories containing the same semantic classes from RGB satellite (Sub-dataset 1), RGB aerial (Sub-dataset 3), MSRGB (Sub-dataset 4) and IR images (Sub-dataset 5).
In C2FSCIL, we provide the model with all training samples accompanied by coarse labels in the base session, including 10 coarse-grained classes and 189 fine-grained classes. In the subsequent incremental sessions, we introduce samples with fine labels for each of the 10 coarse classes, supplying only 5 samples per class at each session, which is consistent with the few-shot setting. The initial training phase learns all 10 coarse classes, while each subsequent incremental phase introduces 20 fine-grained classes.