Abstract: Online test-time adaptation (OTTA) of vision-language models (VLMs) has recently garnered increased attention to take advantage of data observed along a stream to improve future predictions.
PhyX specializes in university-level challenging questions presented through realistic, high-fidelity visual scenarios. Unlike general-purpose benchmarks, our tasks require models to integrate visual ...
Abstract: Open-vocabulary camouflaged object segmentation (OVCOS) seeks to segment and classify camouflaged objects in arbitrary categories, presenting unique challenges due to visual ambiguity and ...