Paper @ International Journal of Pattern Recognition and Artificial Intelligence

Title: DAP-Adapter: Enhancing Few-Shot CLIP with Dynamically Diverse and Context-Aware Prompt Generation

Authors: Zongjian Li, Hongyou Chen, Lingfeng Qu, Yongjie Zhu, Ya Pan, Baodan Tian, Yong Fan, Hadi Amirpour

Abstract: Contrastive language-image pretraining (CLIP) has demonstrated powerful zero-shot and few-shot classification capabilities by training on large-scale image-text pairs. However, in the CLIP training paradigm, data augmentation strategies are applied primarily to the image inputs, whereas the text prompts remain fixed throughout the training process. Existing approaches typically rely on static text templates or use a limited number of learnable soft prompts with categories, which restricts the expressiveness of the model in capturing category semantics. In this paper, we propose a novel approach called the dynamic attribute prompt adapter (DAP-Adapter), which leverages large language models to generate diverse textual descriptions. Our approach introduces attributes as intermediate bridges that link categories to their specific descriptions. During training, a batch-level dynamic language mode sampling mechanism is adopted in combination with learnable soft prompts to dynamically construct rich text prompts. To further enhance its ability to capture semantics, DAP-Adapter also integrates a nontrainable CLIP adapter. To evaluate the model performance, experiments were conducted on ten datasets. The experimental results demonstrate that the proposed DAP-Adapter outperforms the state-of-the-art Tip-Adapter-F method.