Abstract: As a pioneering vision-language model, CLIP (Contrastive Language-Image Pre-training) has achieved significant success across various domains and a wide range of downstream vision-language ...
Abstract: Multi-modal and cross-modal retrieval has garnered increasing attention from researchers recently, owing to its potential to transcend the limitations imposed by traditional retrieval ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results