Finite Decimal Representation Example

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Abstract: Contrastive learning-based vision-language pretraining approaches, such as CLIP, have demonstrated great success in many vision-language tasks. These methods achieve cross-modal alignment by ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Trending now