Abstract: Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with ...
Abstract: Visual grounding in remote sensing (RSVG) images aims to detect specific objects associated with referring expressions in remote sensing images. Existing methods typically combine outputs of ...