Stet Pyq Question Video Computer Science

MGQA: Mixture Gaussian for Video Grounded Question Answering via VLMs

Abstract: Video question answering has become a cornerstone task for evaluating vision language models. However, existing models often fail to ground their answers in relevant visual evidence or ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

MGQA: Mixture Gaussian for Video Grounded Question Answering via VLMs

Trending now