LiteParse pairs fast text parsing with a two-stage agent pattern, falling back to multimodal models when tables or charts ...
Abstract: Document Visual Question Answering (DocVQA) necessitates comprehension of both the spatial layout and the textual content. Multimodal pretraining is a foundational component of existing ...
Abstract: Accurate object detection depends on the precise refinement of bounding box regression. Recent advancements in bounding box regression have introduced a variety of methodologies aimed at ...
YOLOv4 Face Detection on Yale Dataset A comprehensive implementation of YOLOv4 object detection for face recognition tasks, specifically designed for COOP training and educational purposes. This ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results