Transformer Advances Towards Dynamic Routing: TRAR for VQA and REC SOTA
Follow our public account to discover the beauty of CV technology 1 Introduction Due to its superior capability for modeling global dependencies, the Transformer and its variants have become the primary architecture for many visual and language tasks. However, tasks like Visual Question Answering (VQA) and Referencing Expression Comprehension (REC) often require multi-modal predictions that … Read more