OPT OpenIR  > 光谱成像技术研究室
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
Li, Xuelong1,2; Yuan, Aihong3,4; Lu, Xiaoqiang3
Contribution Rank3

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it into natural language sentences. While they neglect high-level semantic concepts and subtle relationships between image regions and natural language elements. To make full use of these information, this paper attempt to exploit the text-guided attention and semantic-guided attention (SA) to find the more correlated spatial information and reduce the semantic gap between vision and language. Our method includes two-level attention networks. One is the text-guided attention network which is used to select the text-related regions. The other is SA network which is used to highlight the concept-related regions and the region-related concepts. At last, all these information are incorporated to generate captions or answers. Practically, image captioning and visual question answering experiments have been carried out, and the experimental results have shown the excellent performance of the proposed approach.

KeywordDeep learning image captioning multimodal visual question answering (VQA)
Indexed BySCI
WOS IDWOS:000608690900036
Citation statistics
Document Type期刊论文
Corresponding AuthorLu, Xiaoqiang
Affiliation1.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
2.Northwestern Polytech Univ, Ctr Opt Imagery Anal & Learning, Xian 710072, Peoples R China
3.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Peoples R China
4.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
Recommended Citation
GB/T 7714
Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Vision-to-Language Tasks Based on Attributes and Attention Mechanism[J]. IEEE TRANSACTIONS ON CYBERNETICS,2021,51(2):913-926.
APA Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2021).Vision-to-Language Tasks Based on Attributes and Attention Mechanism.IEEE TRANSACTIONS ON CYBERNETICS,51(2),913-926.
MLA Li, Xuelong,et al."Vision-to-Language Tasks Based on Attributes and Attention Mechanism".IEEE TRANSACTIONS ON CYBERNETICS 51.2(2021):913-926.
Files in This Item:
File Name/Size DocType Version Access License
Vision-to-Language T(3082KB)期刊论文出版稿限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.