OPT OpenIR  > 光谱成像技术研究室
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
Li, Xuelong1,2; Yuan, Aihong3,4; Lu, Xiaoqiang3
Department光谱成像技术研究室
2021-02
Source PublicationIEEE TRANSACTIONS ON CYBERNETICS
ISSN2168-2267;2168-2275
Volume51Issue:2Pages:913-926
Contribution Rank3
Abstract

Vision-to-language tasks aim to integrate computer vision and natural language processing together, which has attracted the attention of many researchers. For typical approaches, they encode image into feature representations and decode it into natural language sentences. While they neglect high-level semantic concepts and subtle relationships between image regions and natural language elements. To make full use of these information, this paper attempt to exploit the text-guided attention and semantic-guided attention (SA) to find the more correlated spatial information and reduce the semantic gap between vision and language. Our method includes two-level attention networks. One is the text-guided attention network which is used to select the text-related regions. The other is SA network which is used to highlight the concept-related regions and the region-related concepts. At last, all these information are incorporated to generate captions or answers. Practically, image captioning and visual question answering experiments have been carried out, and the experimental results have shown the excellent performance of the proposed approach.

KeywordDeep learning image captioning multimodal visual question answering (VQA)
DOI10.1109/TCYB.2019.2914351
Indexed BySCI
Language英语
WOS IDWOS:000608690900036
PublisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation statistics
Document Type期刊论文
Identifierhttp://ir.opt.ac.cn/handle/181661/94280
Collection光谱成像技术研究室
Corresponding AuthorLu, Xiaoqiang
Affiliation1.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
2.Northwestern Polytech Univ, Ctr Opt Imagery Anal & Learning, Xian 710072, Peoples R China
3.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Key Lab Spectral Imaging Technol CAS, Xian 710119, Peoples R China
4.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
Recommended Citation
GB/T 7714
Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Vision-to-Language Tasks Based on Attributes and Attention Mechanism[J]. IEEE TRANSACTIONS ON CYBERNETICS,2021,51(2):913-926.
APA Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2021).Vision-to-Language Tasks Based on Attributes and Attention Mechanism.IEEE TRANSACTIONS ON CYBERNETICS,51(2),913-926.
MLA Li, Xuelong,et al."Vision-to-Language Tasks Based on Attributes and Attention Mechanism".IEEE TRANSACTIONS ON CYBERNETICS 51.2(2021):913-926.
Files in This Item:
File Name/Size DocType Version Access License
Vision-to-Language T(3082KB)期刊论文出版稿限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.