OPT OpenIR  > 光谱成像技术研究室
Mutual Attention Inception Network for Remote Sensing Visual Question Answering
Zheng, Xiangtao1; Wang, Binqiang2; Du, Xingqian2; Lu, Xiaoqiang3
作者部门光谱成像技术研究室
发表期刊IEEE Transactions on Geoscience and Remote Sensing
ISSN01962892;15580644
产权排序1
摘要

Remote sensing images (RSIs) containing various ground objects have been applied in many fields. To make semantic understanding of RSIs objective and interactive, the task remote sensing visual question answering (VQA) has appeared. Given an RSI, the goal of remote sensing VQA is to make an intelligent agent answer a question about the remote sensing scene. Existing remote sensing VQA methods utilized a nonspatial fusion strategy to fuse the image features and question features, which ignores the spatial information of images and word-level information of questions. A novel method is proposed to complete the task considering these two aspects. First, convolutional features of the image are included to represent spatial information, and the word vectors of questions are adopted to present semantic word information. Second, attention mechanism and bilinear technique are introduced to enhance the feature considering the alignments between spatial positions and words. Finally, a fully connected layer with softmax is utilized to output an answer from the perspective of the multiclass classification task. To benchmark this task, a RSIVQA dataset is introduced in this article. For each of more than 37,000 RSIs, the proposed dataset contains at least one or more questions, plus corresponding answers. Experimental results demonstrate that the proposed method can capture the alignments between images and questions. The code and dataset are available at https://github.com/spectralpublic/RSIVQA. IEEE

关键词Attention mechanism feature fusion remote sensing visual question answering (RSVQA) semantic understanding
DOI10.1109/TGRS.2021.3079918
收录类别EI
语种英语
WOS记录号WOS:000733504200001
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20212310473040
引用统计
被引频次:68[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.opt.ac.cn/handle/181661/94878
专题光谱成像技术研究室
作者单位1.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China.;
2.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China, and also with the University of Chinese Academy of Sciences, Beijing 100049, China.;
3.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China (e-mail: luxq666666@gmail.com)
推荐引用方式
GB/T 7714
Zheng, Xiangtao,Wang, Binqiang,Du, Xingqian,et al. Mutual Attention Inception Network for Remote Sensing Visual Question Answering[J]. IEEE Transactions on Geoscience and Remote Sensing.
APA Zheng, Xiangtao,Wang, Binqiang,Du, Xingqian,&Lu, Xiaoqiang.
MLA Zheng, Xiangtao,et al."Mutual Attention Inception Network for Remote Sensing Visual Question Answering".IEEE Transactions on Geoscience and Remote Sensing
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Mutual Attention Inc(6830KB)期刊论文出版稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zheng, Xiangtao]的文章
[Wang, Binqiang]的文章
[Du, Xingqian]的文章
百度学术
百度学术中相似的文章
[Zheng, Xiangtao]的文章
[Wang, Binqiang]的文章
[Du, Xingqian]的文章
必应学术
必应学术中相似的文章
[Zheng, Xiangtao]的文章
[Wang, Binqiang]的文章
[Du, Xingqian]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。