OPT OpenIR  > 光学影像学习与分析中心
Deep cross-modal retrieval for remote sensing image and audio
Mao, Gou1,2; Yuan, Yuan1; Xiaoqiang, Lu1
2018-10-08
Conference Name10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
Source Publication2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
Conference Date2018-08-19
Conference PlaceBeijing, China
PublisherInstitute of Electrical and Electronics Engineers Inc.
Contribution Rank1
Abstract

Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval. ? 2018 IEEE.

Department光学影像学习与分析中心
DOI10.1109/PRRS.2018.8486338
Indexed ByEI
ISBN9781538684795
Language英语
EI Accession Number20184706085095
Citation statistics
Document Type会议论文
Identifierhttp://ir.opt.ac.cn/handle/181661/30867
Collection光学影像学习与分析中心
Affiliation1.Chinese Academy of Sciences, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi'An Institute of Optics and Precision Mechanics, Xi'an, Shaanxi; 710119, China;
2.University of Chinese Academy of Sciences, Beijing; 100049, China
Recommended Citation
GB/T 7714
Mao, Gou,Yuan, Yuan,Xiaoqiang, Lu. Deep cross-modal retrieval for remote sensing image and audio[C]:Institute of Electrical and Electronics Engineers Inc.,2018.
Files in This Item:
File Name/Size DocType Version Access License
Deep cross-modal ret(796KB)会议论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Mao, Gou]'s Articles
[Yuan, Yuan]'s Articles
[Xiaoqiang, Lu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Mao, Gou]'s Articles
[Yuan, Yuan]'s Articles
[Xiaoqiang, Lu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Mao, Gou]'s Articles
[Yuan, Yuan]'s Articles
[Xiaoqiang, Lu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.