OPT OpenIR  > 光学影像学习与分析中心
Deep cross-modal retrieval for remote sensing image and audio
Mao, Gou1,2; Yuan, Yuan1; Xiaoqiang, Lu1
2018-10-08
会议名称10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
会议录名称2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS 2018
会议日期2018-08-19
会议地点Beijing, China
出版者Institute of Electrical and Electronics Engineers Inc.
产权排序1
摘要

Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval. ? 2018 IEEE.

作者部门光学影像学习与分析中心
DOI10.1109/PRRS.2018.8486338
收录类别EI
ISBN号9781538684795
语种英语
EI入藏号20184706085095
引用统计
文献类型会议论文
条目标识符http://ir.opt.ac.cn/handle/181661/30867
专题光学影像学习与分析中心
作者单位1.Chinese Academy of Sciences, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi'An Institute of Optics and Precision Mechanics, Xi'an, Shaanxi; 710119, China;
2.University of Chinese Academy of Sciences, Beijing; 100049, China
推荐引用方式
GB/T 7714
Mao, Gou,Yuan, Yuan,Xiaoqiang, Lu. Deep cross-modal retrieval for remote sensing image and audio[C]:Institute of Electrical and Electronics Engineers Inc.,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Deep cross-modal ret(796KB)会议论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Mao, Gou]的文章
[Yuan, Yuan]的文章
[Xiaoqiang, Lu]的文章
百度学术
百度学术中相似的文章
[Mao, Gou]的文章
[Yuan, Yuan]的文章
[Xiaoqiang, Lu]的文章
必应学术
必应学术中相似的文章
[Mao, Gou]的文章
[Yuan, Yuan]的文章
[Xiaoqiang, Lu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。