Multi-modal gated recurrent units for image description | |
Li, Xuelong1,2![]() ![]() | |
作者部门 | 光学影像学习与分析中心 |
2018-11-01 | |
发表期刊 | Multimedia Tools and Applications
![]() |
ISSN | 13807501;15737721 |
卷号 | 77期号:22页码:29847-29869 |
产权排序 | 1 |
摘要 | Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets. © 2018, Springer Science+Business Media, LLC, part of Springer Nature. |
关键词 | Image Description Gated Recurrent Unit Convolutional Neural Network Multi-modal Embedding |
DOI | 10.1007/s11042-018-5856-1 |
收录类别 | SCI ; EI |
语种 | 英语 |
WOS记录号 | WOS:000451780800038 |
出版者 | Springer New York LLC |
EI入藏号 | 20181204916306 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.opt.ac.cn/handle/181661/30849 |
专题 | 光谱成像技术研究室 |
通讯作者 | Lu, Xiaoqiang |
作者单位 | 1.Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an; Shaanxi; 710119, China; 2.University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing; 100049, China |
推荐引用方式 GB/T 7714 | Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Multi-modal gated recurrent units for image description[J]. Multimedia Tools and Applications,2018,77(22):29847-29869. |
APA | Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2018).Multi-modal gated recurrent units for image description.Multimedia Tools and Applications,77(22),29847-29869. |
MLA | Li, Xuelong,et al."Multi-modal gated recurrent units for image description".Multimedia Tools and Applications 77.22(2018):29847-29869. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Multi-modal gated re(2037KB) | 期刊论文 | 出版稿 | 限制开放 | CC BY-NC-SA | 请求全文 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论