OPT OpenIR  > 光学影像学习与分析中心
Multi-modal gated recurrent units for image description
Li, Xuelong1,2; Yuan, Aihong1,2; Lu, Xiaoqiang1
Department光学影像学习与分析中心
2018-11-01
Source PublicationMultimedia Tools and Applications
ISSN13807501;15737721
Volume77Issue:22Pages:29847-29869
Contribution Rank1
Abstract

Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.

KeywordImage Description Gated Recurrent Unit Convolutional Neural Network Multi-modal Embedding
DOI10.1007/s11042-018-5856-1
Indexed BySCI ; EI
Language英语
WOS IDWOS:000451780800038
PublisherSpringer New York LLC
EI Accession Number20181204916306
Citation statistics
Document Type期刊论文
Identifierhttp://ir.opt.ac.cn/handle/181661/30849
Collection光学影像学习与分析中心
Corresponding AuthorLu, Xiaoqiang
Affiliation1.Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an; Shaanxi; 710119, China;
2.University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing; 100049, China
Recommended Citation
GB/T 7714
Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Multi-modal gated recurrent units for image description[J]. Multimedia Tools and Applications,2018,77(22):29847-29869.
APA Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2018).Multi-modal gated recurrent units for image description.Multimedia Tools and Applications,77(22),29847-29869.
MLA Li, Xuelong,et al."Multi-modal gated recurrent units for image description".Multimedia Tools and Applications 77.22(2018):29847-29869.
Files in This Item:
File Name/Size DocType Version Access License
Multi-modal gated re(2037KB)期刊论文出版稿开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Multi-modal gated recurrent units for image description.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.