OPT OpenIR  > 光学影像学习与分析中心
Multi-modal gated recurrent units for image description
Li, Xuelong1,2; Yuan, Aihong1,2; Lu, Xiaoqiang1
Source PublicationMultimedia Tools and Applications
Contribution Rank1

Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.

KeywordImage Description Gated Recurrent Unit Convolutional Neural Network Multi-modal Embedding
Indexed BySCI ; EI
WOS IDWOS:000451780800038
PublisherSpringer New York LLC
EI Accession Number20181204916306
Citation statistics
Document Type期刊论文
Corresponding AuthorLu, Xiaoqiang
Affiliation1.Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an; Shaanxi; 710119, China;
2.University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing; 100049, China
Recommended Citation
GB/T 7714
Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Multi-modal gated recurrent units for image description[J]. Multimedia Tools and Applications,2018,77(22):29847-29869.
APA Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2018).Multi-modal gated recurrent units for image description.Multimedia Tools and Applications,77(22),29847-29869.
MLA Li, Xuelong,et al."Multi-modal gated recurrent units for image description".Multimedia Tools and Applications 77.22(2018):29847-29869.
Files in This Item:
File Name/Size DocType Version Access License
Multi-modal gated re(2037KB)期刊论文出版稿限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Xuelong]'s Articles
[Yuan, Aihong]'s Articles
[Lu, Xiaoqiang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.