Multi-modal gated recurrent units for image description

doi:10.1007/s11042-018-5856-1

OPT OpenIR > 光谱成像技术研究室

	Multi-modal gated recurrent units for image description
	Li, Xuelong1,2 ; Yuan, Aihong 1,2; Lu, Xiaoqiang1
作者部门	光学影像学习与分析中心
	2018-11-01
发表期刊	Multimedia Tools and Applications
ISSN	13807501;15737721
卷号	77 期号:22 页码:29847-29869
产权排序	1
摘要	Using a natural language sentence to describe the content of an image is a challenging but very important task. It is challenging because a description must not only capture objects contained in the image and the relationships among them, but also be relevant and grammatically correct. In this paper a multi-modal embedding model based on gated recurrent units (GRU) which can generate variable-length description for a given image. In the training step, we apply the convolutional neural network (CNN) to extract the image feature. Then the feature is imported into the multi-modal GRU as well as the corresponding sentence representations. The multi-modal GRU learns the inter-modal relations between image and sentence. And in the testing step, when an image is imported to our multi-modal GRU model, a sentence which describes the image content is generated. The experimental results demonstrate that our multi-modal GRU model obtains the state-of-the-art performance on Flickr8K, Flickr30K and MS COCO datasets. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.
关键词	Image Description Gated Recurrent Unit Convolutional Neural Network Multi-modal Embedding
DOI	10.1007/s11042-018-5856-1
收录类别	SCI ; EI
语种	英语
WOS记录号	WOS:000451780800038
出版者	Springer New York LLC
EI入藏号	20181204916306
引用统计	被引频次：18[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.opt.ac.cn/handle/181661/30849
专题	光谱成像技术研究室
通讯作者	Lu, Xiaoqiang
作者单位	1.Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an; Shaanxi; 710119, China; 2.University of Chinese Academy of Sciences, 19A Yuquanlu, Beijing; 100049, China
推荐引用方式 GB/T 7714	Li, Xuelong,Yuan, Aihong,Lu, Xiaoqiang. Multi-modal gated recurrent units for image description[J]. Multimedia Tools and Applications,2018,77(22):29847-29869.
APA	Li, Xuelong,Yuan, Aihong,&Lu, Xiaoqiang.(2018).Multi-modal gated recurrent units for image description.Multimedia Tools and Applications,77(22),29847-29869.
MLA	Li, Xuelong,et al."Multi-modal gated recurrent units for image description".Multimedia Tools and Applications 77.22(2018):29847-29869.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Multi-modal gated re（2037KB）	期刊论文	出版稿	限制开放	CC BY-NC-SA	请求全文