OPT OpenIR  > 光学影像学习与分析中心
Video captioning with tube features
Zhao, Bin1; Li, Xuelong2; Lu, Xiaoqiang2
2018
会议名称27th International Joint Conference on Artificial Intelligence, IJCAI 2018
会议录名称Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
卷号2018-July
页码1177-1183
会议日期2018-07-13
会议地点Stockholm, Sweden
出版者International Joint Conferences on Artificial Intelligence
产权排序2
摘要Visual feature plays an important role in the video captioning task. Considering that the video content is mainly composed of the activities of salient objects, it has restricted the caption quality of current approaches which just focus on global frame features while paying less attention to the salient objects. To tackle this problem, in this paper, we design an object-aware feature for video captioning, denoted as tube feature. Firstly, Faster-RCNN is employed to extract object regions in frames, and a tube generation method is developed to connect the regions from different frames but belonging to the same object. After that, an encoder-decoder architecture is constructed for video caption generation. Specifically, the encoder is a bi-directional LSTM, which is utilized to capture the dynamic information of each tube. The decoder is a single LSTM extended with an attention model, which enables our approach to adaptively attend to the most correlated tubes when generating the caption. We evaluate our approach on two benchmark datasets: MSVD and Charades. The experimental results have demonstrated the effectiveness of tube feature in the video captioning task. © 2018 International Joint Conferences on Artificial Intelligence. All right reserved.
作者部门光学影像学习与分析中心
收录类别EI
ISBN号9780999241127
语种英语
ISSN号10450823
EI入藏号20184406016719
文献类型会议论文
条目标识符http://ir.opt.ac.cn/handle/181661/30697
专题光学影像学习与分析中心
作者单位1.School of Computer Science, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an; 710072, China;
2.Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an; 710119, China
推荐引用方式
GB/T 7714
Zhao, Bin,Li, Xuelong,Lu, Xiaoqiang. Video captioning with tube features[C]:International Joint Conferences on Artificial Intelligence,2018:1177-1183.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Video captioning wit(1466KB)会议论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhao, Bin]的文章
[Li, Xuelong]的文章
[Lu, Xiaoqiang]的文章
百度学术
百度学术中相似的文章
[Zhao, Bin]的文章
[Li, Xuelong]的文章
[Lu, Xiaoqiang]的文章
必应学术
必应学术中相似的文章
[Zhao, Bin]的文章
[Li, Xuelong]的文章
[Lu, Xiaoqiang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。