OPT OpenIR  > 光学影像学习与分析中心
Video captioning with tube features
Zhao, Bin1; Li, Xuelong2; Lu, Xiaoqiang2
Conference Name27th International Joint Conference on Artificial Intelligence, IJCAI 2018
Source PublicationProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
Conference Date2018-07-13
Conference PlaceStockholm, Sweden
PublisherInternational Joint Conferences on Artificial Intelligence
Contribution Rank2
AbstractVisual feature plays an important role in the video captioning task. Considering that the video content is mainly composed of the activities of salient objects, it has restricted the caption quality of current approaches which just focus on global frame features while paying less attention to the salient objects. To tackle this problem, in this paper, we design an object-aware feature for video captioning, denoted as tube feature. Firstly, Faster-RCNN is employed to extract object regions in frames, and a tube generation method is developed to connect the regions from different frames but belonging to the same object. After that, an encoder-decoder architecture is constructed for video caption generation. Specifically, the encoder is a bi-directional LSTM, which is utilized to capture the dynamic information of each tube. The decoder is a single LSTM extended with an attention model, which enables our approach to adaptively attend to the most correlated tubes when generating the caption. We evaluate our approach on two benchmark datasets: MSVD and Charades. The experimental results have demonstrated the effectiveness of tube feature in the video captioning task. © 2018 International Joint Conferences on Artificial Intelligence. All right reserved.
Indexed ByEI
EI Accession Number20184406016719
Document Type会议论文
Affiliation1.School of Computer Science, Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an; 710072, China;
2.Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an; 710119, China
Recommended Citation
GB/T 7714
Zhao, Bin,Li, Xuelong,Lu, Xiaoqiang. Video captioning with tube features[C]:International Joint Conferences on Artificial Intelligence,2018:1177-1183.
Files in This Item:
File Name/Size DocType Version Access License
Video captioning wit(1466KB)会议论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhao, Bin]'s Articles
[Li, Xuelong]'s Articles
[Lu, Xiaoqiang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhao, Bin]'s Articles
[Li, Xuelong]'s Articles
[Lu, Xiaoqiang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhao, Bin]'s Articles
[Li, Xuelong]'s Articles
[Lu, Xiaoqiang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.