OPT OpenIR  > 光谱成像技术研究室
Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition
Yuan, Yuan; Tian, Chunlin; Lu, Xiaoqiang1
作者部门光学影像学习与分析中心
2018
发表期刊IEEE ACCESS
ISSN2169-3536
卷号6页码:5573-5583
产权排序1
摘要

Audio-visual speech recognition (AVSR) utilizes both audio and video modalities for the robust automatic speech recognition. Most deep neural network (DNN) has achieved promising performances in AVSR owing to its generalized and nonlinear mapping ability. However, these DNN models have two main disadvantages: 1) the first disadvantage is that most models alleviate the AVSR problems neglecting the fact that the frames are correlated; and 2) the second disadvantage is the feature learned by the mentioned models is not credible. This is because the joint representation learned by the fusion fails to consider the specific information of categories, and the discriminative information is sparse, while the noise, reverberation, irrelevant image objection, and background are redundancy. Aiming at relieving these disadvantages, we propose the auxiliary loss multimodal GRU (alm-GRU) model including three parts: feature extraction, data augmentation, and fusion & recognition. The feature extraction and data augmentation are a complete effective solution for the processing raw complete video and training, and precondition for later core part: fusion & recognition using alm-GRU equipped with a novel loss which is an end-to-end network combining both fusion and recognition, furthermore considering the modal and temporal information. The experiments show the superiority of our model and necessity of the data augmentation and generative component in the benchmark data sets.

 

关键词Aduio-visual Systems Recurrent Neural Networks Generative Adversarial Networks
DOI10.1109/ACCESS.2018.2796118
收录类别SCI ; EI
语种英语
WOS记录号WOS:000426304300001
EI入藏号20180704784763
引用统计
被引频次:27[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.opt.ac.cn/handle/181661/30774
专题光谱成像技术研究室
作者单位1.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr Opt Imagery Anal & Learning, Xian 710119, Shaanxi, Peoples R China;
2.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Yuan, Yuan,Tian, Chunlin,Lu, Xiaoqiang. Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition[J]. IEEE ACCESS,2018,6:5573-5583.
APA Yuan, Yuan,Tian, Chunlin,&Lu, Xiaoqiang.(2018).Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition.IEEE ACCESS,6,5573-5583.
MLA Yuan, Yuan,et al."Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition".IEEE ACCESS 6(2018):5573-5583.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Auxiliary Loss Multi(1638KB)期刊论文出版稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yuan, Yuan]的文章
[Tian, Chunlin]的文章
[Lu, Xiaoqiang]的文章
百度学术
百度学术中相似的文章
[Yuan, Yuan]的文章
[Tian, Chunlin]的文章
[Lu, Xiaoqiang]的文章
必应学术
必应学术中相似的文章
[Yuan, Yuan]的文章
[Tian, Chunlin]的文章
[Lu, Xiaoqiang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。