OPT OpenIR  > 光谱成像技术研究室
Disentangled Representation Learning for Cross-modal Biometric Matching
Ning, Hailong1; Zheng, Xiangtao2; Lu, Xiaoqiang3; Yuan, Yuan4
作者部门光谱成像技术研究室
2022
发表期刊IEEE Transactions on Multimedia
ISSN15209210;19410077
卷号24页码:1763-1774
产权排序1
摘要

Cross-modal biometric matching (CMBM) aims to determine the corresponding voice from a face, or identify the corresponding face from a voice. Recently, many CMBM methods have been proposed by forcing the distance between two modal features to be narrowed. However, these methods ignore the alignability between the two modal features. Because the feature is extracted under the supervision of identity information from single modal data, it can only reflect the identity information of single modal data. In order to address this problem, a disentangled representation learning method is proposed to disentangle the alignable latent identity factors and nonalignable the modality-dependent factors for CMBM. The proposed method consists of two main steps: 1) feature extraction and 2) disentangled representation learning. Firstly, an image feature extraction network is adopted to obtain face features, and a voice feature extraction network is applied to learn voice features. Secondly, a disentangled latent variable is explored to disentangle the latent identity factors that are shared across the modalities from the modality-dependent factors. The modality-dependent factors are filtered out, while the latent identity factors from the two modalities are enforced to be narrowed to align the same identity information. Then, the disentangled latent identity factors are considered as pure identity information to bridge the two modalities for cross-modal verification, 1:N matching, and retrieval. Note that the proposed method learns the identity information from the input face images and voice segments with only identity label as supervised information. Extensive experiments on the challenging VoxCeleb dataset demonstrate the proposed method outperforms the state-of-the-art methods. IEEE

关键词Cross-modal biometric matching Disentangled representation learning Latent identity factors Modalitydependent factors
DOI10.1109/TMM.2021.3071243
收录类别SCI ; EI
语种英语
WOS记录号WOS:000778959200002
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20211610231609
引用统计
被引频次:15[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.opt.ac.cn/handle/181661/94690
专题光谱成像技术研究室
作者单位1.Key Laboratory of Spectral Imaging Technology CAS, Xi'an Institute of Optics and Precision Mechanics,Chinese Academy of Sciences, Xi'An, ShannXi, China, 710119 (e-mail: ninghailong93@gmail.com);
2.Xian Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China, 710119 (e-mail: xiangtaoz@gmail.com);
3.OPTical IMagery Analysis and Learning, Chinese Academy of Sciences, Xi'an, China, 710119 (e-mail: luxq666666@gmail.com);
4.OPTical IMagery Analysis and Learning, Chinese Academy of Sciences, Xi'an, China, (e-mail: y.yuan1.ieee@gmail.com)
推荐引用方式
GB/T 7714
Ning, Hailong,Zheng, Xiangtao,Lu, Xiaoqiang,et al. Disentangled Representation Learning for Cross-modal Biometric Matching[J]. IEEE Transactions on Multimedia,2022,24:1763-1774.
APA Ning, Hailong,Zheng, Xiangtao,Lu, Xiaoqiang,&Yuan, Yuan.(2022).Disentangled Representation Learning for Cross-modal Biometric Matching.IEEE Transactions on Multimedia,24,1763-1774.
MLA Ning, Hailong,et al."Disentangled Representation Learning for Cross-modal Biometric Matching".IEEE Transactions on Multimedia 24(2022):1763-1774.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Disentangled Represe(6109KB)期刊论文出版稿限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ning, Hailong]的文章
[Zheng, Xiangtao]的文章
[Lu, Xiaoqiang]的文章
百度学术
百度学术中相似的文章
[Ning, Hailong]的文章
[Zheng, Xiangtao]的文章
[Lu, Xiaoqiang]的文章
必应学术
必应学术中相似的文章
[Ning, Hailong]的文章
[Zheng, Xiangtao]的文章
[Lu, Xiaoqiang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。