Multimodal Image Aesthetic Prediction with Missing Modality

doi:10.3390/math10132312

OPT OpenIR > 空间光学技术研究室

	Multimodal Image Aesthetic Prediction with Missing Modality
	Zhang, Xiaodan 1; Song, Qiao 1; Liu, Gang2
作者部门	空间光学技术研究室
	2022-07
发表期刊	MATHEMATICS
ISSN	2227-7390
卷号	10 期号:13
产权排序	2
摘要	With the increasing growth of multimedia data on the Internet, multimodal image aesthetic assessment has attracted a great deal of attention in the image processing community. However, traditional multimodal methods often have the following two problems: (1) Existing multimodal image aesthetic methods are based on the assumption that full modalities are available in all samples, which is unapplicable in most cases since textual information is more difficult to obtain. (2) They only fuse multimodal information at a single level and ignore their interaction at different levels. To address these two challenges, we proposed a novel framework termed Missing-Modility-Multimodal-Bert networks (MMMB). To achieve the completeness, we first generate the missing textual modality conditioned on the available visual modality. We then project the image features to the token space of the text, and use the transformer's self-attention mechanism to make the two different modalities information interact at different levels for earlier and more fine-grained fusion, rather than only at the final layer. A large number of experiments on two large benchmark datasets in the field of image aesthetic quality evaluation: AVA and Photo.net demonstrate that the proposed model significantly improves image aesthetic assessment performance under both textual missing modality condition and full-modality condition.
关键词	image aesthetic quality assessment multimodal learning missing multimodal data transformer
DOI	10.3390/math10132312
收录类别	SCI
语种	英语
WOS记录号	WOS:000823883500001
出版者	MDPI
引用统计	被引频次：1[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.opt.ac.cn/handle/181661/96060
专题	空间光学技术研究室
通讯作者	Liu, Gang
作者单位	1.Northwest Univ, Sci & Technol Informat Inst, Xian 710127, Peoples R China 2.Chinese Acad Sci, Xian Inst Opt & Precis Mech, Xian 710119, Peoples R China
推荐引用方式 GB/T 7714	Zhang, Xiaodan,Song, Qiao,Liu, Gang. Multimodal Image Aesthetic Prediction with Missing Modality[J]. MATHEMATICS,2022,10(13).
APA	Zhang, Xiaodan,Song, Qiao,&Liu, Gang.(2022).Multimodal Image Aesthetic Prediction with Missing Modality.MATHEMATICS,10(13).
MLA	Zhang, Xiaodan,et al."Multimodal Image Aesthetic Prediction with Missing Modality".MATHEMATICS 10.13(2022).

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Multimodal Image Aes（1512KB）	期刊论文	出版稿	限制开放	CC BY-NC-SA	请求全文