I am currently an Associate Professor at University of Science and Technology Beijing, China (北京科技大学,计算机与通信工程学院). Prior to that I worked as a research fellow at National University of Singapore (NUS), Singapore, supervised by Prof. Haizhou Li (李海洲). I received my Ph.D. degree from Queen Mary, University of London (QMUL), U.K., under the supervision of Prof. Andrea Cavallaro. During my Ph.D. degree, I went to Fondazione Bruno Kessler (FBK), Trento, Italy, as a research assistant, supervised by Dr. Maurizio Omologo and Dr. Alessio Brutti. I received my B.Eng. and M.Sc. degrees both from the University of Edinburgh, U.K., supervised by Prof. James Hopgood.
My research interest mainly focuses on audio-visual fusion, includes speech processing, speaker localization and tracking, active speaker detection, gesture synthesis, automatic speech recognition. I have published more than 30 papers at the top-tiered international AI journals/conferences such as TMM, TASLP, TII, ACM MM, ICRA ICASSP, INTERSPEECH.
🎉🎉 Our lab in USTB is actively looking for research assistants and postgraduate students. Please contact me at qianxy@ustb.edu.cn for more details. 开展以深度学习为核心的语音信号处理、视觉+听觉多模态人机交互研究,学生可以根据兴趣自由选择
【课题组经费充足,科研氛围浓厚,现招收2025年入学计算机科学与技术硕士研究生、博士研究生,也欢迎优秀的本科生加入】欢迎计算机基础较好,有程序设计竞赛或者科研经历,有志于攻读硕士/博士研究生和出国深造的同学联系我 (附CV及自我介绍,qianxy@ustb.edu.cn) 你将获得:
- 参加国内/国际会议的机会
- 海内外名校导师联合指导
- 有机会推荐到英国爱丁堡大学、萨里大学、伦敦玛丽女王大学、香港科技大学、香港中文大学(深圳)、新加坡国立大学等学习访问
🔥 News
- 2024.07 two ACM MM paper accepted!
- 2024.05 one IJSR paper accepted!
- 2024.04 two INTERSPEECH paper accepted!
- 2024.02 one PRL paper accepted!
- 2023.12 three ICASSP paper accepted!
- 2023.11 Dr. Qian has been assigned as the IEEE Senior Member
- 2023.12 one TCSVT paper accepted!
- 2023.09 one TMM paper accepted!
- 2023.06 Dr. Qian passed the Tenure-track
- 2023.05 two INTERSPEECH paper accepted!
- 2023.04 one TASLP paper accepted!
- 2023.04 one CVPR paper accepted!
- 2023.03 Dr. Qian was invited as an Associate Editor (AE) of IROS 2023 with the track of robotic audition
📜 Research Area
Speech Processing : Speaker recognition and verification 说话人识别;Speech separation and extraction 语音分离;Key-word spotting 关键词检测; Automatic Speech Recognition 语音识别 |
Computer Vision : Face detection and recognition 人脸检测及识别; Lip reading 唇读;Gesture synthesis 姿态生成 |
Multi-modal Processing : Audio-visual active speaker detection 说话人活跃检测; Text-to-speech Synthesis 语音合成;Speaker Localization and Tracking 声源定位及追踪 |
Self-supervised Learning : Self-supervised speech processing 自监督学习 |
💻 Research Experiences
- 2022.10 - Present, Associate Professor, University of Science and Technology of Beijing (USTB), Beijing, China.
- 2022.03 - 2022.09, Visiting Scholar, Chinese University of Hong Kong (CUHKSZ), Shenzhen, China.
- 2020.02 - 2022.02, Research Fellow, National University of Singapore (NUS), Singapore.
- 2017.04 - 2018.12, Research Asistant, Fondazione Bruno Kessler (FBK), Trento, Italy.
- 2014.06 - 2014.08, Research Asistant, Heriot-Watt University (HWU), Edinburgh, United Kingdom.
📖 Educations
- 2015.11 - 2019.11, Ph.D. in Computer Scicence, Queen Mary, University of London (QMUL), London, U.K.
- 2014.08 - 2015.08, M.Sc. in Signal Processing and Communications, University of Edinburgh (UoE), U.K. (Distinction,卓越)
- 2012.09 - 2014.06, B.Eng. in Electronics and Electrical Engineering, University of Edinburgh (UoE), U.K. (First Class Honors,一等荣誉)
- 2010.09 - 2012.06, B.Eng. in Information Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China. (Top: 3%)
📝 Publications
– 2024 –
- Miao Liu, Jing Wang, Xinyuan Qian, Haizhou Li, [RListenFormer: Responsive Listening Head Generation with Generated Listening HGeeandesrated Listening Heads Non-autoregressive Transformers], ACM MM, 2024
- Xianghu Yue, Xueyi Zhang, Yiming Chen, Chengwei Zhang, Mingrui Lao, Huiping Zhuang, Xinyuan Qian
*
, Haizhou Li, [MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks], ACM MM, 2024 - Xinyuan Qian, Jingkai Xu, Yuxuan Gao, Minshu Li, Wanlin Li, Xu-Cheng Yin, [Understanding Dynamic Auditory Perception for Water Filling Level Estimation], IJSR, 2024
- Xinyuan Qian, Hao Tang, Jichen Yang, Hongxu Zhu, Xu-Cheng Yin, Dual-Path Transformer-Based GAN for Co-speech Gesture Synthesis, IJSR, 2024
- Yan Liu; Li_fang Wei; Xinyuan Qian
*
; Tianhao Zhang; Songlu Chen; Xucheng Yin, M3TTS: Multi-Modal Text-to-Speech of Multi-Scale Style Control for Dubbing, PRL, 2024 - Miao Liu, Jing Wang, Xinyuan Qian, Xiang Xie, Visually Guided Binaural Audio Generation with Cross-Modal Consistency, PRL, 2024
- Yu Chen, Xinyuan Qian
*
, Zexu Pan, Kainan Chen, Haizhou Li, LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism, ICASSP, 2024 - Xinyuan Qian, Zexu Pan, Qiquan Zhang, Kainan Chen, Shoufeng Lin, GLMB 3D speaker tracking with video-assisted multi-channel audio optimization functions, ICASSP, 2024
– 2023 –
- Miao Liu, Jing Wang, Xinyuan Qian, Haizhou Li, Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss, TCSVT, 2023
- Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao, Yiming Wang, Kainan Chen, Haizhou Li, [Bi-directional Image-Speech Retrieval Through Geometric Consistency], ICCVW, 2023
- Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao, Haizhou Li, Deep Cross-modal Retrieval Between Spatial Image and Acoustic Speech, TMM, 2023
- Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian
*
, Xu-Cheng Yin Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding, INTERSPEECH, 2023 - Longting Xu, Jichen Yang
*
, Chang Huai You, Xinyuan Qian*
, Daiyu Huang, Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection, TASLP, 2023. - Jiadong Wang, Xinyuan Qian
*
, Malu Zhang, Robby T Tan, Haizhou Li, Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert, CVPR, 2023. - Tian-Hao Zhang, Qi Liu, Xinyuan Qian
*
, Song-Lu Chen, Feng Chen, Xu-Cheng Yin*
, Self-Convolution for Automatic Speech Recognition, ICASSP, 2023. - Moran Chen, Qiquan Zhang, Qi Song, Xinyuan Qian, Ruijin Guo, Mingjiang Wang, Deying Chen Neural-Free Attention for Monaural Speech Enhancement Towards Voice User Interface for Consumer Electronics, TCE, 2023.
- Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi, A Miniaturised Camera-based Multi-Modal Tactile Sensor, ICRA, 2023.
– 2022 –
- Xinyuan Qian, Zhengdong Wang, Jiadong Wang, Guohui Guan, Haizhou Li, Audio-Visual Cross-Attention Network for Robotic Speaker Tracking,TASLP, 2022.
- Xinyuan Qian, Qiquan Zhang, Guohui Guan and Wei Xue, Deep Audio-visual Beamforming for Speaker Localization, SPL, 2022.
- Xinyuan Qian, Jichen Yang, Alessio Brutti, Speaker Front-back Disambiguity using Multi-channel Speech Signals, Electronics Letters 2022.
- Zexu Pan, Xinyuan Qian
*
, Haizhou Li, Speaker Extraction with Co-Speech Gestures Cue, SPL, 2022. - Qiquan Zhang, Xinyuan Qian
*
, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah, Haizhou Li, TFA-SE: A Time-Frequency Attention Module for Neural Speech Enhancement, TASLP, 2022. - Hongxu Zhu, Qiquan Zhang, Peng Gao, Xinyuan Qian, Speech-Oriented Sparse Attention Denoising for Voice User Interface Toward Industry 5.0, TII, 2022.
- Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Iterative Sound Source Localization for Unknown Number of Sources, TNTERSPEECH, 2022.
– 2021 –
- Xinyuan Qian, Alessio Brutti, Oswald Lanz, Maurizio Omologo, Andrea Cavallaro, Audio-visual tracking of concurrent speakers, TMM,2021.
- Xinyuan Qian, Qi Liu, Jiadong Wang, Haizhou Li, Three-Dimensional Speaker Localization: Audio-Refined Visual Scaling Factor Estimation, SPL, 2021.
- Xinyuan Qian, Bidisha Sharma, Amine El Abridi, Haizhou Li, SLoClas: A Database for Joint Sound Localization and Classification, COCOSDA, 2021, Best Paper Award.
- Xinyuan Qian, Maulik Madhavi, Zexu Pan, Jiadong Wang, Haizhou Li, Multi-target DoA estimation with an audio-visual fusion mechanism, ICASSP, 2021.
- Jiadong Wang, Xinyuan Qian
*
, Zihan Pan, Malu Zhang, Haizhou Li, GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization, ICRA, 2021. - Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li, Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection, ACM MM, 2021.
– 2020 and Before –
- Shoufeng Lin
#
, Xinyuan Qian#
, Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework, INTERSPEECH, 2020. - Xinyuan Qian, Alessio Brutti, Oswald Lanz, Maurizio Omologo, Andrea Cavallaro, Multi-speaker tracking from an audio–visual sensing device, TMM, 2019.
- Xinyuan Qian, Alessio Xompero, Alessio Brutti, Oswald Lanz, Maurizio Omologo, Andrea Cavallaro, 3D mouth tracking from a compact microphone array co-located with a camera, ICASSP,2018.
- Oswald Lanz, Alessio Brutti, Alessio Xompero, Xinyuan Qian, Maurizio Omologo, Andrea Cavallaro, Accurate Target Annotation in 3D from Multimodal Streams, ICASSP,2018.
- Xinyuan Qian, Alessio Brutti, Maurizio Omologo, Andrea Cavallaro, 3D audio-visual speaker tracking with an adaptive particle filter, ICASSP,2017.
- Deepayan Bhowmik, Andrew Wallace, Robert Stewart, Xinyuan Qian, Greg Michaelson, Profile driven dataflow optimisation of mean shift visual tracking, GlobalSIP,2014.
🎖 Certifications and Awards
- Best Paper Award, COCOSDA, 2021
- The 3rd place winner in the ActivityNet Challenge (Speaker), CVPR Workshop, 2021
- Outstanding international research associatant, FBK, Trento, Italy, 2019
- Full Ph.D. scholarship in QMUL, London, U.K., 2015-2019
- Outstanding Youth Female Research Engineer Scholarship, Edinburgh, U.K., 2014
- Excellent international student scholarship, Edinburgh, U.K., 2013-2014
- Shanghai 801 scholarship, 2011
- First-Class Scholarship Award, NUAA, Nanjing, China, 2010-2011
💬 Teaching
- 离散数学
- 模式识别基础
👔 Projects
- Young Scientists Fund of the National Natural Science Foundation of China 国家自然科学基金青年项目 (PI), 2023
- CCF-Tencent AI-Lab Open Fund 腾讯 AI Lab犀牛鸟专项(PI), 2023
- Fundamental Research Funds for the Central Universities 中央高校基本科研业务经费 (PI), 2023
- Eigenspace Audio Technology Project (PI), 2023
-
Beijing Municipal Natural Science Foundation - Xiaomi 北京市自然科学基金——小米创新联合基金项目 (Co-PI), 2023
- Human Robot Interaction Project-Phase 1, Singapore, 2020-2022
- Huawei Research&Design Project, Shenzhen, China, 2022
- Shenzhen Research Institute of Big Data Internal Project (SRIBD), Shenzhen, China, 2022
💬 Reviewer
- Reviewer of TASLP, TMM, Neural Networks, ICASSP, INTERSPEECH, SPL, ICPR