Heming Wang

Research Scientist

Meta

About Me

My name is Heming Wang, and I am currently a research scientist with the Meta GenAI LlaMA Speech team. I hold a Ph.D. in Computer Science and Engineering from The Ohio State University, where I had the privilege of being mentored by Professor DeLiang Wang. I also hold a master’s degree in Applied Mathematics from the University of Waterloo.

My research primarily focuses on advancing auditory perception, with a particular emphasis on speech enhancement, audio super-resolution, and machine learning applications that aim to enhance human listening experiences in complex environments.

Interests

Speech enhancement
Generative AI
Self-supervised learning

Education

PhD in Computer Science and Engineering, in progress

The Ohio State University
MMath in Applied Mathematics, 2018

University of Waterloo
BSc in Physics and Computer Science Minor, 2016

University of Waterloo

Experience

Research Intern

Tencent AI Lab

May 2023 – Aug 2023 Seattle, Washinton

Nearfield sound resynthesis with vocoder /codec / self-supervised learning models.

Research Intern

Microsoft

May 2022 – Aug 2022 Seattle, Washinton

Use self-supervised learning to improve generative performance of speech representations.

Research Intern

Microsoft

May 2021 – Aug 2021 Seattle, Washinton

Use self-supervised learning to improve robust automatic speech recognition.

Recent Publications

Heming Wang, Eric W Healy, Deliang Wang (2024). Combined Generative and Predictive Modeling for Speech Super-resolution. arXiv preprint arXiv:2401.14269.

Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu (2023). uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng (2023). Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction. arXiv preprint arXiv:2309.13874.

Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu (2023). Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions. arXiv preprint arXiv:2309.09028.

Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang, Dong Yu (2023). SpatialCodec: Neural Spatial Speech Coding. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Yixuan Zhang, Heming Wang, Deliang Wang (2023). $ F0 $ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Heming Wang, Deliang Wang (2023). Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, others (2023). DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu (2022). Wav2vec-switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Heming Wang, Xueliang Zhang, Deliang Wang (2022). Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Yixuan Zhang, Heming Wang, Deliang Wang (2022). Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech. Proc. Interspeech 2022.

Heming Wang, Xueliang Zhang, Deliang Wang (2022). Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, Deliang Wang (2022). Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Heming Wang, Deliang Wang (2022). Cross-Domain Speech Enhancement with a Neural Cascade Architecture. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Heming Wang, Deliang Wang (2021). Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Heming Wang, Deliang Wang (2021). Towards Robust Speech Super-resolution. IEEE/ACM transactions on audio, speech, and language processing.

Heming Wang, Deliang Wang (2020). Time-frequency Loss for CNN based Speech Super-resolution. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Heming Wang, Richard Mann, Edward R Vrscay (2018). A Diffusion-based Two-dimensional Empirical Mode Decomposition (EMD) Algorithm for Image Analysis. International Conference Image Analysis and Recognition.

Heming Wang, Richard Mann, Edward R Vrscay (2018). A Novel Foward-PDE Approach as an Alternative to Empirical Mode Decomposition. arXiv preprint arXiv:1802.00835.

Contact