Heming Wang

Heming Wang

Research Scientist

Meta

About Me

My name is Heming Wang, and I am currently a research scientist with the Meta GenAI LlaMA Speech team. I hold a Ph.D. in Computer Science and Engineering from [The Ohio State University] (https://www.osu.edu/), where I had the privilege of being mentored by Professor DeLiang Wang. I also hold a master’s degree in Applied Mathematics from the University of Waterloo.

My research primarily focuses on advancing auditory perception, with a particular emphasis on speech enhancement, audio super-resolution, and machine learning applications that aim to enhance human listening experiences in complex environments.

Interests
  • Speech enhancement
  • Generative AI
  • Self-supervised learning
Education
  • PhD in Computer Science and Engineering, in progress

    The Ohio State University

  • MMath in Applied Mathematics, 2018

    University of Waterloo

  • BSc in Physics and Computer Science Minor, 2016

    University of Waterloo

Experience

 
 
 
 
 
Tencent AI Lab
Research Intern
Tencent AI Lab
May 2023 – Aug 2023 Seattle, Washinton
Nearfield sound resynthesis with vocoder /codec / self-supervised learning models.
 
 
 
 
 
Microsoft
Research Intern
Microsoft
May 2022 – Aug 2022 Seattle, Washinton
Use self-supervised learning to improve generative performance of speech representations.
 
 
 
 
 
Microsoft
Research Intern
Microsoft
May 2021 – Aug 2021 Seattle, Washinton
Use self-supervised learning to improve robust automatic speech recognition.

Recent Publications

(2024). Combined Generative and Predictive Modeling for Speech Super-resolution. arXiv preprint arXiv:2401.14269.

Cite

(2023). uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction. arXiv preprint arXiv:2309.13874.

PDF Cite

(2023). Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions. arXiv preprint arXiv:2309.09028.

PDF Cite

(2023). SpatialCodec: Neural Spatial Speech Coding. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). $ F0 $ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2023). Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Wav2vec-switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2022). Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Cross-Domain Speech Enhancement with a Neural Cascade Architecture. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2021). Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2021). Towards Robust Speech Super-resolution. IEEE/ACM transactions on audio, speech, and language processing.

PDF Cite

(2020). Time-frequency Loss for CNN based Speech Super-resolution. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2018). A Diffusion-based Two-dimensional Empirical Mode Decomposition (EMD) Algorithm for Image Analysis. International Conference Image Analysis and Recognition.

PDF Cite

(2018). A Novel Foward-PDE Approach as an Alternative to Empirical Mode Decomposition. arXiv preprint arXiv:1802.00835.

PDF Cite

Contact