Heming Wang

Heming Wang

Ph.D. Student

Ohio State University

About Me

My name is Heming Wang, and I am currently a Ph.D. student in Computer Science and Engineering at the Ohio State University, advised by Prof. DeLiang Wang. Prior to joining OSU, I received my master degree in Applied Mathematics from University of Waterloo. My research interests focus on speech enhancement, self-supervised learning and generative AI.

Interests
  • Speech enhancement
  • Generative AI
  • Self-supervised learning
Education
  • PhD in Computer Science and Engineering, in progress

    The Ohio State University

  • MMath in Applied Mathematics, 2018

    University of Waterloo

  • BSc in Physics and Computer Science Minor, 2016

    University of Waterloo

Experience

 
 
 
 
 
Tencent AI Lab
Research Intern
Tencent AI Lab
May 2023 – Aug 2023 Seattle, Washinton
Nearfield sound resynthesis with vocoder /codec / self-supervised learning models.
 
 
 
 
 
Microsoft
Research Intern
Microsoft
May 2022 – Aug 2022 Seattle, Washinton
Use self-supervised learning to improve generative performance of speech representations.
 
 
 
 
 
Microsoft
Research Intern
Microsoft
May 2021 – Aug 2021 Seattle, Washinton
Use self-supervised learning to improve robust automatic speech recognition.

Recent Publications

(2024). Combined Generative and Predictive Modeling for Speech Super-resolution. arXiv preprint arXiv:2401.14269.

Cite

(2023). uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction. arXiv preprint arXiv:2309.13874.

PDF Cite

(2023). Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions. arXiv preprint arXiv:2309.09028.

PDF Cite

(2023). SpatialCodec: Neural Spatial Speech Coding. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). $ F0 $ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2023). Cross-Domain Diffusion Based Speech Enhancement for Very Noisy Speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2023). DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Wav2vec-switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Attention-Based Fusion for Bone-Conducted and Air-Conducted Speech Enhancement in the Complex Domain. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2022). Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2022). Cross-Domain Speech Enhancement with a Neural Cascade Architecture. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2021). Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

PDF Cite

(2021). Towards Robust Speech Super-resolution. IEEE/ACM transactions on audio, speech, and language processing.

PDF Cite

(2020). Time-frequency Loss for CNN based Speech Super-resolution. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Cite

(2018). A Diffusion-based Two-dimensional Empirical Mode Decomposition (EMD) Algorithm for Image Analysis. International Conference Image Analysis and Recognition.

PDF Cite

(2018). A Novel Foward-PDE Approach as an Alternative to Empirical Mode Decomposition. arXiv preprint arXiv:1802.00835.

PDF Cite

Contact