Modeling strategies for speech enhancement in the latent space of a neural audio codec

This repository provides the official model checkpoints for the paper Modeling strategies for speech enhancement in the latent space of a neural audio codec authored by Sofiene Kammoun, Xavier Alameda-Pineda, and Simon Leglaive, and published at IEEE ICASSP 2026.

We explore different modeling strategies (autoregressive vs. non-autoregressive) and representation spaces (discrete vs. continuous) for speech enhancement using neural audio codecs and Conformer-based architectures.

arXiv | Code and Audio examples | Bibtex

Overview

Our work introduces and compares a family of speech enhancement models that systematically vary along two main axes:

  • Representation Type

  • Discrete tokens

  • Continuous latent vectors

  • Modeling Strategy

  • Autoregressive (AR): Sequential prediction of clean speech representation

  • Non-Autoregressive (NAR): Parallel prediction of clean speech representation

The current release includes the following models:

Model Name Modeling Strategy Input Representation Output Representation Model Checkpoint
D-AR Autoregressive Discrete Discrete D-AR_ckpt_300.pt
D-NAR Non-Autoregressive Discrete Discrete D-NAR_ckpt_300.pt
D-NAR* Non-Autoregressive Continuous Discrete D-NAR_star_ckpt_300.pt
C-AR Autoregressive Continuous Continuous C-AR_ckpt_300.pt
C-NAR Non-Autoregressive Continuous Continuous C-NAR_ckpt_300.pt

Additional models:

  • C-FT (C-FT-encoder_ckpt_300.pt) and D-FT (D-FT-encoder_ckpt_300.pt), where we only finetune the NAC's encoder with an MSE loss and a cross-entropy loss, respectively.
  • STFT-NAR (STFT_NAR_Mask_ckpt_300.pt), where instead of the embeddings of the NAC, we work with STFT representations, and we train the model to output an STFT mask.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SofieneK/SE-NAC

Paper for SofieneK/SE-NAC