Overview¶

What is the DAE?¶

A denoising autoencoder (DAE) is a neural network trained to reconstruct a clean signal from a corrupted version of itself. Here the inputs are spectral features extracted from speech utterances and the corruption is additive noise drawn from the DEMAND corpus.

┌─────────────────────────────────────────────────────────────┐
│                        Training                             │
│                                                             │
│  Clean speech  ──► STFT ──► log|·| ──► noisy frame x̃      │
│  Noise excerpt ──► mix (SNR ∈ {0, 5, 10} dB)               │
│                                                             │
│  x̃  ──► Encoder ──► z ──► Decoder ──► x̂  ──► MSE(x̂, x)  │
└─────────────────────────────────────────────────────────────┘

At inference time only the noisy frame is available. The decoder’s output is an enhanced estimate of the clean spectrum, which is inverted back to audio by re-applying the noisy phase (phase borrowing) and calling librosa.istft().

Feature representations¶

Three feature variants are implemented, each with its own training script:

Script	Feature	Extractor
`simpleAE_logmag_nc`	Log-magnitude STFT frame	`LogMagnitudeSpectrumExtractor`
`simpleAE_power_nc`	Power STFT frame	`PowerSpectrumExtractor`
`simpleAE_mel_nc`	Log-mel power window	`LogMelPowerSpectrumExtractor`

The log-magnitude variant (simpleAE_logmag_nc) is the production model.

Network architecture¶

The encoder and decoder are symmetric stacks of fully-connected layers with ReLU activations and LayerNorm. A LayerNorm is also prepended to normalise the raw input features.

For the log-magnitude model the architecture follows Nossier et al. (2020) architecture (d):

Stage	Layer sizes
Input	129
Encoder	2048 → 500 → 180 (bottleneck)
Decoder	180 → 500 → 2048 → 129

TensorBoard logging¶

All training scripts write metrics to runs/ (relative to the working directory). Launch TensorBoard to inspect them:

tensorboard --logdir runs

Logged scalars:

Loss/train — smoothed MSE on the current mini-batch.
Loss/val_quick — MSE on a partial validation pass (every N batches).
SNR/val_quick — SNR improvement in dB on the quick val pass.
Ratio/val_to_train — validation/training loss ratio (over-fit tracker).
GradNorm/encoder, GradNorm/decoder — L2 gradient norms.
Loss/val_epoch, SNR/val_epoch — full val-set metrics per epoch.