Multitask Noisy Speech Enhancement System

- Speech band equalizer
- Dynamics processing
- Noise gate
- Signal level limiter
- Clipping restoration
- Noise reduction
- Noise whitening
- Blind deconvolution
- Spectrum analyser
- Time stretching
- Spectral expander
- Fourier corrector
- Neural network corrector
- Decorrelation
- Joint approximation
- Homomorphic approximation
- Reverberation
- Synchronisation
- Normalisation
Contact info

Dynamics processing

The dynamics processing module alters the dynamic range of the signal. The typical dynamics processing operations are:

  • compression - the dynamic range of the signal is reduced (smaller difference in level between soft and loud signal parts),
  • expansion - the dynamic range of the signal is enlarged, the soft parts of the signal are enhanced.

The dynamics processing is useful if one needs to:

  • equalize the loudness of the sound (compressor),
  • enlarge the dynamic range of the sound (expander),
  • attenuate or enhance selected frequency ranges (dynamics processing in frequency bands),
  • remove signal parts with level below the given threshold (noise gate),
  • limit the maximum signal level value (limiter).

The dynamics processing module may be used in the restoration process:

  • at the first stage (pre-processing), in order to equalize or enlarge the dynamic range of the signal before it is applied to other restoration procedures,
  • at the final stage - in order to equalize and smoothen the loudness of the processed sound.

The dynamics processor is described by:

  • static response - output level vs. input level; in the simplest case two parameters are used: threshold and ratio;
  • dynamic response - describe how quickly the dynamics processor reacts to changes in input signal level; two parameters are used: attack time (TA) and release time (TR), usually expressed in milliseconds.

The dynamics processing is performed according to the static and dynamic responses set by the user. Any number of threshold points may be used to define static response of the processor. The root-mean-square value detector estimates the level of the input signal. The attack time and release time control the function of the rms detector. Next, the output level is calculated according to the static response, on the basis of the rms value of the signal. The output level is converted to the gain factor and values of sound samples of input signal are modified accordingly. Additionally, signal level limiter is used at the output of the system in order to avoid signal clipping.

The dynamics processing may be performed either on the whole signal or in frequency bands. In the latter case, speech critical bands (filter bank with flat frequency response) or equally-contributing bands may be used. See the speech band equalizer page for detailed description of these two methods. The signal is filtered using the selected filter bank, then the dynamics processor is applied independently in each of the frequency bands and finally the signal is reconstructed.

The user selects the dynamics processing method (whole band, critical bands or equally-contributing bands) in the module window. If either of the multi-band methods is selected, the user needs to choose the current filter from the bank. For each of the frequency bands, the user can select the static and dynamic parameters of the processor. The static response of the system is defined as a set of threshold points connected with straight lines. The threshold points have form of pairs: input level - output level. They can be placed on the plot of static response using the mouse or entered as values using the keyboard. The threshold points can be edited. The dynamic parameters - attack time and release time - can also be adjusted, as well as additional parameters: delay time and smoothing factor (the length of averaging buffer). The result of the processing may be normalized according to the peak value or the rms value of the signal. All settings together may be saved as a preset and restored later for reuse.

The dynamics processing is recommended if the loudness of the signal varies over time. In this case, a compression should be used with one threshold point placed at the middle value of input signal level. Often speech signal is distorted in a way that parts of the recording are too soft and they are inaudible because other parts of the signal are too loud. In this case, a static system response having multiple threshold points should be used. The dynamics processor should work as the expander for low input levels and as the compressor for high input levels. The multi-band processing is recommended if distortions are clearly audible only in defined frequency ranges, for example, the compression may be used for high frequency bands if "hissing phonems" are audible. Spectral analysis may be very helpful before the multi-band dynamics processing is applied.