SPEECH CODING SAMPLES FOR CROSS-MODULE RESIDUAL LEARNING (CMRL) AND COLLABORATIVE QUANTIZATION (CQ)
Speech codecs learn compact representations of
speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual
quality at the cost of model complexity. We previously proposed
a cross-module residual learning (CMRL) pipeline as a module carrier with each autoencoder
reconstructing the residual from its preceding modules. By using linear predictive coding (LPC) as a pre-processor, CMRL showed comparable speech quality with the state-of-the-art codecs at ~24 kbps. But the performance is less competitive at lower bitrates.
We now propose a collaborative quantization (CQ) scheme to jointly
learn the codebook of LPC coefficients and the corresponding residuals.
CQ does not simply shoehorn LPC to a neural network, but
bridges the computational capacity of advanced neural network
models and traditional, yet efficient and domain-specific digital signal
processing methods in an integrated manner. This helps CQ achieve much higher quality than its predecessor at 9 kbps
with even lower model complexity.