Kai Zhen

Home CV Publications Demos Presentations

Journal Articles   Conference Proceedings   Patents   Workshop Papers

===2022===

C-006 Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, and Ariya Rastrow, "Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Incheon, Korea, September 18-22, 2022.

===2021===

J-002 Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim, "Scalable and Efficient Neural Speech Coding: A Hybrid Design," IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 2021 (Accepted for publication).

P-005 Mi Suk Lee, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Jin Soo Choi, Minje Kim, Kai Zhen, "Method and apparatus for processing audio signal," U.S. Patent Application US20210233547A1.

P-004 Minje Kim, Kai Zhen, Mi Suk Lee, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Jin Soo Choi "Residual Coding Method of Linear Prediction Coding Coefficient Based on Collaborative Quantization, and Computing Device for Performing the Method," U.S. Patent Application No. 17/098,090.

C-005 Kai Zhen, Hieu Duy Nguyen, Feng-Ju (Claire) Chang, Athanasios Mouchtaris, and Ariya Rastrow, "Sparsification via Compressed Sensing for Automatic Speech Recognition," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, ON, Canada, June 6-12, 2021.
[pdf]
*from the summer internship with Amazon

C-004 Haici Yang, Kai Zhen, Seungkwon Beack, Minje Kim, "Source-Aware Neural Speech Coding for Noisy Speech Compression," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, ON, Canada, June 6-12, 2021.
[pdf]


===2020===

J-001 Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim, "Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding," IEEE Signal Processing Letters, vol. 27, pp. 2159-2163, 2020, doi: 10.1109/LSP.2020.3039765..
[demo] [pdf] [code]

C-003 Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim, "Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 4-8, 2020.
[demo] [pdf] [code]

C-002 Kai Zhen, Mi Suk Lee, Minje Kim. "A Dual-Staged Context Aggregation Method towards Efficient End-To-End Speech Enhancement," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 4-8, 2020.
[demo] [pdf]

P-003 Minje Kim, Kai Zhen, Mi Suk Lee, "Apparatus and Method for Speech Processing Using a Densely Connected Hybrid Neural Network," US Patent Application, 2020.

P-002 Minje Kim, Kai Zhen, Seungkwon Beack, et al, "Audio Signal Encoding Method and Audio Signal Decoding Method, And Encoder And Decoder Performing the Same," US Patent Application, US20200135220A1.

W-004 Kai Zhen, Hieu Duy Nguyen, Feng-Ju (Claire) Chang, Athanasios Mouchtaris. Network Sparsification for On-Device ASR. Amazon Machine Learning Conference (AMLC) Workshop on Network Inference Optimization, 2020.


===2019 and earlier===

C-001 Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, and Minje Kim, "Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Graz, Austria, September 15-19, 2019.
[demo] [pdf]

P-001 Minje Kim, Aswin Sivaraman, Kai Zhen, Jongmo Sung, et al, "Audio signal encoding method and apparatus and audio signal decoding method and apparatus using psychoacoustic-based weighted error function," US Patent Application, US20190164052A1.

W-003 Kai Zhen, Aswin Sivaraman, Jongmo Sung, Minje Kim, "On Psychoacoustically Weighted Cost Functions Towards Resource-efficient Deep Neural Networks for Speech Denoising," The 7th Annual Midwest Cognitive Science Conference, Bloomington, IN, 2018.
[pdf]

W-002 Peter Miksza, Kevin Watson, Kai Zhen, Sanna Wager, Minje Kim, "Relationships between experts' subjective ratings of jazz improvisations and computational measures of melodic entropy," The Improvising Brain III: Cultural Variation and Analytical Techniques Symposium, Atlanta, GA, in Feb, 2017.

W-001 Kai Zhen and David Crandall, "Finding egocentric image topics through convolutional neural network based representations (extended abstract)," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Egocentric Computer Vision, Las Vegas, US, June 26 - July 1, 2016.