Positions Held

  • Amazon: Applied Scientist Intern
    • May. 2020 - Aug. 2020
    • Alexa Edge ML team
  • LinkedIn Corporation: Machine Learning & Relevance Intern
    • May. 2019 - Aug. 2019, Mountain View, CA
    • Ads AI group
    • Supervisors: Sara Smoot, Lijun Peng, Hiroto Udagawa
    • Project: ads response rate prediction in wide-n-deep estimators and BERT
  • LinkedIn Corporation: Machine Learning & Relevance Intern
    • May. 2018 - Aug. 2018, New York City, NY
    • Company standardization group
    • Supervisors: Xiaoqiang Luo, Deirdre Hogan
    • Project: relevance ranking for resume builder with deep neural networks


Publication and Patents

My research focuses on designing efficient (or low power) neural network algorithms for the application to speech/audio coding and enhancement. This is critical, especially in the era of IoT, for a wide range of devices with limited energy supply (cellphone, hearing aids, smart home assistants, etc). We resort to not only the power of deep learning as a computational paradigm but conventional digital signal processing (DSP) techniques as well: an elegant solution is usually found by bridging these two. For example, we proposed a method to revive the conventional multi-staged residual coding scheme in neural network for speech coding; we also presented a collaborative quantization scheme to enable a trainable LPC quantization along with LPC residual coding. I've also worked on re-engineering psychoacoustics to achieve a more perceptually salient cost function for neural speech and audio processing. These works have led to academic publications, patents, and more research funding.

  • Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, and Minje Kim, "Efficient And Scalable Neural Residual Waveform Coding with Collaborative Quantization," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 4-8, 2020.
  • Kai Zhen, Mi Suk Lee, Minje Kim. "A Dual-Staged Context Aggregation Method towards Efficient End-To-End Speech Enhancement," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 4-8, 2020.
  • Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, and Minje Kim, "Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding," In Proc. Annual Conference of the International Speech Communication Association (Interspeech), Graz, Austria, September 15-19, 2019.
    [demo] [bib]
  • Minje Kim, Aswin Sivaraman, Kai Zhen, Jongmo Sung, et al, "Audio signal encoding method and apparatus and audio signal decoding method and apparatus using psychoacoustic-based weighted error function," US Patent Application, US 2019 / 0164052 A1.
  • Kai Zhen, Aswin Sivaraman, Jongmo Sung, Minje Kim, "On Psychoacoustically Weighted Cost Functions Towards Resource-efficient Deep Neural Networks for Speech Denoising," The 7th Annual Midwest Cognitive Science Conference, Bloomington, IN, 2018. [bib]
  • Peter Miksza, Kevin Watson, Kai Zhen, Sanna Wager, Minje Kim, "Relationships between experts’ subjective ratings of jazz improvisations and computational measures of melodic entropy," The Improvising Brain III: Cultural Variation and Analytical Techniques Symposium, Atlanta, GA, in Feb, 2017.
  • Kai Zhen and David Crandall, "Finding egocentric image topics through convolutional neural network based representations (extended abstract)," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Egocentric Computer Vision, Las Vegas, US, June 26 - July 1, 2016.