Language Models in Action, NLP

Suppose one day you want to home-brew your own SIRI for your daily assistance. How can it be done? Well, surprisingly not too difficult. In this post, we will take the basics of home-brewing SIRI into action.


Of course, given this is a short reading, there will be no way to explain everything in fine detail such as voice embedding extraction, Mel-Spectrogram analysis, text-to-speech voice synthesis, time-delayed neural network, or all the fancy language models used in this technology. Instead, we will focus on one core area: how your cell phone understands your questions and provides some reasonable answer?


Haoran Pu

