A probabilistic model, i.e., Gaussian mixture model (GMM), maps sequences of feature vectors to a sequence of phonemes. Since the 1980s, the HMM has been a paradigm that is used to learn this mapping of phones into HMM sequences. The acoustic features are extracted from the raw speech signal, and the output of the acoustic model is a likely phone sequence, which corresponded to the particular speech utterance. The generic model of an ASR system is shown in Figure 1. Similarly, an exemplary acoustic model architecture and training method ensure that the feature vectors are robustly matched with the phoneme classes to ensure proper feature classification. The acoustic features help to extract concise information from the input acoustic waveform, which is useful for recognizing the word and phonetic content of speech. This will only be possible when we have standardized models for speech recognition, and such systems will ultimately facilitate all kinds of users regardless of their background, education, and lifestyle to have a natural interaction with devices.Īcoustic feature extraction and acoustic models are two crucial parts of a speech recognition system. To provide the best experience to the users while interacting with more advanced smart devices, it is necessary to have more robust and efficient interfaces for human-machine interaction. Thirdly, the number of users is growing day-by-day for smart devices such as mobile phones, smart wearables, smart homes, and infotainment systems in vehicles. Secondly, with the advancements in big data technologies, we now have access to large databases that we can use to train the generic models more efficiently. First is the availability of better computational resources. There are various reasons for this trend. Mobile applications like Google Assistant, Amazon’s Alexa, and Apple’s Siri are redefining the way we interact with our smart devices. Over time, the technology is getting more mature and more natural to integrate into smart devices therefore, the use of ASR is increasing in different applications. Our proposed technique is tested on clean and noisy data along with locally generated data and achieves much better results than existing state-of-the-art techniques, thus setting a new benchmark. A novel technique is proposed for noise robustness by augmenting noise in training data. We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques. Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly those that have not learned the noise efficiently. However, none of them catered to noise in the same. Various researchers have worked on enhancing the efficiency of speech command based systems and used the speech command dataset. There are several speech command based applications in the area of robotics, IoT, ubiquitous computing, and different human-computer interfaces. In the last two decades, extensive research has been initiated by researchers and different organizations to experiment with new techniques and their applications in speech processing systems. The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition.
0 Comments
Leave a Reply. |