Speech to text conversion: an application of NLP

Contents

This post was released as part of the Data Science Blogathon

Introduction

Speech is the most typical means of communication and also most of the population in the world relies on speech to talk to each other. A voice accreditation system translates spoken languages ​​into text. There are several real life samples of voice accreditation systems. as an example, Apple SIRI recognizes speech and truncates it to text. El sistema Speech-To-Text (STT) take a human voice emission as input and a string of words is needed as output. The sole purpose of this system is to extract, characterize and recognize information about speech.

CONTENT

1.System block diagram

2.How does voice accreditation work?

3.Convert an audio file to text

4. How about converting to different audio languages?

5.Microphone to text speech

6.Applications

7. Conclution

System block diagram

system block diagram |  speech to text conversion

Source

1.Acoustic model

To recognize speech, a voice accreditation engine uses the acoustic model. To create an acoustic model, we need to take audio recordings of speech and its text transcripts, and we use software to create statistical representations of the sounds that make up each word.

2.Language model

A language model is a file that includes the probabilities of sequences of words. We use language models for dictation applications, while grammars are used in interactive voice response type applications (IVR) or desktop command and control or telephony.

3.Voice engine

A voice engine is the heart of the voice accreditation system. This is the software that gives your computer the ability to reproduce text in spoken voice (commonly known as text to speech or TTS).

How does voice accreditation work?

Figure |  conversión de voz a texto |  speech to text conversion

Voice accreditation procedure

Voice Accreditation Procedure Hidden Markov Model (HMM), deep neural network models often convert audio to text.

HMM (HIDDEN MARKOV MODEL) is the statistical model that produced the output as a sequence of symbols or quantities. The rationale behind the use of HMMs as a voice accreditation tool is their ability to treat voice accreditation as a piecemeal stationary signal or a short-lived stationary signal.. On a short time scale (as an example, 10 milliseconds), speech can be approximated as a stationary procedure.

HMM codebook

Source

In this blog, I am demonstrating a way to convert speech to text using python. This will be done with the help of the API “Voice accreditation” and the library “PyAudio”. The Voice Accreditation API supports multiple APIs, During this blog I used the Google Voice Accreditation API.

Python Libraries

!pip install SpeechRecognition

Convert an audio file to text

Here are the next steps to convert audio files to text:

Steps:

  1. Import Voice Accreditation Library

  2. Initializing the recognizer class to recognize speech. We are using Google voice accreditation.

  3. Audio files that are compatible with a voice credentialing system include wav, AIFF, AIFF-C, FLAC. I used the 'wav’ to enter this instance.

  4. Here we use the audio clips from the movie ‘Taken’ what does it say “I do not know who you are, I don't know what would interest you if you seek a rescue. I can tell you that I have no money”.

  5. By default, google recognizer reads in english.

Code

#import library
import speech_recognition as sr
# Initialize recognizer class (for recognizing the speech)
r = sr.Recognizer()
# Reading Audio file as source
# listening to the audio file and store in audio_text variable
with sr.AudioFile('I-dont-know.wav') as source:
    audio_text = r.listen(source)
# if the API is unreachable, the recoginize_() method will throw a request error, hence using exception handling
    try:
        # using google speech recognition
        text = r.recognize_google(audio_text)
        print('Converting audio transcripts into text ...')
        print(text)
    except:
         print('Sorry.. run again...')

Production

exit1 |  speech to text conversion

How about converting to different audio languages?

English is one of the most common languages. But, What if we want to convert from different languages ​​like, german and french? From this Speech-To-Text system (STT), can convert your voice from any language to text. Let's see how

As an example, if we want to read an audio file in French, then we need to add a language option in recogonize_google. The remaining code remains the same.

#Adding french language option
text = r.recognize_google(audio_text, language = "fr-FR")

Production

Exit 2

Again, required language option is added in Recogn_google () for language accreditation. I am speaking in tamil, Indian languages ​​and adding “of the IN” in the language option.

# Adding "Tamil language"
print(“Text: “+r.recognize_google(audio_text, language = “ta-IN”))

I just said “how are you” in Tamil and print the Tamil text accurately.

Production

exit3

Microphone voice in text

Microphones are used to receive audio as input from users. There are many different libraries available to convert microphone speech to text. Here we use PyAudio for this conversion.

Steps:

  1. We must install the PyAudio library that is used to receive audio input and output through the microphone and speaker. Helps to extract our voice through the microphone.

! pip instalar PyAudio

  1. We have to use the Microphone class, instead of an audio file source. The remaining steps are the same.

Code

#import library
import speech_recognition as sr
# Initialize recognizer class (for recognizing the speech)
r = sr.Recognizer()
# Reading Microphone as source
# listening to the speech and store in audio_text variable
with sr.Microphone() as source:
    print("Talk")
    audio_text = r.listen(source)
    print("Time over, thanks")
# recoginize_() method will throw a request error if the API is unreachable, hence using exception handling
    try:
        # using google speech recognition
        print("Text: "+r.recognize_google(audio_text))
    except:
         print("Sorry, I did not get that")

I just spoke “How are you?”

Production

exit4

APPLICATIONS

  1. Systems in the car

  2. Health care

  3. Military

  4. Training of air traffic controllers

  5. Telephony and other domains

  6. Use in education and daily life.

Conclution:

The Google Speech Accreditation API is a simple way to convert speech to text, but it needs an online connection to work. In this blog, we have seen a way to convert speech to text using Google's speech accreditation API. This can be very useful for NLP projects., especially for data handling of audio transcripts. If you have something to point out, Feel free to leave a comment! Thank you for reading. Keep learning and stay tuned for more!!

The media shown in this post is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.