Speech Recognition

Once AI Master triggers the AI Worker Process i.e. Aryan then execution is started and Human is greeted and asked "How can I help?" by Aryan. Thereafter SpeechRecognization flow is triggered to listen to Human and thereafter take next set of actions.

To start with a speech recognizer object is created as shown in snippet.

SpeechRecognition Library is used for performing speech recognition, with support for several engines and APIs, online and offline.

Speech recognition engine/API support:

  • CMU Sphinx (works offline)

  • Google Speech Recognition

  • Google Cloud Speech API

  • Wit.ai

  • Microsoft Bing Voice Recognition

  • Houndify API

  • IBM Speech to Text

  • Snowboy Hotword Detection (works offline)

Currently Aryan used Google Speech Recognition for Speech Recognition

Default microphone is used to listen to Human inputs and captured using speech recognizer object in variable audio.

Google Speech Recognition Service over the web is used to convert the speech to text on the fly, output is stored in speech variable.

# Uses the default microphone as the source to listen voice
with sr.Microphone() as source:  

    # Listening to the Microphone source
    audio = r.listen(source)

    # Using Google Speech Recognization Service to recognize the Audio and Translate it to speech
    speech = r.recognize_google(audio)

Thereafter SpeechAction flow is triggered to perform the actions on basis of the Speech, It has following

  • Text Procession and Transformation

  • Keyword Identification

  • Action using Learned Skills

  • Speech to Text Conversion

  • Next Step

Last updated