Building our own J.A.R.V.I.S. using Python - Part I

Programming VIRTUAL PERSONAL ASSISTANT USING PYTHON SERIES Image

    Introduction

    Do you remember J.A.R.V.I.S., Tony Stark's virtual personal assistant? I'm sure you do!

    Have you ever wondered about creating your own personal assistant? Yes? Tony Stark can help us with that! Oops, did you forget he is no more? It's sad that he cannot save us anymore.

    But hey, your favorite language Python can help you with that. Yes, you heard it right. We can create our own J.A.R.V.I.S. using Python. Let's roll it!
     

    Project Setup

    During the development of the project, we'll come across various modules and external libraries. Let's learn and install them. But before we install them, let's create a virtual environment and activate it.

     We are going to create a virtual environment using virtualenv. Python now ships with a pre-installed virtualenv library. So, to create a virtual environment, you can use the below command:

    $ python -m venv env

    The above command will create a virtual environment named env. Now, we need to activate the environment using the command:

    $ . env/Scripts/activate

    To verify if the environment has been activated or not, you can see (env) in your terminal. Now, we can install the libraries.

    1. pyttsx3: pyttsx is a cross-platform text to speech library which is platform-independent. The major advantage of using this library for text-to-speech conversion is that it works offline. To install this module type the below command in the terminal.

      $ pip install pyttsx3
    2. SpeechRecognition: It allows us to convert audio into text for further processing. To install this module type the below command in the terminal.
      $ pip install SpeechRecognition
    3. pywhatkit: It is an easy-to-use library that will help us interact with the browser very easily. To install the module, run the following command in the terminal.
      $ pip install pywhatkit
    4. wikipedia: It is used to fetch a variety of information from the Wikipedia website. To install this module type the below command in the terminal.
      $ pip install wikipedia
    5. requests: It is an elegant and simple HTTP library for Python that allows you to send HTTP/1.1 requests extremely easily. To install the module, run the following command in the terminal:
      $ pip install requests


    .env File

    We need this file to store some private data such as API Keys, Passwords, etc related to the project. For now, let's store the name of the user and the bot.

    Create a file named .env and add the following content there:

    USER=Ashutosh
    BOTNAME=JARVIS

    To use the contents from .env file, we'll install another module called python-decouple as:

    $ pip install python-decouple

    Learn more about Environment Variables in Python here.


    Setting up JARVIS

    Before we start defining a few important functions, let's create a speech engine first.

    import pyttsx3
    from decouple import config
    
    USERNAME = config('USER')
    BOTNAME = config('BOTNAME')
    
    
    engine = pyttsx3.init('sapi5')
    
    # Set Rate
    engine.setProperty('rate', 190)
    
    # Set Volume
    engine.setProperty('volume', 1.0)
    
    # Set Voice (Female)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)

    Let's analyze the above script. First of all, we have initialized an engine using the pyttsx3 module. sapi5 is a Microsoft Speech API that helps us use the voices. Learn more about it here. Next, we are setting the rate and volume properties of the speech engine using setProperty method. Now, we can get the voices from the engine using the getProperty method. voices will be a list of voices available in our system. If we print it, we can see as below:

    [<pyttsx3.voice.Voice object at 0x000001AB9FB834F0>, <pyttsx3.voice.Voice object at 0x000001AB9FB83490>]

    The first one is a male voice and the other one is a female voice. Since JARVIS was a female assistant, let's set the voice property to the female voice using the setProperty method. 

    Note: If you get an error related to PyAudio, download PyAudio wheel from here and install it within the virtual environment.

    Also, using the config method from decouple, we are getting the value of USER and BOTNAME from the environment variables.


    1. Speak Function

    Speak function will be responsible to speak whatever text is passed to it. Let's see the code:

    # Text to Speech Conversion
    def speak(text):
        """Used to speak whatever text is passed to it"""
        
        engine.say(text)
        engine.runAndWait()
    

    In the speak() method, the engine speaks whatever text is passed to it using the say() method. Using the runAndWait() method, it blocks during the event loop and returns when the commands queue is cleared.
     

    2. Greet Function

    This function will be used to greet the user whenever the program is run. According to the current time, it greets Good Morning, Good Afternoon, or Good Evening to the user.

    from datetime import datetime
    
    
    # Greet the user
    def greet_user():
        """Greets the user according to the time"""
        
        hour = datetime.now().hour
        if (hour >= 6) and (hour < 12):
            speak(f"Good Morning {USERNAME}")
        elif (hour >= 12) and (hour < 16):
            speak(f"Good afternoon {USERNAME}")
        elif (hour >= 16) and (hour < 19):
            speak(f"Good Evening {USERNAME}")
        speak(f"I am {BOTNAME}. How may I assist you?")
    

    First, we get the current hour, i.e., if the current time is 11:15 AM, the hour will be 11. If the value of hour is between 6 and 12, wish Good Morning to the user. If the value is between 12 and 16, wish Good Afternoon and similarly, if the value is between 16 and 19, wish Good Evening. We are using the speak method to wish the user.
     

    3. Take User Input

    This function is for taking the commands from the user and recognizing the command using the speech_recognition module.

    import speech_recognition as sr
    from random import choice
    from utils import opening_text
    
    
    # Takes Input from User
    def take_user_input():
        """Takes user input, recognizes it using Speech Recognition module and converts it into text"""
    
        r = sr.Recognizer()
        with sr.Microphone() as source:
            print('Listening....')
            r.pause_threshold = 1
            audio = r.listen(source)
    
        try:
            print('Recognizing...')
            query = r.recognize_google(audio, language='en-in')
            if not 'exit' in query or 'stop' in query:
                speak(choice(opening_text))
            else:
                hour = datetime.now().hour
                if hour >= 21 and hour < 6:
                    speak("Good night sir, take care!")
                else:
                    speak('Have a good day sir!')
                exit()
        except Exception:
            speak('Sorry, I could not understand. Could you please say that again?')
            query = 'None'
        return query

    We have imported speech_recognition module as sr. The Recognizer class within the speech_recognition module helps us recognize the audio. The same module has a Microphone class that gives us access to the microphone of the device. So with the microphone as the source, we try to listen to the audio using the listen() method in the Recognizer class. We have also set the pause_threshold to 1, i.e., it will not complain even if we pause for one second during we speak.

    Next, using the recognize_google() method from the Recognizer class, we try to recognize the audio. The recognize_google() method performs speech recognition on the audio passed to it, using the Google Speech Recognition API. We have set the language to en-in, i.e. English India. It returns the transcript of the audio which is nothing but a string. We've stored it in a variable called query.

    If the query has exit or stop words in it, it means we're asking the assistant to stop immediately. So, before stopping, we greet the user again as per the current hour. If the hour is between 21 and 6, wish Good Night to the user, else, some other message. We create a utils.py file which has just one list containing a few statements as:

    opening_text = [
        "Cool, I'm on it sir.",
        "Okay sir, I'm working on it.",
        "Just a second sir.",
    ]

    If the query doesn't have those two words(exit or stop), we speak something to tell the user that we have heard you. For that, we will use the choice method from the random module to randomly select any statement from the opening_text list. After speaking, we exit from the program.

    During this entire process, if we encounter an exception, we apologize to the user and set the query to None. In the end, we return the query.
     

    Main Method

    To run the project, we're using the main method.

    if __name__ == '__main__':
        greet_user()
        while True:
            query = take_user_input().lower()
            print(query)

    As we know, the first thing we need to do is to greet the user using the greet_user() function. Next, we run a while loop to continuously take input from the user using the take_user_input() function. For now, we're just printing the query.

    For now, the complete code in main.py looks like this:

    import pyttsx3
    import speech_recognition as sr
    from decouple import config
    from datetime import datetime
    from random import choice
    from utils import opening_text
    
    
    USERNAME = config('USER')
    BOTNAME = config('BOTNAME')
    
    
    engine = pyttsx3.init('sapi5')
    
    # Set Rate
    engine.setProperty('rate', 190)
    
    # Set Volume
    engine.setProperty('volume', 1.0)
    
    # Set Voice (Female)
    voices = engine.getProperty('voices')
    engine.setProperty('voice', voices[1].id)
    
    
    # Text to Speech Conversion
    def speak(text):
        """Used to speak whatever text is passed to it"""
    
        engine.say(text)
        engine.runAndWait()
    
    
    # Greet the user
    def greet_user():
        """Greets the user according to the time"""
        
        hour = datetime.now().hour
        if (hour >= 6) and (hour < 12):
            speak(f"Good Morning {USERNAME}")
        elif (hour >= 12) and (hour < 16):
            speak(f"Good afternoon {USERNAME}")
        elif (hour >= 16) and (hour < 19):
            speak(f"Good Evening {USERNAME}")
        speak(f"I am {BOTNAME}. How may I assist you?")
    
    
    # Takes Input from User
    def take_user_input():
        """Takes user input, recognizes it using Speech Recognition module and converts it into text"""
        
        r = sr.Recognizer()
        with sr.Microphone() as source:
            print('Listening....')
            r.pause_threshold = 1
            audio = r.listen(source)
    
        try:
            print('Recognizing...')
            query = r.recognize_google(audio, language='en-in')
            if not 'exit' in query or 'stop' in query:
                speak(choice(opening_text))
            else:
                hour = datetime.now().hour
                if hour >= 21 and hour < 6:
                    speak("Good night sir, take care!")
                else:
                    speak('Have a good day sir!')
                exit()
        except Exception:
            speak('Sorry, I could not understand. Could you please say that again?')
            query = 'None'
        return query
    
    
    if __name__ == '__main__':
        greet_user()
        while True:
            query = take_user_input().lower()
            print(query)

    You can run and test the application now.

    $ python main.py


    Conclusion

    In this part, we have completed the setup of our virtual personal assistant. We have not added any functionality to it yet. We'll work on those functionalities in the next part of the blog. Stay Tuned!

    0 Comments

    To add a comment, please Signup or Login