How to Create a Horoscope API with Beautiful Soup and Flask

Programming Image

    Have you ever read your horoscope in the newspaper or seen it on television? Well, I'm not sure about other countries, but in my country of India, people still read their horoscopes.

    And this is where I got the idea for this tutorial. It might sound a bit old-fashioned, but the main focus here is not on the horoscope itself – it's just the vehicle for our learning.

    In this article, we're going to scrape a website called Horoscope.com using Beautiful Soup and then create our own API using Flask. This API, if deployed on a public server, can then be used by other developers who would wish to create a website to show their horoscope or an app for the same.
     

    How to Set Up the Project

    First of all, we're going to create a virtual environment within which we'll install all the required dependencies.

    Python now ships with the pre-installed venv library. So, to create a virtual environment, you can use the below command:

    $ python -m venv env

    To activate the virtual environment named env, use the command:

    • On Windows:
    env\Scripts\activate.bat
    • On Linux and MacOS:
    source env/bin/activate

    To deactivate the environment (not required at this stage):

    deactivate

    Now we're ready to install the dependencies. The modules and libraries we are going to use in this project are:

    • requests: Requests allow you to send HTTP/1.1 requests extremely easily. The module doesn't come pre-installed with Python, so we need to install it using the command:
      $ pip install requests
    • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. The module doesn't come pre-installed with Python, so we need to install it using the command:
      $ pip install bs4
    • Flask: Flask is a simple, easy-to-use microframework for Python that can help build scalable and secure web applications. The module doesn't come pre-installed with Python, so we need to install it using the command:
      $ pip install flask
    • Flask-RESTX: Flask-RESTX lets you create APIs with Swagger Documentation. The module doesn't come pre-installed with Python, so we need to install it using the command:
      $ pip install flask-restx

    We'll also use environment variables in this project. So, we are going to install another module called python-decouple to handle this:

    $ pip install python-decouple

    To learn more about environment variables in Python, you can check out this article.
     

    Project Workflow

    The basic workflow of the project will be like this:

    1. The horoscope data will be scraped from Horoscope.com.
    2. The data will then be used by our Flask server to send JSON response to the user.


     

    How to Set Up a Flask Project

    The first thing we're going to do is to create a Flask project. If you check the official documentation of Flask, you'll find a minimal application there.

    But, we're not going to follow that. We are going to write an application that is more extensible and has a good base structure. If you wish, you can follow this guide to get started with Flask.

    Our application will exist within a package called core. To convert a usual directory to a Python package, we just need to include an __init__.py file. So, let's create our core package first.

    $ mkdir core

    After that, let's create the __init__.py file inside the core directory:

    $ cd core
    $ touch __init__.py
    $ cd ..

    In the root directory of the project, create a file called config.py. We'll store the configurations for the project in this file. Within the file, add the following content:

    from decouple import config
    
    
    class Config(object):
        SECRET_KEY = config('SECRET_KEY', default='guess-me')
        DEBUG = False
        TESTING = False
        CSRF_ENABLED = True
    
    
    class ProductionConfig(Config):
        DEBUG = False
        MAIL_DEBUG = False
    
    
    class StagingConfig(Config):
        DEVELOPMENT = True
        DEBUG = True
    
    
    class DevelopmentConfig(Config):
        DEVELOPMENT = True
        DEBUG = True
    
    
    class TestingConfig(Config):
        TESTING = True
    

    In the above script, we have created a Config class and defined various attributes inside that. Also, we have created different child classes (as per different stages of development) that inherit the Config class.

    Notice that we have the SECRET_KEY set to an environment variable named SECRET_KEY. Create a file named .env in the root directory and add the following content there:

    APP_SETTINGS=config.DevelopmentConfig
    SECRET_KEY=gufldksfjsdf

    Apart from SECRET_KEY, we have APP_SETTINGS that refers to one of the classes we created in the config.py file. We set it to the current stage of the project.

    Now, we can add the following content in the __init__.py file:

    from flask import Flask
    from decouple import config
    from flask_restx import Api
    
    app = Flask(__name__)
    app.config.from_object(config("APP_SETTINGS"))
    api = Api(
        app,
        version='1.0',
        title='Horoscope API',
        description='Get horoscope data easily using the below APIs',
        license='MIT',
        contact='Ashutosh Krishna',
        contact_url='https://ashutoshkrris.tk',
        contact_email='[email protected]',
        doc='/',
        prefix='/api/v1'
    )
    

    In the above Python script, we are first importing the Flask class from the Flask module that we have installed. Next, we're creating an object app of class Flask. We use the __name__ argument to indicate the app's module or package so that Flask knows where to find other files such as templates.

    Next, we are setting the app configurations to the APP_SETTINGS according to the variable in the .env file.

    Apart from that, we have created an object of the Api class. We need to pass various arguments to it. We can find the Swagger documentation on the / route. The /api/v1 will be prefixed on each API route.

    For now, let's create a routes.py file in the core package and just add the following namespace:

    from core import api
    from flask import jsonify
    
    ns = api.namespace('/', description='Horoscope APIs')

    We need to import the routes in the __init__.py file:

    from flask import Flask
    from decouple import config
    from flask_restx import Api
    
    app = Flask(__name__)
    app.config.from_object(config("APP_SETTINGS"))
    api = Api(
        app,
        version='1.0',
        title='Horoscope API',
        description='Get horoscope data easily using the below APIs',
        license='MIT',
        contact='Ashutosh Krishna',
        contact_url='https://ashutoshkrris.tk',
        contact_email='[email protected]',
        doc='/',
        prefix='/api/v1'
    )
    
    from core import routes			# Add this line

    We're now just left with one file which will help us run the Flask server:

    from core import app
    
    if __name__ == '__main__':
        app.run()
    

    Once you run this file using the python main.py command, you'll see a similar output:

    Now, we are ready to scrape the data from the Horoscope website.
     

    How to Scrape the Data from Horoscope.com

    If you open Horoscope.com and choose your zodiac sign, the horoscope data for your zodiac sign for today will be shown.

    Source: Horoscope.com

    In the above image, you can see you can view the horoscope for yesterday, tomorrow, weekly, monthly or even a custom date. We're going to use all of these.

    But first, if you see the URL of the current page, it is something like: https://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-today.aspx?sign=10.

    The URL has two variables, if you see clearly, sign and today. The value of the variable sign will be assigned according to the zodiac sign. The variable today can be replaced with yesterday and tomorrow.

    The dictionary below can help us with the zodiac signs:

    ZODIAC_SIGNS = {
        "Aries": 1,
        "Taurus": 2,
        "Gemini": 3,
        "Cancer": 4,
        "Leo": 5,
        "Virgo": 6,
        "Libra": 7,
        "Scorpio": 8,
        "Sagittarius": 9,
        "Capricorn": 10,
        "Aquarius": 11,
        "Pisces": 12
    }

    This means that if your zodiac sign is Capricorn, the value of the sign in the URL will be 10.

    Next, if we wish to get the horoscope data for a custom date, the URL https://www.horoscope.com/us/horoscopes/general/horoscope-archive.aspx?sign=10&laDate=20211213 will help us.

    It has the same sign variable, but it has another variable laDate which takes the date in YYYYMMDD format.

    Now, we're ready to create different functions to fetch horoscope data. Create a utils.py file and follow along.
     

    How to Get a Horoscope for the Day

    import requests
    from bs4 import BeautifulSoup
    
    
    def get_horoscope_by_day(zodiac_sign: int, day: str):
        if not "-" in day:
            res = requests.get(f"https://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-{day}.aspx?sign={zodiac_sign}")
        else:
            day = day.replace("-", "")
            res = requests.get(f"https://www.horoscope.com/us/horoscopes/general/horoscope-archive.aspx?sign={zodiac_sign}&laDate={day}")
        soup = BeautifulSoup(res.content, 'html.parser')
        data = soup.find('div', attrs={'class': 'main-horoscope'})
        return data.p.text

    We have created our first function which accepts two arguments – an integer zodiac_sign and a string day. The day can either be today, tomorrow, yesterday, or any custom date before today in the format YYYY-MM-DD.

    If the day is not a custom date, it won't have the hyphen(-) symbol in it. So, we have put a condition for the same.

    If there is no hyphen symbol, we make a GET request on https://www.horoscope.com/us/horoscopes/general/horoscope-general-daily-{day}.aspx?sign={zodiac_sign}. Else first, we change the date from YYYY-MM-DD to YYYYMMDD format.

    Then we make a GET request on https://www.horoscope.com/us/horoscopes/general/horoscope-archive.aspx?sign={zodiac_sign}&laDate={day}.

    After that, we pull the HTML data from the response content of the page using BeautifulSoup. Now we need to get the horoscope text from this HTML code. If you inspect the code of any of the webpage, you'll find this:

    The horoscope text is contained in a div with the class main-horoscope. Thus we use the soup.find() function to extract the paragraph text string and return it.
     

    How to Get a Horoscope for the Week

    def get_horoscope_by_week(zodiac_sign: int):
        res = requests.get(f"https://www.horoscope.com/us/horoscopes/general/horoscope-general-weekly.aspx?sign={zodiac_sign}")
        soup = BeautifulSoup(res.content, 'html.parser')
        data = soup.find('div', attrs={'class': 'main-horoscope'})
        return data.p.text

    The above function is quite similar to the previous one. We have just changed the URL to https://www.horoscope.com/us/horoscopes/general/horoscope-general-weekly.aspx?sign={zodiac_sign}.
     

    How to Get a Horoscope for the Month

    def get_horoscope_by_month(zodiac_sign: int):
        res = requests.get(f"https://www.horoscope.com/us/horoscopes/general/horoscope-general-monthly.aspx?sign={zodiac_sign}")
        soup = BeautifulSoup(res.content, 'html.parser')
        data = soup.find('div', attrs={'class': 'main-horoscope'})
        return data.p.text
    

    This function is also similar to the other two except the URL which has now been changed to https://www.horoscope.com/us/horoscopes/general/horoscope-general-monthly.aspx?sign={zodiac_sign}.
     

    How to Create API Routes

    We'll be using Flask-RESTX to create our API routes. The API routes will look like these:

    • For daily or custom dates:/api/v1/get-horoscope/daily?day=today&sign=capricorn or api/v1/get-horoscope/daily?day=2022-12-14&sign=capricorn
    • For weekly: api/v1/get-horoscope/weekly?sign=capricorn
    • For monthly: api/v1/get-horoscope/monthly?sign=capricorn

    We have two query parameters in the URLs: day and sign. The day parameter can take values like today, yesterday, or custom dates like 2022-12-14. The sign parameter will take the zodiac sign name which can be in uppercase or lowercase, it won't matter.

    To parse the query parameters from the URL, Flask-RESTX has built-in support for request data validation using a library similar to argparse called reqparse. To add arguments in the URL, we'll use add_argument method of the RequestParser class.

    parser = reqparse.RequestParser()
    parser.add_argument('sign', type=str, required=True)
    

    The type parameter will take the type of parameter. The required=True makes the query parameter mandatory to be passed.

    Now, we need another query parameter day. But this parameter will be used only in the daily horoscope URL.

    Instead of rewriting arguments we can write a parent parser containing all the shared arguments and then extend the parser with copy().

    parser_copy = parser.copy()
    parser_copy.add_argument('day', type=str, required=True)

    The parser_copy will not only contain day, but also sign. That is what we'll require for the daily horoscope.

    The main building blocks provided by Flask-RESTX are resources. Resources are built on top of Flask pluggable views, giving you easy access to multiple HTTP methods just by defining methods on your resource.

    Let's create the DailyHoroscopeAPI class that inherits the Resource class from flask_restx.

    @ns.route('/get-horoscope/daily')
    class DailyHoroscopeAPI(Resource):
        '''Shows daily horoscope of zodiac signs'''
        @ns.doc(parser=parser_copy)
        def get(self):
            args = parser_copy.parse_args()
            day = args.get('day')
            zodiac_sign = args.get('sign')
            try:
                zodiac_num = ZODIAC_SIGNS[zodiac_sign.capitalize()]
                if "-" in day:
                    datetime.strptime(day, '%Y-%m-%d')
                horoscope_data = get_horoscope_by_day(zodiac_num, day)
                return jsonify(success=True, data=horoscope_data, status=200)
            except KeyError:
                raise NotFound('No such zodiac sign exists')
            except AttributeError:
                raise BadRequest(
                    'Something went wrong, please check the URL and the arguments.')
            except ValueError:
                raise BadRequest('Please enter day in correct format: YYYY-MM-DD')

    The @ns.route() decorator sets the API route. Inside the DailyHoroscopeAPI class, we have the get method that will handle the GET requests. The @ns.doc() decorator will help us add the query parameters on the URL.

    To get the values of query parameters, we'll use the parse_args() method that will return us a dictionary like this:

    {'sign': 'capricorn', 'day': '2022-12-14'}

    We can then get the values using the keys day and sign.

    As defined in the beginning, we'll have a ZODIAC_SIGNS dictionary. We use a try-except block to handle the request. If the zodiac sign is not in the dictionary, a KeyError Exception is raised. In that case, we respond with a NotFound error (Error 404).

    Also, if the day parameter has a hyphen in it, we try to match the date format with YYYY-MM-DD. If it's not in that format, we raise a BadRequest error (Error 400). If the day doesn't contain a hyphen, we directly call the get_horoscope_by_day() method with the sign and day arguments.

    If some gibberish is passed as the parameter value, an AttributeError is raised. In that case, we raise a BadRequest error.

    The other two routes are also quite similar to the above one. The difference is, we don't need a day parameter here. So, instead of using parser_copy, we'll use parser here.

    @ns.route('/get-horoscope/weekly')
    class WeeklyHoroscopeAPI(Resource):
        '''Shows weekly horoscope of zodiac signs'''
        @ns.doc(parser=parser)
        def get(self):
            args = parser.parse_args()
            zodiac_sign = args.get('sign')
            try:
                zodiac_num = ZODIAC_SIGNS[zodiac_sign.capitalize()]
                horoscope_data = get_horoscope_by_week(zodiac_num)
                return jsonify(success=True, data=horoscope_data, status=200)
            except KeyError:
                raise NotFound('No such zodiac sign exists')
            except AttributeError:
                raise BadRequest('Something went wrong, please check the URL and the arguments.')
    
    
    @ns.route('/get-horoscope/monthly')
    class MonthlyHoroscopeAPI(Resource):
        '''Shows monthly horoscope of zodiac signs'''
        @ns.doc(parser=parser)
        def get(self):
            args = parser.parse_args()
            zodiac_sign = args.get('sign')
            try:
                zodiac_num = ZODIAC_SIGNS[zodiac_sign.capitalize()]
                horoscope_data = get_horoscope_by_month(zodiac_num)
                return jsonify(success=True, data=horoscope_data, status=200)
            except KeyError:
                raise NotFound('No such zodiac sign exists')
            except AttributeError:
                raise BadRequest('Something went wrong, please check the URL and the arguments.')
    

    Now our routes are done. To test the APIs, you can use the Swagger documentation available on the / route, or you can use Postman. Let's run the server and test it.

    You can also deploy the project on a public server so that other developers can access and use the API too.


    Wrapping up

    In this tutorial, we learned how to scrape data from a website using requests and Beautiful Soup. Then we created an API using Flask and Flask-RESTX.

    If you wish to learn how to interact with APIs using Python, check out this guide.

    I hope you enjoyed it – and thanks for reading!

    Code for the tutorial: https://github.com/ashutoshkrris/Horoscope-API


    If you read this far, support the author to show them you care.

    0 Comments

    To add a comment, please Signup or Login

    OR