

When we start looking at the data, we realize it's a dictionary of dictionaries with three keys: id, title, and history.

USEFUL COMMANDS FOR PYTHON WEBSCRAPER CODE
Json_data = json_data.encode('utf8').decode('unicode_escape')Īfter running the python code above, you should get a bunch of data that we’ve cleaned up. Ind_end = string_with_json_obj.index("')") Ind_start = string_with_json_obj.index("('")+2 # strip unnecessary symbols and get only JSON data # Based on the structure of the webpage, I found that data is in the JSON variable, under tags Soup = BeautifulSoup(res.content, "lxml") Parsing JSON-encoded data Decoding the JSON Data with Python season_data = dict() As a result, we'll need to track down this tag, extract JSON from it, and convert it to a Python-readable data structure. Using Developer Tools to determine where the data is storedĪfter looking through the web page's content, we discovered that the data is saved beneath the "script" element in the teamsData variable and is JSON encoded. After executing requests, this is what we'll get. To do so, open Developer Tools in Chrome, navigate to the Network tab, locate the data file (in this example, 2018), and select the “Response” tab. The next step is to figure out where the data on the web page is stored. # create urls for all seasons of all leagues Let’s create variables to handle only the data we require. We can also notice that data on the site starts from 2014/2015 to 2020/2021. However, we will be extracting data for just the top 5 leagues(teams excluding RFPL). We can see on the home page that the site has data for six European leagues. That is critical to finding where to get the data from the site. The first step in any web scraping project is researching the web page you want to scrape and learn how it works.

Importing the Python libraries import numpy as np Now that we have all the required libraries installed let’s get to building our web scraper.
USEFUL COMMANDS FOR PYTHON WEBSCRAPER INSTALL
To install the libraries required for this tutorial, run the following commands below: pip install numpy
