Learn practical skills, build real-world projects, and advance your career

Web Scraping Popular Movies using BeautifulSoup

A web scraping tutorial in Python for beginners.
alt

The Project Idea is to curate a list of popular movies that I can watch using Web Scraping. Check out the TMdb website here: https://www.themoviedb.org/movie

Web Scraping is the process of gathering useful information from the web and making meaningful insights from it. In a way, web scarping is automating the process of data collection.

Note: Web Scraping code depends on the structure of the web page. So, if the structure changes then your code needs update too!

Python offers a variety of libraries to scrape the web such as BeautifulSoup, Requests, Scrapy, Selenium. If you are starting with web scraping, then Beautiful Soup will be the easy option.

We’ll be using the packages:

  • Requests — for downloading the HTML code from the TMdb URL
  • BeautifulSoup4 — for extracting data from the HTML string
  • Pandas — to gather my data into a dataframe for further processing

Let's see an outline of the steps we'll follow:

  1. Load the TMdb movie web page https://www.themoviedb.org/movie using Requests.
  2. Parse the HTML web page using BeautifulSoup.
  3. Extract the list of movies from the landing page. For each page, we'll get the movie name, user rating and the movie page URL.
  4. Again for each movie, we'll grab the release dates, genres, duration and directors.
  5. Compile extracted movie details into Python Lists and Dictionaries.
  6. We'll extend the above logic to scrape multiple pages.
  7. Finally, we'll save all the movie informations into a csv file.
The csv file will be of the following format. 
Name,rating,genre,release_date,runtime	director,url
Mortal Kombat,80,"Fantasy,Action, Adventure, Science Fiction, Thriller",04/23/2021,1h 50m,Lewis Tan,	https://www.themoviedb.org/movie/460465
Godzilla vs. Kong,82.0,"Science Fiction, Action",	03/31/2021,1h 53m,Alexander Skarsgård,	https://www.themoviedb.org/movie/399566
Nobody,85.0,"Action, Thriller, Crime",03/26/2021,1h 32m,Bob Odenkirk,https://www.themoviedb.org/movie/615457
Zack Snyder's Justice League,85.0,"Action, Adventure, Fantasy, Science Fiction",03/18/2021,4h 2m,Ben Affleck,https://www.themoviedb.org/movie/791373

How to Run the code

You can execute the code by clicking the "Run" button or by selecting the "Run on Binder" option.

Installing the Libraries

Let’s start by installing the required packages.