Scraping BookMyShow Movie Data using Python & Selenium

Web scraping is an automatic method to obtain large amounts of data from websites.Read more

alt

BookMyShow is India's largest entertainment ticketing website. Headquartered in Mumbai, it is the only destination for movie and non-movie options like events, plays and sports. Apart from movies, BookMyShow has ticketed for more than 300 live large format events and sport events such as ICL, city Marathons, etc.
For Booking, Go To

In this project, we'll retrive information from the BookMyShow using Web Scraping.

We'll use the Python library Selenium to scrape data from the website.

Here's an outline of the setps we'll follow:

  1. Install and import all the libraries to get the data from the WebElement.
  2. Generate the list of Popular Cities and Other Cities using find_element method.
  3. Extract the list of Movies, Censor Ratings, Language, Booking URL and convert it to DataFrame for 3 Popular Cities..
  4. Extract additional details such as Hearts, User Ratings for 3 Movies.
  5. Extract Theatre details for the current day on the basis of Booking URL for 3 Hindi Movies.
  6. Merge the DataFrames to generate the Combined data with all the details.
  7. Save the extracted information to a CSV file

By the end of the project, we'll create a CSV file in the following format:

City,Movie Name,language,Cencor Rating,Hearts,Ratings Received,Theatre Name,Booking Url
MUMBAI,The Kashmir Files,Hindi,A,98%,199.3K,ratings,Cinepolis: Fun Republic Mall, Andheri (W),https://in.bookmyshow.com/mumbai/movies/the-kashmir-files/ET00110845
...

How to Run the Code

You can execute the code by using the "Run" button at the top of this page and selecting "Run on Colab". You can make changes and save your own version of the notebook to Jovian by executing the following cells:

Install and import all the libraries to get the data from the WebElement

We will use jovian library to run this notebook on Google Colab

!pip install jovian --upgrade --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.6/68.6 KB 1.5 MB/s eta 0:00:00a 0:00:01 Preparing metadata (setup.py) ... done Building wheel for uuid (setup.py) ... done

We will use Selenium and kora library to get the required WebDriverElements

We will use pandas to convert the data into DataFrames and Save it to .CSV file.

We will use time to provide Sleep time to the Web Site.

The library can be installed by using pip.