Scraping BookMyShow Movie Data using Python & Selenium
Web scraping is an automatic method to obtain large amounts of data from websites.Read more
BookMyShow is India's largest entertainment ticketing website. Headquartered in Mumbai, it is the only destination for movie and non-movie options like events, plays and sports. Apart from movies, BookMyShow has ticketed for more than 300 live large format events and sport events such as ICL, city Marathons, etc.
For Booking, Go To
In this project, we'll retrive information from the BookMyShow using Web Scraping.
We'll use the Python library Selenium to scrape data from the website.
Here's an outline of the setps we'll follow:
- Install and import all the libraries to get the data from the WebElement.
- Generate the list of Popular Cities and Other Cities using
find_element
method. - Extract the list of Movies, Censor Ratings, Language, Booking URL and convert it to DataFrame for 3 Popular Cities..
- Extract additional details such as Hearts, User Ratings for 3 Movies.
- Extract Theatre details for the current day on the basis of Booking URL for 3 Hindi Movies.
- Merge the DataFrames to generate the Combined data with all the details.
- Save the extracted information to a CSV file
By the end of the project, we'll create a CSV file in the following format:
City,Movie Name,language,Cencor Rating,Hearts,Ratings Received,Theatre Name,Booking Url
MUMBAI,The Kashmir Files,Hindi,A,98%,199.3K,ratings,Cinepolis: Fun Republic Mall, Andheri (W),https://in.bookmyshow.com/mumbai/movies/the-kashmir-files/ET00110845
...
How to Run the Code
You can execute the code by using the "Run" button at the top of this page and selecting "Run on Colab". You can make changes and save your own version of the notebook to Jovian by executing the following cells:
Install and import all the libraries to get the data from the WebElement
We will use jovian
library to run this notebook on Google Colab
!pip install jovian --upgrade --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.6/68.6 KB 1.5 MB/s eta 0:00:00a 0:00:01
Preparing metadata (setup.py) ... done
Building wheel for uuid (setup.py) ... done
We will use Selenium
and kora
library to get the required WebDriverElements
We will use pandas
to convert the data into DataFrames and Save it to .CSV file.
We will use time
to provide Sleep time to the Web Site.
The library can be installed by using pip
.