Learn practical skills, build real-world projects, and advance your career

Data Analysis of Movies

As a part of an online Data Science certification course at Jovian.ml. I was tasked to finish a Course Project that involved downloading and analysing a real-world dataset. Following below are the steps that I went through to finish said project.
This being my first Data Science project I chose a dataset where the relationships and correlations between the variable were easier to notice and analyse. So after going through a couple of datasets, I settled on the TMDB 5000 Movie Dataset from Kaggle. The dataset contains 20 columns and 4803 rows. Although there are 20 columns not all columns contain data that can be properly studied. Therefore, we will be working with a subset of the dataset, containing fields that can be easily analysed and worked with.

Setup

This project can be carried out in any of the free Notebook services online with Python3 programming. The modules required are:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn

In the following slides I have been using the Jovian.ml notebooks so I will be importing and committing as a way of saving and updating my notebook hosted by Jovian.ml but this can be done on any of the available Notebook services such as Google Colab, Jupyter Notebooks, etc.

Contents

  • Data Preparation and Cleaning
  • Exploratory Analysis and Visualisation
  • Asking and Answering Questions
  • Inferences and Conclusion
  • References and Future Work

Each part of the contents has been informed by a part of the Data Science Workflow, however they are not completely accurate and representative of the workflow of other projects or the general schema of the Data Science Workflow in general. These are therefore the steps derived from the Data Science workflow and are relevant to this project.

But first I will be importing and committing my notebook to Jovian in order to preserve a version of my work.

project_name = "Data Analysis of 4000+ movies"
!pip install jovian --upgrade -q