Learn practical skills, build real-world projects, and advance your career

Project: Exploratary Data Analysis on TMBD Dataset.

Table of Contents

Introduction

image.png

TMDb is an initialism that is almost similar to its more bigger counterpart: IMDb. Both are massive indexes for movie and television information, but The Movie Database differs from the Internet Movie Database in one key aspect that is TMDb is completely powered by its community.

The Movie Database is an online crowdsourced database for movie and television information. The metadata on movies and TV shows is contributed by the 1.1 million strong community. As of writing this, the site has indexed 393,000+ movies and 73,000+ TV shows across 39 languages.

This dataset is a collection of data on around 10000 movies.The primary goal of this is to perform detailed analysis and visualization to derive answers for the questions brainstromed.

We use numpy and pandas for Analysis, matplotib and seaborn for Visualization.

Outline for Analysis

  • Assessed the data and brainstormed questions that could be answered using the data
  • Performed necessary cleaning steps to unify formats, deal with missing data and prepare the dataset for analysis
  • Wrangled and explored the data using Pandas and Numpy to gather insights about the relationship between different aspects, created visualizations using matplotlib and made inferences to answer research questions

Now lets look at the data to decide what questions can be asked and answered on this data set.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df_mov=pd.read_csv('tmdb-movies.csv')
df_mov.head()