Learn practical skills, build real-world projects, and advance your career

Netflix, IMDb Exploratory Data Analysis.

After searching through tens of datasets from kaggle, i stumbled accross this particular data which caught my fancy. Maybe it was due to love for movies that made me download this data.

In this EDA project, i am going analyse this data using several important python libraies such as pandas for reading, cleaning and mainipulation. Sorted_dataframe for sorting out days and months while plotly,seaborn and matplotlib for visualization of data.

Data Reading

import pandas as pd
df= pd.read_csv('netflix-rotten-tomatoes-metacritic-imdb.csv', encoding='utf-8')
df
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 15480 entries, 0 to 15479 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Title 15480 non-null object 1 Genre 13770 non-null object 2 Tags 15413 non-null object 3 Languages 13545 non-null object 4 Series or Movie 15480 non-null object 5 Hidden Gem Score 13379 non-null float64 6 Country Availability 15461 non-null object 7 Runtime 15479 non-null object 8 Director 10772 non-null object 9 Writer 11150 non-null object 10 Actors 13555 non-null object 11 View Rating 8456 non-null object 12 IMDb Score 13381 non-null float64 13 Rotten Tomatoes Score 6382 non-null float64 14 Metacritic Score 4336 non-null float64 15 Awards Received 6075 non-null float64 16 Awards Nominated For 7661 non-null float64 17 Boxoffice 4007 non-null object 18 Release Date 13373 non-null object 19 Netflix Release Date 15480 non-null object 20 Production House 5149 non-null object 21 Netflix Link 15480 non-null object 22 IMDb Link 13177 non-null object 23 Summary 15471 non-null object 24 IMDb Votes 13379 non-null float64 25 Image 15480 non-null object 26 Poster 11842 non-null object 27 TMDb Trailer 7194 non-null object 28 Trailer Site 7194 non-null object dtypes: float64(7), object(22) memory usage: 3.4+ MB