Learn practical skills, build real-world projects, and advance your career
Updated 3 years ago
Netflix, IMDb Exploratory Data Analysis.
After searching through tens of datasets from kaggle, i stumbled accross this particular data which caught my fancy. Maybe it was due to love for movies that made me download this data.
In this EDA project, i am going analyse this data using several important python libraies such as pandas for reading, cleaning and mainipulation. Sorted_dataframe for sorting out days and months while plotly,seaborn and matplotlib for visualization of data.
Data Reading
import pandas as pd
df= pd.read_csv('netflix-rotten-tomatoes-metacritic-imdb.csv', encoding='utf-8')
df
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15480 entries, 0 to 15479
Data columns (total 29 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Title 15480 non-null object
1 Genre 13770 non-null object
2 Tags 15413 non-null object
3 Languages 13545 non-null object
4 Series or Movie 15480 non-null object
5 Hidden Gem Score 13379 non-null float64
6 Country Availability 15461 non-null object
7 Runtime 15479 non-null object
8 Director 10772 non-null object
9 Writer 11150 non-null object
10 Actors 13555 non-null object
11 View Rating 8456 non-null object
12 IMDb Score 13381 non-null float64
13 Rotten Tomatoes Score 6382 non-null float64
14 Metacritic Score 4336 non-null float64
15 Awards Received 6075 non-null float64
16 Awards Nominated For 7661 non-null float64
17 Boxoffice 4007 non-null object
18 Release Date 13373 non-null object
19 Netflix Release Date 15480 non-null object
20 Production House 5149 non-null object
21 Netflix Link 15480 non-null object
22 IMDb Link 13177 non-null object
23 Summary 15471 non-null object
24 IMDb Votes 13379 non-null float64
25 Image 15480 non-null object
26 Poster 11842 non-null object
27 TMDb Trailer 7194 non-null object
28 Trailer Site 7194 non-null object
dtypes: float64(7), object(22)
memory usage: 3.4+ MB