Learn data science and machine learning by building real-world projects on Jovian
project_name = "streaming-services-eda"
!pip install jovian --upgrade -q
import jovian
jovian.commit(project=project_name)

Streaming Services (EDA)

This is an exploratory data analysis of four streaming services from the United States and
will be comparing the content from Amazon Prime Video, Disney Plus, Hulu, and Netflix.

The datasets used for this analysis can be found at;

Amazon: https://www.kaggle.com/shivamb/amazon-prime-movies-and-tv-shows

Disney+: https://www.kaggle.com/shivamb/disney-movies-and-tv-shows

Hulu: https://www.kaggle.com/shivamb/hulu-movies-and-tv-shows

Netflix: https://www.kaggle.com/shivamb/netflix-shows

Library Imports

  • Pandas (pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.)

Data visualization used for this eda will be;

  • Plotly (Plotly provides online graphing, analytics, and statistics tools.)
  • Matplotlib (Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.)
  • Seaborn (Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.)
!pip install plotly
Requirement already satisfied: plotly in /srv/conda/envs/notebook/lib/python3.6/site-packages (5.4.0) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.6/site-packages (from plotly) (1.16.0) Requirement already satisfied: tenacity>=6.2.0 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from plotly) (8.0.1)
!pip install matplotlib seaborn --upgrade --quiet
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Data Import

amazon = pd.read_csv('amazon_prime_titles.csv')
disney = pd.read_csv('disney_plus_titles.csv')
hulu = pd.read_csv('hulu_titles.csv')
netflix = pd.read_csv('netflix_titles.csv')

Dropping out the columns of data that are not being used for this analysis.

amazon_df = amazon.drop(['show_id','title','director','cast','date_added','duration','description','country'], axis=1)
disney_df = disney.drop(['show_id','title','director','cast','date_added','duration','description','country'], axis=1)
hulu_df = hulu.drop(['show_id','title','director','cast','date_added','duration','description','country'], axis=1)
netflix_df = netflix.drop(['show_id','title','director','cast','date_added','duration','description','country'], axis=1)

Which platform has the most content?

plt.figure(figsize=(10,12))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
plt.subplot(2,2,1)
plt.title(f'Amazon~Total Content: {amazon_df.type.value_counts().sum()}',size=14)
sns.countplot(x='type', data=amazon_df, palette='deep')

plt.subplot(2,2,2)
plt.title(f'Disney~Total Content: {disney_df.type.value_counts().sum()}',size=14)
sns.countplot(x='type', data=disney_df, palette='deep')

plt.subplot(2,2,3)
plt.title(f'Hulu~Total Content: {hulu_df.type.value_counts().sum()}',size=14)
sns.countplot(x='type', data=hulu_df, palette='deep')

plt.subplot(2,2,4)
plt.title(f'Netflix~Total Content: {netflix_df.type.value_counts().sum()}',size=14)
sns.countplot(x='type', data=netflix_df, palette='deep')

plt.show()
Notebook Image

Amazon has the most content available on their platform at 9,668.

Hulu offers more TV Shows in relation to their Movies than other platforms.

Which platform has the highest number of New Releases from 2021?

Visualization made with Plotly is interactive. Hover your mouse over the graph for detailed value information. Click and drag a box over a specific area to zoom in, double click to return to default view. Hover mouse over the top right graph to access a tools panel.
px.histogram(amazon_df, x='release_year', color='type', title='Amazon').update_xaxes(type='category', categoryorder='total descending')

Amazon has released 1,139 Movies and 303 TV Shows for a total of 1,442 new releases from 2021.

px.histogram(disney_df, x='release_year', color='type', title='Disney').update_xaxes(type='category', categoryorder='total descending')

Disney+ has released 70 Movies and 55 TV Shows for a total of 125 new releases from 2021.

px.histogram(hulu_df, x='release_year', color='type', title='Hulu').update_xaxes(type='category', categoryorder='total descending')

Hulu has released 111 Movies and 115 TV Shows for a total of 226 new releases from 2021.

px.histogram(netflix_df, x='release_year', color='type', title='Netflix').update_xaxes(type='category', categoryorder='total descending')

Netflix has released 277 Movies and 315 TV Shows for a total of 592 new releases from 2021.

So far at a total of 1,442, Amazon has the highest number of new releases from the year 2021

Which platform is the most Family Friendly?

#Hulu rating data cleaning.

hulu_df['rating'] = hulu_df['rating'].replace(
    ['NOT RATED','2 Seasons','93 min','4 Seasons','136 min','91 min','85 min','98 min','89 min','94 min',
     '86 min','3 Seasons','121 min','88 min','101 min','1 Season','83 min','100 min','95 min','92 min',
     '96 min','109 min','99 min','75 min','87 min','67 min','104 min','107 min','84 min','103 min',
     '105 min','119 min','114 min','82 min','90 min','130 min','110 min','80 min','6 Seasons','97 min',
     '111 min','81 min','49 min','45 min','41 min','73 min','40 min','36 min','39 min','34 min','47 min',
     '65 min','37 min','78 min','102 min','129 min','115 min','112 min','NR','61 min','106 min','76 min',
     '77 min','79 min','157 min','28 min','64 min','7 min','5 min','6 min','127 min','142 min','108 min',
     '57 min','118 min','116 min','12 Seasons','71 min'],'NR')

#Netflix rating data cleaning

netflix_df['rating'] = netflix_df['rating'].replace(['74 min', '84 min', '66 min','NR'],'NR')
plt.figure(figsize=(10,12))
plt.subplots_adjust(wspace = 0.5, hspace = 0.3)
plt.subplot(2,2,1)
plt.title('Amazon~Content Rating', size=14)
sns.countplot(y='rating', 
              data=amazon_df, 
              palette='deep', 
              order=amazon_df['rating'].value_counts().index)

plt.subplot(2,2,2)
plt.title('Disney~Content Rating', size=14)
sns.countplot(y='rating', 
              data=disney_df, 
              palette='deep', 
              order=disney_df['rating'].value_counts().index)

plt.subplot(2,2,3)
plt.title('Hulu~Content Rating', size=14)
sns.countplot(y='rating', 
              data=hulu_df, 
              palette='deep', 
              order=hulu_df['rating'].value_counts().index)

plt.subplot(2,2,4)
plt.title('Netflix~Content Rating', size=14)
sns.countplot(y='rating', 
              data=netflix_df, 
              palette='deep', 
              order=netflix_df['rating'].value_counts().index)

plt.show()
Notebook Image

It should be no surprise that Disney+ does indeed offer the greatest number of family friendly content available than other platforms.

For more information on the content rating system you can reference here;

Movie: https://en.wikipedia.org/wiki/Motion_Picture_Association_film_rating_system

Television: https://en.wikipedia.org/wiki/Television_content_rating_system

Most common Genre on each platform?

#Amazon genre data cleaning
amazon_genre = amazon_df.copy()
amazon_genre = pd.concat([amazon_genre, amazon_df['listed_in'].str.split(',', expand=True)], axis=1)
amazon_genre = amazon_genre.melt(id_vars=['type','listed_in'], value_vars=range(5), value_name='genre')
amazon_genre = amazon_genre = amazon_genre[amazon_genre['genre'].notna()].drop(['variable'], axis=1)
amazon_genre['genre'] = amazon_genre['genre'].str.strip()

#Disney genre data cleaning
disney_genre = disney_df.copy()
disney_genre = pd.concat([disney_genre, disney_df['listed_in'].str.split(',', expand=True)], axis=1)
disney_genre = disney_genre.melt(id_vars=['type','listed_in'], value_vars=range(3), value_name='genre')
disney_genre = disney_genre = disney_genre[disney_genre['genre'].notna()].drop(['variable'], axis=1)
disney_genre['genre'] = disney_genre['genre'].str.strip()

#Hulu genre data cleaning
hulu_genre = hulu_df.copy()
hulu_genre = pd.concat([hulu_genre, hulu_df['listed_in'].str.split(',', expand=True)], axis=1)
hulu_genre = hulu_genre.melt(id_vars=['type','listed_in'], value_vars=range(3), value_name='genre')
hulu_genre = hulu_genre = hulu_genre[hulu_genre['genre'].notna()].drop(['variable'], axis=1)
hulu_genre['genre'] = hulu_genre['genre'].str.strip()

#Netflix genre data cleaning
netflix_genre = netflix_df.copy()
netflix_genre = pd.concat([netflix_genre, netflix_df['listed_in'].str.split(',', expand=True)], axis=1)
netflix_genre = netflix_genre.melt(id_vars=['type','listed_in'], value_vars=range(3), value_name='genre')
netflix_genre = netflix_genre = netflix_genre[netflix_genre['genre'].notna()].drop(['variable'], axis=1)
netflix_genre['genre'] = netflix_genre['genre'].str.strip()
Note: The genre category for Amazon; Arts,Entertainment and Culture could probably be concatenated into one as they share the same values and appears to be one genre. I could not determine this on Amazon's own Genre list nor with the dataset source so I left the data as is. Regardless this doesn't have an effect on this analysis.
Visualization made with Plotly is interactive. Hover your mouse over the graph for detailed value information. Click and drag a box over a specific area to zoom in, double click to return to default view. Hover mouse over the top right graph to access a tools panel.
px.histogram(amazon_genre, x='genre', color='type', title='Amazon').update_xaxes(type='category', categoryorder='total descending')

Amazon has their highest content released in the Drama genre at a total of 3,687.

px.histogram(disney_genre, x='genre', color='type', title='Disney').update_xaxes(type='category', categoryorder='total descending')

Disney+ has their highest content released in the Family genre at a total of 632.

px.histogram(hulu_genre, x='genre', color='type', title='Hulu').update_xaxes(type='category', categoryorder='total descending')

Hulu has their highest content released in the Drama genre at a total of 907.

px.histogram(netflix_genre, x='genre', color='type', title='Netflix').update_xaxes(type='category', categoryorder='total descending')

Netflix has their highest content released in the International Movies genre at a total of 2,752.

Conclusion:

Findings and possible recommendation;

  • Amazon Prime Video: This platform has the most content and also the most up to date content available, with the highest number of new releases being from the current year 2021. If you're looking for the Drama's genre, Amazon also boast the greatest number of contents here.

  • Disney+: Disney is and has always been more focused on family friendly content. If you have children then this is definitely the best platform for you but with Disney acquiring the Star Wars franchise and Marvel Entertainment there is no shortage of content for adults as well.

  • Hulu: Hulu offers more TV Shows relative to Movies than the other platforms. As of 2019 Hulu is now a Subsidiary of Disney. The future of Hulu is unclear as this acquisition was recent, but Disney quoted "Hulu would be oriented towards General Entertainment and content targeting mature audiences." so it appears that this service will continue.

  • Netflix: Coming in a very close second place to Amazon in terms of total content, Netflix has the largest number of International Movies and TV shows available.

The two big players here appear to be Amazon and Netflix. Both services have a basic starting plans at $8.99 usd per month. I don't personally subscribe to either service but after researching each platforms website, it seems that Amazon's prime video is more of only a base structure. It does offer a free section but it charges extra money for each additonal service through what it calls "Channels" like Paramount+, AMC+, Discovery+ etc... while also charging for Movie and TV Shows with the option to rent or buy the content.

This remains purely subjective but if I had to only choose one it would definitely be Netflix. With Netflix the only additional cost is whether or not you want access to HD or Ultra HD content. Netflix still has a plenty of content like Amazon but without the confusing monetization. Netflix also seems to try and cater to a global audience across multiple genres.

The good news with any of these services is that you can cancel your subscription at anytime. As everyone has their own personal interest, it would be best to try them all out and see what works for you.

For current plans and pricing:

Amazon Cost: https://www.amazon.com/gp/help/customer/display.html?nodeId=G34EUPKVMYFW8N2U

Disney Cost: https://www.disneyplus.com/

Hulu Cost: https://help.hulu.com/s/article/how-much-does-hulu-cost

Netflix Cost: https://help.netflix.com/en/node/24926

Future work for this analysis

  • User rating: It would be interesting to incorporate user ratings from sources such as IMDB, Rotten Tomatoes ect. to see what's the most popular content/genre on each platform.

  • Hulu source: Find a better source for the Hulu platform as this did not provide information on the Cast column and the Director column was also jumbled with a content description which made it impossible to parse the names of each director. I could have then made a comparison of the four platforms on Top 5 Cast/Directors.