Video game is always related to our childhood. We played game when we're small and even when we're already an adult. But is the industry doing well these day ? We can analyze the video game sale dataset with graphs visualization to get some insight about that.
The dataset is taken from https://www.kaggle.com/rishidamarla/video-game-sales
Libraries used in project :
Thanks Jovian for the course project.
This is an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. You can run and experiment with the code in a couple of ways: using free online resources (recommended) or on your own computer.
The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle".
Install Conda by following these instructions. Add Conda binaries to your system PATH
, so you can use the conda
command on your terminal.
Create a Conda environment and install the required libraries by running these commands on the terminal:
conda create -n zerotopandas -y python=3.8
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn opendatasets --upgrade
jovian clone notebook-owner/notebook-id
cd directory-name
and start the Jupyter notebook.jupyter notebook
You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook file (it has a .ipynb
extension) to open it.
Firstly We need to download the dataset to use. The link is already provided in the description above. You can also find a lot of interesting datasets on Kaggle
!pip install jovian opendatasets --upgrade --quiet
Let's begin by downloading the data, and listing the files within the dataset.
# Change this
dataset_url = 'https://www.kaggle.com/rishidamarla/video-game-sales'
The downloader will need to use ur username and apikey (generated in ur profile account on Kaggle) so firstly you should probably regis an account on Kaggle.
import opendatasets as od
od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: akariiiii
Your Kaggle Key: ········
100%|██████████| 476k/476k [00:00<00:00, 83.7MB/s]
Downloading video-game-sales.zip to ./video-game-sales
The dataset has been downloaded and extracted.
# Change this
data_dir = './video-game-sales'
import os
os.listdir(data_dir)
['Video_Games.csv']
Let us save and upload our work to Jovian before continuing.
project_name = "data-analysis-of-video-game-sales"
!pip install jovian --upgrade -q
import jovian
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..
[jovian] Please enter your API key ( from https://jovian.ml/ ):
API KEY: ········
[jovian] Updating notebook "indexkyou/data-analysis-of-video-game-sales" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/indexkyou/data-analysis-of-video-game-sales
Firstly we should load the dataset into Pandas data frame and take a look what can we get with this dataset.
import pandas as pd
game_sales_df = pd.read_csv('./video-game-sales/Video_Games.csv')
game_sales_df
Pretty cool we have 16719 rows equal to 16719 game titles here. We should probably check out the columns and info to see if this dataset is already workable
game_sales_df.columns
Index(['Name', 'Platform', 'Year_of_Release', 'Genre', 'Publisher', 'NA_Sales',
'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales', 'Critic_Score',
'Critic_Count', 'User_Score', 'User_Count', 'Developer', 'Rating'],
dtype='object')
game_sales_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 16717 non-null object
1 Platform 16719 non-null object
2 Year_of_Release 16450 non-null float64
3 Genre 16717 non-null object
4 Publisher 16665 non-null object
5 NA_Sales 16719 non-null float64
6 EU_Sales 16719 non-null float64
7 JP_Sales 16719 non-null float64
8 Other_Sales 16719 non-null float64
9 Global_Sales 16719 non-null float64
10 Critic_Score 8137 non-null float64
11 Critic_Count 8137 non-null float64
12 User_Score 10015 non-null object
13 User_Count 7590 non-null float64
14 Developer 10096 non-null object
15 Rating 9950 non-null object
dtypes: float64(9), object(7)
memory usage: 2.0+ MB
Look at the info we can see that :
We should try remove nun object for a better dataframe.
game_sales_df.drop(game_sales_df[game_sales_df.Year_of_Release.isnull()].index, inplace = True)
game_sales_df.drop(game_sales_df[game_sales_df.Name.isnull()].index, inplace = True)
game_sales_df.drop(game_sales_df[game_sales_df.Publisher.isnull()].index, inplace = True)
game_sales_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16416 entries, 0 to 16718
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 16416 non-null object
1 Platform 16416 non-null object
2 Year_of_Release 16416 non-null float64
3 Genre 16416 non-null object
4 Publisher 16416 non-null object
5 NA_Sales 16416 non-null float64
6 EU_Sales 16416 non-null float64
7 JP_Sales 16416 non-null float64
8 Other_Sales 16416 non-null float64
9 Global_Sales 16416 non-null float64
10 Critic_Score 7982 non-null float64
11 Critic_Count 7982 non-null float64
12 User_Score 9837 non-null object
13 User_Count 7461 non-null float64
14 Developer 9904 non-null object
15 Rating 9767 non-null object
dtypes: float64(9), object(7)
memory usage: 2.1+ MB
Ok that dataframe seems good enough. We should take a closer look at the description.
game_sales_df.describe()
So we have around 16450 game titles that was sold between 1980 and 2020. NA seems like the biggest market to sell game.
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "indexkyou/data-analysis-of-video-game-sales" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/indexkyou/data-analysis-of-video-game-sales
At first look the dataframe is already sorted by Global_Sales. But for better viewer we should try creating a few graph.
Let's begin by importingmatplotlib.pyplot
and seaborn
.
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 13
matplotlib.rcParams['figure.figsize'] = (36, 20)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
First, We should see the total sales of games each year. It helps us know when video game is declined and when is it popular.
sns.countplot('Year_of_Release', data = game_sales_df)
plt.title('Total game sale each year')
plt.show()
Seems like we dont have much data from 2017 to 2020 let remove them and try using another graph for better view.
game_sales_df.drop(game_sales_df[game_sales_df.Year_of_Release > 2016].index, inplace = True)
sales_df = game_sales_df.groupby('Year_of_Release', as_index = False).sum()
x = sales_df['Year_of_Release']
y = sales_df['Global_Sales']
plt.figure(figsize=(20,10), dpi= 60)
plt.plot(x,y, label = 'Sales', color = 'green')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.title('Total game sale each year')
plt.legend()
plt.show()
Let add other sales area as well like NA | EU | JP
x = sales_df['Year_of_Release']
na = sales_df['NA_Sales']
eu = sales_df['EU_Sales']
jp = sales_df['JP_Sales']
total = sales_df['Global_Sales']
plt.title('Sales comparison between area and global')
plt.plot(x,total, label = 'Global')
plt.plot(x,na, label = 'US')
plt.plot(x,eu, label = 'EU')
plt.plot(x,jp, label = 'JP')
plt.legend(bbox_to_anchor =(1, 1))
<matplotlib.legend.Legend at 0x7ff838d6ce50>
We can see that US is the largest market follow by EU and JP. JP is pretyy consistent and doesnt seem to be declined that much. In 2008 and 2009 video game was explored in popular so we should take a look at the game list in these year.
list_games_2008 = game_sales_df.loc[game_sales_df['Year_of_Release'] == 2008]
list_games_2008.sort_values('Global_Sales',ascending = False).head(10)
list_games_2009 = game_sales_df.loc[game_sales_df['Year_of_Release'] == 2009]
list_games_2009.sort_values('Global_Sales',ascending = False).head(10)
In 2008 and 2009, the most popular game is from Wii platform. That's pretty interesting let see the pie graph for platform (We should combine two dataframe as well)
combine_list = list_games_2008.append(list_games_2009)
platform_counts = combine_list.Platform.value_counts()
platform_counts
DS 895
Wii 607
X360 318
PS3 300
PS2 287
PSP 261
PC 183
DC 1
XB 1
Name: Platform, dtype: int64
plt.figure(figsize=(24,12))
plt.title("Top 10 platform in 2008 and 2009")
plt.pie(platform_counts, labels=platform_counts.index, autopct='%1.1f%%', startangle=180);
plt.legend(loc = 2,fontsize = 10, bbox_to_anchor = (1, 1), ncol = 2)
<matplotlib.legend.Legend at 0x7ff83876ce80>
top10_platforms = game_sales_df.Platform.value_counts().head(10)
plt.figure(figsize=(24,12))
plt.title("Top 10 platform of all time")
plt.pie(top10_platforms, labels=top10_platforms.index, autopct='%1.1f%%', startangle=180);
plt.legend(loc = 2,fontsize = 10, bbox_to_anchor = (1, 1), ncol = 2)
<matplotlib.legend.Legend at 0x7ff838688e80>
PS2 still dominated for many years truely the best selling console of all time.
top_publishers = game_sales_df.Publisher.value_counts().head(10)
top_publishers
Electronic Arts 1344
Activision 976
Namco Bandai Games 935
Ubisoft 929
Konami Digital Entertainment 825
THQ 712
Nintendo 700
Sony Computer Entertainment 686
Sega 629
Take-Two Interactive 421
Name: Publisher, dtype: int64
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
sns.barplot(top_publishers.index, top_publishers);
top_genres = game_sales_df.Genre.value_counts().head(10)
plt.figure(figsize=(12,6))
sns.barplot(top_genres.index, top_genres);
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "indexkyou/data-analysis-of-video-game-sales" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/indexkyou/data-analysis-of-video-game-sales
TODO - write some explanation here.
Instructions (delete this cell)
- Ask at least 5 interesting questions about your dataset
- Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn
- Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary
- Wherever you're using a library function from Pandas/Numpy/Matplotlib etc. explain briefly what it does
Let us save and upload our work to Jovian before continuing.
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "indexkyou/data-analysis-of-video-game-sales" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/indexkyou/data-analysis-of-video-game-sales
TODO - Write some explanation here: a summary of all the inferences drawn from the analysis, and any conclusions you may have drawn by answering various questions.
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "aakashns/zerotopandas-course-project-starter" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/aakashns/zerotopandas-course-project-starter
TODO - Write some explanation here: ideas for future projects using this dataset, and links to resources you found useful.
Submission Instructions (delete this cell)
- Upload your notebook to your Jovian.ml profile using
jovian.commit
.- Make a submission here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/course-project
- Share your work on the forum: https://jovian.ml/forum/t/course-project-on-exploratory-data-analysis-discuss-and-share-your-work/11684
- Share your work on social media (Twitter, LinkedIn, Telegram etc.) and tag @JovianML
(Optional) Write a blog post
- A blog post is a great way to present and showcase your work.
- Sign up on Medium.com to write a blog post for your project.
- Copy over the explanations from your Jupyter notebook into your blog post, and embed code cells & outputs
- Check out the Jovian.ml Medium publication for inspiration: https://medium.com/jovianml
import jovian
jovian.commit()
[jovian] Attempting to save notebook..
[jovian] Updating notebook "indexkyou/data-analysis-of-video-game-sales" on https://jovian.ml/
[jovian] Uploading notebook..
[jovian] Capturing environment..
[jovian] Committed successfully! https://jovian.ml/indexkyou/data-analysis-of-video-game-sales