Analysis of PS4 and XBOX One Video Game Sales

The project's aim is to learn what factor influences the most sales for a video game. This project may benefit the video game publishers reading this as everything analyzed refers to the correlation of factors to sales. The factors included in this study are genre, year, region, and publisher.

The dataset used is "Video Game Sales Dataset" from kaggle (https://www.kaggle.com/sidtwr/videogames-sales-dataset?select=XboxOne_GameSales.csv). Albeit the dataset has three csv files, we are only using two since the focus of this study is only on the PS4 and XBox One. The selected datasets contain the following columns: Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, and Global. There are around 1000 games for the PS4 and 600 for the XBox One (may have same games), making this project have a decent size considering that there are only 5 years in the dataset (PS4 and XBox One was released on 2013).

The following libraries were used for the project: collections, numpy and pandas for calculation and data analysis, seaborn and matplotlib for visualizations. The techniques used in the project are aligned to the teachings from the course "Data Analysis with Python: Zero to Pandas" by Jovian in partnership with freecodecamp.

1. Assessing and Cleaning the Dataset

In this section, we will look at the dataset and change it to get better inisights and create the best visualizations possible. This section will include removing rows with nulls, no sales, and games with duplicates. We will also be combining the PS4 and XBox One dataset and adding a total sales column. The essence of this section is to delete errors in the dataset and make sure what we use for visualization is optimal.

2. Exploratory Data Analysis and Data Visualization

In this section, we will try and get insights from the preprocessed dataset from visualizations. As we gain insights from the visualizations, we will be asking questions in the form of hypotheses to further our knowledge in the dataset. These hypotheses mentioned are as follows:

The publisher of the the top game is also the publisher with the highest sales.
The higher the game count of a publisher, the more sales the publisher has.
The top publisher in the most sold region is also the top publisher in the other regions
A region's interest in a genre differs from one another due to difference in culture and tradition.
The top publisher's performance is infulenced by the market's performance.
A top publisher are creating games according to highest selling genre.

3. Conclusion

We sum up our insights and the answers to our 6 hypotheses in this section.

4. Recommendations

For this section, we state our recommendations if one is to re-do the study based on the insights we gathered in the exploratory data analysis and data visualization.

5. References

The sources found in this section were used in the project.

Assessing and Cleaning the Dataset

We begin by importing the libraries and uploading the datasets.

!pip install opendatasets --upgrade

Collecting opendatasets
  Downloading opendatasets-0.1.20-py3-none-any.whl (14 kB)
Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 4.5 MB/s eta 0:00:011
Requirement already satisfied: click in /opt/conda/lib/python3.9/site-packages (from opendatasets) (8.0.1)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.9/site-packages (from opendatasets) (4.62.3)
Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (1.16.0)
Requirement already satisfied: certifi in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (2021.5.30)
Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (2.8.2)
Requirement already satisfied: requests in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (2.26.0)
Collecting python-slugify
  Downloading python_slugify-6.1.1-py2.py3-none-any.whl (9.1 kB)
Requirement already satisfied: urllib3 in /opt/conda/lib/python3.9/site-packages (from kaggle->opendatasets) (1.26.7)
Collecting text-unidecode>=1.3
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
     |████████████████████████████████| 78 kB 9.6 MB/s  eta 0:00:01
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests->kaggle->opendatasets) (3.1)
Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.9/site-packages (from requests->kaggle->opendatasets) (2.0.0)
Building wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... done
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73051 sha256=64e419c2e66eccd07bfa2716a2b483976133c9b552b84469bfce49d4de7e46ba
  Stored in directory: /home/jovyan/.cache/pip/wheels/ac/b2/c3/fa4706d469b5879105991d1c8be9a3c2ef329ba9fe2ce5085e
Successfully built kaggle
Installing collected packages: text-unidecode, python-slugify, kaggle, opendatasets
Successfully installed kaggle-1.5.12 opendatasets-0.1.20 python-slugify-6.1.1 text-unidecode-1.3

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import collections
import opendatasets as od
import jovian
%matplotlib inline
sns.set_style('whitegrid')
matplotlib.rcParams['font.size'] = 10
matplotlib.rcParams['figure.figsize'] = (10, 4)