Sign In

Analysis of "Pokemon with stats" dataset


In the following a dataset comprising all Pokemon of the first 6 generations is evaluated. The dataset contains 13 columns. Each Pokemon has a unique number that corresponds to their number in the Pokedex and a Name. Most of the other columns contain the stats for each Pokemon and in addition there is information about the Types, from which Generation this Pokemon is and if it is a legendary Pokemon.

The Dataset originates from Kaggle. Link to Kaggle dataset.

I do not own the copyright of this picture! Link to Banner.

To use with jovian, the jovian has to be installed and imported.

In [1]:
project_name = "zerotopandas-course-project-Pokemon"
In [2]:
#!pip install jovian --upgrade -q
In [3]:
import jovian
In [4]:
#jovian.commit(project = project_name)

Additionaly several libraries that are used in this notebook are imported.

In [5]:
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Later on a pairplot from seaborn is generated, which throws several warnings, but works anyway. The warnings library is imported to filter warnings.

In [6]:
import warnings

For the generation of further plots, parameters are set in the following.

In [7]:
In [8]:
plt.rcParams['font.size'] = 12
plt.rcParams['figure.figsize'] = (15, 5)

Data Preparation and Cleaning

Beforehand I downloaded the dataset to my local drive. The dataset is now imported from the file Pokemon.csv.

In [9]:
Pokemon = pd.read_csv('Pokemon.csv')

Let's quickly take a look on the dataset, to get an overview of what we are dealing with, using the head and shape function.

In [10]:
In [11]:
(800, 13)

First, we are replacing the whitespace in the column names by an underscore to avoid any problem when calling columns.

In [12]:
Pokemon.columns = Pokemon.columns.str.replace(' ', '_')

As we can see there is also a little problem with the Pokemon's names. If they are the Mega-evolution, the name contains redundancies. So now we are correcting for this.

In [13]:
Pokemon['Name'] = Pokemon['Name'].str.replace(".*(?=Mega)", "")

Now, we are setting the name column as index to easily identify each Pokemon by its name.

In [14]:
Pokemon = Pokemon.set_index('Name')

As every Pokemon can be of one or two different Types, we generate a new column called Type_combined which shows the combinations of both types.

In [15]:
Pokemon['Type_combined'] = Pokemon[['Type_1', 'Type_2']].fillna('').sum(axis=1)
In [16]:

Exploratory Analysis and Visualization

Firstly we want to know how many Pokemon are containg in our dataset.

In [17]:
Number_of_Pokemon = Pokemon.shape[0]
print('There are {} Pokemon in this dataset.'.format(Number_of_Pokemon))
There are 800 Pokemon in this dataset.

To calculate the basic statistics of the dataset, we are using the describe function.

In [18]:

Additionally we can plot the quickly plot all columns against each other to see potential connections and distributions using the pairplot function from seaborn library.

In [19]:
sns.pairplot(Pokemon, hue="Generation");