In the following a dataset comprising all Pokemon of the first 6 generations is evaluated. The dataset contains 13 columns. Each Pokemon has a unique number that corresponds to their number in the Pokedex and a Name. Most of the other columns contain the stats for each Pokemon and in addition there is information about the Types, from which Generation this Pokemon is and if it is a legendary Pokemon.
The Dataset originates from Kaggle. Link to Kaggle dataset.
I do not own the copyright of this picture! Link to Banner.
To use with jovian, the jovian has to be installed and imported.
project_name = "zerotopandas-course-project-Pokemon"
#!pip install jovian --upgrade -q
import jovian
#jovian.commit(project = project_name)
Additionaly several libraries that are used in this notebook are imported.
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Later on a pairplot from seaborn is generated, which throws several warnings, but works anyway. The warnings library is imported to filter warnings.
import warnings
warnings.filterwarnings("ignore")
For the generation of further plots, parameters are set in the following.
sns.set_style('white')
plt.rcParams['font.size'] = 12
plt.rcParams['figure.figsize'] = (15, 5)
Beforehand I downloaded the dataset to my local drive. The dataset is now imported from the file Pokemon.csv.
Pokemon = pd.read_csv('Pokemon.csv')
Let's quickly take a look on the dataset, to get an overview of what we are dealing with, using the head and shape function.
Pokemon.head(10)
Pokemon.shape
(800, 13)
First, we are replacing the whitespace in the column names by an underscore to avoid any problem when calling columns.
Pokemon.columns = Pokemon.columns.str.replace(' ', '_')
As we can see there is also a little problem with the Pokemon's names. If they are the Mega-evolution, the name contains redundancies. So now we are correcting for this.
Pokemon['Name'] = Pokemon['Name'].str.replace(".*(?=Mega)", "")
Now, we are setting the name column as index to easily identify each Pokemon by its name.
Pokemon = Pokemon.set_index('Name')
As every Pokemon can be of one or two different Types, we generate a new column called Type_combined which shows the combinations of both types.
Pokemon['Type_combined'] = Pokemon[['Type_1', 'Type_2']].fillna('').sum(axis=1)
Pokemon.head()
Firstly we want to know how many Pokemon are containg in our dataset.
Number_of_Pokemon = Pokemon.shape[0]
print('There are {} Pokemon in this dataset.'.format(Number_of_Pokemon))
There are 800 Pokemon in this dataset.
To calculate the basic statistics of the dataset, we are using the describe function.
Pokemon.describe()
Additionally we can plot the quickly plot all columns against each other to see potential connections and distributions using the pairplot function from seaborn library.
sns.pairplot(Pokemon, hue="Generation");