Upload your notebook to your Jovian.ml profile using jovian.commit
.
Make a submission here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/course-project
Share your work on the forum: https://jovian.ml/forum/t/course-project-on-exploratory-data-analysis-discuss-and-share-your-work/11684
Browse through projects shared by other participants and give feedback
Your submission will be evaluated using the following criteria:
In the following a dataset comprising all Pokemon of the first 6 generations is evaluated. The dataset contains 13 columns. Each Pokemon has a unique number that corresponds to their number in the Pokedex and a Name. Most of the other columns contain the stats for each Pokemon and in addition there is information about the Types, from which Generation this Pokemon is and if it is a legendary Pokemon.
The Dataset originates from Kaggle. Link to Kaggle dataset.
I do not own the copyright of this picture! Link to Banner.
To use with jovian, the jovian has to be installed and imported.
project_name = "zerotopandas-course-project-Pokemon"
!pip install jovian --upgrade -q
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Zugriff verweigert: 'c:\\programdata\\anaconda3\\etc\\jupyter\\nbconfig\\notebook.d\\jovian_nb_ext.json'
Consider using the `--user` option or check the permissions.
import jovian
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..
Additionaly several libraries that are used in this notebook are imported.
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Later on a pairplot from seaborn is generated, which throws several warnings, but works anyway. The warnings library is imported to filter warnings.
import warnings
warnings.filterwarnings("ignore") #multiscatter plot throws annoying warning but still works
For the generation of further plots, parameters are set in the following.
sns.set_style('white')
plt.rcParams['font.size'] = 12
plt.rcParams['figure.figsize'] = (15, 5)
Beforehand I downloaded the dataset to my local drive. The dataset is now imported from the file Pokemon.csv.
Pokemon = pd.read_csv('Pokemon.csv')
Let's quickly take a look on the dataset, to get an overview using the head and shape function.
Pokemon.head(15)
Pokemon.shape
First, we are replacing the whitespace in the column names by an underscore to avoid any problem when calling columns.
Pokemon.columns = Pokemon.columns.str.replace(' ', '_')
As we can see there is also a little problem with the Pokemon's names. If they are the Mega-evolution, the name contains redundancies. So now we are correcting for this.
Pokemon['Name'] = Pokemon['Name'].str.replace(".*(?=Mega)", "")
Now, we are setting the name column as index to easily identify each Pokemon by its name.
Pokemon = Pokemon.set_index('Name')
As every Pokemon can be of one or two different Types, we generate a new column called Type_combined which shows the combinations of both types.
Pokemon['Type_combined'] = Pokemon[['Type_1', 'Type_2']].fillna('').sum(axis=1)
Pokemon.head()
# jovian.commit()
TODO
Number_of_Pokemon = Pokemon.shape[0]
print('There are {} Pokemon in this dataset.'.format(Number_of_Pokemon))
Pokemon.describe()
sns.pairplot(Pokemon, hue="Generation");
Pokemon['Generation'].unique()
Pokemon['Type_1'].unique()
Pokemon['Type_2'].unique()
Pokemon['Type_combined'].unique()
# jovian.commit()
TODO
Pokemon.groupby('Type_1').mean().sort_values(
by='Total', ascending=False).head(10)
Pokemon.groupby('Type_2').mean().sort_values(
by='Total', ascending=False).head(10)
Pokemon.groupby('Type_combined').mean().sort_values(
by='Total', ascending=False).head(10)
Pokemon['Rank'] = Pokemon['Total'].rank(method='first', ascending=False)
Pokemon.head()
Pokemon.sort_values(by='Total', ascending=False)
Rank_plot = Pokemon.sort_values(by='Rank', ascending=True)[0:100]
Rank_plot
sns.scatterplot('Rank',
'Total',
hue='Type_1',
s=50,
alpha=1,
data=Rank_plot)
plt.title('Ranking Top 100 Pokemon')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Rank_plot_non_legend = Pokemon[Pokemon.Legendary == False].sort_values(
by='Rank', ascending=True)[0:100]
Rank_plot_non_legend
sns.scatterplot('Rank',
'Total',
hue='Type_1',
s=50,
alpha=1,
data=Rank_plot_non_legend)
plt.title('Ranking Top 100 non-legendary Pokemon')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
mean_Rank_Type_1 = Pokemon.groupby(by='Type_1').mean()['Rank']
mean_Rank_Type_1.sort_values(ascending=True)
median_Rank_Type_1 = Pokemon.groupby(by='Type_1').median()['Rank']
median_Rank_Type_1.sort_values(ascending=True)
shifted_distribution_rank = mean_Rank_Type_1.sort_values(
ascending=True) - median_Rank_Type_1.sort_values(ascending=True)
plt.plot(shifted_distribution_rank.sort_values(ascending=False))
plt.hlines(y=0, xmin=0, xmax=17, linestyles='dashed', alpha=0.25)
plt.title('Difference of mean and median rank brouped by Type 1')
# jovian.commit()
TODO
plt.hist(Pokemon.Total, bins= 15);
p_thresh= 1e-3
p_norm= stats.normaltest(Pokemon.Total)[1]
if p_norm < p_thresh:
print('The p-value smaller than {} and therefore the data does not correspond to a normal distribution. Be aware of that fact when selecting statistical tests.'.format(p_thresh))
else:
print('The p-value greater than {}. The data seems to be normally distributed. Go on with your statistics.'.format(p_thresh))
As the data in the total score column are not normal distributed, for testing of significant differences, the ranked values were facilitated.
Dragon= Pokemon.Rank[Pokemon.Type_1 == 'Dragon']
Flying= Pokemon.Rank[Pokemon.Type_1 == 'Flying']
Fairy= Pokemon.Rank[Pokemon.Type_1 == 'Fairy']
stats.ttest_ind(Dragon, Flying)
stats.ttest_ind(Dragon, Fairy)
# jovian.commit()
Link dataset with possible attacks to infere the strongest possible combination (the all-purpose Pokemon)
# jovian.commit()