Learn practical skills, build real-world projects, and advance your career

Olympics-dataset-analysis

Data Exploration of historical Olympics dataset

I this notebook I use python to run some data exploration techniques to provid my view of viewing the dataset.

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016.

Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on.

I run my analysis primarly on Summer Olympics

The data have been scraped from www.sports-reference.com in May 2018.

Content

The file athlete_events.csv contains 271116 rows and 15 columns; Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are the following:


1. ID - Unique number for each athlete;

2. Name - Athlete's name;

3. Sex - M or F;

4. Age - Integer;

5. Height - In centimeters;

6. Weight - In kilograms;

7. Team - Team name;

8. NOC - National Olympic Committee 3-letter code;

9. Games - Year and season;

10. Year - Integer;

11. Season - Summer or Winter;

12. City - Host city;

13. Sport - Sport;

14. Event - Event;

15. Medal - Gold, Silver, Bronze, or NA.

Index of contents

from google.colab import files
files.upload()

! pip install opendatasets --upgrade
import opendatasets as od

dataset_url = 'https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results'
od.download(dataset_url)
Saving kaggle.json to kaggle.json Collecting opendatasets Downloading https://files.pythonhosted.org/packages/18/99/aaa3ebec81dc347302e730e0daff61735ed2f3e736129553fb3f9bf67ed3/opendatasets-0.1.10-py3-none-any.whl Requirement already satisfied, skipping upgrade: tqdm in /usr/local/lib/python3.7/dist-packages (from opendatasets) (4.41.1) Requirement already satisfied, skipping upgrade: kaggle in /usr/local/lib/python3.7/dist-packages (from opendatasets) (1.5.12) Requirement already satisfied, skipping upgrade: click in /usr/local/lib/python3.7/dist-packages (from opendatasets) (7.1.2) Requirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (2.23.0) Requirement already satisfied, skipping upgrade: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (4.0.1) Requirement already satisfied, skipping upgrade: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (2.8.1) Requirement already satisfied, skipping upgrade: six>=1.10 in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (1.15.0) Requirement already satisfied, skipping upgrade: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (1.24.3) Requirement already satisfied, skipping upgrade: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle->opendatasets) (2020.12.5) Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle->opendatasets) (3.0.4) Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->kaggle->opendatasets) (2.10) Requirement already satisfied, skipping upgrade: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle->opendatasets) (1.3) Installing collected packages: opendatasets Successfully installed opendatasets-0.1.10
100%|██████████| 5.43M/5.43M [00:00<00:00, 91.9MB/s]
Downloading 120-years-of-olympic-history-athletes-and-results.zip to ./120-years-of-olympic-history-athletes-and-results

Importing Dataset