Learn practical skills, build real-world projects, and advance your career

Data Analysis with Python: Zero to Pandas - Course Project Guidelines

(remove this cell before submission)

Important links:

This is the starter notebook for the course project for Data Analysis with Python: Zero to Pandas. You will pick a real-world dataset of your choice and apply the concepts learned in this course to perform exploratory data analysis. Use this starter notebook as an outline for your project . Focus on documentation and presentation - this Jupyter notebook will also serve as a project report, so make sure to include detailed explanations wherever possible using Markdown cells.

Evaluation Criteria

Your submission will be evaluated using the following criteria:

  • Dataset must contain at least 3 columns and 150 rows of data
  • You must ask and answer at least 4 questions about the dataset
  • Your submission must include at least 4 visualizations (graphs)
  • Your submission must include explanations using markdown cells, apart from the code.
  • Your work must not be plagiarized i.e. copy-pasted for somewhere else.

Follow this step-by-step guide to work on your project.

Step 1: Select a real-world dataset

Here's some sample code for downloading the US Elections Dataset:

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')

You can find a list of recommended datasets here: https://jovian.ml/forum/t/recommended-datasets-for-course-project/11711

Step 2: Perform data preparation & cleaning

  • Load the dataset into a data frame using Pandas
  • Explore the number of rows & columns, ranges of values etc.
  • Handle missing, incorrect and invalid data
  • Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

Step 3: Perform exploratory analysis & visualization

  • Compute the mean, sum, range and other interesting statistics for numeric columns
  • Explore distributions of numeric columns using histograms etc.
  • Explore relationship between columns using scatter plots, bar charts etc.
  • Make a note of interesting insights from the exploratory analysis

Step 4: Ask & answer questions about the data

  • Ask at least 4 interesting questions about your dataset
  • Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn
  • Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary
  • Wherever you're using a library function from Pandas/Numpy/Matplotlib etc. explain briefly what it does

Step 5: Summarize your inferences & write a conclusion

  • Write a summary of what you've learned from the analysis
  • Include interesting insights and graphs from previous sections
  • Share ideas for future work on the same topic using other relevant datasets
  • Share links to resources you found useful during your analysis

Step 6: Make a submission & share your work

(Optional) Step 7: Write a blog post

  • A blog post is a great way to present and showcase your work.
  • Sign up on Medium.com to write a blog post for your project.
  • Copy over the explanations from your Jupyter notebook into your blog post, and embed code cells & outputs
  • Check out the Jovian.ml Medium publication for inspiration: https://medium.com/jovianml

Example Projects

Refer to these projects for inspiration:

NOTE: Remove this cell containing the instructions before making your submission. You can do using the "Edit > Delete Cells" menu option.

Video game sales Data analysis

  • Dataset obtained from kaggle
  • Aiming to do data analysis for video game sales in depth

Downloading the Dataset

Downloading video game sales dataset using opendatasets module
!pip install jovian opendatasets --upgrade --quiet

Let's begin by downloading the data, and listing the files within the dataset.