Data Analysis with Python: Zero to Pandas - Course Project Guidelines
(remove this cell before submission)
Important links:
- Make submissions here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/course-project
- Ask questions here: https://jovian.ml/forum/t/course-project-on-exploratory-data-analysis-discuss-and-share-your-work/11684
- Find interesting datasets here: https://jovian.ml/forum/t/recommended-datasets-for-course-project/11711
This is the starter notebook for the course project for Data Analysis with Python: Zero to Pandas. You will pick a real-world dataset of your choice and apply the concepts learned in this course to perform exploratory data analysis. Use this starter notebook as an outline for your project . Focus on documentation and presentation - this Jupyter notebook will also serve as a project report, so make sure to include detailed explanations wherever possible using Markdown cells.
Evaluation Criteria
Your submission will be evaluated using the following criteria:
- Dataset must contain at least 3 columns and 150 rows of data
- You must ask and answer at least 4 questions about the dataset
- Your submission must include at least 4 visualizations (graphs)
- Your submission must include explanations using markdown cells, apart from the code.
- Your work must not be plagiarized i.e. copy-pasted for somewhere else.
Follow this step-by-step guide to work on your project.
Step 1: Select a real-world dataset
- Find an interesting dataset on this page: https://www.kaggle.com/datasets?fileType=csv
- The data should be in CSV format, and should contain at least 3 columns and 150 rows
- Download the dataset using the
opendatasets
Python library
Here's some sample code for downloading the US Elections Dataset:
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')
You can find a list of recommended datasets here: https://jovian.ml/forum/t/recommended-datasets-for-course-project/11711
Step 2: Perform data preparation & cleaning
- Load the dataset into a data frame using Pandas
- Explore the number of rows & columns, ranges of values etc.
- Handle missing, incorrect and invalid data
- Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)
Step 3: Perform exploratory analysis & visualization
- Compute the mean, sum, range and other interesting statistics for numeric columns
- Explore distributions of numeric columns using histograms etc.
- Explore relationship between columns using scatter plots, bar charts etc.
- Make a note of interesting insights from the exploratory analysis
Step 4: Ask & answer questions about the data
- Ask at least 4 interesting questions about your dataset
- Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn
- Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary
- Wherever you're using a library function from Pandas/Numpy/Matplotlib etc. explain briefly what it does
Step 5: Summarize your inferences & write a conclusion
- Write a summary of what you've learned from the analysis
- Include interesting insights and graphs from previous sections
- Share ideas for future work on the same topic using other relevant datasets
- Share links to resources you found useful during your analysis
Step 6: Make a submission & share your work
- Upload your notebook to your Jovian.ml profile using
jovian.commit
. - Make a submission here: https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas/assignment/course-project
- Share your work on the forum: https://jovian.ml/forum/t/course-project-on-exploratory-data-analysis-discuss-and-share-your-work/11684
- Browse through projects shared by other participants and give feedback
(Optional) Step 7: Write a blog post
- A blog post is a great way to present and showcase your work.
- Sign up on Medium.com to write a blog post for your project.
- Copy over the explanations from your Jupyter notebook into your blog post, and embed code cells & outputs
- Check out the Jovian.ml Medium publication for inspiration: https://medium.com/jovianml
Example Projects
Refer to these projects for inspiration:
-
Analyzing your browser history using Pandas & Seaborn by Kartik Godawat
-
WhatsApp Chat Data Analysis by Prajwal Prashanth
-
Understanding the Gender Divide in Data Science Roles by Aakanksha N S
NOTE: Remove this cell containing the instructions before making your submission. You can do using the "Edit > Delete Cells" menu option.
import jovian
project='thanzin-zerotopandas-course-project-1'
jovian.commit(project=project)
Project Title - change this
TODO - Write some introduction about your project here: describe the dataset, where you got it from, what you're trying to do with it, and which tools & techniques you're using. You can also mention about the course Data Analysis with Python: Zero to Pandas, and what you've learned from it.
Title: Data Analysis of the factors influencing National Happiness
TODO- The purpose of this analysis is to determine what variables have a positive and negative influence on happiness and life
satisfaction. I retrieved this dataset from Kaggle. I am using ,, and ____ for my data analysis. I will use matplotlib and seaborn libraries for data visualization tasks.