Learn practical skills, build real-world projects, and advance your career

Advanced Data Analysis Techniques with Python & Pandas

This tutorial is a part of the Zero to Data Science Bootcamp by Jovian.

alt

Pandas is a popular Python library used for working in tabular data (similar to the data stored in a spreadsheet). Pandas offers several easy-to-use and efficient utilities for loading, processing, cleaning and analyzing large tabular datasets. Datasets containing millions of records can be processed using Pandas in a matter of minutes.

This tutorial covers the following topics:

  • Downloading datasets from online sources
  • Processing massive datasets using Pandas
  • Working with categorical data
  • Handling missing and duplicate data
  • Transforming data with type-specific functions
  • Data frame concatenation and merging

How to run the code

This tutorial is an executable Jupyter notebook hosted on Jovian. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.

Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Colab. Follow these instructions to connect your Google Drive with Jovian.

Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.

Jupyter Notebooks: This tutorial is a Jupyter notebook - a document made of cells. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top.

Let's install and import the required libraries.

#restart kernel after installation
!pip install numpy pandas-profiling jovian --upgrade --quiet