Learn practical skills, build real-world projects, and advance your career

Working with CATEGORICAL data in Pandas:

Pandas is the defacto toolbox for Python data scientists to ease the data analysis process. You can use it before you start analyzing, to collect, explore, and format the data. Pandas makes these steps a breeze via its numerous I/O and handy data manipulation functions.

During Data analysis, you must have encountered non-numeric datatypes mainly dates and some repetitive values called categorical datatypes.

In this tutorial, you’ll learn the common tricks to handle 'categorical' type of data and preprocess it to build machine learning models with them.

Prerequisites:

  • You should be familiar with Pandas data structures. Here's a notebook by Aakash NS for pandas from the very start.
  • You can find help with running this notebook online using this helper notebook.

Loading Datasets into colab:

I have used opendatsets, which is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.

#restart kernel after installation
!pip install opendatasets pandas-profiling --upgrade --quiet