Learn practical skills, build real-world projects, and advance your career

Analyzing Tabular Data using Python and Pandas

Part 7 of "Data Analysis with Python: Zero to Pandas"

This tutorial is the seventh in a series on introduction to programming and data analysis using the Python language. These tutorials take a practical coding-based approach, and the best way to learn the material is to execute the code and experiment with the examples. Check out the full series here:

  1. First Steps with Python and Jupyter
  2. A Quick Tour of Variables and Data Types
  3. Branching using Conditional Statements and Loops
  4. Writing Reusable Code Using Functions
  5. Reading from and Writing to Files
  6. Numerical Computing with Python and Numpy
  7. Analyzing Tabular Data using Pandas

Reading a CSV file using Pandas

Pandas is typically used for working in tabular data (simliar to the data stored in a spreadsheet). Pandas provides helper functions to read data from various file formates like CSV, Excel spreadsheets, HTML tables, JSON, SQL and more. Let's download a file italy-covid-daywise.txt which contains daywise Covid-19 data for Italy in the following format:

date,new_cases,new_deaths,new_tests
2020-04-21,2256.0,454.0,28095.0
2020-04-22,2729.0,534.0,44248.0
2020-04-23,3370.0,437.0,37083.0
2020-04-24,2646.0,464.0,95273.0
2020-04-25,3021.0,420.0,38676.0
2020-04-26,2357.0,415.0,24113.0
2020-04-27,2324.0,260.0,26678.0
2020-04-28,1739.0,333.0,37554.0
...

This format of storing data is known as comma separated values or CSV.

CSVs: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)

We'll download this file using the urlretrieve function from the urllib.request module.

from urllib.request import urlretrieve
urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/italy-covid-daywise.csv', 
            'italy-covid-daywise.csv')
('italy-covid-daywise.csv', <http.client.HTTPMessage at 0x2d688f740c8>)

To read the file, we can use the read_csv method from Pandas. Let's being by importing the Pandas library. It is typically imported with the alias pd.