Analyzing Tabular Data using Python and Pandas
Part 7 of "Data Analysis with Python: Zero to Pandas"
This tutorial is the seventh in a series on introduction to programming and data analysis using the Python language. These tutorials take a practical coding-based approach, and the best way to learn the material is to execute the code and experiment with the examples. Check out the full series here:
Reading a CSV file using Pandas
Pandas is typically used for working in tabular data (simliar to the data stored in a spreadsheet). Pandas provides helper functions to read data from various file formates like CSV, Excel spreadsheets, HTML tables, JSON, SQL and more. Let's download a file italy-covid-daywise.txt
which contains daywise Covid-19 data for Italy in the following format:
date,new_cases,new_deaths,new_tests
2020-04-21,2256.0,454.0,28095.0
2020-04-22,2729.0,534.0,44248.0
2020-04-23,3370.0,437.0,37083.0
2020-04-24,2646.0,464.0,95273.0
2020-04-25,3021.0,420.0,38676.0
2020-04-26,2357.0,415.0,24113.0
2020-04-27,2324.0,260.0,26678.0
2020-04-28,1739.0,333.0,37554.0
...
This format of storing data is known as comma separated values or CSV.
CSVs: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)
We'll download this file using the urlretrieve
function from the urllib.request
module.
from urllib.request import urlretrieve
urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/italy-covid-daywise.csv',
'italy-covid-daywise.csv')
('italy-covid-daywise.csv', <http.client.HTTPMessage at 0x2d688f740c8>)
To read the file, we can use the read_csv
method from Pandas. Let's being by importing the Pandas library. It is typically imported with the alias pd
.