Lecture 4: Analyzing Tabular Data with Pandas

What do you mean by give over?

I mean to give over to the output, at the moment the output of the mean function looks like this:

print(temp_gr_df[['temp']].mean())

Output:

      temp
year
2000    10
2001    11
2002    12
2003    13
2004    14
2005    15
2006    16
2007    17
2008    18
2009    19

I would like an output like this, to access the month information in graph or further functions.

      temp month
year
2000    10     1
2001    11     1
2002    12     1
2003    13     1
2004    14     1
2005    15     1
2006    16     1
2007    17     1
2008    18     1
2009    19     1

I still don’t understand.

If you group by year, the mean is calculated for a whole year, there are no months.

If you want to calculate the mean by year AND month, you have to group by both of them at once.

I create a group with all temp values of month 1 for every year. Then I calc the average temp of every year (average of month 1 for every year).

I just wanna add an extra column, month or year, or for example a merge from other df’s, merged by year, to the output so that it looks like:

      temp month
year
2000    10     1
2001    11     1
2002    12     1
2003    13     1
2004    14     1
2005    15     1
2006    16     1
2007    17     1
2008    18     1
2009    19     1

or

      temp     year
year
2000    10     2000
2001    11     2001
2002    12     2002
2003    13     2003
2004    14     2004
2005    15     2005
2006    16     2006
2007    17     2007
2008    18     2008
2009    19     2009

The question is actually, how can I add extra columns to this output.

OK so, to add columns:

Use df['month'] = 1

Not sure if that has any meaning, because you calculate mean temperature of single month. If you have single temperature per month, then calculating mean doesn’t make sense (you would need a temperature recorded day by day for this to work).

You might also use reset_index() to remove the year from index, or pass an argument as_index=False to groupby function.

1 Like

…or pass an argument as_index=False to groupby function.

That was the magic parameter what I was looking for :slight_smile:

Output:

   year  temp
0  2000    10
1  2001    11
2  2002    12
3  2003    13
4  2004    14
5  2005    15
6  2006    16
7  2007    17
8  2008    18
9  2009    19

Thx for the help.

This is pandas-practice-assignment.

1 Like

ImportError: matplotlib is required for plotting when the default backend "matplotlib" is selected.
If this is the error you are getting, install matplotlib using pip install matplotlib.

2 Likes

The code I write gets committed and thats how I am able to complete the assignments. But if I upload a file to the binder, it goes missing after close the binder. Even the version of the my notebook on numpyarrays, that you have forked had the same problem. In the latest versions, I have workarounded that by writing the entire contents of the csv file as a string so as to create the file everytime the notebook is run

In Lecture 4, when talking about working with dates in Pandas there is this line of code

covid_df[covid_df.month == 1 ][['new_cases', 'new_deaths', 'new_tests']].sum()

This returns the sum of these metrics for the month of may.

Can anyone help me work out how to write a function that would iterate through each month in the dataframe and return the sums of the metrics for each month?

If you don’t want to give me the answer, can you help me work out what I could look for to find out for myself?

Thanks

You can use two nested for loops.

Please have a look …Covid India analysis using pandas

1 Like

Here’s the code for extracting country-wise data for Covid-19 using Pandas:

Hi, How to deal with multiple case entries for a specific date.
For eg: This is the code
cases_df = covid_cases_uk_df[covid_cases_uk_df.newCasesBySpecimenDate>0]
cases_df.sort_values(‘date’, ascending=True).head(20)

date newCasesBySpecimenDate
254 2020-01-30 2
248 2020-02-05 1
245 2020-02-08 4
244 2020-02-09 1
242 2020-02-11 1
232 2020-02-21 1
230 2020-02-23 1
229 2020-02-24 2
228 2020-02-25 5
227 2020-02-26 4
226 2020-02-27 7
974 2020-02-27 1
747 2020-02-28 1
225 2020-02-28 11
224 2020-02-29 5

from urllib.request import urlretrieve

urlretrieve('https://hub.jovian.ai/wp-content/uploads/2020/09/italy-covid-daywise.csv', 
            'italy-covid-daywise.csv')`

@aakashns hi, I’m facing the below error when trying to get the csv files from Jovian URL using the above lines. Tried googling and the impression I’m getting is an issue around common trusted certificates… but I am able to download the csv file smoothly by going to the URL directly on my browser. Is this related to URL issue or just a setup issue from my end.


SSLCertVerificationError Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1349 try:
-> 1350 h.request(req.get_method(), req.selector, req.data, headers,
1351 encode_chunked=req.has_header(‘Transfer-encoding’))

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in request(self, method, url, body, headers, encode_chunked)
1239 “”“Send a complete request to the server.”""
-> 1240 self._send_request(method, url, body, headers, encode_chunked)
1241

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1285 body = _encode(body, ‘body’)
-> 1286 self.endheaders(body, encode_chunked=encode_chunked)
1287

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in endheaders(self, message_body, encode_chunked)
1234 raise CannotSendHeader()
-> 1235 self._send_output(message_body, encode_chunked=encode_chunked)
1236

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in _send_output(self, message_body, encode_chunked)
1005 del self._buffer[:]
-> 1006 self.send(msg)
1007

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in send(self, data)
945 if self.auto_open:
–> 946 self.connect()
947 else:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in connect(self)
1408
-> 1409 self.sock = self._context.wrap_socket(self.sock,
1410 server_hostname=server_hostname)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
499 # ctx._wrap_socket()
–> 500 return self.sslsocket_class._create(
501 sock=sock,

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
1039 raise ValueError(“do_handshake_on_connect should not be specified for non-blocking sockets”)
-> 1040 self.do_handshake()
1041 except (OSError, ValueError):

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py in do_handshake(self, block)
1308 self.settimeout(None)
-> 1309 self._sslobj.do_handshake()
1310 finally:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)
in
----> 1 urlretrieve(‘https://hub.jovian.ai/wp-content/uploads/2020/09/italy-covid-daywise.csv’,
2 ‘italy-covid-daywise.csv’)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in urlretrieve(url, filename, reporthook, data)
245 url_type, path = _splittype(url)
246
–> 247 with contextlib.closing(urlopen(url, data)) as fp:
248 headers = fp.info()
249

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
–> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
523
524 sys.audit(‘urllib.Request’, req.full_url, req.data, req.headers, req.get_method())
–> 525 response = self._open(req, data)
526
527 # post-process response

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in _open(self, req, data)
540
541 protocol = req.type
–> 542 result = self._call_chain(self.handle_open, protocol, protocol +
543 ‘_open’, req)
544 if result:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
500 for handler in handlers:
501 func = getattr(handler, meth_name)
–> 502 result = func(*args)
503 if result is not None:
504 return result

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in https_open(self, req)
1391
1392 def https_open(self, req):
-> 1393 return self.do_open(http.client.HTTPSConnection, req,
1394 context=self._context, check_hostname=self._check_hostname)
1395

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1351 encode_chunked=req.has_header(‘Transfer-encoding’))
1352 except OSError as err: # timeout error
-> 1353 raise URLError(err)
1354 r = h.getresponse()
1355 except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108)>

Hello,

I quickly tried your code to download the dataset - I tried both in Jupyter Lab and a new Jovian notebook.

The code worked on my end:

from urllib.request import urlretrieve 
urlretrieve('https://hub.jovian.ai/wp-content/uploads/2020/09/italy-covid-daywise.csv', 'italy-covid-daywise.csv')

I get the following result:

There is a thread on SO on this issue. You might want to check it out.