Jovian
⭐️
Sign In

Project Whatsapp Message analysis

Write some introduction about your project here: describe the dataset, where you got it from, what you're trying to do with it, and which tools & techniques you're using. You can also mention about the course, and what you've learned from it.

As a first step, let's upload our Jupyter notebook to Jovian.ml.

In [2]:
!pip install jovian --upgrade --quiet
!pip install numpy --upgrade --quiet
!pip install pandas --upgrade --quiet
!pip install matplotlib --upgrade --quiet
!pip install seaborn --upgrade --quiet
In [3]:
project_name = "whatsapp-chat-analysis-course-project-try"
In [4]:
import jovian
In [7]:
jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Please enter your API key ( from https://jovian.ml/ ): API KEY: ········ [jovian] Updating notebook "edsenmichaelcy/whatsapp-chat-analysis-course-project-try" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/edsenmichaelcy/whatsapp-chat-analysis-course-project-try

Data Preparation and Cleaning

In [5]:
import os
import pandas as pd
import re
import datetime as time
import jovian
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [6]:
whatsapp_df = pd.read_fwf('Chat.txt', header = None)

whatsapp_df
Out[6]:
In [7]:
whatsapp_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 23330 entries, 0 to 23329 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 23177 non-null object 1 1 23087 non-null object 2 2 788 non-null object dtypes: object(3) memory usage: 546.9+ KB

After that we will use the info() that provided by the pandas to understand the datatype in the dataframe. As you can see we need to do some clearning such as the date and the Media omitted. (re-explain)

In [8]:
whatsapp_df.shape
Out[8]:
(23330, 3)
In [9]:
def txtTodf(txt_file):
    '''Convert WhatsApp chat log text file to a Pandas dataframe.'''
    
    # some regex to account for messages taking up multiple lines
    pat = re.compile(r'^(\d\d\/\d\d\/\d\d\d\d.*?)(?=^^\d\d\/\d\d\/\d\d\d\d|\Z)', re.S | re.M)
    with open(txt_file) as file:
        data = [m.group(1).strip().replace('\n', ' ') for m in pat.finditer(file.read())]

    user     = []; 
    message  = []; 
    datetime = []
    
    for row in data:

        # timestamp is before the first dash
        datetime.append(row.split(' - ')[0])

        # sender is between am/pm, dash and colon
        try:
            s = re.search('m - (.*?):', row).group(1)
            user.append(s)
        except:
            user.append('')

        # message content is after the first colon
        try:
            message.append(row.split(': ', 1)[1])
        except:
            message.append('')

    df = pd.DataFrame(zip(datetime, user, message), columns=['datetime', 'user', 'message'])
    df['datetime'] = pd.to_datetime(df.datetime, format='%d/%m/%Y, %I:%M %p')

    # remove events not associated with a sender
    df = df[df.user != ''].reset_index(drop=True)
    
    return df

whatsapp_df = txtTodf('Chat.txt')
In [10]:
whatsapp_df.head(20)
Out[10]:
In [11]:
whatsapp_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 22701 entries, 0 to 22700 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 22701 non-null datetime64[ns] 1 user 22701 non-null object 2 message 22701 non-null object dtypes: datetime64[ns](1), object(2) memory usage: 532.2+ KB
In [ ]:
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..

Clearning the image data

In [27]:
plt.imshow(whatsapp_df.img1[0])
plt.show()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-27-2bfed925f583> in <module> ----> 1 plt.imshow(whatsapp_df.img1[0]) 2 plt.show() /srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5134 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5135 return self[name] -> 5136 return object.__getattribute__(self, name) 5137 5138 def __setattr__(self, name: str, value) -> None: AttributeError: 'DataFrame' object has no attribute 'img1'
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: