Learn practical skills, build real-world projects, and advance your career

A Comprehensive Data Analysis on a WhatsApp Group Chat

Author: Tushar Nankani

Overview

  • Introduction
  • Data Retrieval & Preprocessing
  • Exploratory Data Analysis
  • Data Visualization
  • Data Interpretation
  • Summarizing the Inferences
  • Conclusion

Introduction:

Whatsapp has quickly become the world’s most popular text and voice messaging application. Specializing in cross-platform messaging with over 1.5 billion monthly active users, this makes it the most popular mobile messenger app worldwide.

  • I thought of various projects on which I could analyse data like - Air Quality Index or The cliched Covid-19 Data Analysis.

  • But I thought why not do Data Analysis on a WhatsApp group chat of college students and find out interesting insights about who is most active, who are ghosts (the ones who do not reply), my sleep schedule, the most used emoji, the sentiment score of each person, who swears the most, the most actives times of the day, or does the group use phones during college teaching hours?

  • These would be some interesting insights for sure, more for me than for you, since the people in this chat are people I know personally.

Beginning. How do I export my conversations? From Where To Obtain Data?

  • The first step is Data Retrieval & Preprocessing, that is to gather the data. WhatsApp allows you to export your chats through a .txt format.

  • Go to the respective chat, which you want to export!

alt
  • Tap on options, click on More, and Export Chat.
alt
  • I will be Exporting Without Media.
NOTE:
  • Without media: exports about **40k messages **
  • With media: exports about 10k messages along with pictures/videos
  • While exporting data, avoid including media files because if the number of media files is greater than certain figure then not all the media files are exported.
alt

Opening this .txt file up, you get messages in a format that looks like this:

alt

Importing Necessary Libraries

We will be using :

  1. Regex (re) to extract and manipulate strings based on specific patterns.
  2. pandas for analysis.
  3. matlotlib and seaborn for visualization.
  4. emoji to deal with emojis.
  5. wordcloud for the most used words.