Building A Spam Detector 17279 - Notebook by Aiswarya Srinivas (aiswaryasrinivas)

Learn practical skills, build real-world projects, and advance your career

Created 4 years ago

Introduction

Spam detection is one of the most baisc applications using Lexical Processing.
In this notebook, we will use the Spam SMS dataset downloaded from

https://www.kaggle.com/uciml/sms-spam-collection-dataset

We will look at Text Cleaning using NLTK. We will use BoW and TF-IDF with NAive Bayes to classify a message as "Spam" and "Ham"


import numpy as np 
import pandas as pd 


import os


import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', None)

Analysis/Modelling

data=pd.read_csv("spam.csv",encoding='latin-1')
data.shape

(5572, 5)

data.head()