Learn practical skills, build real-world projects, and advance your career

Introduction

Spam detection is one of the most baisc applications using Lexical Processing.
In this notebook, we will use the Spam SMS dataset downloaded from

https://www.kaggle.com/uciml/sms-spam-collection-dataset

We will look at Text Cleaning using NLTK. We will use BoW and TF-IDF with NAive Bayes to classify a message as "Spam" and "Ham"


import numpy as np 
import pandas as pd 


import os


import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', None)  

Analysis/Modelling

data=pd.read_csv("spam.csv",encoding='latin-1')
data.shape
(5572, 5)
data.head()