Learn practical skills, build real-world projects, and advance your career
Created 4 years ago
Introduction
Spam detection is one of the most baisc applications using Lexical Processing.
In this notebook, we will use the Spam SMS dataset downloaded from
https://www.kaggle.com/uciml/sms-spam-collection-dataset
We will look at Text Cleaning using NLTK. We will use BoW and TF-IDF with NAive Bayes to classify a message as "Spam" and "Ham"
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', None)
Analysis/Modelling
data=pd.read_csv("spam.csv",encoding='latin-1')
data.shape
(5572, 5)
data.head()