Customersentimentclassification - Notebook by anjana248 (anjana248)

Learn practical skills, build real-world projects, and advance your career

Updated 4 years ago

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Text Precprocessing Steps:

Inspired from another notebook

Only few of these are required in this context for topic modelling

Lower casing
Removal of Punctuations
Removal od stop words
Removal of frequent words (taken care of by NMF)
Removal of Rare words (taken care of by NMF)
Stemming/ Lemmatization
Removal of emojis
Removal of emoticons
Conversion of emoticons to words
Removal of URLs
Removal of HTML tags
Chat words conversion
Spelling correction

import pandas as pd

raw_data = pd.read_csv("../input/customer-support-on-twitter/twcs/twcs.csv")

sample_data = pd.read_csv("../input/customer-support-on-twitter/sample.csv")