Updated 2 years ago
Text Classification with Bag of Words - Natural Language Processing
"Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data." - Wikipedia
Bag of Words: The bag-of-words (BOW) model is a representation that turns arbitrary text into fixed-length vectors by counting how many times each word appears.
Outline:
- Download and explore a real-world dataset
- Apply text preprocessing techniques
- Implement the bag of words model
- Train ML models for text classification
- Make predictions and submit to Kaggle
Dataset: https://www.kaggle.com/c/quora-insincere-questions-classification
Download and Explore the Data
Outline:
- Download the dataset from Kaggle to Colab
- Explore the data using Pandas
- Create a small working sample
Download the Data to Colab
Upload your kaggle.json
to Colab. Get it here: https://www.kaggle.com/docs/api#authentication
!ls .
kaggle.json sample_data
import os