Natural Language Processing using spacy

Introduction of Spacy library
Load english dictionary
Find out stop words
create an nlp object of given document (sentence)
Count frequency of each word using hash values (using count_by(ORTH) and nlp.vocab.strings)
print each word count, using dictionary comprehension
print index of each token
Print various attributes of nlp object (i.e. is_alpha,tok.shape_,is_stop,tok.pos_,tok.tag_) !!!
Stemming (using nltk)
- using PorterStemmer()
- using SnowballStemmer()
Lemmatization
Display tree view of words using displacy using displacy.render()
How to get the meaning of any denoted words by nlp using explain()
How to Find out NER(Named entity Recognition) in given doc
Display Named Entity in doc using displacy.render
Remove stop_words/punctuation using is_stop & is_punct attribute
create a list of words/sentence after removing stop_words then make sentence
Sentence and Word Tokenization
Pipelining:
- Get all the factory pipelining options available
- How to disable preloaded pipeline, that will enahnce the processing time?
- Adding custom pipelines
Reading a file and displaying entity
Chunking
Computing word similarity
n-grams (using nltk and sklearn-CountVectorizer())
- bi-grams
- tri-grams
- n-grams

import spacy as sp
from spacy import displacy # used for data visualization
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.attrs import ORTH # to be used for word count

nlp = sp.load("en_core_web_sm") # ref: https://spacy.io/models/en

Natural Language Processing using spacy

To load english model