Learn practical skills, build real-world projects, and advance your career
Updated 4 years ago
Natural Language Processing using spacy
- Introduction of Spacy library
- Load english dictionary
- Find out stop words
- create an nlp object of given document (sentence)
- Count frequency of each word using hash values (using count_by(ORTH) and nlp.vocab.strings)
- print each word count, using dictionary comprehension
- print index of each token
- Print various attributes of nlp object (i.e. is_alpha,tok.shape_,is_stop,tok.pos_,tok.tag_) !!!
- Stemming (using nltk)
- using PorterStemmer()
- using SnowballStemmer()
- Lemmatization
- Display tree view of words using displacy using displacy.render()
- How to get the meaning of any denoted words by nlp using explain(
) - How to Find out NER(Named entity Recognition) in given doc
- Display Named Entity in doc using displacy.render
- Remove stop_words/punctuation using is_stop & is_punct attribute
- create a list of words/sentence after removing stop_words then make sentence
- Sentence and Word Tokenization
- Pipelining:
- Get all the factory pipelining options available
- How to disable preloaded pipeline, that will enahnce the processing time?
- Adding custom pipelines
- Reading a file and displaying entity
- Chunking
- Computing word similarity
- n-grams (using nltk and sklearn-CountVectorizer())
- bi-grams
- tri-grams
- n-grams
import spacy as sp
from spacy import displacy # used for data visualization
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.attrs import ORTH # to be used for word count
nlp = sp.load("en_core_web_sm") # ref: https://spacy.io/models/en