Learn practical skills, build real-world projects, and advance your career

Natural Language Processing using spacy

  • Introduction of Spacy library
  • Load english dictionary
  • Find out stop words
  • create an nlp object of given document (sentence)
  • Count frequency of each word using hash values (using count_by(ORTH) and nlp.vocab.strings)
  • print each word count, using dictionary comprehension
  • print index of each token
  • Print various attributes of nlp object (i.e. is_alpha,tok.shape_,is_stop,tok.pos_,tok.tag_) !!!
  • Stemming (using nltk)
    • using PorterStemmer()
    • using SnowballStemmer()
  • Lemmatization
  • Display tree view of words using displacy using displacy.render()
  • How to get the meaning of any denoted words by nlp using explain()
  • How to Find out NER(Named entity Recognition) in given doc
  • Display Named Entity in doc using displacy.render
  • Remove stop_words/punctuation using is_stop & is_punct attribute
  • create a list of words/sentence after removing stop_words then make sentence
  • Sentence and Word Tokenization
  • Pipelining:
    • Get all the factory pipelining options available
    • How to disable preloaded pipeline, that will enahnce the processing time?
    • Adding custom pipelines
  • Reading a file and displaying entity
  • Chunking
  • Computing word similarity
  • n-grams (using nltk and sklearn-CountVectorizer())
    • bi-grams
    • tri-grams
    • n-grams
import spacy as sp
from spacy import displacy # used for data visualization
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.attrs import ORTH # to be used for word count
nlp = sp.load("en_core_web_sm") # ref: https://spacy.io/models/en
To load english model