Learn practical skills, build real-world projects, and advance your career

Natural Language Processing

  • Introduction of NLP
  • Understanding NLP based python libraries and its installation
    • nltk
    • spacy
  • Text data wrangling using "re" without using any NLP libraries, to understand what actaully happens in the background
    when libs are used
    • Remove unwanted characters
    • Remove numbers, spaces
  • Creating document (aka: list of sentences)
  • Creating paragraphs/sentence tokens from large text
  • Creating word tokens
  • Getting frequency distribution of each word and visualize them,
    • see the impact of followings:
      • Without making text as lowercase
      • After making text as lowercase
  • Compare Number of stop words between NLTK-->SPACY-->SKLEARN and select the best one
  • Finding Stop words and removing them again visualize the frequency distribution
import nltk as nl
import re
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.corpus import stopwords #stopwords.words("english")
from spacy.lang.en.stop_words import STOP_WORDS # list of stopwords
from sklearn.feature_extraction import stop_words #stop_words.ENGLISH_STOP_WORD
# For FreqDistribution: nl.FreqDist(<list of words>)
a = """Sir Isaac Newton PRS (25 December 1642 – 20 March 1726/27[a]) was an English mathematician, physicist, astronomer, theologian, and author (described in his own day as a "natural philosopher") who is widely recognised as one of the most influential scientists of all time, and a key figure in the scientific revolution. 
His book Philosophiæ Naturalis Principia Mathematica (Mathematical Principles of Natural Philosophy), first published in 1687, laid the foundations of classical mechanics. Newton also made seminal contributions to optics, and shares credit with Gottfried Wilhelm Leibniz for developing the infinitesimal calculus.

In Principia, sir Newton formulated the laws of motion and universal gravitation that formed the dominant scientific viewpoint until it was superseded by the theory of relativity. 
Newton used his mathematical description of gravity to prove Kepler's laws of planetary motion, account for tides, the trajectories of comets, the precession of the equinoxes and other phenomena, eradicating doubt about the Solar System's heliocentricity. 
He demonstrated that the motion of objects on Earth and celestial bodies could be accounted for by the same principles. Newton's inference that the Earth is an oblate spheroid was later confirmed by the geodetic measurements of Maupertuis, La Condamine, and others, convincing most European scientists of the superiority of Newtonian mechanics over earlier systems."""
print(a)
Sir Isaac Newton PRS (25 December 1642 – 20 March 1726/27[a]) was an English mathematician, physicist, astronomer, theologian, and author (described in his own day as a "natural philosopher") who is widely recognised as one of the most influential scientists of all time, and a key figure in the scientific revolution. His book Philosophiæ Naturalis Principia Mathematica (Mathematical Principles of Natural Philosophy), first published in 1687, laid the foundations of classical mechanics. Newton also made seminal contributions to optics, and shares credit with Gottfried Wilhelm Leibniz for developing the infinitesimal calculus. In Principia, sir Newton formulated the laws of motion and universal gravitation that formed the dominant scientific viewpoint until it was superseded by the theory of relativity. Newton used his mathematical description of gravity to prove Kepler's laws of planetary motion, account for tides, the trajectories of comets, the precession of the equinoxes and other phenomena, eradicating doubt about the Solar System's heliocentricity. He demonstrated that the motion of objects on Earth and celestial bodies could be accounted for by the same principles. Newton's inference that the Earth is an oblate spheroid was later confirmed by the geodetic measurements of Maupertuis, La Condamine, and others, convincing most European scientists of the superiority of Newtonian mechanics over earlier systems.