Learn practical skills, build real-world projects, and advance your career

Stock Sentiment Analysis and Summarization via Web scrapping

CHECK OUT

source_code_github (Jupyter Notebook and Python file)

source_code_jovian (Jupyter Notebook)

Inspiration for the project:

Stock Market is a huge gamble for some because they are not informed with proper data to make the right decisions. People take a lot of time in deciding which Cafe they would walk into but not spend enough time on the stock they invest. It is because people have less time but this is when AI comes to the rescue. Abstractive summarization and webscrapping seems to aid us more to gather the required information to make the right decisions.


Big thumbs up to Jovian team for introducing the concept of sentiment analysis in their free course assignments which paved the way for the current pipelines of the project.


Thanks to Nicholas Reonette for his work on NLP code which is highly customizable for any NLP project which is the base for the pipeline 2 of my project.

Structure of the project:
  1. Install and Import Dependencies
  2. Summarization models
  • Type1: Summarization Model ------------>(Basic newspaper3k)
  • Type2 Summarization Model ------------>(Financial Summarization Pegasus model)
  1. News and Sentiment Pipeline 1: Finiviz website
  • 3a_1 Web Scrapping from finviz website using the ticker (Output: CSV file)
  • 3a_2 Web Scrapping from finviz website using the ticker_list (Output: CSV file)
  • 3a_3 View the stock as a Data_frame and perform sentiment analysis
  • 3a_4 Cleaning the data in the data frame
  • 3a_5 Sentiment Analysis
  • 3a_6 Scraping Articles
  1. News and Sentiment Pipeline 2: Stock News from Google & any Stock NEWS website
  • 4a_1 Search for Stock News using Google and Yahoo Finance and strip out unwanted URLs
  • 4a_2 Searching and Webscrapping final URLs
  • 4a_3 Summarizing
  • 4a_4 Adding Sentiment Analysis (Using transformers)
  • 4a_5 Export to CSV
Module references:
  1. Webscraping modules:
  • Requests Module:
    The request module is a boon for webscrappers. It helps developers to return back the HTML code of the target webpage.
  • Beautiful soup:
    If you are a web developer you will parreciate BeautifulSoup for it will breakdown the complex HTML page to readable and scrappable soup object.(bs4.BeautifulSoup)
  1. Standard modules:
  • Pandas module:
    It is well know tool in a data developers tool kit to handle large volumes of data and to gain inference or search information via corelation, grouping, sorting and extended data analysis.
  • Numpy module:
    In simple words it makes mathematical operations on data a piece of cake.Application of matrices and array calculation form the heart of this modules. Pandas is built on it as well.
  • Matplotlib:
    Ofcourse people want to see cool visuals and also visuals convey a lot better than texts on the screen. Dont worry matplotlib has got your back.
  1. Sentiment analyser modules:
  • NLTK:
    Thanks to Jovian team for introducing me to the concept of sentiment analysis. It is basically processing text data and infer emotions from them. When it comes to Natural Language Processing NLTK and Hugging face transformers sought of have an edge in the present market.
  • Textblob:
    A light weight sentiment analyser used in the first pipeline of my project.
  • Transformers:
    A sentiment analyzer in transformer's arsenal.
  1. Article summarization:
  • Newspaper3k:
    It is a light abstractive summarization python library that helps you to summarize the given text.
  • financial-summarization-pegasus:
    A deep learning arsenal basically meant for NLP projects. In this project we will be using pegaus financial summarization.

1. Install and Import Dependencies