Learn practical skills, build real-world projects, and advance your career

Scraping the data of world's top 500 companies by market capitalization

image

All the data scientists/data analysts doing research or analysis need some Data the begin their research with.Now the question that comes to our mind is "How the data is being prepared, from where we can get this much data, and How to extract the exact data that we need?

The answer is,

There are sevaral ways to collect this data and one of them is Web Scraping, web scraping: "Web scraping is a technique used to extract structured data from websites containing unstructured data through an automated process.".

For web scraping some tools are available there and Python comes to your rescue by provinding you with some great libraries which help to make your our work easier,these libraries help to automate our work.

These libraries are:requests, lxml, BeautifulSoup, scrapy, selinium.

Here in this notebook I've used requests and BeautifulSoup libraries to scrape the data from the website value.today

Value.Today is a software analytics company which provides World's Top Corporate Companies Information, Corporate Companies Information, Financial Data of Company and World Financial News.

The dataset that I've created in this notebook will be beneficial for the comparative study of these 500 companies. You can easily study about the Rise and Falls of these companies in from each 6 months since 2020.

Here is an outline for the full journey of Web-Scraping

  • First of all we need to import all usefull libraries which we arte going to use in this entire notebook
  • Parsing all the html data by using requests and BeautifulSoup libraries
  • Start finding all the usefull tags and data inside those tags
  • After collecting all the data we than create a dataframe using pandas library, and also make a CSV file to save all the data
Note
  • All website generally does not allow to scrape their data, some wbsites allow to use their data for reaserch and study purpose only.

  • Some websites provides their data in CSV format allowing you to scrape their data with REST API

  • Before scraping any website, we should look for a terms and conditions page to see if there are explicit rules about scraping. If there are, we should follow them. If there are not, then it becomes more of a judgement call.

  • You can run the cells by clikicking the Run button available in the toolbar, or you can just press Shift + Enter to run the cell

!pip install jovian --upgrade --quiet
import jovian