Learn practical skills, build real-world projects, and advance your career

Scraping the Topics of Github

Importing the Libraries for Scraping the Website

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://github.com/topics'
# Creating a new dictionary to store the scraped data inorder to create a pandas a dataframe
topics = {"Topic Name":[],
         "Topic Description":[],
         "Topic URL":[]}

Inspecting the website

  • Before trying to scrape any website, it is inevitable to inspect and understand the DOM(Document Object Module) of that website. It enables us to scrape the needed data effectively. We should check whether the data we are going to scrape is in any div tag or p tag.
  • Scraping a dynamic website is more complicated than a static website. We should perform some steps to scrape a dynamic website.