Learn practical skills, build real-world projects, and advance your career
 

Top Repositories for GitHub Topics

Pick a website and describe your objective

  • Browse through different sites and pick on to scrape. Check the "Project Ideas" section for inspiration.
  • Identify the information you'd like to scrape from the site. Decide the format of the output CSV file.
  • Summarize your project idea and outline your strategy in a Jupyter notebook, creating a Project Outline. Use the "New" button above to open a blank Jupyter notebook.

Project Outline:

  • We're going to scrape https://github.com/topics
  • We'll get a list of topics. For each topic, we'll get the topic title, topic page URL, and topic description
  • For each topic, we'll get the top 25 repositories in the topic from the topic page
  • For each repository, we'll grab the repo name, username, stars, and repo URL
  • For each topic, we'll create a CSV file in the following format:
Repo Name,Username,Stars,Repo URL
three.js,mrdoob,69700,https://github.com/mrdoob/three.js
libgdx,libgdx,18300,https://github.com/libgdx/libgdx

Use the requests library to download web pages

  • Inspect the website's HTML source code and identify the right URLs to download.
  • Download and save web pages locally using the requests library.
  • Create a function to automate downloading for different topics/search queries.