CITYPOPULATION WEB SCRAPER

By: Rohan Dawar

In this project I will be scraping the website citypopulation.de with BeautifulSoup to create a csv file of populations for sub-national entities

###What is scraping?

  • Web scraping is the process of extracting content and data from a website through it's HTML code.

###What is https://www.citypopulation.de/ ?

  • This website provides up to date data on population and areas for all countries of the world, including territories and subdivisions

###What is Beautiful Soup ?

  • Beautiful Soup (AKA BS4) is a Python library for pulling data out of HTML pages
  • In this project, I will be using BS4 to get the country pages within a continent, as well as parsing the population data from the subdivisions of that country

###What are sub-national entities?

  • Sub-national entities are any administrative or census division within a country such as provinces, states, territories, municipalities, etc.
  • citypopulation.de tries to keep up to date population data for all national and sub-national entities on Earth

###Outline:

  1. Download the webpage using requests
  2. Test Parsing
  3. Pandas Dataframe
  4. Example Analysis
  5. Exporting to CSV
  6. Summary
  7. Future Work
  8. References

Download The Webpage Using Requests

base_url = 'https://www.citypopulation.de'
import requests
from bs4 import BeautifulSoup
# List of continents we want to parse:
continents = ['Africa', 'Asia', 'Europe', 'America', 'Oceania']