Learn practical skills, build real-world projects, and advance your career

Capstone Project The Battle of Neighborhoods : Toronto vs New York City


This project is part of IBM Data Science Professional Certificate and aims to reveal some insights for the people who wants to immigrate to Toronto or New York City.

alt

Problem


Say, You are planning immigration to either Toronto, Canada or New York City, USA for better life prospects like education, earnings, security and many more. You live in India and love your neighbourhood mainly beacause of all the great amenities and venues that exist in your neighbourhood such as fast food centres, pharmacies, schools, markets, shopping malls, parks, hospitals and so on. You want to know, which city among Toronto and New york city will be economical and similar to your current neighborhood.

Introduction


Objective

This project is targeted to stakeholders interested in immigrating to Toronto, Canada and New York, USA; especially from India as lot's of Indians have been immigrated there in past years and it aims to provide a bird view of variation of housing prices as neighborhood and also the available venues and amenities.

Libraries used

We will use

  • pandas : for data analysis
  • numpy : to handle data in a vectorized manner
  • json : to handle JSON files
  • requests : to handle requests
  • BeautifulSoup : for web scrapping
  • geopy : to convert an address into latitude and longitude values and vice versa
  • folium : map rendering library
  • Matplotlib : graph ploting library
  • sklearn : for kmeans algorithm
  • uszipcode : for New York Housing Price data
  • opendatasets : for downloading housing data of Toronto from kaggle
  • tqdm : TQDM is a progress bar library with good support for nested loops and Jupyter/IPython notebooks
  • warnings : to remove warnings from outline cell of Jypyter notebook

Disclaimer for running this notebook on your platforms

  • You need a Kaggle account.
  • If you decided to run this notebook, start the auto-pilot for execution of notebook then layback,put your earpods on and enjoy the excution for 30 minutes.

Background


Why People Immigrate?

Referring to the economic, political, and social influences, people migrate from or to specific countries. Immigrants are motivated to leave their former countries of citizenship, or habitual residence, for a variety of reasons, including: a lack of local access to resources, a desire for economic prosperity, to find or engage in paid work, to better their standard of living, family reunification, retirement, climate or environmentally induced migration, exile, escape from prejudice, conflict or natural disaster, or simply the wish to change one's quality of life.

Toronto

New York City(NYC)

  • What is New York City?

    New York City, often simply called New York, is the most populous city distributed over about 784 km2 (302.6 sq mi) in the United States. Located at the southern tip of the State of New York, the city is the center of the New York metropolitan area, the largest metropolitan area in the world by urban area.Read more here

  • Immigration from India to New York City

    As of 2014-18, the U.S. cities with the largest number of Indians were the greater New York, Chicago, San Francisco, and San Jose metropolitan areas. These four metro areas accounted for about 30 percent of Indians in the United States.Read more here

  • Explore the Neighborhood of New York City here

Data


  • We will need the location data such as postal codes, boroughs, latitude and longitude, housing price data of the neighborhoods of respective cities..

1.Location Data

  • About data
  • For New York City, all the data is available in .csv format on ibm.
  • For Torornto, we will web srape the wikipedia page to grab postal codes, boroughs and neighborhoods and then will merged them with a dataset containing respective latitudes and longitudes.
  • It will be used for segmenting the neighborhoods based on the venues and plot maps for rest of the data.
  • Foursquare REST API will be used to grab the venues in the neighborhood
  • Sources

2.Housing Data

  • About data
  • For New York city, we will use web scrapping to scrape the zip codes from city-data and then use the uszipcode library to grab the housing data for the respective zipcodes. Read More here
  • For Toronto, the housing dataset is hosted on Kaggle
  • For fair comparison, the prices will be converted to same unit and will be maintained within limit.
  • Source

How these data will be used ?

  • Factors considered
  • Venues and amenties available in the neighborhoods of each city.
  • Housing Price data of each city.
  • Solution
  • We will cluster the neighbourhoods of each city using K Means algorithm in 3 clusters based on Housing Price as Low Budget, Medium Budget and High Budget and based on venues in 10 clusters.
  • Maps will be used for visualization, to know the cluster number of the markers do click on them. Feel free to zoom in and out the maps.