Capstone Project The Battle of Neighborhoods : Toronto vs New York City

This project is part of IBM Data Science Professional Certificate and aims to reveal some insights for the people who wants to immigrate to Toronto or New York City.

Problem

Say, You are planning immigration to either Toronto, Canada or New York City, USA for better life prospects like education, earnings, security and many more. You live in India and love your neighbourhood mainly beacause of all the great amenities and venues that exist in your neighbourhood such as fast food centres, pharmacies, schools, markets, shopping malls, parks, hospitals and so on. You want to know, which city among Toronto and New york city will be economical and similar to your current neighborhood.

Introduction

`Objective`

This project is targeted to stakeholders interested in immigrating to Toronto, Canada and New York, USA; especially from India as lot's of Indians have been immigrated there in past years and it aims to provide a bird view of variation of housing prices as neighborhood and also the available venues and amenities.

`Libraries used`

We will use

pandas : for data analysis

numpy : to handle data in a vectorized manner

json : to handle JSON files

requests : to handle requests

BeautifulSoup : for web scrapping

geopy : to convert an address into latitude and longitude values and vice versa

folium : map rendering library

Matplotlib : graph ploting library

sklearn : for kmeans algorithm

uszipcode : for New York Housing Price data

opendatasets : for downloading housing data of Toronto from kaggle

tqdm : TQDM is a progress bar library with good support for nested loops and Jupyter/IPython notebooks

warnings : to remove warnings from outline cell of Jypyter notebook

`Disclaimer for running this notebook on your platforms`

You need a Kaggle account.

If you decided to run this notebook, start the auto-pilot for execution of notebook then layback,put your earpods on and enjoy the excution for 30 minutes.

Background

Why People Immigrate?

Referring to the economic, political, and social influences, people migrate from or to specific countries. Immigrants are motivated to leave their former countries of citizenship, or habitual residence, for a variety of reasons, including: a lack of local access to resources, a desire for economic prosperity, to find or engage in paid work, to better their standard of living, family reunification, retirement, climate or environmentally induced migration, exile, escape from prejudice, conflict or natural disaster, or simply the wish to change one's quality of life.

`Toronto`

What is Toronto ?

Toronto is the capital of the Canadian province of Ontario and the most populous city in Canada, the fourth most populous city in North America. Its current area of 630.2 km2 (243.3 sq mi). Read more here
Immigration from India to canada

India is the number one source country for immigrants coming from overseas to Canada and The highest concentrations of Indian Canadians are found in the provinces of Ontario; followed by the China and the Philippines in 2019.In the five years that ended in 2019, immigration from India, skyrocketed, growing by almost 117.6% from 39,340 in 2015 to 85,590.
Explore more about immigration from different countries to Canada here and Toronto's Neighborhood here

`New York City(NYC)`

What is New York City?

New York City, often simply called New York, is the most populous city distributed over about 784 km2 (302.6 sq mi) in the United States. Located at the southern tip of the State of New York, the city is the center of the New York metropolitan area, the largest metropolitan area in the world by urban area.Read more here
Immigration from India to New York City

As of 2014-18, the U.S. cities with the largest number of Indians were the greater New York, Chicago, San Francisco, and San Jose metropolitan areas. These four metro areas accounted for about 30 percent of Indians in the United States.Read more here
Explore the Neighborhood of New York City here

Data

We will need the location data such as postal codes, boroughs, latitude and longitude, housing price data of the neighborhoods of respective cities..

`1.Location Data`

About data

For New York City, all the data is available in .csv format on ibm.

For Torornto, we will web srape the wikipedia page to grab postal codes, boroughs and neighborhoods and then will merged them with a dataset containing respective latitudes and longitudes.

It will be used for segmenting the neighborhoods based on the venues and plot maps for rest of the data.

Foursquare REST API will be used to grab the venues in the neighborhood

Sources

New York City(JSON) : https://cocl.us/new_york_dataset

Toronto's Postal Code, Borough and Neighborhoods data : https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Toronto's Geospatial data : https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv

`2.Housing Data`

About data

For New York city, we will use web scrapping to scrape the zip codes from city-data and then use the uszipcode library to grab the housing data for the respective zipcodes. Read More here

For Toronto, the housing dataset is hosted on Kaggle

For fair comparison, the prices will be converted to same unit and will be maintained within limit.

Source

New York City : http://www.city-data.com/zipmaps/New-York-New-York.html

Toronto : https://www.kaggle.com/mnabaee/ontarioproperties

How these data will be used ?

Factors considered

Venues and amenties available in the neighborhoods of each city.

Housing Price data of each city.

Solution

We will cluster the neighbourhoods of each city using K Means algorithm in 3 clusters based on Housing Price as Low Budget, Medium Budget and High Budget and based on venues in 10 clusters.

Maps will be used for visualization, to know the cluster number of the markers do click on them. Feel free to zoom in and out the maps.