Capstone Project The Battle of Neighborhoods : Toronto vs New York City
This project is part of IBM Data Science Professional Certificate and aims to reveal some insights for the people who wants to immigrate to Toronto or New York City.
Problem
Say, You are planning immigration to either Toronto, Canada or New York City, USA for better life prospects like education, earnings, security and many more. You live in India and love your neighbourhood mainly beacause of all the great amenities and venues that exist in your neighbourhood such as fast food centres, pharmacies, schools, markets, shopping malls, parks, hospitals and so on. You want to know, which city among Toronto and New york city will be economical and similar to your current neighborhood.
Introduction
Objective
This project is targeted to stakeholders interested in immigrating to Toronto, Canada and New York, USA; especially from India as lot's of Indians have been immigrated there in past years and it aims to provide a bird view of variation of housing prices as neighborhood and also the available venues and amenities.
Libraries used
We will use
pandas
: for data analysisnumpy
: to handle data in a vectorized mannerjson
: to handle JSON filesrequests
: to handle requestsBeautifulSoup
: for web scrappinggeopy
: to convert an address into latitude and longitude values and vice versafolium
: map rendering libraryMatplotlib
: graph ploting librarysklearn
: for kmeans algorithmuszipcode
: for New York Housing Price dataopendatasets
: for downloading housing data of Toronto from kaggletqdm
: TQDM is a progress bar library with good support for nested loops and Jupyter/IPython notebookswarnings
: to remove warnings from outline cell of Jypyter notebook
Disclaimer for running this notebook on your platforms
- You need a Kaggle account.
- If you decided to run this notebook, start the auto-pilot for execution of notebook then layback,put your earpods on and enjoy the excution for 30 minutes.
Background
Why People Immigrate?
Referring to the economic, political, and social influences, people migrate from or to specific countries. Immigrants are motivated to leave their former countries of citizenship, or habitual residence, for a variety of reasons, including: a lack of local access to resources, a desire for economic prosperity, to find or engage in paid work, to better their standard of living, family reunification, retirement, climate or environmentally induced migration, exile, escape from prejudice, conflict or natural disaster, or simply the wish to change one's quality of life.
Toronto
-
What is Toronto ?
Toronto is the capital of the Canadian province of Ontario and the most populous city in Canada, the fourth most populous city in North America. Its current area of 630.2 km2 (243.3 sq mi). Read more here
-
Immigration from India to canada
India is the number one source country for immigrants coming from overseas to Canada and The highest concentrations of Indian Canadians are found in the provinces of Ontario; followed by the China and the Philippines in 2019.In the five years that ended in 2019, immigration from India, skyrocketed, growing by almost 117.6% from 39,340 in 2015 to 85,590.
-
Explore more about
immigration from different countries to Canada
here andToronto's Neighborhood
here
New York City(NYC)
-
What is New York City?
New York City, often simply called New York, is the most populous city distributed over about 784 km2 (302.6 sq mi) in the United States. Located at the southern tip of the State of New York, the city is the center of the New York metropolitan area, the largest metropolitan area in the world by urban area.Read more here
-
Immigration from India to New York City
As of 2014-18, the U.S. cities with the largest number of Indians were the greater New York, Chicago, San Francisco, and San Jose metropolitan areas. These four metro areas accounted for about 30 percent of Indians in the United States.Read more here
-
Explore the
Neighborhood of New York City
here
Data
- We will need the location data such as postal codes, boroughs, latitude and longitude, housing price data of the neighborhoods of respective cities..
1.Location Data
About data
- For New York City, all the data is available in .csv format on ibm.
- For Torornto, we will web srape the wikipedia page to grab postal codes, boroughs and neighborhoods and then will merged them with a dataset containing respective latitudes and longitudes.
- It will be used for segmenting the neighborhoods based on the venues and plot maps for rest of the data.
- Foursquare REST API will be used to grab the venues in the neighborhood
Sources
- New York City(JSON) : https://cocl.us/new_york_dataset
- Toronto's Postal Code, Borough and Neighborhoods data : https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
- Toronto's Geospatial data : https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv
2.Housing Data
About data
- For New York city, we will use web scrapping to scrape the zip codes from city-data and then use the uszipcode library to grab the housing data for the respective zipcodes. Read More here
- For Toronto, the housing dataset is hosted on Kaggle
- For fair comparison, the prices will be converted to same unit and will be maintained within limit.
Source
- New York City : http://www.city-data.com/zipmaps/New-York-New-York.html
- Toronto : https://www.kaggle.com/mnabaee/ontarioproperties
How these data will be used ?
Factors considered
- Venues and amenties available in the neighborhoods of each city.
- Housing Price data of each city.
Solution
- We will cluster the neighbourhoods of each city using K Means algorithm in 3 clusters based on Housing Price as Low Budget, Medium Budget and High Budget and based on venues in 10 clusters.
- Maps will be used for visualization, to know the cluster number of the markers do click on them. Feel free to zoom in and out the maps.