Hello, folks. I hope you’re doing great.
For the Course Project, I decided to analyze Football (soccer) matches from 1872 to 2020. I’m trying to answer two specific questions: if being home team have some effect on winning (people often said the fans are the 13th player) and if altitude (as in elevation, in meters above the sea level) does play a role or not when two teams compete agains each other (say Bolivia vs Uruguay). Anyway…
For the Elevation question, I’m using the dataset ‘results.cvs’ (footbal matches) and ‘global_data’ (cities and elevation) and trying to merge both dataframes using .merge using ‘city’. Ideally, I would like to extract the altitude of a city from global_data and asign it to the same city in results_df.
Thing is when I merge both DF, I end up having MORE rows than originally had.
Results_df, +41.000 records = date, home_team, away_team, home_score, away_score, tournament city, country, neutral.
City_data_df, +13.000 records = city, continent, elevation
Merged_df = 56.000 records (WHAAAAT)
Even if i drop rows based on NaNs in the date colum, I end up having almos 4.000 more records than I should.
Can you look it up my notebook and tell me what I’m doing wrong? I’ve googled and stackoverflowed but I can’t find the reason why this is happening.
Thanks in advance and sorry for the clickbaity title