This is my course project for Data Analysis with Python: Zero to Pandas. I have tried to analyze a dataset containing 300000+ Kickstarter projects from 2009-2018. Any feedback would be appreciated. Feel free to interact with code.
The 1970 year is a beginning of so called “unix time”. The timestamps in such format denote seconds that have passed since the 1st January 1970 00:00. This is clearly some problem with the dataset, and in my opinion, you should remove any rows that contain such incorrect data.
Thank you for pointing that out. Did pretty much that after realizing the same. The graphs after the heatmap where year is significant do not include the values counted as 1970. I purposefully didn’t remove the rows in data cleaning part for two reasons.
- I wanted to maintain the flow of the notebook, as in, it was certain that some data is defaulted to “unix time” after plotting the heatmap, thus I wanted the reader to go through that process of factually finding that it was error in the dataset.
- The rows where the
launched_dateis “unix time” still contain correct data for year independent columns like
backersetc. This can still legitimately play a role in analysis where “time” is insignificant. I hope this made sense.