Learn practical skills, build real-world projects, and advance your career

Data Analysis on Google Play Store Applications

Introduction

The dataset belongs to Google playstore Applications. The dataset is from kaggle (link: https://www.kaggle.com/lava18/google-play-store-apps). The data is in the form of .csv and has data from 2010 to 2018.The dataset has various columns like app name, category,Rating,Reviews,Price and so on. Based on the columns available we will be able to find total number of apps installed,average installations per year,number of apps which are the most recently updated,sorting the dataset based on different columns at different points, etc.. Another most important part of the project is to plot the data and visualize using different graphs and indentify different outcomes from it using matplotlib and seaborn libraries. We will also use numpy and pandas especially for data cleaning so as to have a well polished and a perfect dataset. The knowledge gained to work on the dataset is from Jovians Zerotopandas course the course where we started with the basics of python and had a very good insight of numpy and pandas and deep dived to analysis of data and plotting of data using dataframes.

Write some introduction about your project here: describe the dataset, where you got it from, what you're trying to do with it, and which tools & techniques you're using. You can also mention about the course, and what you've learned from it.

As a first step, let's upload our Jupyter notebook to Jovian.ml.

project_name = "data-analysis-on-googleplaystore-app"

Data Preparation and Cleaning

let us import pandas and numpy libraries to clean the dataframe. For example in the below dataframe 'app_df' in Installs column we need to remove the + symbol so that we can apply required operations on the column. There are numerous fileds we need to update. Lets discuss in detail while we under go data cleaning process.
The pandas library offers various functions for the data analysis aspirants to work with. pandas will allow us to clean the dataframe and update as required. Some common functions are pandas.sort('column name'),pandas.unique(),pandas.info(),etc.,
The numpy library is used to perform operations on the dataframe where we can extract columns to arrays,lists, dictionaries, perform mathematical operations on the dataframe,etc.,

The links of pandas and numpy official sites are given below for references purposes and to explore and learn more.
Pandas: https://jovian.ml/outlink?url=https%3A%2F%2Fpandas.pydata.org%2Fdocs%2Fuser_guide%2Findex.html
numpy: https://numpy.org/doc/stable/index.html

Lets now import pandas and numpy libraries.

import pandas as pd
import numpy as np