# ravimishragit/statistics-coding-challenge

2 years ago

Fill valid code/values in place of blanks.

In [16]:
``````# import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
``````
In [2]:
``````scores = [29,27,14,23,29,10]

# find the mean of all items of the list 'scores'
np.mean(scores)
``````
Out[2]:
``22.0``
In [3]:
``````# find the median of all items of the list 'scores'
np.median(scores)
``````
Out[3]:
``25.0``
In [4]:
``````from statistics import mode

fruits = ['apple', 'grapes', 'orange', 'apple']

# find mode of the list 'fruits'
mode(fruits)
``````
Out[4]:
``'apple'``
In [5]:
``````from random import sample
data = sample(range(1,100),50) # generating a list 50 random integers

# find variance of data
np.var(data)
``````
Out[5]:
``797.3684``
In [6]:
``````# find standard deviation
np.std(data)
``````
Out[6]:
``28.23771237193268``

In [7]:
``````# read data_python.csv using pandas
In [8]:
``````# print first few rows of mydata
``````
Out[8]:
In [17]:
``````# plot histogram for 'Item_Outlet_Sales'

plt.hist(mydata['Item_Outlet_Sales'])
plt.show()
``````
In [18]:
``````# increadse no. of bins to 20
plt.hist(mydata['Item_Outlet_Sales'], bins=20)
plt.show()
``````
In [19]:
``````# find mean and median of 'Item_MRP'
np.mean(mydata['Item_MRP']), np.median(mydata['Item_MRP'])``````
Out[19]:
``(140.9927819781768, 143.0128)``
In [ ]:
``````# find mode of 'Outlet_Size'
mydata['Outlet_Size'].____``````
In [21]:
``````# frequency table of 'Outlet_Type'
mydata['Outlet_Type'].mode()``````
Out[21]:
``````0    Supermarket Type1
dtype: object``````
In [22]:
``````# mean of 'Item_Outlet_Sales' for 'Supermarket Type2' outlet type
np.mean(mydata['Item_Outlet_Sales'][mydata['Outlet_Type'] == 'Supermarket Type2'])``````
Out[22]:
``1995.4987392241392``
In [23]:
``````# mean of 'Item_Outlet_Sales' for 'Supermarket Type3' outlet type
np.mean(mydata['Item_Outlet_Sales'][mydata['Outlet_Type'] == 'Supermarket Type3'])``````
Out[23]:
``3694.038557647059``
In [24]:
``````# 2 sample independent t-test
from scipy import stats
stats.ttest_ind(mydata['Item_Outlet_Sales'][mydata['Outlet_Type'] == 'Supermarket Type2'], mydata['Item_Outlet_Sales'][mydata['Outlet_Type'] == 'Supermarket Type3'])``````
Out[24]:
``Ttest_indResult(statistic=-20.442923116350805, pvalue=5.856140005446105e-84)``
In [ ]:
`` ``