Jovian
⭐️
Sign In
In [2]:
import pandas as pd
In [3]:
drinks = pd.read_csv('http://bit.ly/31YGKsh')
In [4]:
drinks.head()
Out[4]:
In [5]:
drinks.beer_servings.mean()
Out[5]:
106.16062176165804

Looking at beer serving by continent

In [6]:
drinks.groupby(['continent']).beer_servings.mean()
Out[6]:
continent
Africa            61.471698
Asia              37.045455
Europe           193.777778
North America    145.434783
Oceania           89.687500
South America    175.083333
Name: beer_servings, dtype: float64

Filter the dataframe by africa

In [7]:
drinks[(drinks.continent == 'Africa')].beer_servings.mean()
Out[7]:
61.471698113207545
In [8]:
drinks[(drinks.continent == 'Asia')].beer_servings.mean()
Out[8]:
37.04545454545455
In [9]:
drinks[(drinks.continent == 'Europe')].beer_servings.mean()
Out[9]:
193.77777777777777
In [10]:
drinks.groupby(['continent']).beer_servings.agg(['count', 'min', 'max', 'mean', 'median','std','cumsum'])
Out[10]:
In [11]:
drinks.groupby('continent').agg(['count', 'min', 'max', 'mean', 'median','std','cumsum'])
Out[11]:
In [12]:
%matplotlib inline
drinks.groupby('continent').mean().plot(kind = 'bar')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0xafd8ebe388>
Notebook Image




  • Map
  • Apply
  • Apply Map
In [13]:
train = pd.read_csv('http://bit.ly/kaggletrain')
In [14]:
train.head()
Out[14]:
In [15]:
train['sex_num'] = train.Sex.map({'male':0, 'female':1})
In [16]:
train.loc[0:4, ['Sex','sex_num']]
Out[16]:
In [17]:
train.sample()
Out[17]:
In [18]:
train['Name_Length'] = train.Name.apply(len)
In [19]:
train.head()
Out[19]:
In [21]:
train.loc[0:4,['Name','Name_Length']]
Out[21]:
In [22]:
import numpy as np
In [24]:
train['Fare_Ceil']= train.Fare.apply(np.ceil)
In [27]:
train['Fare_Floor']= train.Fare.apply(np.floor)
In [26]:
train.loc[0:4,['Fare','Fare_Ceil']]
Out[26]:
In [28]:
train.loc[0:4,['Fare','Fare_Floor']]
Out[28]:
In [29]:
train.loc[0:4,['Name','Name_Length']]
train.Name.str.split(',')[1]
Out[29]:
['Cumings', ' Mrs. John Bradley (Florence Briggs Thayer)']
Find it using pandas lambda .apply()
In [32]:
def get_Element(mylist, position):
    return mylist[position]
In [38]:
train.Name.str.split(',').apply(get_Element, position = 1).head()
Out[38]:
0                                Mr. Owen Harris
1     Mrs. John Bradley (Florence Briggs Thayer)
2                                    Miss. Laina
3             Mrs. Jacques Heath (Lily May Peel)
4                              Mr. William Henry
Name: Name, dtype: object
In [31]:
train.Name.str.split(',').apply(lambda x: x[1])
Out[31]:
0                                  Mr. Owen Harris
1       Mrs. John Bradley (Florence Briggs Thayer)
2                                      Miss. Laina
3               Mrs. Jacques Heath (Lily May Peel)
4                                Mr. William Henry
                          ...                     
886                                    Rev. Juozas
887                           Miss. Margaret Edith
888                 Miss. Catherine Helen "Carrie"
889                                Mr. Karl Howell
890                                    Mr. Patrick
Name: Name, Length: 891, dtype: object
.apply() as a dataframe method
In [40]:
drinks = pd.read_csv('http://bit.ly/31YGKsh')
drinks.head()
Out[40]:
In [44]:
drinks.loc[:,'beer_servings':'wine_servings'].apply(max, axis= 0)
Out[44]:
beer_servings      376
spirit_servings    438
wine_servings      370
dtype: int64
In [45]:
drinks.loc[:,'beer_servings':'wine_servings'].apply(max, axis= 1)
Out[45]:
0        0
1      132
2       25
3      312
4      217
      ... 
188    333
189    111
190      6
191     32
192     64
Length: 193, dtype: int64
In [48]:
drinks.loc[:,'beer_servings':'wine_servings'].apply(np.argmax, axis= 1)
Out[48]:
0      0
1      1
2      0
3      2
4      0
      ..
188    0
189    0
190    0
191    0
192    0
Length: 193, dtype: int64

applymap() is a dataframe method which is applied to every cell of a dataframe

In [49]:
drinks.loc[:,'beer_servings':'wine_servings'].applymap(float)
Out[49]:
> And Some more .apply() function
In [2]:
import pandas as pd
In [3]:
flight = pd.read_csv('http://bit.ly/flight_stuff')
In [4]:
flight.head()
Out[4]:

Goal is to convert the col:duration to dur_mins

In [5]:
d = flight.duration[0]
d
Out[5]:
'2h 5m'
In [6]:
type(d)
Out[6]:
str
In [7]:
int(d.split(' ')[0].replace('h',''))*60 + int(d.split(' ')[1].replace('m',''))
Out[7]:
125
In [18]:
int(d.split()[0].replace('h',''))*60 + int(d.split()[1].replace('m',''))
Out[18]:
125
In [13]:
def dur(d):
    return int(d.split()[0].replace('h',''))*60 + int(d.split()[1].replace('m',''))
In [16]:
flight['dur_mins'] = flight.duration.apply(dur)
In [17]:
flight
Out[17]:
In [ ]: