Learn data science and machine learning by building real-world projects on Jovian

Top Engineering Colleges in India Based on NIRF Ranking

Today I selected a dataset which lists about Top Engineering Colleges in India based on NIRF Rankings. This dataset has below data:

  • Institute ID
  • Name of Institute
  • 5 Parameters for Ranking (TLR, RPC, GO, OI, PR)
  • City
  • State
  • Rank

The NIRF rankings are given to the universities based on 5 parameter -

  1. Teaching, Learning & Resources (TLR)

  2. Research and Professional Practice (RPC)

  3. Graduation Outcomes (GO)

  4. Outreach and Inclusivity (OI)

  5. Peer Perception (PR)

Each parameter is rated out of 100.

You can find the evaluation document here https://www.nirfindia.org/nirfpdfcdn/2020/framework/Engineering.pdf

Acknowledgements All the copyrights are owned by NIRF, MHRD, India. https://www.nirfindia.org/Home

I got this dataset from Kaggle. Link for the dataset https://www.kaggle.com/nehaprabhavalkar/indian-universities-rankings-2020

System Requirement

List of all Python Modules needed for Data Analysis

  • Pandas

  • Matplotlib

  • Seaborn

  • Numpy

In [1]:
project_name = "engineering-college-ranking" # change this
In [2]:
!pip install jovian --upgrade -q
In [3]:
!pip install numpy pandas matplotlib seaborn --upgrade --quiet
In [4]:
import jovian
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE1OTkyODk1MDMsIm5iZiI6MTU5OTI4OTUwMywianRpIjoiNzBiZDdhYzgtOGIyMC00ODcwLThmZTMtMzdhMDljODdmODdkIiwiZXhwIjoxNjAzMTc3NTAzLCJpZGVudGl0eSI6eyJpZCI6NjAxNiwidXNlcm5hbWUiOiJjaGFtcHJha3UifSwiZnJlc2giOmZhbHNlLCJ0eXBlIjoiYWNjZXNzIn0.PNbWVYGI4rC8WstzYXYMDR1SZuMVXCZvP4xTiJOMLT4

In [ ]:
jovian.commit(project=project_name)
[jovian] Attempting to save notebook..

Retreiving Data and Cleaning

In [6]:
from urllib.request import urlretrieve

urlretrieve('https://gist.githubusercontent.com/champraku/880fae1919cf6e8a3f47ebf4afb5b46d/raw/f8070953f42d44023a5335d86831b638d5038bfc/engineering.csv', 'engineering.csv')
Out[6]:
('engineering.csv', <http.client.HTTPMessage at 0x7f674b6d9340>)
In [7]:
eng_clg_df = pd.read_csv('engineering.csv')
In [8]:
shape = eng_clg_df.shape
shape
Out[8]:
(200, 10)
In [9]:
print("This Dataset has {0} Rows and {1} Columns". format(shape[0], shape[1]))
This Dataset has 200 Rows and 10 Columns
In [10]:
print(eng_clg_df.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 200 entries, 0 to 199 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 institute_id 200 non-null object 1 name 200 non-null object 2 tlr 200 non-null float64 3 rpc 200 non-null float64 4 go 200 non-null float64 5 oi 200 non-null float64 6 perception 200 non-null float64 7 city 200 non-null object 8 state 200 non-null object 9 rank 200 non-null int64 dtypes: float64(5), int64(1), object(4) memory usage: 15.8+ KB None

This info method displays the each column and what type of data it holds.

In [11]:
eng_clg_df
Out[11]:

This will simply prints out 5 rows from both top and bottom.

In [12]:
eng_clg_df.head(10)
Out[12]:
In [13]:
eng_clg_df.columns
Out[13]:
Index(['institute_id', 'name', 'tlr', 'rpc', 'go', 'oi', 'perception', 'city',
       'state', 'rank'],
      dtype='object')

Since it has 10 columns and I'm going to remove the column named as "institute_id". Because it makes no use for this data analysis.

In [14]:
import jovian
In [15]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Exploratory Analysis and Visualization

By using Pandas Profiling, we can able to generate automated visualization based on the dataset.

In [16]:
# By using Profile Reporting
# !pip install pandas_profiling --quiet
# from pandas_profiling import ProfileReport
In [17]:
# ProfileReport(eng_clg_df)

Teaching, Learning & Resources (TLR)

  • Student Strength including Doctoral Students (SS)

  • Faculty-student ratio with emphasis on permanent faculty (FSR)

  • Combined metric for Faculty with PhD (or equivalent) and Experience (FQE)

  • Financial Resources and their Utilisation (FRU)

Research and Professional Practice (RP)

  • Combined metric for Publications (PU)

  • Combined metric for Quality of Publications (QP)

  • IPR and Patents: Published and Granted (IPR)

  • Footprint of Projects and Professional Practice (FPPP)

Graduation Outcomes (GO)

  • Metric for University Examinations (GUE)

  • Metric for Number of Ph.D. Students Graduated (GPHD)

Outreach and Inclusivity (OI)

  • Percentage of Students from Other States/Countries (Region Diversity RD)

  • Percentage of Women (Women Diversity WD)

  • Economically and Socially Challenged Students (ESCS)

  • Facilities for Physically Challenged Students (PCS)

Perception (PR) Ranking

  • Peer Perception

  • Academic Peers and Employers (PR)

Check this link for more information https://www.nirfindia.org/Documents

In [18]:
tlr_df = eng_clg_df.sort_values('tlr', ascending=False).head(10)
tlr_df
Out[18]:
In [19]:
# clg_state_df = eng_clg_df.groupby('STATE').count()
In [20]:
eng_clg_df['state'].unique()
Out[20]:
array(['Tamil Nadu', 'Delhi', 'Maharashtra', 'Uttar Pradesh',
       'West Bengal', 'Uttarakhand', 'Assam', 'Telangana',
       'Madhya Pradesh', 'Jharkhand', 'Karnataka', 'Odisha', 'Kerala',
       'Gujarat', 'Punjab', 'Bihar', 'Rajasthan', 'Himachal Pradesh',
       'Haryana', 'Andhra Pradesh', 'Meghalaya', 'Chhattisgarh',
       'Chandigarh', 'Tripura', 'Goa', 'Jammu and Kashmir', 'Pondicherry',
       'Arunachal Pradesh', 'Manipur'], dtype=object)
In [21]:
state_count_df = eng_clg_df['state'].value_counts()
state_count_df
Out[21]:
Tamil Nadu           34
Maharashtra          22
Karnataka            21
Telangana            15
Uttar Pradesh        11
Andhra Pradesh       10
Punjab                8
West Bengal           8
Odisha                7
Gujarat               7
Delhi                 7
Kerala                6
Haryana               6
Madhya Pradesh        5
Rajasthan             4
Himachal Pradesh      4
Uttarakhand           4
Jharkhand             4
Assam                 3
Chandigarh            2
Pondicherry           2
Arunachal Pradesh     2
Bihar                 2
Jammu and Kashmir     1
Manipur               1
Tripura               1
Chhattisgarh          1
Goa                   1
Meghalaya             1
Name: state, dtype: int64
In [22]:
plt.xlabel('State')
plt.ylabel('Total Number of College from Each State')

plt.title("States Count")

state_count_df.plot(kind='bar');
Notebook Image

This implies that Tamil Nadu has 34 colleges in the Top 200 List, followed by Maharastra with 22 colleges and then Karnataka with 21 colleges.

Now I'm going to create new column called 'overall_score' and this column stores the result of addition of all the 5 scores.

In [23]:
eng_clg_df['overall_score'] = eng_clg_df.tlr + eng_clg_df.rpc + eng_clg_df.oi + eng_clg_df.go + eng_clg_df.perception
eng_clg_df
Out[23]:
In [24]:
overall_score_df = eng_clg_df.sort_values('overall_score', ascending=False).head(30)
overall_score_df
Out[24]:
In [25]:
state_count = overall_score_df['state'].value_counts()
state_count
Out[25]:
Tamil Nadu          4
Kerala              3
West Bengal         3
Telangana           2
Maharashtra         2
Odisha              2
Delhi               2
Uttar Pradesh       2
Rajasthan           1
Jharkhand           1
Assam               1
Bihar               1
Madhya Pradesh      1
Karnataka           1
Uttarakhand         1
Himachal Pradesh    1
Gujarat             1
Punjab              1
Name: state, dtype: int64
In [26]:
plt.xlabel('State')
plt.ylabel('No of College in Top 30')

plt.title("Total Number of Colleges from Each State in Top 30")
state_count.plot(kind='bar');
Notebook Image

Above graph implies that, again based on overall score Tamil Nadu tops the list with 4 colleges in top 30 list followed by West Bengal, Kerala.

In [27]:
import jovian
In [28]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Which state has more colleges in the top 200 list?

Here I'm only going to show for Top 10 states. Since, I already plotted graph for this question. Now I'm going to try with matplotlib and seaborn library.

In [29]:
state_count_df = eng_clg_df['state'].value_counts().head(10)
state_count_df
Out[29]:
Tamil Nadu        34
Maharashtra       22
Karnataka         21
Telangana         15
Uttar Pradesh     11
Andhra Pradesh    10
Punjab             8
West Bengal        8
Odisha             7
Gujarat            7
Name: state, dtype: int64
In [30]:
sns.set_style("darkgrid")
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (20, 5)
plt.rcParams['figure.facecolor'] = '#00000000'
In [31]:
plt.xlabel('State')
plt.ylabel('No of colleges')

plt.title("Total Number of Colleges from Each State")

plt.plot(state_count_df, 's--r')

Out[31]:
[<matplotlib.lines.Line2D at 0x7f6747ccaa00>]
Notebook Image
In [32]:
import jovian
In [33]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Top 10 colleges with good TLR, RPC, GO scores out of 100?

In [34]:
tlr_scores_df = eng_clg_df.sort_values('tlr', ascending=False).head(10)
tlr_scores_df
Out[34]:
In [35]:
rpc_scores_df = eng_clg_df.sort_values('rpc', ascending=False).head(10)
rpc_scores_df
Out[35]:
In [36]:
major_score_df = eng_clg_df[(eng_clg_df.tlr >= 50) & (eng_clg_df.go >= 50) & (eng_clg_df.rpc >= 50)][['name', 'tlr', 'rpc', 'go', 'city', 'state']]
major_score_df.shape
major_score_df
Out[36]:
In [37]:
sns.set_style("darkgrid")
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (20, 5)
plt.rcParams['figure.facecolor'] = '#00000000'
In [38]:
plt.xticks(rotation=90)

plt.xlabel('College Name')
plt.ylabel('Score (Out of 100)')

plt.title("Top 10 Colleges Based on TLR Score")

plt.bar(major_score_df.name, major_score_df.tlr)
Out[38]:
<BarContainer object of 18 artists>
Notebook Image

Out of 200 colleges, only 18 colleges has 50+ marks in terms of TLR, RPC, and GO.

In [39]:
import jovian
In [40]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Top 10 colleges with good GO score out of 100?

Good Graduation Outcomes (GO) Scores tells that how good college placements in the past 3 years and their median salary. Also it checks the total number of Ph.D students graduated from the college till now.

In [41]:
go_scores_df = eng_clg_df.sort_values('go', ascending=False).head(15)
go_scores_df
Out[41]:
In [42]:
sns.set_style("whitegrid")
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (20, 5)
plt.rcParams['figure.facecolor'] = '#00000000'
In [43]:
plt.xticks(rotation=90)

plt.xlabel('College Name')
plt.ylabel('Score (Out of 100)')

plt.title("Top 15 Colleges Based on GO Score")

plt.bar(go_scores_df.name, go_scores_df.go)
Out[43]:
<BarContainer object of 15 artists>
Notebook Image

Here Most of the IIT's tops the list with good GO score. Since they offer good education as well as other extra-circular activites alongside the education.

In [44]:
import jovian
In [45]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Top 15 colleges based on consolidated score out of 500

In [46]:
top_overall_df = eng_clg_df.head(15)
top_overall_df
Out[46]:
In [47]:
sns.pairplot(top_overall_df)
Out[47]:
<seaborn.axisgrid.PairGrid at 0x7f6747f571c0>
Notebook Image
In [48]:
sns.set_style("whitegrid")
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (20, 5)
plt.rcParams['figure.facecolor'] = '#00000000'
In [49]:
plt.xticks(rotation=90)

plt.xlabel('College Name')
plt.ylabel('Score (Out of 500)')

plt.title("Top 15 Colleges Based on Overall Score")

plt.bar(top_overall_df.name, top_overall_df.overall_score)
Out[49]:
<BarContainer object of 15 artists>
Notebook Image

Used pairplot. But we can't able to infer from this plot.

In [50]:
import jovian
In [51]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Top and Bottom 10 colleges

In [52]:
top_clg_df = eng_clg_df.head(10)
top_clg_df
Out[52]:
In [53]:
bottom_clg_df = eng_clg_df.tail(10)
bottom_clg_df
Out[53]:
In [54]:
plt.xticks(rotation=90)

plt.xlabel('College Name')
plt.ylabel('Score (Out of 500)')

plt.title("Top 10 Colleges Based on Overall Score")

plt.plot(top_clg_df.name, top_clg_df.overall_score, 'x--b')
Out[54]:
[<matplotlib.lines.Line2D at 0x7f67197125b0>]
Notebook Image
In [55]:
plt.xticks(rotation=90)

plt.xlabel('College Name')
plt.ylabel('Score (Out of 500)')

plt.title("Bottom 10 Colleges Based on Overall Score")

plt.plot(bottom_clg_df.name, bottom_clg_df.overall_score, 'x--b')
Out[55]:
[<matplotlib.lines.Line2D at 0x7f67447dc3d0>]
Notebook Image
In [56]:
import jovian
In [57]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "champraku/engineering-college-ranking" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/champraku/engineering-college-ranking

Inferences and Conclusion

Thus based on the analysis, it infering that Tamilnadu has good quailty education on comparing to other states.

In [58]:
import jovian
In [ ]:
jovian.commit()

References and Future Work

Since This is my 1st data analysis activity, I myself came to understand I need to do lot of practice. While working I struggled a lot. But anyhow I'm happy that I did good work.

Thanks FreeCodeCamp.org and Jovian.ml for giving me the opportunity to work in entirely new field. At the end I'm happy that I learned some new area. I will work to improve this dataset analysis.

Thanks a lot.

Analyze open data sets using pandas in a Python notebook: https://medium.com/ibm-data-science-experience/analyze-open-data-sets-using-pandas-in-a-python-notebook-64e93776370a

Indian Universities Rankings [2020] Rankings of Indian Universities in the fields of Engineering, Medical, Law, etc.: https://www.kaggle.com/nehaprabhavalkar/indian-universities-rankings-2020

Complete Guide to Data Visualization with Python: https://towardsdatascience.com/complete-guide-to-data-visualization-with-python-2dd74df12b5e

In [ ]:
import jovian
In [ ]:
jovian.commit()