Jovian
⭐️
Sign In

Analyzing K-12 Student Data for School Year 2021

Year 2020 has been a challenging year. On March 13, 2020, all public schools in California were shut down due to COVID-19. All students ended their 2020 school year in a remote learning setting. Early August, all students had to start school year 2021 remotely still. I work in a school district as a Database Manager in Southern California. Our school district is a K-12 school district and have roughly 50,000 students and 50 schools. I work in Information Technology department and my main role is to maintain and support data in any information system that school district is using. Because of school closures since March this year, a lot of changes in the systems were made because students can physically go to schools to get instructions. Because of these changes, we were able to collect some new data points this school year to analyze. It is interesting to see what these new data points can tell us.

As a first step, let's upload our Jupyter notebook to Jovian.ml.

In [1]:
project_name = "analyzing-k12-student-data-2021" # change this
In [2]:
!pip install jovian --upgrade -q
In [3]:
import jovian
In [4]:
jovian.commit(project=project_name)
[jovian] Attempting to save notebook.. [jovian] Please enter your API key ( from https://jovian.ml/ ): API KEY: ········ [jovian] Updating notebook "yinyinw/analyzing-k12-student-data-2021" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/yinyinw/analyzing-k12-student-data-2021

Data Preparation and Cleaning

Data source will be from our SIS (Student Information System). Our SIS is using Microsoft SQL Server as database server so I wrote a T-SQL script to pull student data from database server. Since student data is confidential, I didn't include any sensitive student data. I also replaced unique identifier for each school and student in SIS with some unique ID that I created just for this project.

Here is some information of data file I am using:

  • File name is data-science-project-2021-k12.csv.
  • Only students who are currently enrolled in school district are included in the file.
  • Fields included in the file are below:
    • SchoolType: Traditional or Year Round. (Year Round means these schools have year round schedule. Students are divided into different tracks to accommodate student populations)
    • EnrollmentType: Traditional or Virtual. (In summer, we gave parents opportunity to select learning model they would like to have for their kids once students can return to school. Virtual meaning remotely learning and Traditional meaning students would go back to school physically when condition is allowed.)
    • SchoolCode: Student's enrolled school. This data is created just for this project.
    • Track: Student's primary track.
    • Grade: Student's grade level in school year 2021.
    • GradeLevel: Elementary, Intermediate or High
    • StudentID: Student's unique identifier. This data is created just for this project.
    • HasLaptop: This is created on these conditions: 1. if student answered "Yes" on survey question "Do you have laptop access?" or 2. if student has checked out a laptop from district.
    • HasInternet: This is created on these conditions: 1. if student answered "Yes" on survey question "Do you have internet at home?" or 2. if student has checked out a hot spot from district.
    • LaptopCheckedOut: The student has checked out a laptop from district or not.
    • PCSurveyLaptop: Student's answer on survey question "Do you have laptop access?"
    • MiFiCheckedOut: The student has checked out a hot spot from district or not.
    • PCSurveyInternet: Student's answer on survey question "Do you have internet at home?"
    • PCSurveyInstrProg: Which instruction program (learning model) do you want to have? We assigned students into different learning models based on their response.
    • TotalStudentInHousehold: Total current students in our school district in the same household.
    • IsSED: Is the student in Socio-Economic Disadvantage group?
    • IsSPED: Is the student in Special Education program?
    • SPEDProgram: If the student is in SPED, which SPED program he/she is in?
    • IsEL: Is the student in English Learner program?
    • ELPAC: ELPAC proficiency level. ELPAC is an English Learner exam in California.
    • IsDLI: Is the student in Dual Immersion program? Dual Immersion is a program which offers bi-lingual learning environment.
    • IsGATE: Is the student in GATE program? GATE means Gift And Talented Education.
    • IsGNP: Is the student in GATE Magnet Program?
    • IsFoster: Is the student a foster student?
    • Gender: Student's gender.
    • Ethnicity: Student's ethnicity.
    • ParentHighestEdLevel: Parent's highest education level.
    • DaysEnrolled: Total days enrolled in school year 2021 up to 9/25/20.
    • DaysPresent: Total present days in school year 2021 up to 9/25/20.
    • PresentRate: This is calculated by using DaysPresent / DaysEnrolled.
    • CountT1P_N_IDF: Count "N's" (for grade K-3) or "In Danger of Failing" (for grade 4-6) courses for elementary students for T1 term progress report.
    • CountT1_N_DF: Count of "N's" (for grade K-3) or "D's and F's" (for grade 4-6) courses for elementary students for T1 term marks. Only some students in year round schools have T1 marks so far.
    • CountQ1P_IDF: Count of "In Danger of Failing" courses for secondary (intermediate and high school) students for Q1 term progress report.
    • CountQ1I_DF: Count of "D's and F's" courses for secondary (intermediate and high school) students for Q1 term interim report. Only a few students get this mark.
In [5]:
import numpy as np
In [6]:
import pandas as pd
In [7]:
data_raw_df = pd.read_csv('data-science-project-2021-k12.csv')
In [8]:
data_raw_df
Out[8]:

There are 34 columns and 51,354 rows in this data file. That means there are currently 51,354 students enrolled in our school district.

In [9]:
data_raw_df.columns
Out[9]:
Index(['SchoolType', 'EnrollmentType', 'SchoolCode', 'Track', 'Grade',
       'GradeLevel', 'StudentID', 'HasLaptop', 'HasInternet',
       'LaptopCheckedOut', 'PCSurveyLaptop', 'MiFiCheckedOut',
       'PCSurveyInternet', 'PCSurveyInstrProg', 'TotalStudentInHousehold',
       'IsSED', 'IsSPED', 'SPEDProgram', 'IsEL', 'ELPAC', 'IsDLI', 'IsGATE',
       'IsGNP', 'IsFoster', 'Gender', 'Ethnicity', 'ParentHighestEdLevel',
       'DaysEnrolled', 'DaysPresent', 'PresentRate', 'CountT1P_N_IDF',
       'CountT1_N_DF', 'CountQ1P_IDF', 'CountQ1I_DF'],
      dtype='object')

New data points since school year 2021 are EnrollmentType, HasLaptop, HasInternet, LaptopCheckedOut, PCSurveyLaptop, MiFiCheckedOut, PCSurveyInternet, PCSurveyInstrProg.

In [10]:
data_raw_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 51354 entries, 0 to 51353 Data columns (total 34 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SchoolType 51354 non-null object 1 EnrollmentType 51354 non-null object 2 SchoolCode 51354 non-null int64 3 Track 51354 non-null object 4 Grade 51354 non-null object 5 GradeLevel 51354 non-null object 6 StudentID 51354 non-null int64 7 HasLaptop 51354 non-null object 8 HasInternet 51354 non-null object 9 LaptopCheckedOut 51354 non-null object 10 PCSurveyLaptop 51354 non-null object 11 MiFiCheckedOut 51354 non-null object 12 PCSurveyInternet 51354 non-null object 13 PCSurveyInstrProg 51354 non-null object 14 TotalStudentInHousehold 51354 non-null int64 15 IsSED 51354 non-null object 16 IsSPED 51354 non-null object 17 SPEDProgram 6910 non-null object 18 IsEL 51354 non-null object 19 ELPAC 5672 non-null float64 20 IsDLI 51354 non-null object 21 IsGATE 51354 non-null object 22 IsGNP 51354 non-null object 23 IsFoster 51354 non-null object 24 Gender 51354 non-null object 25 Ethnicity 51354 non-null object 26 ParentHighestEdLevel 51354 non-null object 27 DaysEnrolled 51354 non-null float64 28 DaysPresent 51354 non-null float64 29 PresentRate 51354 non-null float64 30 CountT1P_N_IDF 51354 non-null int64 31 CountT1_N_DF 51354 non-null int64 32 CountQ1P_IDF 51354 non-null int64 33 CountQ1I_DF 51354 non-null int64 dtypes: float64(4), int64(7), object(23) memory usage: 13.3+ MB
In [11]:
data_raw_df.describe()
Out[11]:

It looks like max number of students in a household is 41. Not sure if this is bad data since 41 students in a household seems high. Let's find out.

In [12]:
data_raw_df[data_raw_df.TotalStudentInHousehold == 41]
Out[12]:

Since the data file was created without any confidential student information, there is no way to identify why there is a household with 41 students so I checked SIS directly. It turns out that there are 41 students with no Household ID in SIS (bad data). I need to contact school sites to have them to correct the data and also I need to change their "TotalStudentInHousehold" to 0.

In [13]:
k12_data = data_raw_df.copy()

Always make a copy of raw data before change data.

In [14]:
k12_data.loc[k12_data.TotalStudentInHousehold == 41, 'TotalStudentInHousehold'] = 0
In [15]:
k12_data.describe()
Out[15]:

Now max number of students in a household seems making more sense.

Here are some findings based on these integer columns:

  • Total active student count is 51,354.
  • Average number of students in a household in the school district is 1.74. Max number of students in a household is 7.
  • Average student enrolled days in this school year is 32.78. Max is 57. Because some elementary schools are year-round schools in our school district and their first day of school was 7/6/2020 so it is reasonable students having 57 enrolled days.
In [16]:
k12_data['StudentCount'] = 1

Add a new column "StudentCount" so it is easier to count student in different category later.

In [17]:
import jovian
In [18]:
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "yinyinw/analyzing-k12-student-data-2021" on https://jovian.ml/ [jovian] Uploading notebook.. [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ml/yinyinw/analyzing-k12-student-data-2021

Exploratory Analysis and Visualization

This section is to explore data points in the data file using visualization. There are some existing data points that I list here for people to understand our school district's basic demographic. New data points that we collected in school year 2021 due to COVID-19 school closures are listed here for exploring the possible analysis.

In [19]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Grade Level

Let's check how grade level distribution looks like in our school district.

In [20]:
grade_level = k12_data.Grade.value_counts()
grade_level
Out[20]:
10    4350
09    4331
11    4298
12    4167
08    4084
07    4015
05    3832
06    3822
04    3813
03    3609
02    3552
01    3474
KN    3035
KA     538
PS     303
TR     130
IN       1
Name: Grade, dtype: int64
In [21]:
plt.figure(figsize=(10,6))
plt.title("Student Grade Level in School Year 2021")
sns.barplot(grade_level.index, grade_level);

In general, K-12 student counts are evenly distributed with high and intermediate school students (7-12) slightly more than elementary students (K-12). Some other grade levels that are not K-12 are for some Special Education students that receive our services so they have relatively low counts.

Ethnicity

How about ethnicity distribution? Let's find out.

In [22]:
ethnicity_count = k12_data.Ethnicity.value_counts()
ethnicity_count
Out[22]:
Hispanic                     27536
White                        11953
Asian                         7758
Black or African American     3284
Other                          823
Name: Ethnicity, dtype: int64
In [23]:
plt.figure(figsize=(10,6))
plt.title("Student Ethnicity in School Year 2021")
plt.xticks(rotation=45)
sns.barplot(ethnicity_count.index, ethnicity_count);

Hispanic ethnicity is a clear winner here followed by White and Asian.

In [24]:
plt.figure(figsize=(10,6))
plt.title("Student Ethnicity in School Year 2021")
plt.pie(ethnicity_count, labels=ethnicity_count.index, autopct='%1.1f%%', startangle=180);

Same data but different chart here to show percentage of each ethnicity using pie chart. A little bit more than half of our students are Hispanic.

Virtual vs Traditional Setting

Newly collected in school year 2021

Because of COVID-19, all California schools began school year 2021 in a remote learning (virtual) setting. During summer time, our school district sent out a survey asking parents' preferred learning models (instruction programs) for their students. There are two selections: Virtual and Traditional. Schools schedule students into different enrollment types based on their survey answers. For those parents who didn't answer the survey, default enrollment type is Traditional.

In [25]:
k12_data.EnrollmentType.unique()
Out[25]:
array(['Traditional', 'Virtual'], dtype=object)
In [26]:
virtual_traditional = k12_data.EnrollmentType.value_counts()
virtual_traditional
Out[26]:
Traditional    38678
Virtual        12676
Name: EnrollmentType, dtype: int64
In [27]:
plt.figure(figsize=(10,6))
plt.title("Student Enrollment Type in School Year 2021")
plt.pie(virtual_traditional, labels=virtual_traditional.index, autopct='%1.1f%%', startangle=180);

Based on this pie chart, about a quarter of students are enrolled in Virtual settings in 2021.

Device Checked Out Status

Newly collected in school year 2021

Because of school closures, students can't physically go to school to receive instructions. Students now have to rely on technologies to attend online meetings with teachers or do their homework. In order to support some families who can't afford these technologies, school district has budget to purchase Chromebooks and hot spots for students to check out to use at home. Parents can make requests for Chromebook or/and hot spot on parent portal. Information Technology department gathers these requests and notifies parents to come to district office to pick up devices when inventory allows.

Let's check how many devices have been checked out since March, 2020.

In [28]:
laptop_checked_out = k12_data.LaptopCheckedOut.value_counts()
laptop_checked_out
Out[28]:
No     32752
Yes    18602
Name: LaptopCheckedOut, dtype: int64

Total 18,602 Chromebooks were checked out to students.

In [30]:
mifi_checked_out = k12_data.MiFiCheckedOut.value_counts()
mifi_checked_out
Out[30]:
No     43532
Yes     7822
Name: MiFiCheckedOut, dtype: int64

Total 7,822 hot spots were checked out to families. We only provide one hot spot to each household.

In [31]:
has_laptop = k12_data.HasLaptop.value_counts()
has_laptop
Out[31]:
Yes    42649
No      8705
Name: HasLaptop, dtype: int64
In [32]:
has_internet = k12_data.HasInternet.value_counts()
has_internet
Out[32]:
Yes    45106
No      6248
Name: HasInternet, dtype: int64

I created HasLaptop and HasInternet two fields based on two conditions for "Yes" when I wrote the SQL query to pull data:

  • If student doesn't request for a device.
  • If student has checked out a device from district.

Because we are still waiting for vendors to deliver Chromebooks and hot spots orders, we need to monitor how many students/families still need to check out Chromebooks or/and hot spots to support students' online learning experience.

Based on the numbers, we still have 8,705 students that will need to check out a Chromebook and 6,248 families that will need to check out a hot spot.

In [33]:
plt.figure(figsize=(10,6))
plt.title("Student Has a Laptop to Use?")
plt.pie(has_laptop, labels=has_laptop.index, autopct='%1.1f%%', startangle=180);

Based on this pie chart, there is still 17% of students in our school district currently don't have a laptop to use for attending school remotely.

In [34]:
plt.figure(figsize=(10,6))
plt.title("Student Has internet to Use?")
plt.pie(has_internet, labels=has_internet.index, autopct='%1.1f%%', startangle=180);

And there is 12% of students in our school district currently don't have internet access for attending school remotely.

Student Attendance

One of many ways to track student engagement is thru student attendance. It is proven higher student attendance rate can lead to higher academic performance. Let's see how attendance rate distribution looks like in school year 2021.

In [35]:
plt.figure(figsize=(10, 6))
plt.title("Student Present Rate in School Year 2021")
plt.xlabel('Present Rate')
plt.ylabel('Number of Student')

plt.hist(k12_data.PresentRate, bins=np.arange(0,1,0.1), color='blue');

Here are some findings:

  • Most students have more than 80% Present Rate.
  • Around 200 students with 0% Present Rate. When I pulled data from Student Information System, I found out that there are some classes set up wrong in Student Schedule. The wrong settings actually caused some students having 0% Present Rate, but they actually have much higher Present Rate. Although I contacted SIS team to correct the settings so we won't get bad data again, the changes won't be reflected quickly enough for me to use on this project.
In [36]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..

Asking and Answering Questions

Due to new data points collected in this school year, there are some things I am interested in knowing. Here I listed 5 questions that I have in mind that may be answered using this data set.

Q1: What ethnicity and/or gender group(s) of students prefer to start school year in a Virtual setting?

Because students' enrollment type (Traditional or Virtual) was selected by parents, it is interesting to see if there are certain ethnicity or gender group of students having preference over Traditional or Virtual learning model. I actually had the idea when I obeserved my kids' classes. Both of my kids picked Virtual setting and I feel in both of their classes, they have more Asian classmates than they used to have in previous years. Let's see my "feeling" is correct or not.

In [ ]:
eth_virtual = k12_data[k12_data.EnrollmentType == "Virtual"].groupby(['Ethnicity', 'Gender', 'EnrollmentType'], as_index=False)[['StudentCount']].count()
eth_virtual

I am interested in knowing for both ethnicity and gender group so first I did a group by to count only Virtual students by their ethnicity and gender.

In [ ]:
eth_traditional = k12_data[k12_data.EnrollmentType == "Traditional"].groupby(['Ethnicity', 'Gender', 'EnrollmentType'], as_index=False)[['StudentCount']].count()
eth_traditional

Then I did a group by to count only Traditional students by their ethnicity and gender.

In [ ]:
eth_erollmenttype = eth_virtual.merge(eth_traditional, on=["Ethnicity", "Gender"])
eth_erollmenttype

I merged two data sets together so I can have one column showing Virtual count and another column showing Traditional count for eaiser calculation later.

In [ ]:
eth_erollmenttype['VirtualRate'] = eth_erollmenttype.StudentCount_x / (eth_erollmenttype.StudentCount_x + eth_erollmenttype.StudentCount_y)
eth_erollmenttype

I use count of Virtual students / total count of students to generate a new column called Virtual Rate so I can know how many percents of students in each group picked Virtual learning model.

In [ ]:
pivot_eth = eth_erollmenttype.pivot("Ethnicity", "Gender", "VirtualRate")
pivot_eth

After getting Virtual Rate, I prefer to use Heat Map to show the differences from each group so I have to pivot the data set first. Since count of students in Non-Binary gender group is only 3, I decided to exlude the group from my Heat Map so it won't confuse people.

In [ ]:
plt.title("Q1: What ethnicity and/or gender group(s) of students prefer to start school year in a Virtual setting?")
sns.heatmap(pivot_eth[['Female', 'Male']], annot=True, cmap="Blues");

Here are some findings:

  • I use darker color to represent higher Virtual Rate so you can see Asian student group (no matter female or male) have the highest Virtual Rate. (Does that mean "my guess" was right?)
  • I don't see a huge difference between female and male students in each ethnicity group.
  • White student group has the lowest Virtual Rate, only 16% for females and 15% for males.

Q2: For elementary students, does "access to technology" lead to higher student performance?

The biggest goal since school closure for us, Information Technology department, is to try our best to provide students technologies to attend school remotely. Unfortunately, vendors can't provide enough Chromebooks and hot spots at once for our needs so there are still students who have requested to receive technologies, but haven't be able to receive them. Since first progress report for students are available now. Let's do some analysis to see if students with technologies can help them get better grades.

Since elementar schools and secondary schools (intermediate and high schools) have different terms so I have to separate them.

In [ ]:
k6_data = k12_data[k12_data.GradeLevel == "Elementary"]
k6_data.describe()

First step is to create a data set with only elementary students. There are 25,979 elementary students.

The column I am going to use here to analyze is CountT1P_N_IDF. The column is a count of student's total in danger of failing subjects. Based on the summary here, max number of subjects an elementary student can get is 14. Mean is 0.59.

In [ ]:
sns.scatterplot(k6_data.PresentRate, k6_data.CountT1P_N_IDF, hue=k6_data.HasLaptop, s=100);

First I tried to use Scatter Plot to see if students who don't have access to a laptop can form an obvious group on the chart. I don't see there is an obvious separation between two groups.

In [ ]:
sns.scatterplot(k6_data.PresentRate, k6_data.CountT1P_N_IDF, hue=k6_data.HasInternet, s=100);

Same thing for students who don't have access to Internet. There is no obvious separation. I want to try Barplot and see if it gives the answer that I am looking for.

In [ ]:
plt.title("For elementary students, does ''access to a laptop'' lead to higher student performance?")
sns.barplot('HasLaptop', 'CountT1P_N_IDF', data=k6_data);

This Barplot chart does show "students with access to a laptop" have fewer failing subjects. Variation on "No" is higher than "Yes". I know it doesn't neccessary mean students have better academic performance, but this is the only data point we can use to determine "student success" academically now.

Let's check students with access to internet next.

In [ ]:
plt.title("For elementary students, does ''access to internet'' lead to higher student performance?")
sns.barplot('HasInternet', 'CountT1P_N_IDF', data=k6_data);

The differenct in this chart is bigger than the last one. I believe it shows that "whether students having internet access" is a more influential factor for student performance than "whether students having laptop access".

Q3: For elementary students, do students in Virtual learning model perform better or worse?

In [ ]:
plt.title("Q3: For elementary students, do students in Virtual learning model perform better or worse?")
sns.barplot('EnrollmentType', 'CountT1P_N_IDF', data=k6_data);

In this Barplot chart, it shows Virtual students have more failing subjects than Traditional students. Based on the findings, it shows students are learning better in Traditional setting than Virtual setting. However, since I know there have been multiple problems with the platform we picked for Virutal students to use to learn their course content. It is highly possible to be the reason why Virtual students had a rough start in this school year. I want to wait until I can get more data points regarding to student performance to make a more resonable conclusion.

Q4: For secondary students, does "access to technology" lead to higher student performance?

In [ ]:
secondary_data = k12_data[k12_data.GradeLevel != "Elementary"]
secondary_data.describe()

First step is to create a data set with only secondary students. There are 25,375 secondary students.

The column I am going to use here to analyze is CountQ1P_IDF. The column is a count of student's total in danger of failing subjects. Based on the summary here, max number of subjects an secondary student can get is 8. Mean is 1.42.

In [ ]:
plt.title("For secondary students, does ''access to a laptop'' lead to higher student performance?")
sns.barplot('HasLaptop', 'CountQ1P_IDF', data=secondary_data);

This Barplot chart does show "students with access to a laptop" have fewer failing subjects. I know it doesn't neccessary mean students have better academic performance, but this is the only data point we can use to determine "student success" academically now.

Let's check students with access to internet next.

In [ ]:
plt.title("For secondary students, does ''access to internet'' lead to higher student performance?")
sns.barplot('HasInternet', 'CountQ1P_IDF', data=secondary_data);

Unlike elementary student analysis, secondary students show similar trend for HasLaptop and HasInternet. It shows students having laptop access and having internet access can help student performance greatly.

Q5: For secondary students, do students in Virtual learning model perform better or worse?

In [ ]:
plt.title("Q5: For secondary students, do students in Virtual learning model perform better or worse?")
sns.barplot('EnrollmentType', 'CountQ1P_IDF', data=secondary_data);

In this Barplot chart, it shows Virtual students have more failing subjects than Traditional students. Similar to elementary students, it shows students are learning better in Traditional setting than Virtual setting. For the same reason, I want to wait until I can get more data points regarding to student performance to make a more resonable conclusion.

Q6: What are the laptop checked out rates for each ethnicity group and for SED (Socio-Economic Disadvantage) students?

We provided surveys for families who need to check out laptop from school district since the beginning of the pandemic. It is crucial for students to have access to a laptop to be able to do distance learning. Let's see what student ethnicity group has the highest laptop checked out rate and I am interested to see if SED students have higher checked out rate than non-SED students.

SED means Socio-Economic Disadvange. The criteria defined by California is students who are in Foster or Homeless program, or in Free or Reduced Lunch program or whose parents' highet education level is "Not High School Graduate".

In [ ]:
laptop_yes = k12_data[k12_data.LaptopCheckedOut == "Yes"].groupby(['Ethnicity', 'IsSED', 'LaptopCheckedOut'], as_index=False)[['StudentCount']].count()
laptop_yes

First I did a group by to count only students who have checked out a laptop by their ethnicity and whether they are in SED or not.

In [ ]:
laptop_no = k12_data[k12_data.LaptopCheckedOut == "No"].groupby(['Ethnicity', 'IsSED', 'LaptopCheckedOut'], as_index=False)[['StudentCount']].count()
laptop_no

Then I did a group by to count only students who haven't checked out a laptop by their ethnicity and whether they are in SED or not.

In [ ]:
merge_laptop = laptop_yes.merge(laptop_no, on=["Ethnicity", "IsSED"])
merge_laptop

Merged two data sets.

In [ ]:
merge_laptop['CheckoutLaptopRate'] = merge_laptop.StudentCount_x / (merge_laptop.StudentCount_x + merge_laptop.StudentCount_y)
merge_laptop

Calculated Check out Laptop Rate by using checked out laptop student count / total student count.

In [ ]:
pivot_laptop = merge_laptop.pivot("Ethnicity", "IsSED", "CheckoutLaptopRate")
pivot_laptop

Pivoted the table for Heatmap.

In [ ]:
plt.title("Checkout Laptop Rate by Ethnicity and SED?")
sns.heatmap(pivot_laptop, annot=True, cmap="Blues");

Here are some findings:

  • I use darker color to represent higher Checkout Laptop Rate so you can see Black or African American student group in SED program have the highest Checkout Laptop Rate which is 50%.
  • High difference between SED and Non-SED students in each ethnicity group.
  • There is not a siganificant difference between each ethnicity group for their Checkout Laptop Rate.
In [ ]:
import jovian
In [ ]:
jovian.commit()

Inferences and Conclusion

Here is some summary of this project:

  • Asian student group (no matter female or male) have the highest Virtual Rate.
  • Not a big difference between female and male students in each ethnicity group for their Virtual Rate.
  • White student group has the lowest Virtual Rate, only 16% for females and 15% for males.
  • "Whether students having internet access" is a more influential factor for student performance than "whether students having laptop access".
  • For both elementary students and secondary students, "Students with access to a laptop" have fewer failing subjects.
  • For elementary students, Virtual students have more failing subjects than Traditional students but the result can be caused by multiple problems with the platform we picked for Virutal students to use to learn their course content.
  • Unlike elementary student analysis, secondary students show similar trend for HasLaptop and HasInternet. It shows students having laptop access and having internet access can help student performance greatly.
  • Black or African American student group in SED program have the highest Checkout Laptop Rate which is 50%.
  • High difference between SED and Non-SED students in each ethnicity group. Students in SED have higher Checkout Laptop Rate.
  • Not a big difference between each ethnicity group for their Checkout Laptop Rate.

References and Future Work

Here I listed some discussion points that can potentially be worth while exploring in the future:

  • It would be interesting to know how student's performance is compared to previous years to the student in the same grade level in previous years since this year starts in distance learning setting.
  • As this school year proceeds, we can gather more data points related to students' performance to get more accurate analysis.
  • Our school district is preparing to open schools in a hybrid setting which means we will separte different cohorts to go to school at a different day/time each week to perform social distancing at school sites. More analysis can be done by comparing different cohort or comparing 100% distance learning groups with hybrid setting.

Thank you for your time to check out this project. Hope you can get a better understanding on K-12 student data and how it looks like in this special year 2020 (school year 2021).

In [ ]:
import jovian
In [ ]:
jovian.commit()
In [ ]: