Learn practical skills, build real-world projects, and advance your career

English Learner (EL) reclassification year prediction using linear regression

I work in a k-12 school district in Southern California as Database Manger. My team maintains Student Information System (SIS) databases and also helps create reports in various areas. English Learner (EL) is a student subgroup that we often focus on analyzing in different reports.

There are several term definitions we will need to understand before we start to look at the dataset:

  • English Leaner (EL) is a program to help students who do not speak, read, write or understand English well as a result of English not being their home language.
  • The English Language Proficiency Assessments for California (ELPAC) is the required state test in California for English language proficiency (ELP) that must be given to students whose primary language is a language other than English.
  • Reclassification is the process whereby a student is reclassified from English learner status to fluent English proficient (RFEP) status.

In a nutshell, in Califonia, if students' primary language are not English. They will be given ELPAC assessment. If test results show they are not fluent English proficient (FEP) yet, students are in English Learner subgroup/program so they can receive more help in campus.

School district's goal is to help all English Learners (EL) to be reclassified in a resonable timeframe to avoid students become Long Term EL (LTEL) which means students are in EL program more than 5 years.

In this project, I am trying to use historical data of EL students and try to predict how long an EL student can get reclassficaiton (RFEP) based on given data points that we used to train this model. Data source type is tabular data and my assumption is that this is a liniear reguression problem. The end goal is to use the model to predict all current EL students and find students that maybe at risk and give them more resources and help.

project_name = "english-learner-reclassfication"
jovian.commit(project=project_name)
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... [jovian] Capturing environment.. [jovian] Committed successfully! https://jovian.ai/yinyinw/english-learner-reclassfication

Step 1: Data Preparation and Cleaning

Data source is from our SIS (Student Information System). Our SIS is using Microsoft SQL Server as database server so I wrote a T-SQL script to pull student data from database server. Since student data is confidential, I didn't include any sensitive student data. Student ID here are replaced with an unique ID that I created just for this project.

I am pulling all students that were reclassifed from English Learner in the past with several data points that I think can be used to train in this model:

  • StudentID: Student's unique identifier. This data is created just for this project.
  • ELStartDate: the student's first day to enter EL program in our school district.
  • StartLevel: first EL level from State Assessment (i.e: ELPAC)
  • ELStartGradeLevel: the student's grade level when they start EL program
  • HomeLang: language spoken at home for the student. This information is collected when they first enrolled.
  • PrimaryLang: priamray language use for the student. This informaiton is collected when they first enrolled and it is used to determine whehter the student needs to take ELPAC or not.
  • IsSED: Is the student in Socio-Economic Disadvantage group?
  • IsSPED: Is the student in Special Education program?
  • IsDLI: Is the student in Dual Immersion program? Dual Immersion is a program which offers bi-lingual learning environment.
  • IsGATE: Is the student in GATE program? GATE means Gift And Talented Education.
  • IsFoster: Is the student a foster student?
  • IsHomeless: Is the student a homeless student?
  • BirthCountry: birth country for the student
  • Gender: Student's gender.
  • Ethnicity: Student's ethnicity.
  • ParentHighestEdLevel: Parent's highest education level.
  • BehaviorIncidents: count of total behavior incidents of this student before the reclassification
  • AbsentRate: this is calculated by "total days absent before reclassification" / "total days enrolled before reclassification"
  • AvgEnglishMark: this is an average grade for all courses with English as subject
  • ReclassDate: this is used to calculate how many years it takes for the student to reclass
  • ReclassGradeLevel: this is the grade level when the student reclassifies
  • ReclassYears: this is the output for the model and will be the forcast for current EL students
import torch
import jovian
import torchvision
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split