Learn practical skills, build real-world projects, and advance your career

Keras API Project Exercise

The Data

We will be using a subset of the LendingClub DataSet obtained from Kaggle: https://www.kaggle.com/wordsforthewise/lending-club

NOTE: Do not download the full zip from the link! We provide a special version of this file that has some extra feature engineering for you to do. You won't be able to follow along with the original file!

LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California.[3] It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.

Our Goal

Given historical data on loans given out with information on whether or not the borrower defaulted (charge-off), can we build a model thatcan predict wether or nor a borrower will pay back their loan? This way in the future when we get a new potential customer we can assess whether or not they are likely to pay back the loan. Keep in mind classification metrics when evaluating the performance of your model!

The "loan_status" column contains our label.

Data Overview



There are many LendingClub data sets on Kaggle. Here is the information on this particular data set:

LoanStatNewDescription
0loan_amntThe listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
1termThe number of payments on the loan. Values are in months and can be either 36 or 60.
2int_rateInterest Rate on the loan
3installmentThe monthly payment owed by the borrower if the loan originates.
4gradeLC assigned loan grade
5sub_gradeLC assigned loan subgrade
6emp_titleThe job title supplied by the Borrower when applying for the loan.*
7emp_lengthEmployment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
8home_ownershipThe home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
9annual_incThe self-reported annual income provided by the borrower during registration.
10verification_statusIndicates if income was verified by LC, not verified, or if the income source was verified
11issue_dThe month which the loan was funded
12loan_statusCurrent status of the loan
13purposeA category provided by the borrower for the loan request.
14titleThe loan title provided by the borrower
15zip_codeThe first 3 numbers of the zip code provided by the borrower in the loan application.
16addr_stateThe state provided by the borrower in the loan application
17dtiA ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
18earliest_cr_lineThe month the borrower's earliest reported credit line was opened
19open_accThe number of open credit lines in the borrower's credit file.
20pub_recNumber of derogatory public records
21revol_balTotal credit revolving balance
22revol_utilRevolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
23total_accThe total number of credit lines currently in the borrower's credit file
24initial_list_statusThe initial listing status of the loan. Possible values are – W, F
25application_typeIndicates whether the loan is an individual application or a joint application with two co-borrowers
26mort_accNumber of mortgage accounts.
27pub_rec_bankruptciesNumber of public record bankruptcies


Starter Code

Note: We also provide feature information on the data as a .csv file for easy lookup throughout the notebook:
import pandas as pd
data_info = pd.read_csv('../DATA/lending_club_info.csv',index_col='LoanStatNew')