Jovian
⭐️
Sign In
# Snap Finance Business Intelligence Developer Assessment

This notebook will cover an analysis of a subset of customer applications at Snap Finance. Below summarizes my findings and includes code used to conduct my analysis.

Data

I used AWS's Relational Database Services(RDS) to create and populate a PostgreSQL table with Jupyter Notebooks. Once I had the database instance setup, I was ready to store both the customer_data.csv and merchant_data.csv tables out on the cloud.

In previous work environments, we have used the python package psycopg2 as a wrapper to load the tables into the database. During this exercise, I found a more straight forward approach with Python by using the sqlalchemy library. With this function, I was able to insert my data with pandas API df.to_sql. It was easy to implement however the table loading speed could use optimization.

Customer_data

In [27]:
import pandas as pd
df = pd.read_csv('/Users/rhynoooo/desktop/customer_data.csv')
df.columns = [c.lower() for c in df.columns] 
In [28]:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://postgres:postgres@mydb.ckj30otfj7ty.us-east-2.rds.amazonaws.com:5432/snap')
df.to_sql("customer_data",
               engine,
               if_exists='replace',
               schema='public',
               index=False,
               chunksize=500)
In [29]:
query = ''' SELECT * from customer_data '''
pd.read_sql(query, engine)
Out[29]:

Merchant_data

In [30]:
import pandas as pd
df2 = pd.read_csv('/Users/rhynoooo/desktop/merchant_data.csv')
df2.columns = [c.lower() for c in df2.columns] 
In [31]:
df2.to_sql("merchant_data",
               engine,
               if_exists='replace',
               schema='public',
               index=False,
               chunksize=500)
In [32]:
columns, rows
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-32-2511705f0fdc> in <module>() ----> 1 columns, rows NameError: name 'columns' is not defined
In [33]:
query = ''' SELECT * from merchant_data '''
pd.read_sql(query, engine)
Out[33]:

1. Descriptive Statistics For Customer's Funded Amount

In [34]:
df.describe()
Out[34]:

2. What percentage of completed leases are completed within seven days of submitting their application?

In [35]:
query = '''
SELECT 
    Round(100*(SUM(CASE WHEN (to_date(completed_time, 'MM/DD/YY') -  to_date(submit_time, 'MM/DD/YY')) <=7 
THEN 1.0 ELSE 0.0 END) 
/ (count(*) * 1.0)), 2) as Percentage
FROM customer_data'''
pd.read_sql(query, engine)
Out[35]:

3. What is the overall approval rate(percentage of applications approved) and completion rate (percentage of completions per approvals) for the entire data set?

Overall approval rate(percentage of applications approved) is 71.75%

In [36]:
query = '''
SELECT 
    Round(100 * (SUM(CASE WHEN approved = 1  THEN 1.0 ELSE 0.0 END) / (count(*) * 1.0)), 2) as Percentage
FROM customer_data'''
pd.read_sql(query, engine)
Out[36]:

Overall completion rate (percentage of completions per approval) is 60.85%

In [37]:
query = '''
SELECT 
    Round(100 * (SUM(CASE WHEN approved = 1 and completed = 1  THEN 1.0 ELSE 0.0 END) / SUM(CASE WHEN approved = 1  
THEN 1.0 ELSE 0.0 END)), 2) as Percentage
FROM customer_data'''
pd.read_sql(query, engine)
Out[37]:

4. Is there a difference in approval rates among the various application sources?

Yes there is a difference in approval rates among the various sources(Store = 69.38, Other = 56.94, Online = 37.93)

In [38]:
query = '''
SELECT Round(100 * (SUM(CASE WHEN approved = 1 and completed = 1  THEN 1.0 ELSE 0.0 END) / SUM(CASE WHEN approved = 1  
THEN 1.0 ELSE 0.0 END)), 2) as Percentage, application_source
FROM customer_data
group by application_source'''
pd.read_sql(query, engine)
Out[38]:

5. Which state has the highest total funded amount?

As you can see below, Utah has the highest total funded amount among Florida and Texas (UT = \( 1,365,658, FL = \) 1,111,361, TX = $ 388,337)

In [39]:
query = '''
select 
    md.state, 
    sum(cd.funded_amount) as fundedamt from customer_data cd
join merchant_data md
    on cd.merchant_id = md.id
group by md.state
'''
pd.read_sql(query, engine)
Out[39]:

6. For the state in #5, how many total applications did they have each month?

Below are the total applications for each month for the state of Utah: January = 54 February = 70 March = 100 April = 129 May = 146 June = 181 July = 232 August = 228 September = 249 October = 300 November = 338 December = 352

In [40]:
query = '''
select 
case when mm = 1 then 'January'
    when mm = 2 then 'February'
    when mm = 3 then 'March'
    when mm = 4 then 'April'
    when mm = 5 then 'May'
    when mm = 6 then 'June'
    when mm = 7 then 'July'
    when mm = 8 then 'August'
    when mm = 9 then 'September'
    when mm = 10 then 'October'
    when mm = 11 then 'November'
    when mm = 12 then 'December'
else 'error not a month'
end,
appCnt
from (
    select 
        extract(month from to_date(submit_time, 'MM/DD/YY')) as mm, 
        count(*) as appCnt from customer_data cd
    join merchant_data md
        on cd.merchant_id = md.id
    where md.state = 'UT'
    group by extract(month from to_date(submit_time, 'MM/DD/YY'))
    order by extract(month from to_date(submit_time, 'MM/DD/YY'))
) as s
'''
pd.read_sql(query, engine)
Out[40]: