Join the “Zero to Data Analyst” Bootcamp. Limited seats.

A 20-week program covering 7 courses, 12 assignments, 4 real-world projects, and 12 months of career support.

The challenge is to create a model that uses data from the first 24 hours of intensive care to predict patient survival. MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

The data includes:

Training data for 91,713 encounters. Unlabeled test data for 39,308 encounters, which includes all the information in the training data except for the values for hospital_death. WiDS Datathon 2020 Dictionary with supplemental information about the data, including the category (e.g., identifier, demographic, vitals), unit of measure, data type (e.g., numeric, binary), description, and examples. Sample submission files

PyTorch is defined as an open source machine learning library for Python. It is initially developed by Facebook artificial-intelligence research group, and Uber’s Pyro software for probabilistic programming which is built on it.Originally, PyTorch was developed by Hugh Perkins as a Python wrapper for the LusJIT based on Torch framework. There are two PyTorch variants.

PyTorch redesigns and implements Torch in Python while sharing the same core C libraries for the backend code. PyTorch developers tuned this back-end code to run Python efficiently. They also kept the GPU based hardware acceleration as well as the extensibility features that made Lua-based Torch.

The major features of PyTorch are mentioned below −

**Easy Interface** − PyTorch offers easy to use API; hence it is considered to be very simple to operate and runs on Python. The code execution in this framework is quite easy.

**Python usage** − This library is considered to be Pythonic which smoothly integrates with the Python data science stack. Thus, it can leverage all the services and functionalities offered by the Python environment.

**Computational graphs** − PyTorch provides an excellent platform which offers dynamic computational graphs. Thus a user can change them during runtime. This is highly useful when a developer has no idea of how much memory is required for creating a neural network model.

PyTorch is known for having three levels of abstraction as given below −

**Tensor** − Imperative n-dimensional array which runs on GPU.

**Variable** − Node in computational graph. This stores data and gradient.

**Module** − Neural network layer which will store state or learnable weights.

The following are the advantages of PyTorch −

**It is easy to debug and understand the code.
It includes many layers as Torch.
It includes lot of loss functions.
It can be considered as NumPy extension to GPUs.
It allows building networks whose structure is dependent on computation itself.**

In [1]:

```
from __future__ import print_function
from builtins import range
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
torch.manual_seed(1234)
from sklearn import preprocessing
```

In [2]:

```
# importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier ,AdaBoostClassifier
from sklearn.model_selection import train_test_split
import lightgbm as lgb
```

In [3]:

```
# loading dataset
training_v2 = pd.read_csv("../input/widsdatathon2020/training_v2.csv")
```

In [4]:

```
# creating independent features X and dependant feature Y
y = training_v2['hospital_death']
X = training_v2
X = training_v2.drop('hospital_death',axis = 1)
```

In [5]:

```
# Remove Features with more than 75 percent missing values
train_missing = (X.isnull().sum() / len(X)).sort_values(ascending = False)
train_missing = train_missing.index[train_missing > 0.60]
X = X.drop(columns = train_missing)
```

In [6]:

```
#Convert categorical variable into dummy/indicator variables.
X = pd.get_dummies(X)
```

In [7]:

```
# Imputation transformer for completing missing values.
my_imputer = SimpleImputer()
new_data = pd.DataFrame(my_imputer.fit_transform(X))
new_data.columns = X.columns
X= new_data
```

In [8]:

```
# Threshold for removing correlated variables
threshold = 0.9
# Absolute value correlation matrix
corr_matrix = X.corr().abs()
corr_matrix.head()
# Upper triangle of correlations
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
upper.head()
# Select columns with correlations above threshold
to_drop = [column for column in upper.columns if any(upper[column] > threshold)]
print('There are %d columns to remove.' % (len(to_drop)))
#Drop the columns with high correlations
X = X.drop(columns = to_drop)
```

```
There are 36 columns to remove.
```

In [9]:

```
# Initialize an empty array to hold feature importances
feature_importances = np.zeros(X.shape[1])
# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary', boosting_type = 'goss', n_estimators = 10000, class_weight = 'balanced')
for i in range(2):
# Split into training and validation set
train_features, valid_features, train_y, valid_y = train_test_split(X, y, test_size = 0.25, random_state = i)
# Train using early stopping
model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],eval_metric = 'auc', verbose = 200)
# Record the feature importances
feature_importances += model.feature_importances_
```

```
Training until validation scores don't improve for 100 rounds
[200] valid_0's auc: 0.894784 valid_0's binary_logloss: 0.307204
Early stopping, best iteration is:
[123] valid_0's auc: 0.896074 valid_0's binary_logloss: 0.336856
Training until validation scores don't improve for 100 rounds
[200] valid_0's auc: 0.89045 valid_0's binary_logloss: 0.313477
Early stopping, best iteration is:
[108] valid_0's auc: 0.892429 valid_0's binary_logloss: 0.34919
```

In [10]:

```
# Make sure to average feature importances!
feature_importances = feature_importances / 2
feature_importances = pd.DataFrame({'feature': list(X.columns), 'importance': feature_importances}).sort_values('importance', ascending = False)
# Find the features with zero importance
zero_features = list(feature_importances[feature_importances['importance'] == 0.0]['feature'])
print('There are %d features with 0.0 importance' % len(zero_features))
# Drop features with zero importance
X = X.drop(columns = zero_features)
```

```
There are 19 features with 0.0 importance
```

In [11]:

```
# Normalize the data attributes
normalized_X = preprocessing.normalize(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
```

In [12]:

```
#Define training hyperprameters.
batch_size = 60
num_epochs = 50
learning_rate = 0.01
size_hidden= 100
#Calculate some other hyperparameters based on data.
batch_no = len(X_train) // batch_size #batches
cols=X_train.shape[1] #Number of columns in input matrix
classes= len(np.unique(y_train))
```

In [13]:

```
class Net(nn.Module):
def __init__(self,cols,size_hidden,classes):
super(Net, self).__init__()
#Note that 17 is the number of columns in the input matrix.
self.fc1 = nn.Linear(cols, size_hidden)
#variety of # possible for hidden layer size is arbitrary, but needs to be consistent across layers. 3 is the number of classes in the output (died/survived)
self.fc2 = nn.Linear(size_hidden, classes)
def forward(self, x):
x = self.fc1(x)
x = F.dropout(x, p=0.1)
x = F.relu(x)
x = self.fc2(x)
return F.softmax(x, dim=1)
net = Net(cols, size_hidden, classes)
```

In [14]:

```
#Adam is a specific flavor of gradient decent which is typically better
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()
```

In [15]:

```
from sklearn.utils import shuffle
from torch.autograd import Variable
running_loss = 0.0
for epoch in range(num_epochs):
#Shuffle just mixes up the dataset between epocs
train_X, train_y = shuffle(X_train, y_train)
# Mini batch learning
for i in range(batch_no):
start = i * batch_size
end = start + batch_size
inputs = Variable(torch.FloatTensor(train_X[start:end].values.astype(np.float32)))
labels = Variable(torch.LongTensor(train_y[start:end].values.astype(np.float32)))
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
print('Epoch {}'.format(epoch+1), "loss: ",running_loss)
running_loss = 0.0
```

```
Epoch 1 loss: 1498.3719485402107
Epoch 2 loss: 1499.5898114442825
Epoch 3 loss: 1499.5398111343384
Epoch 4 loss: 1499.5731455087662
Epoch 5 loss: 1499.6398088932037
Epoch 6 loss: 1499.5398112535477
Epoch 7 loss: 1499.6231424808502
Epoch 8 loss: 1499.5731439590454
Epoch 9 loss: 1499.5564813613892
Epoch 10 loss: 1499.5898144245148
Epoch 11 loss: 1499.6231430768967
Epoch 12 loss: 1499.5898139476776
Epoch 13 loss: 1499.5731488466263
Epoch 14 loss: 1499.5898113250732
Epoch 15 loss: 1499.5898127555847
Epoch 16 loss: 1499.6231447458267
Epoch 17 loss: 1499.6398134231567
Epoch 18 loss: 1499.6064811944962
Epoch 19 loss: 1499.5731446743011
Epoch 20 loss: 1499.5898134708405
Epoch 21 loss: 1499.5898113250732
Epoch 22 loss: 1499.5398098230362
Epoch 23 loss: 1499.6064813137054
Epoch 24 loss: 1499.6231442689896
Epoch 25 loss: 1499.5064792633057
Epoch 26 loss: 1499.6064791679382
Epoch 27 loss: 1499.5731455087662
Epoch 28 loss: 1499.5731471776962
Epoch 29 loss: 1499.5898129940033
Epoch 30 loss: 1499.589810371399
Epoch 31 loss: 1499.5564786195755
Epoch 32 loss: 1499.523141860962
Epoch 33 loss: 1499.6398117542267
Epoch 34 loss: 1499.639814376831
Epoch 35 loss: 1499.5564798116684
Epoch 36 loss: 1499.5731449127197
Epoch 37 loss: 1499.5731447935104
Epoch 38 loss: 1499.539813041687
Epoch 39 loss: 1499.6398141384125
Epoch 40 loss: 1499.5398124456406
Epoch 41 loss: 1499.5898158550262
Epoch 42 loss: 1499.589810013771
Epoch 43 loss: 1499.573145031929
Epoch 44 loss: 1499.6064821481705
Epoch 45 loss: 1499.5398144721985
Epoch 46 loss: 1499.5731456279755
Epoch 47 loss: 1499.52314722538
Epoch 48 loss: 1499.5731439590454
Epoch 49 loss: 1499.589810371399
Epoch 50 loss: 1499.5731484889984
```

In [16]:

```
import pandas as pd
#This is a little bit tricky to get the resulting prediction.
def calculate_accuracy(x,y=[]):
"""
This function will return the accuracy if passed x and y or return predictions if just passed x.
"""
# Evaluate the model with the test set.
X = Variable(torch.FloatTensor(x))
result = net(X) #This outputs the probability for each class.
_, labels = torch.max(result.data, 1)
if len(y) != 0:
num_right = np.sum(labels.data.numpy() == y)
print('Accuracy {:.2f}'.format(num_right / len(y)), "for a total of ", len(y), "records")
return pd.DataFrame(data= {'actual': y, 'predicted': labels.data.numpy()})
else:
print("returning predictions")
return labels.data.numpy()
```

In [17]:

```
result1=calculate_accuracy(X_train.values.astype(np.float32),y_train.values.astype(np.float32))
result2=calculate_accuracy(X_test.values.astype(np.float32),y_test.values.astype(np.float32))
```

```
Accuracy 0.09 for a total of 73370 records
Accuracy 0.09 for a total of 18343 records
```