Sign In


The challenge is to create a model that uses data from the first 24 hours of intensive care to predict patient survival. MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

Data Description

MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

The data includes:

Training data for 91,713 encounters. Unlabeled test data for 39,308 encounters, which includes all the information in the training data except for the values for hospital_death. WiDS Datathon 2020 Dictionary with supplemental information about the data, including the category (e.g., identifier, demographic, vitals), unit of measure, data type (e.g., numeric, binary), description, and examples. Sample submission files


PyTorch is defined as an open source machine learning library for Python. It is initially developed by Facebook artificial-intelligence research group, and Uber’s Pyro software for probabilistic programming which is built on it.Originally, PyTorch was developed by Hugh Perkins as a Python wrapper for the LusJIT based on Torch framework. There are two PyTorch variants.

PyTorch redesigns and implements Torch in Python while sharing the same core C libraries for the backend code. PyTorch developers tuned this back-end code to run Python efficiently. They also kept the GPU based hardware acceleration as well as the extensibility features that made Lua-based Torch.


The major features of PyTorch are mentioned below −

Easy Interface − PyTorch offers easy to use API; hence it is considered to be very simple to operate and runs on Python. The code execution in this framework is quite easy.

Python usage − This library is considered to be Pythonic which smoothly integrates with the Python data science stack. Thus, it can leverage all the services and functionalities offered by the Python environment.

Computational graphs − PyTorch provides an excellent platform which offers dynamic computational graphs. Thus a user can change them during runtime. This is highly useful when a developer has no idea of how much memory is required for creating a neural network model.

PyTorch is known for having three levels of abstraction as given below −

Tensor − Imperative n-dimensional array which runs on GPU.

Variable − Node in computational graph. This stores data and gradient.

Module − Neural network layer which will store state or learnable weights.

Advantages of PyTorch

The following are the advantages of PyTorch −

It is easy to debug and understand the code.
It includes many layers as Torch.
It includes lot of loss functions.
It can be considered as NumPy extension to GPUs.
It allows building networks whose structure is dependent on computation itself.

In [1]:
from __future__ import print_function
from builtins import range
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from sklearn import preprocessing
In [2]:
# importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier ,AdaBoostClassifier
from sklearn.model_selection import train_test_split
import lightgbm as lgb

In [3]:
# loading dataset 
training_v2 = pd.read_csv("../input/widsdatathon2020/training_v2.csv")
In [4]:
# creating independent features X and dependant feature Y
y = training_v2['hospital_death']
X = training_v2
X = training_v2.drop('hospital_death',axis = 1)
In [5]:
# Remove Features with more than 75 percent missing values
train_missing = (X.isnull().sum() / len(X)).sort_values(ascending = False)
train_missing = train_missing.index[train_missing > 0.60]
X = X.drop(columns = train_missing)
In [6]:
#Convert categorical variable into dummy/indicator variables.
X = pd.get_dummies(X)
In [7]:
# Imputation transformer for completing missing values.
my_imputer = SimpleImputer()
new_data = pd.DataFrame(my_imputer.fit_transform(X))
new_data.columns = X.columns
X= new_data
In [8]:
# Threshold for removing correlated variables
threshold = 0.9

# Absolute value correlation matrix
corr_matrix = X.corr().abs()
# Upper triangle of correlations
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
# Select columns with correlations above threshold
to_drop = [column for column in upper.columns if any(upper[column] > threshold)]
print('There are %d columns to remove.' % (len(to_drop)))
#Drop the columns with high correlations
X = X.drop(columns = to_drop)
There are 36 columns to remove.
In [9]:
# Initialize an empty array to hold feature importances
feature_importances = np.zeros(X.shape[1])

# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary', boosting_type = 'goss', n_estimators = 10000, class_weight = 'balanced')
for i in range(2):
    # Split into training and validation set
    train_features, valid_features, train_y, valid_y = train_test_split(X, y, test_size = 0.25, random_state = i)
    # Train using early stopping, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],eval_metric = 'auc', verbose = 200)
    # Record the feature importances
    feature_importances += model.feature_importances_

Training until validation scores don't improve for 100 rounds [200] valid_0's auc: 0.894784 valid_0's binary_logloss: 0.307204 Early stopping, best iteration is: [123] valid_0's auc: 0.896074 valid_0's binary_logloss: 0.336856 Training until validation scores don't improve for 100 rounds [200] valid_0's auc: 0.89045 valid_0's binary_logloss: 0.313477 Early stopping, best iteration is: [108] valid_0's auc: 0.892429 valid_0's binary_logloss: 0.34919
In [10]:
# Make sure to average feature importances! 
feature_importances = feature_importances / 2
feature_importances = pd.DataFrame({'feature': list(X.columns), 'importance': feature_importances}).sort_values('importance', ascending = False)
# Find the features with zero importance
zero_features = list(feature_importances[feature_importances['importance'] == 0.0]['feature'])
print('There are %d features with 0.0 importance' % len(zero_features))
# Drop features with zero importance
X = X.drop(columns = zero_features)
There are 19 features with 0.0 importance
In [11]:
# Normalize the data attributes
normalized_X = preprocessing.normalize(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [12]:
#Define training hyperprameters.
batch_size = 60
num_epochs = 50
learning_rate = 0.01
size_hidden= 100

#Calculate some other hyperparameters based on data.  
batch_no = len(X_train) // batch_size  #batches
cols=X_train.shape[1] #Number of columns in input matrix
classes= len(np.unique(y_train))
In [13]:
class Net(nn.Module):
    def __init__(self,cols,size_hidden,classes):
        super(Net, self).__init__()
        #Note that 17 is the number of columns in the input matrix. 
        self.fc1 = nn.Linear(cols, size_hidden)
        #variety of # possible for hidden layer size is arbitrary, but needs to be consistent across layers.  3 is the number of classes in the output (died/survived)
        self.fc2 = nn.Linear(size_hidden, classes)
    def forward(self, x):
        x = self.fc1(x)
        x = F.dropout(x, p=0.1)
        x = F.relu(x)
        x = self.fc2(x)
        return F.softmax(x, dim=1)
net = Net(cols, size_hidden, classes)
In [14]:
#Adam is a specific flavor of gradient decent which is typically better
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()
In [15]:
from sklearn.utils import shuffle
from torch.autograd import Variable
running_loss = 0.0
for epoch in range(num_epochs):
    #Shuffle just mixes up the dataset between epocs
    train_X, train_y = shuffle(X_train, y_train)
    # Mini batch learning
    for i in range(batch_no):
        start = i * batch_size
        end = start + batch_size
        inputs = Variable(torch.FloatTensor(train_X[start:end].values.astype(np.float32)))
        labels = Variable(torch.LongTensor(train_y[start:end].values.astype(np.float32)))
        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
    print('Epoch {}'.format(epoch+1), "loss: ",running_loss)
    running_loss = 0.0  
Epoch 1 loss: 1498.3719485402107 Epoch 2 loss: 1499.5898114442825 Epoch 3 loss: 1499.5398111343384 Epoch 4 loss: 1499.5731455087662 Epoch 5 loss: 1499.6398088932037 Epoch 6 loss: 1499.5398112535477 Epoch 7 loss: 1499.6231424808502 Epoch 8 loss: 1499.5731439590454 Epoch 9 loss: 1499.5564813613892 Epoch 10 loss: 1499.5898144245148 Epoch 11 loss: 1499.6231430768967 Epoch 12 loss: 1499.5898139476776 Epoch 13 loss: 1499.5731488466263 Epoch 14 loss: 1499.5898113250732 Epoch 15 loss: 1499.5898127555847 Epoch 16 loss: 1499.6231447458267 Epoch 17 loss: 1499.6398134231567 Epoch 18 loss: 1499.6064811944962 Epoch 19 loss: 1499.5731446743011 Epoch 20 loss: 1499.5898134708405 Epoch 21 loss: 1499.5898113250732 Epoch 22 loss: 1499.5398098230362 Epoch 23 loss: 1499.6064813137054 Epoch 24 loss: 1499.6231442689896 Epoch 25 loss: 1499.5064792633057 Epoch 26 loss: 1499.6064791679382 Epoch 27 loss: 1499.5731455087662 Epoch 28 loss: 1499.5731471776962 Epoch 29 loss: 1499.5898129940033 Epoch 30 loss: 1499.589810371399 Epoch 31 loss: 1499.5564786195755 Epoch 32 loss: 1499.523141860962 Epoch 33 loss: 1499.6398117542267 Epoch 34 loss: 1499.639814376831 Epoch 35 loss: 1499.5564798116684 Epoch 36 loss: 1499.5731449127197 Epoch 37 loss: 1499.5731447935104 Epoch 38 loss: 1499.539813041687 Epoch 39 loss: 1499.6398141384125 Epoch 40 loss: 1499.5398124456406 Epoch 41 loss: 1499.5898158550262 Epoch 42 loss: 1499.589810013771 Epoch 43 loss: 1499.573145031929 Epoch 44 loss: 1499.6064821481705 Epoch 45 loss: 1499.5398144721985 Epoch 46 loss: 1499.5731456279755 Epoch 47 loss: 1499.52314722538 Epoch 48 loss: 1499.5731439590454 Epoch 49 loss: 1499.589810371399 Epoch 50 loss: 1499.5731484889984
In [16]:
import pandas as pd
#This is a little bit tricky to get the resulting prediction.  
def calculate_accuracy(x,y=[]):
    This function will return the accuracy if passed x and y or return predictions if just passed x. 
    # Evaluate the model with the test set. 
    X = Variable(torch.FloatTensor(x))  
    result = net(X) #This outputs the probability for each class.
    _, labels = torch.max(, 1)
    if len(y) != 0:
        num_right = np.sum( == y)
        print('Accuracy {:.2f}'.format(num_right / len(y)), "for a total of ", len(y), "records")
        return pd.DataFrame(data= {'actual': y, 'predicted':})
        print("returning predictions")

In [17]:

Accuracy 0.09 for a total of 73370 records Accuracy 0.09 for a total of 18343 records