This Jupyter notebook outlines a universal blueprint that can be used to attack and solve any machine learning problem. It is based on the workflow described in the book Deep Learning with Python.
Download the latest version of the workbook using the command:
Change the file name, title and kernel as desired. This notebook was originally written with the kernel
conda:tensorflow_p36 on the AWS Deep Learning AMI.
Follow the steps described below, filling in the blanks (marked as
Once you're done building the final model, you can delete the cells containing instructions (like this one).
Define the problem and assemble a dataset:
Be aware of the hypotheses you are making at this stage:
Remember that machine learning can only be used to memorize patterns which are present in your training data. You can only recognize what you have seen before.
Q: What are you trying to predict?
Q: What will your input data be?
Q: What type of problem are you facing?
Q: What is the size of your dataset?
To achieve, you must define what you mean by success, E.g.:
Q: What is your metric for success?
TODO (e.g. accuracy)
Q: What value of your success metric are you aiming for?
TODO (e.g. 95 %)
There are three common evaluation protocols:
Prepare your data based on the evaluation protocol:
Q: What approach are going to follow for validation?
TODO e.g. K-fold cross validation
Q: Does your data require reformatting (into tensors), normalization or scaling?
Q: What is the training/validation/test split?
TODO e.g. 50-25-25
Q: Can/should the data be randomized before splitting?
Q: Can you come up with new features using existing ones to make the problem easier?
# Implement Step 3 (load, prepare & split the data)
The first goal is to develop a model that is capable of beating a dumb baseline. There are 3 key choices to be made:
rmspropis good enough).
|Problem Type||Last-layer Activation||Loss Function|
|Multi-class, single-label classification||`softmax`||`categorical_crossentropy`|
|Multi-class, multi-label classification||`sigmoid`||`binary_crossentropy`|
|Regression to arbitrary values||None||`mse`|
|Regression to values in `[0,1]`||`sigmoid`||`mse` or `binary_crossentropy`|
# TODO: Implement Step 4 here
The final objective is to find the balance between:
To figure out how big a model is required, you must develop a model that overfits, using one or more of the following approaches:
Plot the values of the loss function and the success metrics on the training and validation datasets to identify where the model starts over-fitting.
# TODO: Implement Step 5 Here
This part will take the most time. You will repeatedly modify your model, train it, evaluate on your validation data, modify it again... until your model is as good as it can get.
Following are some approaches for improving the model:
Be mindful of the following: every time you are using feedback from your validation process in order to tune your model, you are leaking information about your validation process into your model. This can cause overfitting on the validation data.
Once you have developed a seemingly good enough model configuration, you can train your final production model on all data available (training and validation) and evaluate it one last time on the test set. Finally, you can save your model to disk, so that it can be reused later.
# TODO: Implement Step 6 here