Learn practical skills, build real-world projects, and advance your career
!pip install fastai2
Requirement already satisfied: fastai2 in /content/fastai2 (0.0.18) Requirement already satisfied: fastcore in /content/fastai2/fastcore (from fastai2) (0.1.18) Requirement already satisfied: torch>=1.3.0 in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.5.1+cu101) Requirement already satisfied: torchvision>=0.5 in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.6.1+cu101) Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from fastai2) (3.2.2) Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.0.5) Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from fastai2) (2.23.0) Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from fastai2) (3.13) Requirement already satisfied: fastprogress>=0.1.22 in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.2.3) Requirement already satisfied: pillow in /usr/local/lib/python3.6/dist-packages (from fastai2) (7.0.0) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from fastai2) (0.22.2.post1) Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from fastai2) (1.4.1) Requirement already satisfied: spacy in /usr/local/lib/python3.6/dist-packages (from fastai2) (2.2.4) Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from fastcore->fastai2) (1.18.5) Requirement already satisfied: dataclasses>='0.7' in /usr/local/lib/python3.6/dist-packages (from fastcore->fastai2) (0.7) Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch>=1.3.0->fastai2) (0.16.0) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (0.10.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (1.2.0) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai2) (2.8.1) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->fastai2) (2018.9) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (2.9) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (2020.6.20) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (3.0.4) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->fastai2) (1.24.3) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->fastai2) (0.15.1) Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (47.3.1) Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.0) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (3.0.2) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (2.0.3) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.2) Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (7.4.0) Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.1.3) Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (0.7.0) Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (4.41.1) Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (0.4.1) Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai2) (1.0.2) Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10->matplotlib->fastai2) (1.12.0) Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->fastai2) (1.6.1) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy->fastai2) (3.1.0)

I've used Deep Learning, and particularly fastai, for some time now. Normally I was reluctant to share anything thinking is not going to be of much use, but motivated by this competition, I decided to try and share a notebook in pure Pytorch to show how to implement a Multilabel Stratification and Cross Validation strategy https://www.kaggle.com/ronaldokun/multilabel-stratification-cv-and-ensemble because I thought it was very important to have a competitive edge. I am glad I did that because the notebook proved useful for some, it motivated me to also try to write more pure Pytorch.

I used this competition to learn the latest version of the fastai library, fastai2 (still in development but already ready to use).

fastai is awesome, with a lot of advanced stuff and neat tricks already build in the library, but it can be somewhat of a "black box", hard to inspect and adapt the code because is vastly different than familiar python codebases. That's why I share my entire workflow here in fastai2 so it can be useful to others to get to know the library, use it and adapt to other competitions.

Imports

import multiprocessing as mp
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from skmultilearn.model_selection import IterativeStratification
from fastai2.layers import *
from fastai2.vision.all import *
from fastai2.callback.core import *
from fastai2.optimizer import *
import torch
from torch import nn
import torch.nn.functional as F
from sklearn.metrics import f1_score
import scipy.optimize as opt
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import gc
import zipfile
import os
from pathlib import Path
import warnings
warnings.filterwarnings("ignore")

      
#ROOT = Path('../input/jovian-pytorch-z2g')
ROOT = Path('/content/clouderizer/fastai-v4/data')
DIR = ROOT / 'Human protein atlas'
TRAIN = DIR / 'train'
TEST = DIR / 'test'
SEED = 2020
FOCAL = False
EPOCHS = 16


#MODEL_PATH = Path('/kaggle/working/models')
#MODEL_PATH.mkdir(exist_ok=True)

submission = pd.read_csv(ROOT / 'submission.csv')

Statistics of the Dataset.

Many top competitors in Kaggle don't just use the Imagenet Statistics but calculate the Statistics of the current whole dataset ( train + test ) and use those instead. I lack expertise to know which is best but nevertheless I calculated the statistics of the current dataset. The values are already calculated, uncomment the following cell if you want to make the calculation yourself.