For my project, I'm doing the Default Final Project of CS 224N:
Question Answering on SQuAD 2.0
Why?
There are several reasons:
-
I want in the future do (maybe project nr2?) a model, that based on the input paragraph can generate Question-Answering Pairs.
- By doing this Question Answering Project, I think I get a more intuitive understanding of NLP concepts
- I hope that some of the modules from this project I can re-use in future
-
In this project, they give student baseline model, along with some other code, that I don't see often in tutorials or courses, which inclues:
- logging experiments in Tensorboard
- making checkpoint along the way (this is HUGE for me, because I will be using GPU in my local machine, so I can stop training and restart it later)
- providing scripts for training, testing
- basicly showing good coding practice
- I saw/read slides from this presentation Writing Code for NLP Research and it "checks" most of the boxes
-
I think the practices mentioned above will be useful for all my future projects.
-
It provides Squad 2.0 Leaderboard, which kind of like Kaggle, can guide me, telling me if I'm going in a good direction.
What I already did:
- I installed the enviroment for this task, downloaded code.
- First obstacle was being able to train the baseline on my local machine. I got out-of-memory error, but for RAM, not GPU. Embeddings created using GloVe where to big for my PC (I got 16 GB RAM). So I looked at pre-processing process, and found that changing parameter responsible for "Max number of words in a paragraph" from 400 to 200 enables me to train the baseline model on mine machine.
I plan to do lot of experiments on my local machine, then, when I'm ready, I'll re-train model in cloud without restrictions. - I read about metrics used in SQaD, and what is being judged.
- I watched lecture dedicated to Question Answering Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 10 – Question Answering
TODO
The Baseline model is based on a model from 2017 (BiAF), so there is lot of room for improvement.
- Looking at leaderboard The Stanford Question Answering Dataset I can see, that models using BERT are performing well, so I'll try to implement this at first.
- I'll try to adopt code from GitHub - huggingface/pytorch-pretrained-BERT: 📖The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL., which comes with pretrained weights.
As what to do after adapting BERT.
Theere are several things I can do to improve model, mentioned in handout provided by CS 224n staff:
Pre-trained Contextual Embeddings (PCE), aka ELMo & BERT
- ELMo
- BERT
Non-PCE Model Types
- Character-level Embeddings
- Self-attention
- Transformers
- Transformer-XL
- Additional input features
More models and papers
- Regularization
- Sharing weights
- Word vectors
- Combining forward and backward states
- Types of RNN
- Model size and number of layers.
- Optimization algorithms
- Ensembling
- Parameters - Experiment
If I manage to get a good model in 4 weeks, during last week I'll try do fine-tune the model on a different dataset (transfer learning), or I'll try to make API out of it and deploy it on server (not sure if API-making is doable in one week).
!pip install jovian -q