Learn practical skills, build real-world projects, and advance your career
Created 5 years ago
Note: This is just a mirror of the fast.ai NLP Course Notebook, for the dsnet.org meetup. Please refer to the Course Github Repo for the latest updates
Summary of my results:
model | train_loss | valid_loss | seq2seq_acc | bleu |
---|---|---|---|---|
seq2seq | 3.355085 | 4.272877 | 0.382089 | 0.291899 |
+ teacher forcing | 3.154585 | 4.022432 | 0.407792 | 0.310715 |
+ attention | 1.452292 | 3.420485 | 0.498205 | 0.413232 |
transformer | 1.913152 | 2.349686 | 0.781749 | 0.612880 |
Seq2Seq Translation with Attention
Attention is a technique that uses the output of our encoder: instead of discarding it entirely, we use it with our hidden state to pay attention to specific words in the input sentence for the predictions in the output sentence. Specifically, we compute attention weights, then add to the input of the decoder the linear combination of the output of the encoder, with those attention weights.
A nice illustration of attention comes from this blog post by Jay Alammar (visualization originally from Tensor2Tensor notebook):