Learn practical skills, build real-world projects, and advance your career

Note: This is just a mirror of the fast.ai NLP Course Notebook, for the dsnet.org meetup. Please refer to the Course Github Repo for the latest updates

Summary of my results:

modeltrain_lossvalid_lossseq2seq_accbleu
seq2seq3.3550854.2728770.3820890.291899
+ teacher forcing3.1545854.0224320.4077920.310715
+ attention1.4522923.4204850.4982050.413232
transformer1.9131522.3496860.7817490.612880

Seq2Seq Translation with Attention

Attention is a technique that uses the output of our encoder: instead of discarding it entirely, we use it with our hidden state to pay attention to specific words in the input sentence for the predictions in the output sentence. Specifically, we compute attention weights, then add to the input of the decoder the linear combination of the output of the encoder, with those attention weights.

A nice illustration of attention comes from this blog post by Jay Alammar (visualization originally from Tensor2Tensor notebook):

attention