7b Seq2seq Attention Translation - Notebook by Sanyam Bhutani (init27)

Learn practical skills, build real-world projects, and advance your career

Created 5 years ago

Note: This is just a mirror of the fast.ai NLP Course Notebook, for the dsnet.org meetup. Please refer to the Course Github Repo for the latest updates

Summary of my results:

model	train_loss	valid_loss	seq2seq_acc	bleu
seq2seq	3.355085	4.272877	0.382089	0.291899
+ teacher forcing	3.154585	4.022432	0.407792	0.310715
+ attention	1.452292	3.420485	0.498205	0.413232
transformer	1.913152	2.349686	0.781749	0.612880

Seq2Seq Translation with Attention

Attention is a technique that uses the output of our encoder: instead of discarding it entirely, we use it with our hidden state to pay attention to specific words in the input sentence for the predictions in the output sentence. Specifically, we compute attention weights, then add to the input of the decoder the linear combination of the output of the encoder, with those attention weights.

A nice illustration of attention comes from this blog post by Jay Alammar (visualization originally from Tensor2Tensor notebook):