8 Translation Transformer - Notebook by Sanyam Bhutani (init27)

Learn practical skills, build real-world projects, and advance your career

Created 5 years ago

Note: This is just a mirror of the fast.ai NLP Course Notebook, for the dsnet.org meetup. Please refer to the Course Github Repo for the latest updates

This notebook is adapted from this one created by Sylvain Gugger.

See also The Annotated Transformer from Harvard NLP.

Attention and the Transformer

Nvidia AI researcher Chip Huyen wrote a great post Top 8 trends from ICLR 2019 in which one of the trends is that RNN is losing its luster with researchers.

There's good reason for this, RNNs can be a pain: parallelization can be tricky and they can be difficult to debug. Since language is recursive, it seemed like RNNs were a good conceptual fit with NLP, but recently methods using attention have been achieving state of the art results on NLP.

This is still an area of very active research, for instance, a recent paper Pay Less Attention with Lightweight and Dynamic Convolutions showed that convolutions can beat attention on some tasks, including English to German translation. More research is needed on the various strenghts of RNNs, CNNs, and transformers/attention, and perhaps on approaches to combine the best of each.

from fastai.text import *