Cache mechanism in Transformer
Author: Liu Shaokong (an NLP algorithm engineer) The Encoder part is relatively simple, and only a batch-related mask needs to be considered when performing self-attention. Here we focus on the working mechanism of the decoder attention at each layer in the training and inference modes. In the training mode, the decoder part uses the teacher_forcing […]
Cache mechanism in Transformer Read More »