memory-efficient-attention-pytorch
PublicImplementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n2) Memory"
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n2) Memory"