Abstract: The quadratic complexity of self-attention in Transformers has hindered the processing of long text. To alleviate this problem, previous works have proposed to sparsify the attention matrix, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results