Abstract: Recently, transformer-based backbones show superior performance over the convolutional counterparts in computer vision. Due to quadratic complexity with respect to the token number in global ...