CAS-ViT: Convolutional Additive Self-Attention Vision Transformers for Efficient Mobile Applications
Abstract: Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer’s powerful global context capability. However, the pairwise token affinity and complex ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results