US 12,306,906 B2
Adaptive token sampling for efficient transformer
Mohsen Fayyaz, Bonn (DE); Soroush Abbasi Koohpayegani, Baltimore, MD (US); Eric Chris Wolfgang Sommerlade, Oxford (GB); and Hamidreza Vaezi Joze, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Nov. 14, 2021, as Appl. No. 17/525,908.
Prior Publication US 2023/0153379 A1, May 18, 2023
Int. Cl. G06F 18/2113 (2023.01); G06F 18/24 (2023.01); G06N 3/04 (2023.01)
CPC G06F 18/2113 (2023.01) [G06F 18/24 (2023.01); G06N 3/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for processing a data item, comprising:
obtaining plural item tokens that represent the data item;
obtaining a classification token; and
converting the item tokens and the classification token into embedding vectors;
in an attention operation, using a transformer neural network for: generating original attention information based on the embedding vectors, the original attention information having a plurality of attention values, each attention value describing an importance that a particular token plays in an interpretation of another particular token; generating score information based on attention values in the original attention information that pertain to the classification token; and generating modified attention information by removing attention values from the original attention information, as guided by a sampling operation that is performed based on the score information; and
performing subsequent operations in the transformer neural network based on the modified attention information,
the subsequent operations performing fewer operations by using the modified attention information rather than the original attention information.