US 12,217,054 B2
Method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof
Duc Quang Bui, Grand Prairie, TX (US); Alan L. Davis, Sugar Land, TX (US); Dheera Balasubramanian Samudrala, Richardson, TX (US); and Timothy David Anderson, University Park, TX (US)
Assigned to TEXAS INSTRUMENTS INCORPORATED, Dallas, TX (US)
Filed by TEXAS INSTRUMENTS INCORPORATED, Dallas, TX (US)
Filed on Sep. 16, 2022, as Appl. No. 17/946,113.
Application 17/946,113 is a continuation of application No. 16/786,457, filed on Feb. 10, 2020, granted, now 11,449,336.
Claims priority of provisional application 62/852,918, filed on May 24, 2019.
Prior Publication US 2023/0015163 A1, Jan. 19, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 9/38 (2018.01)
CPC G06F 9/30018 (2013.01) [G06F 9/30036 (2013.01); G06F 9/30038 (2023.08); G06F 9/30043 (2013.01); G06F 9/30047 (2013.01); G06F 9/30105 (2013.01); G06F 9/3895 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a processor instruction for a first processor that specifies a portion of a first vector stored in a first source register of the first processor, a portion of a second vector stored in a second source register of the first processor, and a first destination register;
generating a predication instruction to mask one or more lanes of each of a third source register and a fourth source register of a second processor, in which the third and fourth source registers store the first and second vectors, respectively, and in which each of the third and fourth source registers is larger than each of the first and second source registers; and
based on a translation of the processor instruction for execution by a second processor:
reading the portion of the first vector from unmasked lanes of the third source register;
reading the portion of the second vector from unmasked lanes of the fourth source register;
interleaving the portion of the first vector read from the unmasked lanes of the third source register with the portion of the second vector read from the unmasked lanes of the fourth source register to produce a third vector; and
storing the third vector in a second destination register of the second processor.