US 12,086,594 B2
	Vector friendly instruction format and execution thereof
Robert C. Valentine, Kiryat Tivon (IL); Jesus Corbal San Adrian, King City, OR (US); Roger Espasa Sans, Barcelona (ES); Robert D. Cavin, San Francisco, CA (US); Bret L. Toll, Hillsboro, OR (US); Santiago Galan Duran, Molins de Rei (ES); Jeffrey G. Wiedemeier, Austin, TX (US); Sridhar Samudrala, Austin, TX (US); Milind Baburao Girkar, Sunnyvale, CA (US); Edward Thomas Grochowski, San Jose, CA (US); Jonathan Cannon Hall, Hillsboro, OR (US); Dennis R. Bradford, Portland, OR (US); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); James C Abel, Phoenix, AZ (US); Mark Charney, Lexington, MA (US); Seth Abraham, Tempe, AZ (US); Suleyman Sair, Phoenix, AZ (US); Andrew Thomas Forsyth, Kirkland, WA (US); Lisa Wu, New York, NY (US); and Charles Yount, Phoenix, AZ (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Aug. 28, 2023, as Appl. No. 18/239,106.
Application 18/239,106 is a continuation of application No. 17/524,624, filed on Nov. 11, 2021, granted, now 11,740,904.
Application 17/524,624 is a continuation of application No. 17/004,711, filed on Aug. 27, 2020, granted, now 11,210,096, issued on Dec. 28, 2021.
Application 17/004,711 is a continuation of application No. 16/289,506, filed on Feb. 28, 2019, granted, now 10,795,680, issued on Oct. 6, 2020.
Application 16/289,506 is a continuation of application No. 13/976,707, abandoned, previously published as PCT/US2011/054303, filed on Sep. 30, 2011.
Claims priority of provisional application 61/471,043, filed on Apr. 1, 2011.
Prior Publication US 2024/0061683 A1, Feb. 22, 2024
Int. Cl. G06F 9/30 (2018.01); G06F 9/34 (2018.01); H01L 29/66 (2006.01); H01L 29/775 (2006.01); H01L 29/78 (2006.01); H01L 29/786 (2006.01)

CPC G06F 9/30145 (2013.01) [G06F 9/3001 (2013.01); G06F 9/30014 (2013.01); G06F 9/30025 (2013.01); G06F 9/30032 (2013.01); G06F 9/30036 (2013.01); G06F 9/30047 (2013.01); G06F 9/30149 (2013.01); G06F 9/30181 (2013.01); G06F 9/30185 (2013.01); G06F 9/30192 (2013.01); G06F 9/34 (2013.01); H01L 29/66553 (2013.01); H01L 29/775 (2013.01); H01L 29/7831 (2013.01); H01L 29/78696 (2013.01); G06F 9/30018 (2013.01); H01L 29/66 (2013.01)]

23 Claims

1. An apparatus comprising:

a processor to execute an instruction set, wherein the instruction set includes a first instruction format, wherein the first instruction format includes a first plurality of templates, wherein the first instruction format has a plurality of fields including a base operation field, a data element width field, and a write mask field, wherein the first instruction format supports, through different values in the base operation field, specification of different vector operations, wherein each of the vector operations is to generate a destination vector operand including a plurality of data elements at different data element positions, wherein the first instruction format supports, through different values in the data element width field, specification of different data element widths, wherein the base operation field, the data element width field, and the write mask field may each store only one value on each occurrence of an instruction in the first instruction format in instruction streams, the processor including,

a decode unit to decode the occurrences of the instructions in the first plurality of templates, including to:

distinguish, for each of the occurrences, which one of the data element widths to use based on a value in the data element width field; and

distinguish, for each of the occurrences, which of the data element positions of the destination vector operand are or are not to include corresponding data elements resulting from the vector operation of the occurrence based on the value in the write mask field and the data element width for the occurrence,

wherein different values that may be stored in the write mask field distinguish different write mask registers, of a set of write mask registers, that are to store configurable write masks, and wherein the data element width for the occurrence distinguishes which of the data element positions of the destination vector operand correspond with which bits of the configurable write masks.