US 12,014,180 B2
Dynamically foldable and unfoldable instruction fetch pipeline
John G. Favor, San Francisco, CA (US); Michael N. Michael, Folsom, CA (US); and Vihar Soneji, Newark, CA (US)
Assigned to Ventana Micro Systems Inc., Cupertino, CA (US)
Filed by Ventana Micro Systems Inc., Cupertino, CA (US)
Filed on Jun. 8, 2022, as Appl. No. 17/835,409.
Prior Publication US 2023/0401066 A1, Dec. 14, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/38 (2018.01); G06F 12/0875 (2016.01); G06F 12/1045 (2016.01)
CPC G06F 9/3806 (2013.01) [G06F 9/3867 (2013.01); G06F 12/0875 (2013.01); G06F 12/1054 (2013.01); G06F 2212/305 (2013.01); G06F 2212/452 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A microprocessor, comprising:
a dynamically-foldable instruction fetch pipeline that receives a first fetch request that includes a fetch virtual address;
a buffer structure that holds the first fetch request; and
a branch target buffer (BTB) tagged with the fetch virtual address, wherein the BTB precedes and is decoupled from the instruction fetch pipeline by the buffer structure, wherein the first fetch request includes a hit/miss indicator and includes a predicted set index and a predicted way number provided by the BTB when the fetch virtual address hits in the BTB;
wherein the dynamically-foldable instruction fetch pipeline comprises:
a first sub-pipeline that includes a translation lookaside buffer (TLB) configured to translate the fetch virtual address into a fetch physical address;
a second sub-pipeline that includes a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache configured to receive a set index that selects a set of tags of the tag RAM for comparison with a tag portion of the fetch physical address to determine a correct way of the instruction cache;
a third sub-pipeline that includes a data RAM of the instruction cache configured to receive the set index and a way number that together specify an entry of the data RAM from which to fetch a block of instructions, wherein the TLB, the tag RAM, and the data RAM are sufficiently large to require multiple clocks to access such that each of the first, the second, and the third sub-pipelines comprise multiple stages; and
a control signal that comprises the hit/miss indicator of the BTB;
wherein when the control signal indicates a folded mode, the first, second and third sub-pipelines are configured to operate in a parallel manner by using the predicted set index and the predicted way number as the set index and the way number, respectively, to fetch the block of instructions from the entry of the data RAM and by using the predicted set index as the set index to select the set of tags of the tag RAM; and
wherein when the control signal indicates a unfolded mode, the first, second and third sub-pipelines are configured to operate in a sequential manner.