US 12,498,927 B1
Microprocessor that allows same-fetch block start address co-residence of unrolled loop multi-fetch block macro-op cache entry and loop body macro-op cache entry used to build same
John G. Favor, San Francisco, CA (US); and Michael N. Michael, Folsom, CA (US)
Assigned to Ventana Micro Systems Inc., Cupertino, CA (US)
Filed by Ventana Micro Systems Inc., Cupertino, CA (US)
Filed on Apr. 24, 2024, as Appl. No. 18/645,281.
Application 18/645,281 is a continuation in part of application No. 18/380,152, filed on Oct. 13, 2023, granted, now 12,282,430.
Application 18/645,281 is a continuation in part of application No. 18/380,150, filed on Oct. 13, 2023, granted, now 12,299,449.
Application 18/380,152 is a continuation in part of application No. 18/240,249, filed on Aug. 30, 2023, granted, now 12,253,951.
Application 18/380,152 is a continuation in part of application No. 18/240,249, filed on Aug. 30, 2023, granted, now 12,253,951.
Claims priority of provisional application 63/547,230, filed on Nov. 3, 2023.
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01)
CPC G06F 9/3802 (2013.01) [G06F 9/30065 (2013.01); G06F 9/30181 (2013.01)] 31 Claims
OG exemplary drawing
 
1. A microprocessor, comprising:
a prediction unit (PRU) that continuously predicts a sequence of fetch block start addresses (FBSAs) that specify a corresponding sequence of fetch blocks (FBlks) in a program instruction stream, wherein a FBlk comprises a sequential run of architectural instructions;
a macro-op (MOP) cache (MOC) having MOC entries (MEs) that hold MOPs into which the architectural instructions of one or more FBlks are decoded, wherein a MOP comprises an instruction executable by an execution unit of the microprocessor;
a fetch unit; and
a fusion engine;
wherein the PRU is configured to:
install into the MOC a loop body ME using a first FBSA value that specifies the loop body ME;
instruct the fusion engine to build into the MOC an unrolled loop multi-FBlk ME (ULP-MF-ME) using F copies of the MOPs of the loop body ME, wherein F is a loop unroll factor that is at least two, wherein the ULP-MF-ME comprises an unrolled loop iteration count;
install into the MOC the ULP-MF-ME using the first FBSA value that also specifies the ULP-MF-ME;
detect a multiple-hit in the MOC on both the loop body ME and the ULP-MF-ME associated with a current instance of a loop on the loop body ME in the program instruction stream; and
in response to detecting the multiple-hit:
instruct the fetch unit to fetch from the MOC for execution a number of copies of the MOPs of the ULP-MF-ME equal to the unrolled loop iteration count; and
instruct the fetch unit to fetch from the MOC for execution the MOPs of the loop body ME until the PRU predicts the program instruction stream has fallen out of the current instance of the loop on the loop body ME.