Hardware Reference
In-Depth Information
implement the instruction in the execution pipeline. This decoder mechanism
bridges the gap between an ancient CISC instruction set and a modern RISC data
path.
The decoded micro-ops are fed into the micro-op cache , which Intel refers to
as the L0 (level 0) instruction cache. The micro-op cache is similar to a traditional
instruction cache, but it has a lot of extra breathing room to store the micro-op se-
quences that individual instructions produce. When the decoded micro-ops rather
than the original instructions are cached, there is no need to decode the instruction
on subsequent executions. At first glance, you might think that Intel did this to
speed up the pipeline (and indeed it does speed up the process of producing an in-
struction), but Intel claims that the micro-op cache was added to reduce front end
power consumption. With the micro-op cache in place, the remainder of the front
end sleeps in an unclocked low-power mode 80% of the time.
Branch prediction is also performed in the front end. The branch predictor is
responsible for guessing when the program flow breaks from pure sequential fetch-
ing, and it must be able to do this long before the branch instructions are executed.
The branch predictor in the Core i7 is quite remarkable. Unfortunately for us, the
specifics of processor branch predictors are closely held secrets for most designs.
This is because the performance of the predictor is often the most critical compo-
nent to the overall speed of the design. The more prediction accuracy designers can
squeeze out of each square micrometer of silicon, the better the performance of the
entire design. As such, companies hide these secrets under lock and key and even
threaten employees with criminal prosecution should any of them decide to share
these jewels of knowledge. Suffice it to say, though, that all of them keep track of
which way previous branches went and use this to make predictions. It is the de-
tails of precisely what they record and how they store and look up the information
that is top secret. After all, if you had a fantastic way to predict the future, you
probably would not put it on the Web for the whole world to see.
Instructions are fed from the micro-op cache to the out-of-order scheduler in
the order dictated by the program, but they are not necessarily issued in program
order. When a micro-op that cannot be executed is encountered, the scheduler
holds it but continues processing the instruction stream to issue subsequent instruc-
tions all of whose resources (registers, functional units, etc.) are available. Regis-
ter renaming is also done here to allow instructions with a WAR or WAW depen-
dence to continue without delay.
Although instructions can be issued out of order, the Core i7 architecture's re-
quirement of precise interrupts means that the ISA instructions must be retired
(i.e., have their results made visible) in original program order. The retirement unit
handles this chore.
In the back end of the processor we have the execution units, which carry out
the integer, floating-point, and specialized instructions. Multiple execution units
exist and run in parallel. They get their data from the register file and the L1 data
cache.
 
Search WWH ::




Custom Search