View unanswered posts | View active topics It is currently Fri Mar 29, 2024 2:08 am



Reply to topic  [ 4 posts ] 
 instruction cache VS instruction decode stage 
Author Message

Joined: Thu Jan 17, 2013 4:38 pm
Posts: 53
(I feel like this should be in the CPU section, but not fitting well there after all)

If I understand the Intel P4 correctly it had a cache for the internally deconstructed "risc" instructions (in addition to a regular x86 icache).

Have any CPUs jelled the cache memory fetch controller with the decode stage to store pre-decoded instructions in the icache? (It would probably grow a lot in bit size, but not in other aspects for a cache of the same number of opcodes.)

Or does perhaps Intel have some kind of patent on this?


Tue Feb 05, 2013 4:23 pm
Profile
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
Quote:
Have any CPUs jelled the cache memory fetch controller with the decode stage to store pre-decoded instructions in the icache?
Looking back a few years, AMD's K6 series processors stored pre-decoded instructions in the icache. Are you interested strictly in the newest technology, or in a broader view?

Paul Hsieh's web site, azillionmonkeys.com, is well worth a visit. In the Technical section there is a comparison of 6th generation x86 CPU architectures and a comparison of 7th generation x86 CPU architectures. From the 6th generation article: "the K6's cache is divided into two fixed caches for separate code and data. [...] On the K6, the predecode bits are used for determining instruction length boundaries."

Mr Hsieh seems well-informed, articulate and thorough. Another excerpt on this topic (one of many topics carefully analyzed on the site):

"I am not as big a fan of split architectures (commonly referred to as the Harvard Architecture) because they set an artificial lower limit on your working sets. As pointed out to me by the AMD folk, this keeps them from having to worry about data accesses kicking out their instruction cache lines. But I would expect this to be dealt with by associativity and don't believe that it is worth the trade off of lower working set sizes. Among the design benefits they do derive from a split architecture is that they can add pre-decode bits to just the instruction cache."

cheers,
Jeff

http://LaughtonElectronics.com

_________________
http://LaughtonElectronics.com


Thu Feb 07, 2013 3:29 pm
Profile WWW

Joined: Thu Jan 17, 2013 4:38 pm
Posts: 53
Dr Jefyll wrote:
Are you interested strictly in the newest technology, or in a broader view?

Anything. Historical stuff is great.

Didn't know about the K6 doing that - I was kinda more in a fixed opcode size mentality, but find it interesting. Any 68K re-creators would probably get use out of that too.


Fri Feb 08, 2013 3:16 am
Profile
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
Quote:
Didn't know about the K6 doing that - I was kinda more in a fixed opcode size mentality, but find it interesting.
I sure found it interesting when I first learned of this. Every clock cycle is precious -- every pipeline stage, that is. Apparently the K6 designers looked at the cycle when the cache line fills and said, "This can be more than just a load. We can use predecode logic to simultaneously do some useful work, and store the result." It bloats the cache, because now you have extra bits to store. But those extra bits earn their keep.

On page 11 this K6 Data Sheet says, "Decoding x86 instructions is particularly difficult because the instructions are variable-length and can be from 1 to 15 bytes long. Predecode logic supplies the five predecode bits that are associated with each instruction byte. The predecode bits indicate the number of bytes to the start of the next x86 instruction. The predecode bits are stored in an extended instruction cache alongside each x86 instruction byte as shown in Figure 2. The predecode bits are passed with the instruction bytes to the decoders where they assist with parallel x86 instruction decoding."

On a personal note I remember the K6 as a watershed chip because it was my 450 MHz K6-2+ that allowed me to comfortably view PDF files for the first time. (Wow! Electronic documents!) Up until then I'd been running a clock-quadrupled '486, and for PDFs it just didn't have enough steam! :cry:

Jeff

_________________
http://LaughtonElectronics.com


Fri Feb 08, 2013 5:16 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 4 posts ] 

Who is online

Users browsing this forum: SemrushBot and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software