View unanswered posts | View active topics It is currently Thu Mar 28, 2024 2:13 pm



Reply to topic  [ 3 posts ] 
 Computing jump targets with odd instruction sizes 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
2017/08/23
I’m wondering what to do about the JAL instruction for my current design. The problem is instructions aren’t an even multiple of a byte in size and an odd number of instructions are fit into a 256 bit cache line with a few bits left over. Normally JAL takes a register operand and an immediate, adds them together to form a target address. The simple cases where either the register or the immediate constant is zero don’t cause a problem. However, JAL can be used to perform computed table based jumps and calls. One problem with the odd instruction organization is that an address for a table based jump can’t be easily calculated. For example on most processors calculating the jump target would just be a matter of shifting a value in a register by a certain number of bits. Then using the JAL instruction to add in the base address of the table. For example to jump to the 27 element on a machine with 32 bit instructions the register would simply be multiplied by four (shifted left twice). But what about on a machine with three 40 bit instructions packed into 128 bits ? One has to multiply by 3.2 to calculate a jump target. The other problem is what if the table isn’t at a 16 byte (128 bit) boundary ? It’s a bit restrictive to require that all jump tables be located on even 128 bit boundaries. It would cause problems with small structures.

It's possible to add an instruction specifically for calculating jump addresses but it seems like an ugly solution. A multiply by 3.2 and round to instruction address would be handy. But useful only when the table address is aligned on a 128 bit boundary. For the example multiplying by 3.2 would yield 86.4 or 56.x hex. This is the 5th 128 bit row byte 6. Rounded down to an instruction address it would be byte 5 which actually does work out to the desired target.

A fixed point fractional MAC instruction followed by rounding might be more useful than a custom “calculate this address” instruction. 3.2 is almost the same as 3.25. If a multiply by 3.25 were performed instead of by 3.2 there would be the occasional “hole” in the jump table. As long as the holes are known about. They should shouldn’t be a problem. As long as the compiler keeps the table index consistent with the MAC instruction it should work out okay.

_________________
Robert Finch http://www.finitron.ca


Wed Aug 23, 2017 12:10 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
A very few machines are word addressable. Most machines are byte addressable, even those with word-size instructions. Perhaps this is a case for making a machine which is bit-addressable? Or nibble-addressable, depending on how your instruction boundaries sit within a 128 bit boundary.

Having said which, I'd probably start by looking into alignment restrictions.


Wed Aug 23, 2017 9:31 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I worked through a couple of examples of indexed table access using a standard instruction set just to ensure that things could work.

Quote:
A very few machines are word addressable. Most machines are byte addressable, even those with word-size instructions. Perhaps this is a case for making a machine which is bit-addressable? Or nibble-addressable, depending on how your instruction boundaries sit within a 128 bit boundary.

The machine is now byte addressable. I switched the instruction set around trimming a bit off the instruction size in order to make it an even multiple of a byte in size. Instructions addresses and data addresses are now the same. One key to making things work is a jal instruction that automatically rounds to a valid instruction address.

Code:
; Hypothetical machine with 3, 40 bit instructions in 128 bits
; Since three instructions are fit into 128 bits the instruction
; width is effectively 5.333... bytes even though some bits are
; not used.
; Assumes JAL rounds to an instruction address

; Packed table
; R1 = table index
; R3 = address of table
   mul      R1,R1,#21846   ; multiply by 5.333 * 4096
   and      R2,R1,#0xFFF   ; get remainder
   sne      R2,R2,R0      ; set R2 = 1 if remainder <> 0
   shr      R1,R1,#12      ; divide by 4096
   add      R1,R1,R2      ; round up (add remainder)
   add      R1,R1,R3      ; add in table base address (in R3)
   jal      R49,[R1]      ; jump to address

; Faster math
; Table with "holes"
; Simply multiplies by six rather than 5.333 but needs a table
; with holes in it.
   shl      R1,R1,#1      ; *2
   shl      R2,R1,#1      ; *4
   add      R1,R1,R2      ; *6
   add      R1,R1,R3      ; and in table base address (in R3)
   jal      R49,[R1]      ; jump to address

; Even faster to multiply by eight but then the table is only 66% full.

Scaling by 4096 was chosen because it gives 12 bits of precision for fixed point math.
This allows a table with thousands of entries without any holes. Probably gross overkill for most apps.

_________________
Robert Finch http://www.finitron.ca


Thu Aug 24, 2017 6:46 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 3 posts ] 

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software