View unanswered posts | View active topics It is currently Thu Mar 28, 2024 8:04 pm



Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
 Scaled-index addressing mode 
Author Message

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
I never programmed a processor supporting scaled-index addressing mode in assembler. I can see the advantage of the addressing mode as an instruction for indexing into arrays of larger element size but as a total of code I feel that advantage gets smaller. I'm sure my inexperience is limiting my view so I ask for some help.

Looping through an array with scaled-indexing using some OP Rd [Rb,Ri*4] you need INC Ri and a branch, instead of OP Rd [Rb,Ri], ADD Ri #4 and a branch without the addressing mode. With a reverse loop and a decrement-and-branch instruction you can remove the increment instruction on the scaled-index variant. I guess you could create a decrement-and-branch instruction with scale instead (subtract-and-branch).

What features are needed from a processor supporting scaled-index addressing for it to be of greater use and what use cases do it have?


Fri Feb 14, 2020 10:46 am
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
spiff wrote:
What features are needed from a processor supporting scaled-index addressing for it to be of greater use and what use cases do it have?


The use-case that springs immediately to mind is jump tables:
Code:
jsr (_jumptable,r0*4)

Which says to me that scaled addressing modes are most useful in a CISC context, where they can be used with the majority of instructions, and not just with loads/stores. I also think they're most useful in architectures where registers are scarce.


Fri Feb 14, 2020 2:05 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Scaled indexed addressing will sometimes save on register usage compared a processor that doesn’t support it. It can sometimes lead to shorter code which might be important for small loops. There’s no need to load the reg into a temp and add. It is most often used for addressing arrays of data. Another use of scaled indexed addressing occurs when the displacement constant of register indirect mode is too large. It’s easy to then load the constant into a register and use and use scaled indexed addressing with the original displacement register. Without the indexed addressing it can require two additional registers and another instruction to perform the same operation.
However, scaled indexed addressing is not used often enough to justify its availability in some designs. Some processors support it, others don’t. For instance, RISCV doesn’t support it (unless one adds custom instructions). PowerPC supports indexed addressing without scaling.

_________________
Robert Finch http://www.finitron.ca


Fri Feb 14, 2020 6:43 pm
Profile WWW

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
Oh, thanks [Rob*2]. ;)

I immediately see one misconception I had about scaled-index mode. It is not intended to work on struct type data but only on arrays of primary datatypes already known by the processor.

I didn't even think of jumptables, embarrassed. The displacement trick was also new.

I realised (possibly wrongly) how limited the index addressing modes are on processors. The size of the operands are known by the instructions, shouldn't the processor be able to automatically scale the index to that size? (like inherited by the regular opcodes, the way auto-increment/decrement work.) I think I need to look into an auto-scaled-index addressing mode... As I don't know of any such processor, what am I missing here? I guess there are still uses for a two register base+offset addressing mode so it's not possible to entirely replace the common index mode.


Sat Feb 15, 2020 9:13 am
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
spiff wrote:
I think I need to look into an auto-scaled-index addressing mode... As I don't know of any such processor, what am I missing here? I guess there are still uses for a two register base+offset addressing mode so it's not possible to entirely replace the common index mode.


Auto scaling is an interesting idea. The only problem I can see is that, as you say, you'd still want an unscaled indexed addressing mode, so for byte accesses the scaled and unscaled modes would have the same effect, wasting encoding space. If you disallow scaled mode for byte access and re-use those codes for something else then you're adding to decoding complexity.


Sat Feb 15, 2020 2:56 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I'm a fan of scaled indexed addressing. It's like a divide, it's rarely used but when it's needed it's a slow and a pita to emulate with other instructions.
Quote:
The size of the operands are known by the instructions, shouldn't the processor be able to automatically scale t
Yes it could and some processors work that way.

Having the scale explicitly defined is primarily to reduce the decoding complexity in the processor. Rather than having to decode the instruction to determine the scale, it’s already there in the instruction. The processor may be able to scale a register while it's doing other decoding. There are often plenty of bits available to encode a scale. The “left over” bits in an instruction are sometimes used to also allow a displacement with scaled indexed addressing. It is rarely useful to use the scale for anything other than the size of the object being referenced. For instance, there might be a rare case where it’s desired to load the first byte of an array element, where the elements are actually words in size. So, then a load byte instruction would be used with a scale of 8 (for 64-bit words). But that kind of thing is extremely rare.

_________________
Robert Finch http://www.finitron.ca


Sat Feb 15, 2020 9:17 pm
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
This is a feature currently present in all major processors, starting by Intel and ARM. If I recall it well, even the VAX and the 68000 series processors had it. As already explained above, it's not meant to access struct elements but just plain arrays (or memory, which is the same) of processor sizes supported elements. Index scaling helps to reduce instruction count and improve speed in many scenarios involving loops. It also enables some compiler optimisations that would not be fully possible without it, for example the "loop strength reduction" optimisation consists on finding the best possible formula for the loop induction variable, which may involve replacing expensive operations such as multiplications by additions or collapsing multiple variables into a single ones while reducing code.

If we consider the following code:
Code:
char a[10];
int b[10];
for ( int i=0; i<10 ; ++i) b[i] = a[i];

- On a processor without scaled index addressing, the above code requires either (1) the presence of two index variables, one for the char array and another one for the int array, or (2) the multiplication/shift of one induction variable in the loop to account for the different element size of both arrays.

- On a processor with scaled index addressing, only one induction variable is required, because the instruction used to access the array elements already account for the size of the elements being accessed.

It's also interesting to note, that on processors with a pre/post decrement/increment addressing mode, the scaling also takes effect in this case. This means that pre-decrements, post-increments and so on are not meant to be just by one, they can be any size supported by the processor. Therefore, code like this:

Code:
char *a = "Hello world" ;
int b[12];
while ( *a != '\0' ) *b++ = *a++

may get compiled by simply using the post increment mode, that in the case of 'a' will increment the pointer by one, and in the case of 'b' will do it by 4 (assuming 32 bit ints), without needing a scalar induction variable or any need for adjustments of the different sized pointers.

My ultimate understanding, is that scaled-index addressing is nothing else that having "c" language pointer arithmetic at the processor instructions level, which helped early compilers to just translate raw "c" code into assembly language without having to worry about pointer element sizes.


Sun Feb 16, 2020 8:44 am
Profile

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
Ahh, thanks for those replies. Even got examples why the size of the scale being explicitly given in the instruction.

robfinch wrote:
[...] and some processors work that way.

Can you provide some model names?


Sun Feb 16, 2020 4:14 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
spiff wrote:
Ahh, thanks for those replies. Even got examples why the size of the scale being explicitly given in the instruction.

robfinch wrote:
[...] and some processors work that way.

Can you provide some model names?

Supported scale factors are almost invariably powers of 2, such as 1, 2, 4, 8. So on a 64 bits cpu this can be encoded with only 2 bits. That’s the most common approach, for what I can tell, for all commercial processors, including Intel and ARM. The encoding for instructions supporting this include (or use) two encoding bits to indicate the desired scale factor. From the point of view of hardware, my understanding is that this is implemented by shifting left the index value by the amount specified on these 2 bits.


Sun Feb 16, 2020 5:58 pm
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
joanlluch wrote:
The encoding for instructions supporting this include (or use) two encoding bits to indicate the desired scale factor. From the point of view of hardware, my understanding is that this is implemented by shifting left the index value by the amount specified on these 2 bits.


I think what spiff was looking for was examples where the index is automatically scaled appropriately for the operand size, rather than the scale factor being encoded separately into the instruction word - i.e. mov scales by 4, mov.h scales by 2, mov.b scales by 1?

(Historical note: 68020 onwards supports scaled indices (with the scale factor of 1, 2, 4 or 8 explicity encoded) but 68000 doesn't.)


Sun Feb 16, 2020 9:06 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
robinsonb5 wrote:
I think what spiff was looking for was examples where the index is automatically scaled appropriately for the operand size, rather than the scale factor being encoded separately into the instruction word - i.e. mov scales by 4, mov.h scales by 2, mov.b scales by 1?


Ok, maybe then this stackoverflow question/answer may help as an example https://stackoverflow.com/questions/48354636/are-scaled-index-addressing-modes-a-good-idea It shows a piece of C code and two different compilers outputs. The GCC 7.2 compiler choses not to use scaled index instructions, LLVM uses them (the link mentions clang 5.0, but that's just the front end, it does not create x86 code. It's the LLVM x86 backend what generates it. Based on my experience with that compiler the, "mov dword ptr [rdi + 2*rax], eax" instruction was most probably inserted during LSR optimisation)

There's an interesting online utility to compare assembly output of several popular compilers, which is definitely worth to try.

compiler comparison

Just for easy reference, I copy below the relevant code as posted in the said stackorverflow question

Code:
void foo(int* __restrict__ a)
{
    int i; int val = 0;
    for (i = 0; i < 100; i++) {
        val = 2 * i;
        a[i] = val;
    }
}


GCC 7.2
Code:
foo(int*):
        xor     eax, eax
.L2:
        mov     DWORD PTR [rdi], eax
        add     eax, 2
        add     rdi, 4
        cmp     eax, 200
        jne     .L2
        rep ret


LLVM 5.0
Code:
foo(int*): # @foo(int*)
  xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
  mov dword ptr [rdi + 2*rax], eax
  add rax, 2
  cmp rax, 200
  jne .LBB0_1
  ret

Also, just as a matter of completness, this is the output of the LLVM compiler for the ARM architecture

ARM V7
Code:
foo(int*):
        mov     r1, #0
.LBB0_1:                                @ =>This Inner Loop Header: Depth=1
        str     r1, [r0, r1, lsl #1]
        add     r1, r1, #2
        cmp     r1, #200
        bne     .LBB0_1
        bx      lr
The LLVM produces virtually identical code for the X86 and the ARM.

Interestingly, if we switch to GCC 8.1 or latter, the compiler output is somewhat better, but still not as good as LLVM

GCC 8.1
Code:
foo(int*):
        xor     eax, eax
.L2:
        lea     edx, [rax+rax]
        mov     DWORD PTR [rdi+rax*4], edx
        add     rax, 1
        cmp     rax, 100
        jne     .L2
        ret


In this case, the GCC compiler uses x4 scaled index to store the value in the array, but it fails to fully optimise the loop which can be achieved by choosing a better induction variable. LLVM, on the other hand, uses an induction variable that is already 2*i, so it removes the need for the computation of 2*i inside the loop, and therefore it uses a x2 scaled index to store the value in the array

Quote:
(Historical note: 68020 onwards supports scaled indices (with the scale factor of 1, 2, 4 or 8 explicity encoded) but 68000 doesn't.)

I'm not sure if this is 100% right. I have on my table an old reference book about 68000 programming, and it clearly states that for the "autoincrement and autodecrement" addressing modes the processor will increment the register by 'one', 'two', or 'four' depending on the "array element size". I think this means the type of instruction (byte, word or long). Also, in the "address register indirect with index and displacement" addressing mode the book seems to imply the same, although this is certainly not put in any clear way in this case (?), so you are probably right about that the 68000 did not support scaled indexes in the later case.


Sun Feb 16, 2020 10:54 pm
Profile

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
Sorry for the confusion, should have included the complete quotes. 68k, x86, and Arm all have the scaling factor of the index explicitly given in the instruction. I was looking for an architecture where the scaling is automatically taken from the operand size. Thanks for the stackoverflow discussion as that relate to the original question.

spiff wrote:
robfinch wrote:
spiff wrote:
The size of the operands are known by the instructions, shouldn't the processor be able to automatically scale the index to that size? (like inherited by the regular opcodes, the way auto-increment/decrement work.)

Yes it could and some processors work that way.

Can you provide some model names?


Mon Feb 17, 2020 11:02 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:

spiff wrote:
robfinch wrote:
spiff wrote:
The size of the operands are known by the instructions, shouldn't the processor be able to automatically scale the index to that size? (like inherited by the regular opcodes, the way auto-increment/decrement work.)

Yes it could and some processors work that way.

Can you provide some model names?

I can't think of any off the top of my head. I was sure I'd seen it in the past, but I may be wrong. Thinking of auto-increment / decrement addressing.
What does the VAX do? It has indexed addressing in an interesting fashion.

_________________
Robert Finch http://www.finitron.ca


Mon Feb 17, 2020 4:12 pm
Profile WWW

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
robfinch wrote:
What does the VAX do? It has indexed addressing in an interesting fashion.

There you have it. VAX scales its index mode to the size of the operand. Thanks. Maybe should have known this myself as I was a heavy user of VAX-11 back in the days, but never programmed in assembly apparently.


Mon Feb 17, 2020 8:00 pm
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
My brutally simple method to avoid scaling issues in my own design was to make ints, chars, floats (and even "bytes") all the same width. YMMV, of course ...

Mike B.


Mon Feb 24, 2020 6:56 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 16 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: AhrefsBot, Bing [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software