View unanswered posts | View active topics It is currently Thu Mar 28, 2024 9:54 am



Reply to topic  [ 8 posts ] 
 Stack pointer register in ITC Forth. 
Author Message

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Hello, all. As a newcomer to forth, I have been working on an implementation to 'shake out' the instruction set in my hobby 65m32 cpu design.

Do any of you (especially Garth, Jeff, and Brad) have any insights about the hazards of using the system stack for the data stack? I know that the 6502 uses the X register for the data stack, but I'm assuming that it's due to the fact that indirect and offset stack addressing modes are notably absent on the original 6502, and therefore using X instead of S is much more efficient. Is this the only reason, or am I missing something?

The reason that I suspect that I should be asking is because pdp-11 figForth uses the system stack for the return stack as well, and its orthogonal addressing treats all registers essentially as 'equals', just like my 65m32 (at least when it's indexing). Is there a compelling reason for doing it that way?

One possibility is the 'questionable' technique of using dstack values that have already been 'dropped', since an interrupt would instantly corrupt those values. Garth does it inside at least one of his primitives in his 65802 ITC Forth, but he's completely safe in doing so, since he's using X for his dstack pointer, and he's waiting until the next 'NEXT' to service the interrupt (in Forth or machine language). Are you aware of any other common code that does this (using pre-dropped dstack items), in primitives and/or compiled words?

Another possibility is that I haven't yet reached far enough in my translation to see any snags. FigForth seems to be very frugal in its use of JSRs and software interrupts, which could cause some complications, should they become necessary in an implementation that uses S for the dstack.

Jeff suggested that ITC might not even be a desirable implementation for a processor that only needs a few machine instructions (on average) per primitive. I am almost certain that he's correct, but I want to finish my ITC version before chasing a new goal.

If there is no particular reason against using the system stack for my data stack, I would like to give it a try, because my push-then-load instructions would save me several machine words and several machine cycles in the implementation that I'm attempting to complete. (Push-then-loaD ... kinda invites a Store-then-pulL, but that opportunity already Came-then-wenT) ;)

With dTOS in A and rTOS in (X):
DUP becomes pha
DROP becomes pla
SWAP becomes exa 0,s
NIP becomes ins
R becomes pda 0,x
R> becomes pda 0,x+
OVER becomes pda 0,s
@ remains lda 0,a
...

Thanks,

Mike


Last edited by barrym95838 on Mon Mar 03, 2014 4:14 pm, edited 2 times in total.



Mon Mar 03, 2014 2:55 am
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
Quote:
Do any of you (especially Garth, Jeff, and Brad) have any insights about the hazards of using the system stack for the data stack?

That would present problems. Any Forth word could take in parameters on the stack, then pass some of them to other words used in its definition, which might in turn pass one or more to yet another word. If it's all on the return stack (whether it got there by way of JSR or by nest), the nested words won't have any way to know that they have to step over some return addresses to find the parameters, and if so, how many, unless each routine that passes them along also exercises more overhead to adjust the stack. Having data and return stacks separate solves a lot of problems and makes stack management much more efficient. Some Forths have a separate, third stack for other things, mainly floating-point data, also to make stack management easier-- although that's mainly for if the cells on the third stack are at least the width of two cells on the normal data stack. There wouldn't be much advantage if they were the same width.

Quote:
One possibility is the 'questionable' technique of using dstack values that have already been 'dropped', since an interrupt would instantly corrupt those values. Garth does it inside at least one of his primitives in his 65802 ITC Forth, but he's completely safe in doing so, since he's using X for his dstack pointer, and he's waiting until the next 'NEXT' to service the interrupt (in Forth or machine language). Are you aware of any other common code that does this (using pre-dropped dstack items), in primitives and/or compiled words?

It works on the '802 because it does not have more than the 64K of address space, so indexing with FFFE,X just acts like a -2,X instead of going into the next bank. It improved efficiency a bit but I'll have to change it for running on an '816 with multiple banks. OTOH, if your operand there were wide enough to hit the entire addressable range, you can go back to using it like a negative number. I think you would need 32 bits for that on the 65m32, not the 24 I think you have in standard operands. Using a second 32-bit byte as an operand might still be more efficient than the alternative though.

Quote:
Jeff suggested that ITC might not even be a desirable implementation for a processor that only needs a few machine instructions (on average) per primitive. I am almost certain that he's correct, but I want to finish my ITC version before chasing a new goal.

Brad has pointed out in our private emails that if a JSR instruction with operand is a single word, then subroutine threading becomes very efficient. (Bruce Clark gives nine reasons why STC Forth hardly has the expected program-memory penalties even on a 6502/816, at http://forum.6502.org/viewtopic.php?p=3335, starting in the middle of his post in the middle of that long page.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Mon Mar 03, 2014 5:54 am
Profile WWW

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Thanks, Garth, but I think that I failed to make my point clearly.

I'm not trying to merge the data and return stacks, just swap the registers designated to point to them. So, in these cases, I would use X to point to the return stack and S to point to the data stack. I must admit that I haven't translated NEST yet, so I'll do some more digging before commenting any further. JSR would certainly complicate things a bit, though, at least for a subroutine that wanted to access d2OS or deeper.
Quote:
I think you would need 32 bits for that on the 65m32, not the 24 I think you have in standard operands. Using a second 32-bit byte as an operand might still be more efficient than the alternative though.

No, any offset in the range [-65535 .. +65535] fits in the same word as the opcode.

Quote:
Brad has pointed out in our private emails that if a JSR instruction with operand is a single word, then subroutine threading becomes very efficient.

My 65m32 can do that, but only if the vocabulary fits in its zero-page, which is in word addresses 0 +/- 65535. I have reserved negative addresses for the 'system space', so that basically means that you have 64KW to implement your complete vocabulary.

Quote:
(Bruce Clark gives nine reasons why STC Forth hardly has the expected program-memory penalties even on a 6502/816, at http://forum.6502.org/viewtopic.php?p=3335, starting in the middle of his post in the middle of that long page.)

I really like that guy! He never ceases to impress me, although he is a bit aloof at times.

Mike


Mon Mar 03, 2014 6:06 am
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
barrym95838 wrote:
Quote:
I think you would need 32 bits for that on the 65m32, not the 24 I think you have in standard operands. Using a second 32-bit byte as an operand might still be more efficient than the alternative though.

No, any offset in the range [-65535 .. +65535] fits in the same word as the opcode.

Ah yes, I had forgotten. The premise remains though, as a -1 operand for indexing will end up being +FFFF instead of -1. (I can't remember though if you have something entirely different for indexing, which could still make this a moot point.) Edit: No, you said you can do a -1 operand, and so go negative.

Quote:
Quote:
Brad has pointed out in our private emails that if a JSR instruction with operand is a single word, then subroutine threading becomes very efficient.

My 65m32 can do that, but only if the vocabulary fits in its zero-page, which is in word addresses 0 +/- 65535. I have reserved negative addresses for the 'system space', so that basically means that you have 64KW to implement your complete vocabulary.

More will fit in 64K (or 128K) 32-bit bytes than 64K 8-bit ones; so if any big data arrays can go elsewhere, I know I myself would have trouble filling up that much memory with actual program material. Still, if at least the possibility is left to go outside that boundary (albeit with less effiency when you do), that would be good too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Mon Mar 03, 2014 7:25 am
Profile WWW

Joined: Fri Jan 10, 2014 11:19 pm
Posts: 25
I have seen at least one ITC or DTC Forth which used the processor's return stack as the Forth data stack, and used a different address register for Forth's return stack. Alas, I don't recall the CPU in question.

The argument for this is simply one of efficiency. In most ITC/DTC code, data stack operations occur more frequently than Forth return stack operations. For example, nearly every machine-code primitive does something on the data stack, but (in ITC/DTC) is executed without using the return stack. So it's a question of which addressing mode is faster, and the only way to find that out is to code some primitives both ways and compare timing.

For reference, Phil Koopman did a study of how often Forth primitives get called in some benchmark Forth programs. The top ten were CALL a.k.a. ENTER (12.21%), EXIT (11.74%), VARIABLE (5.46%), @ (5.40%), 0BRANCH (4.78%), LIT (4.54%), + (4.18%), SWAP (3.90%), R> (3.89%), >R (3.87%), CONSTANT (3.68%), and DUP (3.05%). Note that CALL, EXIT, R>, and >R combined add up to about 32% of primitive executions. (Also bear in mind, however, that some primitives do multiple data stack operations.)

The argument against this is primarily the occurrence of interrupts. If you code your primitives correctly -- which means no "negative indexing" to refer to stack locations that have been popped -- interrupts can be safely handled. I am careful to always write my primitives this way, because I often use interrupts and multitasking in my applications. (I regard "negative indexing" like I regard self-modifying code -- on very rare occasions useful, but almost always avoidable, and should be considered an absolute last resort whenever suggested.) You also need to ensure that there's always enough free space on the Forth data stack to handle interrupts...but you'd have to make the same allowance on the Forth return stack in a "normal" implementation.

I confess that I succumb to another argument against using the return stack for data, which is a philosophical/pedagogical argument: a newcomer to Forth will expect to see the CPU's CALL/RET stack used for return addresses, and might stumble over the idea of using it to store data, and creating an independent return stack for Forth. So, all other things being equal or nearly equal, I'll always favor using the CPU's stack as the Forth return stack. (Oh, for the days of the 6809, with two designated stack pointers.)

Of course, if you are using Subroutine Threaded Code, the decision is forced: the CPU's stack is the return stack. (Except on devices such as the ARM, which don't have a subroutine return stack.)

_________________
1802, 6809, 8051, 8086, MSP430, Z80 -- there's a Forth for that: http://www.camelforth.com


Mon Mar 03, 2014 12:40 pm
Profile WWW

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Thanks for the valuable input, guys! If no one else (including Jeff) has a compelling argument to break tradition, I'll keep X as my data stack pointer. It's not much of a disadvantage anyway ... only an occasional one- instruction penalty for not being able to use push-then-load:

With dTOS in A and rTOS in (S):
DUP becomes sta 0,-x
DROP becomes lda 0,x+
SWAP becomes exa 0,x
NIP becomes inx
>R becomes pda 0,x+ (saving one instruction)
R becomes sta 0,-x : lda 0,s
R> becomes sta 0,-x : pla
OVER becomes sta 0,-x : lda 1,x
@ remains lda 0,a
...

Thanks,

Mike

P.S. Thanks for the word frequency study results, Dr. Brad! I'll do my best to concentrate on those.
EXIT and ;S are synonymous, right? If that's the case, it's ply on my implementation.
ENTER and DOCOL are synonymous too, right? If so, I'm leaning toward pdy #0,u but I haven't confirmed that yet.
My ITC NEXT is ldu 0,y+ : jmp (0,u+)


Mon Mar 03, 2014 3:16 pm
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Garth wrote:
... Brad has pointed out in our private emails that if a JSR instruction with operand is a single word, then subroutine threading becomes very efficient.

I'm reading his Moving Forth series right now ... good stuff!

Quote:
More will fit in 64K (or 128K) 32-bit bytes than 64K 8-bit ones; so if any big data arrays can go elsewhere, I know I myself would have trouble filling up that much memory with actual program material ...

It's still a bit too early to be sure, but I'm starting to think that I can fit a complete ~200-word figForth vocabulary in less than 2KW, headers and all! I'll post the (hand-assembled) .lst file here when it's done, but it will likely be buggy, unless I can get my simulator completed, compiled and debugged first.

Mike


Fri Mar 14, 2014 7:30 am
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
Quote:
but I'm starting to think that I can fit a complete ~200-word figForth vocabulary in less than 2KW,

My '816 Forth has about a thousand words (although that includes the assembler). :D

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Fri Mar 14, 2014 9:29 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 8 posts ] 

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software