View unanswered posts | View active topics It is currently Thu Mar 28, 2024 9:53 am



Reply to topic  [ 36 posts ]  Go to page Previous  1, 2, 3  Next
 65ISR 
Author Message

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
Garth wrote:
Some advertising tactics try to be self-fulfilling prophecy. "This is the next big standard that everyone will be going to; so don't be left behind."

I have another ISA design also: http://www.forth.org/Stundurd.txt

I coined the name "Stundurd Forth" as a spoof on ANS-Forth that is supposedly the "Standard Forth."

I consider ANS-Forth to be a marketing gimmick from Forth Inc.. It got rubber-stamped by ANSI in 1994, and had never been tested (the first ANS-Forth compiler was SwiftForth that came out in 1997, and it was too bug-ridden to be usable until version-2 that came out sometime later) --- it seems obvious that nobody involved in ANS-Forth cared about the technical aspects of the language standard, or they would have written a reference compiler so they could test it to find out if it worked --- all they cared about was declaring themselves to be the Standard (with a capital 'S'), so they could declare everybody else to be a non-standard wanna-bee.


Fri Jul 28, 2017 2:45 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
There'll always be someone who says an idea won't fly. And sometimes they will be right. So you have to look at their credentials and experience and motivation, and see if you value their opinion - because rejecting all feedback is probably a poor tactic, just as accepting all feedback would be.

Maybe sometimes a product will be successful merely because it has some particular technology inside - you can sell it because it based on Forth, or Z80, or MS-DOS - but I think most times a product will be successful because it solves a problem for someone. Whether a calculator is powered by 6502 or Z80 or ARM isn't what the purchaser cares about - they care about usability, features, accuracy, reliability and so on. In some markets, they care about compatibility.

My advice would be: if you're doing it for fun, make sure you're enjoying yourself. If you're doing it for money, make sure your product is useful to somebody. Solve their problem.

When it comes to choosing an MPU, I would think the buyer will care about
- features, including performance and power efficiency
- price
- reliability
- supplier reputation including longevity and support
- compatibility
- ease of use including quality of toolchain and training


Fri Jul 28, 2017 6:14 am
Profile

Joined: Tue Jan 15, 2013 10:11 am
Posts: 114
Location: Norway/Japan
Hugh Aguilar wrote:
Have you used the Propeller? How well did that work out? What did you use it for?
I bought a Propeller QuickStart board some years ago. It's a credit-card sized board with 8 LEDs and 8 touch sensors and a microUSB port for communication and power. The threshold to get it to do something was incredibly low. I plugged it in to my Linux notebook, installed 'BST' (there are other options), and loaded some example code to blink a LED. Then I tried some other code, and immediately started modifying it to count or produce a pattern. Very easy. Took me just a minute to actually start programming it. So for some time I kept the board in a tiny paper box (the one it came in) and a stub (10cm) USB cable in a pocket in my jacket, and if I had a notebook I would just whip it out in a cafeteria or something and start programming. Fun and easy.
I have some DIPs too - you can actually use the Propeller just as-is, on a small breadboard, with nearly nothing else. Add resistors and a plug and you can have VGA straight out from a pin - all in software.

But I have only played with it, not used it for anything serious like a control system. Well, except for the built-in Propeller in an Apple I replica I have - but there it's more like a peripheral.


Fri Jul 28, 2017 10:14 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
Hugh Aguilar wrote:
I have a design for a processor (file attached) in case anybody is interested.

I have an update now that can be downloaded.

I added the PSV and PCV instructions to boost the speed of the VIRQ ISR.
I added a JMP zadr instruction so the 65ISR-chico can now have a VIRQ ISR that works similarly.
The 65ISR-chico lacks the PSV and PCV instructions (because there is now W register) though, so this has to be done manually with A.
This still works, but is less efficient. The 65ISR-chico is mostly intended to be a coprocessor.

There are some other minor changes. For example, SHR and SHL are now called ROR and ROL which makes more sense.

I also rewrote a lot of the documentation. It was hard to read and confusing previously.

thanks for your interest --- Hugh


Attachments:
File comment: update --- primarily the PSV and PCV instructions
65ISR.txt [29.13 KiB]
Downloaded 554 times
Fri Sep 22, 2017 2:20 pm
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Congratulations on a neat little project! I don't have time to read through your code samples in depth right now, but I really like their 65xx "flavor". You have clearly put some thought and skill into this design, and it shows. Keep up the good work.

Mike B.


Fri Sep 22, 2017 6:10 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
barrym95838 wrote:
Congratulations on a neat little project! I don't have time to read through your code samples in depth right now, but I really like their 65xx "flavor". You have clearly put some thought and skill into this design, and it shows. Keep up the good work.

Mike B.

Thanks for the encouragement. :)

I've definitely put some thought into it. As for skill, that is questionable. I don't know anything about Verilog. I don't know how much use of resources the processor takes, so I don't know how to decide between different sets of features.

For example, I currently have this:
Quote:
RTI set V-flag to 1 >> unmask the interrupts
WAI set V-flag to 0 >> unmask the interrupts >> go into a low-power wait mode
PSV zadr load W with PC+1 >> store W to memory >> do RTI unsupported in the 65ISR-chico
PCV zadr load W with PC+1 >> store W to memory >> do WAI unsupported in the 65ISR-chico

Lets say that I used the C-flag rather than the V-flag. And I provided these instructions instead:
Quote:
RTI unmask the interrupts >> go into a low-power wait mode if no interrupts pending
POL zadr load W with PC+1 >> store W to memory >> do RTI unsupported in the 65ISR-chico

If I did this, then the programmer would have to use SEC or CLC to set or clear the C-flag manually. So, instead of:
PSV zadr he would have: SEC POL zadr
And, instead of:
PCV zadr he would have: CLC POL zadr

There is a trade-off! I have to implement 2 instructions rather than 4 in Verilog, but the programmer has to use 2 instructions rather than 1 in his code.

Without knowing how difficult it is to implement instructions in Verilog, and how this affects resource usage in the FPGA, I don't know which way to go on the trade-off.

Also, can SEC or CLC be done in one clock cycle? On the 65C02 these take two clock cycles. On modern processors (such as the STM8) these take one clock cycle. I think the STM8 is able to get by with one clock cycle because it has a prefetch queue, whereas the old 65C02 wasn't able to fetch the next opcode while the previous opcode was executing.

I'm getting in over my head! Until I learn some Verilog, I'm not really qualified to be designing a new processor.


Sat Sep 23, 2017 4:53 am
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
Hugh Aguilar wrote:
Also, can SEC or CLC be done in one clock cycle? On the 65C02 these take two clock cycles. On modern processors (such as the STM8) these take one clock cycle. I think the STM8 is able to get by with one clock cycle because it has a prefetch queue, whereas the old 65C02 wasn't able to fetch the next opcode while the previous opcode was executing.

The 65CE02 did it in a single cycle. Commodore had some kind of patent on how to do that. They didn't sell very may 65CE02's though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Sat Sep 23, 2017 5:04 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Fortunately it's a 1988 patent, so it has served its purpose: their methods are open to see and to use by now. See
http://forum.6502.org/viewtopic.php?f=1&t=2551

There will of course be more than one way to do this. It might be worth a read of our OPC1 processor, which is a very simple CPU.
https://revaldinho.github.io/opc/
As you'd expect, as we progress up to OPC6 it becomes more complex. It does at some point gain single-cycle execution. That's mainly a question of a sufficiently subtle state machine to control overlapping actions, and also a simple enough instruction set to decode the one-cycle instructions very rapidly.


Sat Sep 23, 2017 7:30 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
BigEd wrote:
As you'd expect, as we progress up to OPC6 it becomes more complex. It does at some point gain single-cycle execution. That's mainly a question of a sufficiently subtle state machine to control overlapping actions, and also a simple enough instruction set to decode the one-cycle instructions very rapidly.

I'll read up on your OPC --- I had not heard of that.

What does it mean to have a "simple enough instruction set to decode the one-cycle instructions very rapidly"?
Does this mean that there are a few opcodes? I have less than 100
Or does this mean that there is a bit pattern in the opcodes, such as having a field that indicates the addressing-mode?
I have 8 addressing-modes, so I could use a 3-bit field to indicate the addressing-mode. That leaves a 5-bit field to indicate the instruction, which is 32 instructions --- I have over 64.
I was trying to speed up the processor by combining two instructions together when they are commonly adjacent to make a third instruction that does both operations (hopefully in parallel), so I end up with a lot of instructions.


Sat Sep 23, 2017 1:54 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Yes, I mean it mustn't take much logic complexity to decide on the next state, according to the just-fetched instruction. In the case of OPC6 it looks like we need to detect single word instructions, and throw out four cases which are not single cycle: Loads, Stores, Pushes, Pops. It's not a trivial decision. It's noteworthy (IIRC) that the machine clocked a bit faster before we added the pushes and pops, but they were useful enough instructions to be worth having anyway.


Sat Sep 23, 2017 2:08 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
BigEd wrote:
Fortunately it's a 1988 patent, so it has served its purpose: their methods are open to see and to use by now. See
http://forum.6502.org/viewtopic.php?f=1&t=2551

There will of course be more than one way to do this. It might be worth a read of our OPC1 processor, which is a very simple CPU.
https://revaldinho.github.io/opc/
As you'd expect, as we progress up to OPC6 it becomes more complex. It does at some point gain single-cycle execution. That's mainly a question of a sufficiently subtle state machine to control overlapping actions, and also a simple enough instruction set to decode the one-cycle instructions very rapidly.

I read up on your OPC a little bit. You have a load-and-store design.

I have a lot of instructions that act on memory. For example:
Code:
NEG     A               negate A setting Z N flags, and setting C= ~Z
NEG     zadr            negate memory value setting Z N flags, and setting C= ~Z

I could get rid of the NEG version that uses the zadr addressing-mode --- just load A, negate it, then store A again.

Of course, for this to work with dinary operations such as ADD I need to have more than one register --- maybe a B as well as an A register.

What I remember about the MiniForth is that registers take up a lot of resources, which is why the MiniForth has so few registers. My assumption that this is still true is why my 65ISR has very few registers. Nowadays everybody just goes with 16 registers though, so maybe this is not true anymore (the MiniForth was on the Lattice isp1048 PLD, rather than an FPGA).


Sat Sep 23, 2017 2:19 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
BigEd wrote:
Yes, I mean it mustn't take much logic complexity to decide on the next state, according to the just-fetched instruction. In the case of OPC6 it looks like we need to detect single word instructions, and throw out four cases which are not single cycle: Loads, Stores, Pushes, Pops. It's not a trivial decision. It's noteworthy (IIRC) that the machine clocked a bit faster before we added the pushes and pops, but they were useful enough instructions to be worth having anyway.

I don't have a stack at all on the 65ISR --- so there are no push or pop instructions.
I'm taking Walter Banks's advice and relying primarily on direct-access of zero-page variables.


Sat Sep 23, 2017 2:21 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
One thing which really makes a difference with Arlet's 6502 design - even with just four 8 bit registers - is putting them in a register file, because Xilinx FPGAs, at least, have really efficient implementations of register files. So we did the same sort of thing with OPC6, with a 16x16 reg file. It turns out that Lattice FPGAs don't do the same thing, so we don't get a big win there, and in fact on Lattice our reg file ends up being in block RAM. That's not a terribly efficient use of block RAM, but it works and it is fast and there's still a fair amount left for other purposes.

I think experience is needed, and then experimentation, to find out which machine features have how much cost - and then to consider that different FPGA families might have different cost functions.

In fact a lot of the later evolution of our machines was driven by convenience and compactness of the machine code: even if an instruction could be dropped and replaced by two or three others, it might be nicer to keep it, especially from the point of view of writing code in assembly.


Sat Sep 23, 2017 2:36 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 93
BigEd wrote:
One thing which really makes a difference with Arlet's 6502 design - even with just four 8 bit registers - is putting them in a register file, because Xilinx FPGAs, at least, have really efficient implementations of register files. So we did the same sort of thing with OPC6, with a 16x16 reg file. It turns out that Lattice FPGAs don't do the same thing, so we don't get a big win there, and in fact on Lattice our reg file ends up being in block RAM. That's not a terribly efficient use of block RAM, but it works and it is fast and there's still a fair amount left for other purposes.

I think experience is needed, and then experimentation, to find out which machine features have how much cost - and then to consider that different FPGA families might have different cost functions.

The Lattice isp1048 PLD was not an FPGA --- this was in 1994 when PLDs were different from FPGAs --- now a PLD is actually an FPGA internally.

I most likely will start with an Xilinx FPGA for the 65ISR. I have heard a lot of good things about them. Ilya Tarasov works for Xilinx teaching people how to use their chips --- he has implemented several Forth processors on the Xilinx FPGAs --- Testra implemented their RACE processor on a Lattice FPGA (the RACE is the upgraded MiniForth processor).

BTW: Michael Morris has a 65c02 implementation --- he spoke well of Arlet's implementation though --- said it was quite efficient. :-)
You were involved in Arlet's implementation? Maybe I could look at that and use it as a guide for my 65ISR implementation. This was Verilog?

BigEd wrote:
In fact a lot of the later evolution of our machines was driven by convenience and compactness of the machine code: even if an instruction could be dropped and replaced by two or three others, it might be nicer to keep it, especially from the point of view of writing code in assembly.

In many cases, having a single instruction also saves you from using a register --- important if the register is already in use.

For example, I previously did an indirect jump like this:
Code:
    LDW vector
    RTS

The problem is that this uses W, and W may be in use for something else. Also, the chico version doesn't have a W register.

I added a JMP zadr instruction to fix these problems (and boost the speed because it is just one instruction). Now I can do this:
Code:
    JMP vector

Anyway, I have an upgrade to my document. I got rid of the VIRQ interrupt and replaced it with the MIRQ interrupt, and added an M-flag to the processor flags. This is much better! I now have only RTI and POL --- interrupts can only occur after these two instructions --- I still think it is a bad idea to allow interrupts to occur anywhere, after any instruction.

I also got rid of the DIV instruction. That could be difficult to implement in HDL, and it is not very important. I provided the SLW and ADW instructions that boost the speed of a software division:
Code:
macro DIV D T B         ; W=numerator, D=denominator, T=-D, B=bit           this is done 16 times for each B in quotient
    SLW
    ADW T               ; W= W*2-D  (partial remainder)
    BPL L1              ; if W>=0 then leave the quotient bit set to one (all quotient bits set to one prior to starting)
    CLC
    STC B               ; set the quotient bit to zero
    ADW D               ; restore W
L1:
endm

; The DIV macro calculates one bit of the quotient. The quotient should be preset with all 1 bits.
; D is a 16-bit denominator; the 8-bit denominator shifted left by 8 bits. T is D negated (see the DNEGATE macro above).
; The bits should be calculated from most-significant to least-significant (right to left because we are little-endian).
; Execute DIV sixteen times. Whatever is left in W is the remainder (should be 8-bit because the denominator was 8-bit).

; MUL is an instruction because this has to be fast for PID control. DIV isn't used much and doesn't have to be fast.
; The most common use for DIV is dividing a 16-bit number by ten to convert it into decimal digits.
; Our DIV macro above is pretty fast though (compared to the 65c02, for example) --- it should be adequate for most uses.

One reason I don't want a hardware division is that it will take several clock cycles to execute. This is bad because the main-program will be blocking IRQx interrupts for too long. With a software division, I can do a POL after every iteration (there are 16 iterations) so the main-program isn't holding control for too long.

I'm definitely interested in feedback on my design --- I want it to be efficient --- I'm pretty happy with the MIRQ idea right now, though, so I think I'm off to a good start. 8-)


Attachments:
File comment: Now I have MIRQ rather than VIRQ.
Also, I got rid of DIV and replaced it with SLW and ADW to boost the speed of a software division.
There are some other changes.

65ISR.txt [31.16 KiB]
Downloaded 475 times
Sun Sep 24, 2017 1:47 am
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Hugh Aguilar wrote:
You were involved in Arlet's implementation?

AFAIK, Arlet whipped up that little gem all by himself. Ed did use it as a base for a 16-bit byte 65Org16 experiment awhile back, IIRC.
Quote:
Maybe I could look at that and use it as a guide for my 65ISR implementation. This was Verilog?

Yep.
https://github.com/Arlet/verilog-6502
Quote:
... I think I'm off to a good start. 8-)

Go, Hugh, Go!

Mike B.


Sun Sep 24, 2017 6:16 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 36 posts ]  Go to page Previous  1, 2, 3  Next

Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software