View unanswered posts | View active topics It is currently Mon Oct 14, 2019 6:36 pm



Reply to topic  [ 210 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14
 74xx based CPU (yet another) 
Author Message
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
BigEd wrote:
Bret Victor gave a good talk (and an amusing one) a few years ago:
https://retrocomputingforum.com/t/bret- ... -video/681

Wow, that video and presentation is a gem!. I must confess that not being able to catch the subtleness of the (English) language, and not knowing about the work of this guy, I was a bit confused at the beginning, as this speech was apparently given only a few yeas ago, as you actually pointed out!. But I didn't figure it out until relatively late at watching the video. It got me thinking weird things for a while. Thanks for that!. The Binary to Assembly Language to Fortran "enormous amount of resistance" tells a lot about human nature, very true. I also found this http://worrydream.com/dbx/ about this presentation by the author, with several links to documents (including videos) from the original times.


Thu Sep 12, 2019 7:06 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1277
So glad you liked it - and thanks for the link. I will add that to the post over on retrocomputing. It's one of my favourite talks, for the information, the mode of delivery, and the overall message.

Back to prefixes and so on: I think somewhere we've discussed whether or when the short constants in opcodes should be sign-extended. There are various pros and cons, and it might be that different tactics apply to different opcodes. (Another way to extend the usefulness of a short constant is to left-shift it, perhaps for example to point only to even addresses. This could get messy!)


Thu Sep 12, 2019 8:05 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
Hi Ed, about sign or zero extension of short constants embedded in opcodes, I have given some thought and also looked at what's most commonly used by the compiler. For some instructions, such as PC relative branches, it is obvious that they must be always sign extended, but I decided the following for the instruction set.

- CALL (11 bits ): Zero extends, as the constant represents absolute unsigned addresses.

- JMP, BRCC (9 bits): Sign extends, as it represents a positive or negative PC relative offset.

- ADD, SUB (8 bits): Both zero extend, as they are complementary instructions that the compiler can chose depending on whether a constant must be added or substracted

- AND (8 bits): Zero extends, as I think using sing extend for this one would look weird

- MOV (8 bits): Sign extends, so both small positive and negative values can be expressed

- CMP (8 bits): Sign extends, as it is desirable that comparisons with small negative numbers can be carried out

- "Indirect addressing modes" and "load effective address" instructions such as LD.W [SP, 8], R0, or LD.W [R0, 10], R1 or LEA SP, 44, R2 : All of them Zero extend the constant, so only positive offsets are available on them, (8 bit for the SP, and 5 bit for general purpose registers). I decided against negative offsets because I found that they are virtually never used, specially since I incorporated the "Base Frame Register" feature in the compiler. In the case of the SP or the Frame Pointer/ Base Pointer register, all offsets are positive because the SP is always at the top of the stack (the lower frame address) and the FP is at the lower address of the interesting access range.

- Access to global variables, for example with LD.W [myVarName], R0, or struct members with LD.W [R0, 10], R1, or even instances of this LD.W [R0, myVar], R1 : All require only positive offsets because the compiler always references base addresses of objects.

- All prefixed instructions, that is, all of the above instructions preceded by a prefix instruction, just combine the 11 bits of the prefix with the lower 5 bits of the embedded constant to get a 16 bit value, so whatever the constant value is, its sign will be implicit through the 16 bit, 2's complement arithmetic that will be performed with it, thus enabling negative offsets if necessary for the indirect addressing modes.

About left shifting the constant, this is also something that was considered and discussed here. In my case I think the only use I have for that is PC relative/absolute addresses, and SP arithmetic.

- Program memory addresses are already expressed in words. All instructions are 1 word long, so the PC and all program memory addresses are already implicitly shifted. The total program memory space is 128K although I have 64K addresses.

- Stack Pointer arithmetic could benefit from left-shifted constants because the SP is always word aligned. But given the constrains of the architecture and register set, this is not possible (or practical), because the same exact instruction codes are used to perform General Purpose Register arithmetic and SP arithmetic.

- The specific SP indirect addressing instructions, can't either benefit from that because although the SP is always pointing to an even address, the instruction may require the load of a single byte in an odd address. For example, the "load zero extended byte" instruction LD.SB [SP, 3], R0 may require an odd numbered offset. This is allegedly very rare, because it will only happen with structs passed by-value containing odd positioned 'char' fields. But this forbids the use of left-shifted constants, unless the compiler was to generate explicit code to access such odd positioned bytes, which I ultimately decided to avoid.


Thu Sep 12, 2019 9:17 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1277
Thanks for the comprehensive response! I confess I mentioned sign-extension without a firm mental picture of what had and hadn't been discussed or decided...


Fri Sep 13, 2019 6:01 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
BigEd wrote:
Thanks for the comprehensive response! I confess I mentioned sign-extension without a firm mental picture of what had and hadn't been discussed or decided...

I believe this was in the context of a comment by Rob (I think) who made an interesting suggestion about "displaced" immediate fields (not sure how he named it), meaning immediate constant fields with a non symmetrical range on the negative and positive side. I had that implemented for a while but I certainly did not post my late decision on the subject in any detailed way. The summary is that I finally decided to set most of them only in the positive range.


Sat Sep 14, 2019 1:10 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
I did some more work and started testing the operations that the processor does not support directly, and therefore require library calls, such as non-constant shifts, multiplication and division, with the compiler + assembler + simulator. So far, I tested all 16 bit based functions, and all 32 bit functions except multiplication and division, that I'm leaving for latter.

This is the c code that is put to test, which in this case is configured to test multiplication:

Code:
int indx;

int add(int a, int b) {return a+b;}
int sub(int a, int b) {return a-b;}
int and(int a, int b) {return a&b;}
int or(int a, int b) {return a|b;}
int xor(int a, int b) {return a^b;}
int lsr(unsigned int a, int b) {return a>>b;}
int lsl(int a, int b) {return a<<b;}
int asr(int a, int b) {return a>>b;}
int mul(int a, int b) {return a*b;}
int div(int a, int b) {return a/b;}
int mod(int a, int b) {return a%b;}

int (*funcList[])() = {add, sub, and, or, xor, lsl, asr, lsr, mul, div, mod, };

__attribute__((noinline))
unsigned int funcListTest( unsigned int a, unsigned int b, int i)
{
  return funcList[i]( a, b );
}

int main()
{
  return funcListTest(10, 15, 8);
}


I will not perform exhaustive testing procedures or implementing testing units because I think this would be overkill for this project. I just attempt to test manually a number of cases that I think that are representative, including the edge cases that could cause trouble. If this passes I will just assume that, for the untested range of values, the results will be fine too. I may well find latter that something that I considered tested does not work in a particular scenario, and I need to go back to the testing code, but I guess I will have to live with that.

The assembler generates machine code taking into account the Harvard architecture of the processor. This means that it's not possible or efficient to store data next to program code, as it's usually the case for Von-Neuman processors. Therefore, the assembler must create specific initialisation code that will move all required initial values to program memory, including constant data, and compiler initialised user variables. The Log file output of the Assembler for the c code above looks like that:

Code:
Constant Data:
00000 : 0 bytes

Initialised Variables:
00000 : 0x0e,0x00  add  Program:14
00002 : 0x10,0x00  sub  Program:16
00004 : 0x12,0x00  and  Program:18
00006 : 0x14,0x00  or  Program:20
00008 : 0x16,0x00  xor  Program:22
00010 : 0x1a,0x00  lsl  Program:26
00012 : 0x1c,0x00  asr  Program:28
00014 : 0x18,0x00  lsr  Program:24
00016 : 0x1e,0x00  mul  Program:30
00018 : 0x20,0x00  div  Program:32
00020 : 0x22,0x00  mod  Program:34

Unitialised Variables:
00022 : 2 bytes

-----
file:/Users/joan/Documents-Local/Relay/RelayNou/main.c74

Source: setup
00000 : 1111100000010011  _pfix  :00613
00001 : 0101000000101001  mov setupAddr, r1  :00613
00002 : 0101000000000010  mov dataAddr, r2  :00000
00003 : 0101000001011000  mov wordLength, r0  :00011
00004 : 0101100000000000  cmp r0, 0
00005 : 0100000000000110  brcc 0, .LL1  Program:+6
00006 : 0000111000001011  ld.w {r1}, r3
00007 : 1110000000010011  st.w r3, [r2, 0]
00008 : 0110000000001001  add r1, 1, r1
00009 : 0110000000010010  add r2, 2, r2
00010 : 0110100000001000  sub r0, 1, r0
00011 : 0011111111111000  jmp .LL0  Program:-8
00012 : 1111000000101001  call main  Program:00041
00013 : 0000100011000000  halt

Source: main.c
00014 : 0010000000001000  add r1, r0, r0
00015 : 0000000011000000  ret
00016 : 0010010001000000  sub r0, r1, r0
00017 : 0000000011000000  ret
00018 : 0010101000001000  and r1, r0, r0
00019 : 0000000011000000  ret
00020 : 0010100000001000  or r1, r0, r0
00021 : 0000000011000000  ret
00022 : 0010110000001000  xor r1, r0, r0
00023 : 0000000011000000  ret
00024 : 1111000001011101  call __lshrhi3  Program:00093
00025 : 0000000011000000  ret
00026 : 1111000001101001  call __ashlhi3  Program:00105
00027 : 0000000011000000  ret
00028 : 1111000001010001  call __ashrhi3  Program:00081
00029 : 0000000011000000  ret
00030 : 1111000011010011  call __mulhi3  Program:00211
00031 : 0000000011000000  ret
00032 : 1111000100010011  call __divhi3  Program:00275
00033 : 0000000011000000  ret
00034 : 1111000100100101  call __modhi3  Program:00293
00035 : 0000000011000000  ret
00036 : 1010000000010010  ld.w [SP, 2], r2
00037 : 0000001001010010  lsl r2, r2
00038 : 1100100000010010  ld.w [r2, funcList], r2  Data:00000
00039 : 0000001010000010  call r2
00040 : 0000000011000000  ret
00041 : 0110100000010111  sub SP, 2, SP
00042 : 0101000001000000  mov 8, r0
00043 : 1011000000000000  st.w r0, [SP, 0]
00044 : 0101000001010000  mov 10, r0
00045 : 0101000001111001  mov 15, r1
00046 : 1111000000100100  call funcListTest  Program:00036
00047 : 0110000000010111  add SP, 2, SP
00048 : 0000000011000000  ret

Source: setupData
00613 : 0000000000001110  _imm 14
00614 : 0000000000010000  _imm 16
00615 : 0000000000010010  _imm 18
00616 : 0000000000010100  _imm 20
00617 : 0000000000010110  _imm 22
00618 : 0000000000011010  _imm 26
00619 : 0000000000011100  _imm 28
00620 : 0000000000011000  _imm 24
00621 : 0000000000011110  _imm 30
00622 : 0000000000100000  _imm 32
00623 : 0000000000100010  _imm 34

Assembly completed


For brevity, I removed from the output above all the system library functions that in the example above would go from address 00048 to 00612

The starting sections: "Constant Data", "Initialised Variables" and "Uninitialised Variables" represent addresses in Data memory space. They are output to the Log file just for information purposes, but the assembler does not generate anything in Data memory. All the remaining sections are in Program memory, and they are the actual binary output of the assembler. The very last section, named "Source: setupData" is copied verbatim to Data memory by the processor initialisation code. The initialisation code goes to the "Source: setup" section as the processor starts execution at address 00000 (at least for now). Just at the end of the initialisation code, the "main" user function is called, and user program execution begins.

For the testing code above, I implemented a call table in C with the interesting functions to test. From the point of view of the C program, it is just an array of function pointers, which is initialised by the compiler with the test function addresses, so this array goes to the "Initialised Variables" section as shown above. Other than just testing the functions themselves, the code above allowed me to verify that at least simple cases of memory/stack access, subroutine calls, and other basic instructions are encoded correctly and executed correctly by the simulator.

So far, 16 bit non-constant shifts, multiplication, division and remainder, are already working fine in the simulator. Also 32 bit shifts, adds, subtraction, logical ops, have been tested.

(Edit: typo)


Sat Sep 14, 2019 10:15 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
While playing with the arithmetic routines in the simulator I figured out a couple of small changes on the instruction set that would be beneficial, and even potentially simplify the actual processor ALU implementation.

Since long ago, I have kept a double set of Status Registers. The first set was intended for boolean comparisons and logical operation, and the second set was meant for arithmetic ops. The following link of a previous post shows why this was beneficial:

http://anycpu.org/forum/viewtopic.php?f=23&t=583&start=90#p4501

Particularly, on the very last example, the compiler performs a signed addition of a 32 bit value with a 16 bit value. To add the upper word, the 16 bit value needs to be sign extended. Since there was (at the time) no specific instruction for that, a trick with compare (CMP) and set (SETLT) instructions was used. This got inserted between the ADD and the ADDC instruction thus saving one register. This was right and possible without any status flag interferences because the arithmetic flags were not affected by the in-between comparison.

However, things evolved and at some point I incorporated a sign extend word instruction (SEXTW). The same C code now results in the following assembly code which is 3 instructions shorter than previously:

CPU74
Code:
arith32:
   add   r2, r0, r0
   sextw   r2, r2
   addc   r2, r1, r1
   ret

The SEXTW instruction already leaves status flags unaffected. As a consequence of that, the 'dual' set of status flags have lost some utility. I have still seen the compiler taking advantage of the dual set by for example inserting a logical operation between an ADD and ADDC in order to save a register, but in reality the occurrences of that are very rare, and possibly not worth the complication of dual register flags. Many instructions that are candidates to be inserted between add and addc are register moves, and memory accesses, but these do not affect status flags anyway so there's no need for dual flags either.

So, I decided to simplify things at the cpu level and removed completely the dual status register. There's now a single set of SR flags. So far: I, V, S, C, Z

The next thing I did is incorporating shift instructions through carry (some processors call them rotate through carry) as well as carry flag setting for the normal shifts. This actually doesn't imply any new physical instruction encoding because, once carry generation is incorporated into the shift instructions, the ones performing left shifts (and left rotation through carry) can be replaced by the existing ADD and ADDC. So now, I only have explicit right-shift instructions, namely:
Code:
lsr Rs, Rd   1 bit logical shift right. Bit 0 is shifted to the C Flag. Bit 15 is set to zero. Result is stored in Rd
lsrc Rs, Rd   1 bit shift Right through carry. Bit 0 is shifted to the C Flag. The old C flag is shifted to bit 15. Result is stored in Rd
asr Rs, Rd   1 bit arithmetic shift right. Bit 0 is shifted to the C Flag. bit 15 is preserved. Result is stored in Rd
which should also simplify somehow the ALU design thanks to the lack of explicit left-shifts.

I named "LSRC" as this, instead of RCR or ROR or something to that effect, because it virtually always occurs after an ASR or LSR, which looks more elegant to me as it makes more obvious that it is performing the second half of a shift right. So, the code that is now generated for 1 bit shifts of 32 long values is this:

CPU74
Code:
# ---------------------------------------------
# asr32_1
# ---------------------------------------------

asr32_1:
   asr   r1, r1
   lsrc   r0, r0
   ret

# ---------------------------------------------
# lsr32_1
# ---------------------------------------------

lsr32_1:
   lsr   r1, r1
   lsrc   r0, r0
   ret

# ---------------------------------------------
# lsl32_1
# ---------------------------------------------

lsl32_1:
   add   r1, r1, r1
   addc   r0, r0, r0
   ret


Note the use of ADD and ADDC for the left shift.

Non constant shifts are still generated through library calls.

For constant shifts above 1, the compiler still generates the combination of optimised Left/Right Shifts combined with OR. For example, the following cases of 16, 8 and 4 bit shifts.

Code:
# ---------------------------------------------
# lsl32_16
# ---------------------------------------------

lsl32_16:       (16 bit constant shift left of long word)
   mov   r0, r1
   mov   0, r0
   ret

# ---------------------------------------------
# lsl32_8
# ---------------------------------------------

lsl32_8:      (8 bit constant shift left or long word)
   bswap   r0, r2
   zext   r2, r2
   zext   r1, r1
   bswap   r1, r1
   or   r1, r2, r1
   zext   r0, r0
   bswap   r0, r0
   ret

# ---------------------------------------------
# lsl32_4      (4 bit constant shift left of long word)
# ---------------------------------------------

lsl32_4:
   add   r1, r1, r1
   add   r1, r1, r1
   add   r1, r1, r1
   add   r1, r1, r1
   bswap   r0, r2
   zext   r2, r2
   lsr   r2, r2
   lsr   r2, r2
   lsr   r2, r2
   lsr   r2, r2
   or   r1, r2, r1
   add   r0, r0, r0
   add   r0, r0, r0
   add   r0, r0, r0
   add   r0, r0, r0
   ret


It may seem a bit convoluted, specially the 4 bit shift case, but it's still better than the compiler outputs for other processors without muti-shift hardware support that I looked at. This could be improved further in some cases by playing with the 1 bit long word shifts described above. For example the 4 bit shift would result in a total of 8 instructions only. I'm still undecided about whether I should spend some more time on this. I guess I will leave this for now and will restart the simulator tests.


Tue Sep 17, 2019 8:13 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1277
Another interesting evolution - thanks for the update!


Tue Sep 17, 2019 8:18 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
As I had already Add, Sub with carry instructions, and having recently added shifts with carry, I decided to complete the lot and added "cmpc" (compare with carry) as well. So I now have the whole set, which enables easy extension of 16 bit integer operations, into 32 or even 64 bits with relatively little effort, by following an identical approach to the AVR processors.

There are 4 instructions that take the carry flag (I call them the "carry-In instructions") :
Code:
addc Rs, Rn, Rd
subc Rs, Rn, Rd
cmpc Rs, Rn
lsrc Rs, Rd

I also decided to implement the In-Zero flag approach that was recently mentioned in the OPC thread, but not only for the 'cmpc' instruction but for all of them. The idea is that the Status Register flags after carry-in instructions will correctly reflect the result of the combined (wide) operation by taking into account the Zero flag. This makes possible to use conditional branches or set/select instructions after wide operations, as if they were normal 16 bit operations.

Other than that, I made the following changes on the ISA:

- Went back to 8 general purpose registers R0 through R7, with a separated SP and PC, instead of 7 registers + SP. That is, the SP is no longer part of the general register set.

The SP requires a number of instructions for it alone, as it can no longer be used as a general purpose register in regular instructions. This has not been a major issue because after the incorporation of prefixed immediates I had left some free slots that I used now.

Having separated instructions for the SP not only adds 1 precious general purpose register, but makes the ISA encodings more dense on instructions that could actually be used. I mean that when the SP was a general purpose register it could theoretically be used as an operand or result in instructions with little sense for a SP, such as logical operations, shifts, and others, thus wasting encoding combinations that now have been reclaimed for the new general purpose register. The SP has been left with only the instructions that are meaningful for it.

The set of SP instructions include just basic SP stack adjust, and stack load/stores with only one addressing mode. No more fancy is required for the SP operations. They are the following:

Code:
add SP, K, SP  // Adjust SP up or down, K is a sign extended constant
lea SP, K, Rd   // Get the address pointed by SP+K. this can also be used to move SP to Rd by just setting K to zero
mov Rs, SP     // Copy register Rd to SP. Along with the previous instruction, enables any kind of arithmetic on the SP by first transferring to a general purpose register
ld.w [SP, K], Rd  // Load word at address SP+K
ld.sb [SP, K], Rd  // Load sign extended byte at address SP+K
st.w Rd, [SP, K]  // Store word to address SP+K
st.b Rd, [SP, K]  // Store byte to address SP+K
push Rd            // Push Rd
pop Rd             // Pop Rd
[I have not listed subroutine calls tand returns which of course also have SP involvement]

The first two instructions may appear identical but they are not because the SP is not encoded as such, but implicit in the instructions. So these instructions do not even belong to the same encoding pattern.

Zero extended loads from the stack have been removed due to lack of encoding space, but this is not a major problem because explicit zero extends can be generated just after loads, and anyway the cases where this is required are relatively rare, as stack passed arguments always take a word even if they are byte sized.

I am now tempted to remove completely the push and pop instructions. They are really redundant because push/pops are only generated to save/restore registers upon function entry and exit. Register save/restore can be performed with store/load instructions followed or preceded by a stack adjust. It's one more instruction in most cases at function entry and another one at function exit, but two less instructions to implement at the microcode level. So far, I am not totally decided on that one though.

Updated the Compiler files and Instruction set Docs from the github repo
https://github.com/John-Lluch/CPU74

Joan


Wed Oct 02, 2019 9:28 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
I removed the "push" and "pop" instructions from the set. Function prologue and epilogue code now look like this:
Code:
   .globl   arith
arith:
   add   SP, -4, SP      # Allocate stack space
   st.w   r4, [SP, 2]   # Save register R4
   st.w   r5, [SP, 0]   # Save register R5
   ...
   ...                 # Function body
   ...
   ld.w   [SP, 0], r5   # Restore register R5
   ld.w   [SP, 2], r4   # Restore register R4
   add   SP, 4, SP       # Deallocate stack space
   ret


Compared with Push / Pop code, this requires one more instruction at function entry and exit in simple cases, but the cost is identical for the more general case because stack allocation code after push sequences is required anyway for local variables or stack arguments to calling functions.

With the new approach, all the required stack allocation for the function is folded into a single instruction at the beginning of the function, which also accounts for the callee saved registers. The new approach also normalises Frame Pointer usage when required, because the designed register is just saved normally as any other, and it is set to the initial SP pointer just before the function body begins.

From the point of view of hardware, this means two less instructions (push/pop) and relaxed need to implement predecrement/postincrement stores/loads only for these two. There's still the 'call' and 'return' instructions requiring pre/post SP de/in crements, but these instructions only push/pop the PC, so I suppose this should facilitate things.

This is something that I wanted to do for some time because the LLVM backend appeared to have the required hooks to implement it, and looked as a sensible thing to do, but I was refraining from doing it because I thought it was tricky. However, recently watching the RISCV doing exactly the same has motivated me to go ahead with this change.


Sat Oct 05, 2019 5:40 pm
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 15
DO you have real hardware?


Mon Oct 07, 2019 6:39 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
oldben wrote:
DO you have real hardware?

No, I don't.

I currently only have a hand made diagram of the overall CPU architecture for the purposes of understanding which buses and control lines I might need for the execution of all my instruction set.

I also have a software simulator that is coded in a way that exposes the required decoding logic to convert the instruction encodings and their embedded fields into executable microcodes and their operands, as well as cycle accurate components to perform the execution of such microcodes according to the cpu constraints.

So far, my work has consisted in implementing software tools (most time has gone to the compiler) as well as tweaking both the instruction set and the CPU diagram to get the best balance (or at least to my understanding) among: instruction set completeness, raw performance, code density, and easy instruction decoding. One thing that I learned is that achieving such a balance is much harder for constant-width 16 bit instruction sets, than for 32 bit or 8 bit sets.

Said that, I will need a lot of help for the actual hardware.


Mon Oct 07, 2019 7:54 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 15
It might be wise to keep a look out for the kind of connectors you wish to use, for the best prices,
as well as front panel switches if used and other harder to find compnents. AM29705 looks like just the RAM you need. The Block diagram looks good other than adding a write path to your program memory,for program loading. At $7.99 each Am29705's are here https://unicornelectronics.com/IC/MISCELLANEOUS.html


Wed Oct 09, 2019 5:20 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
Hi Ben, thank you for having looked at my project.

About components, I will purchase switches, connectors, cabling and so on, from local stores, and will possibly use Mouser for ics. I'm located in Europe, so purchasing relatively inexpensive items from the US is not generally viable due to delays and added costs. For pcbs I will possibly use JLCPCB, I have great experience with them based on a few small pcbs I made in the past, and they offer great quality as far as I can tell.

I'm aware that I need a writing path to program memory, but to be honest, I don't currently know what's the best way to do it, or what should be implemented. I think some hobbyists use an arduino to bridge their CPU to a standard computer through a serial port. In a perfect world, I would want to have a boot loader implemented in my processor, with an ethernet or serial interface to connect to a computer. But so far this is way above my head. Any suggestions would be greatly appreciated.

Joan

Edit: Forgot to add that I attempt to use SOIC surface mount components, particularly for ics, rather than through-hole DIL, for compactness and short length pcb traces, as my processor will have a relatively great number of them, and I aim for speeds of about 16 MHz or so.


Wed Oct 09, 2019 7:29 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 184
Location: Girona-Catalonia
I started figuring out an actual hardware schematic for decoding the instruction set. Found that some 'exceptions' to the encoding orthogonally required more gating than I felt comfortable with... The decoder in the software simulator is already implemented in terms that can be directly translated into hardware decoders, multiplexers and logic gates, but you know, just one line of seemingly inoffensive code, or just any tiny logical operator in the middle of an expression, adds extra complexity when that must be translated into actual hardware.

So I thought that I needed to free again, somehow, encoding slots that would allow me to place instruction bits where they are easier to decode. I decided to completely remove the condition code fields from the conditional instructions ("select", "set" and "conditional branch"). So I did this:

- I placed the actual condition code right in the compare instruction, instead of in the conditional instructions. The compare instruction does no longer set the classic S, V, C, S flags, but just one single flag (I named it T) indicating whether the comparison was true or false.

- For flags-setting arithmetic or logical operations, the convention is to set the "T" flag when the result was Zero. So after an ALU op (except a compare), the T flag is synonymous of Z. The compiler already takes advantage of this to optimise things, as shown in one of the functions below.

Code:
int compareselect2(int a)
{
  return a > 3 ? a : 0;
}

int sftest_test( int x, int y, int a, int b )
{
  if ( a < 8 )
  {
      x = x + a;
      y = - b;
  }
  return x+y;
}

int loopTest0( int a )
{
  for ( int i = 0 ; i<10 ; i++ )
    a = a<<1;

  return a;
}


CPU74
Code:
# ---------------------------------------------
# compareselect2
# ---------------------------------------------
   .globl   compareselect2
compareselect2:
   cmp.gt   r0, 3          # compare r0 with 3 for greater than, and update T flag
   mov   0, r1              # move 0 to r1
   selcc   r0, r1, r0       # select r0 if T flag is true, or r1 otherwise, put the result in r0
   ret                    # return

# ---------------------------------------------
# sftest_test
# ---------------------------------------------
   .globl   sftest_test
sftest_test:
   cmp.ge   r2, 8         # compare r2 with 8 for greater than or equal, and update T flag
   brcc   .LBB1_2         # conditional branch is T is true
   add   r2, r0, r0 
   neg   r3, r1
.LBB1_2:
   add   r0, r1, r0
   ret

# ---------------------------------------------
# loopTest0
# ---------------------------------------------
   .globl   loopTest0
loopTest0:
   mov   10, r1         # initial induction variable value, set to 10
.LBB0_1:
   add   r0, r0, r0     # this is the actual shift left
   sub   r1, 1, r1      # decrement the iv
   brncc   .LBB0_1      # loop back if not zero
   ret


So now, it's not the conditional instruction but the compare instruction what carries condition code information. Compiled code quality is not degraded because checking for multiple conditions after a single compare is extremely rare for compiler generated code. The advantage is that this frees 3 bits from all previous conditional instruction encodings, and it doesn't add anything to the compare instruction because I am able to use the space of the (previously unused) destination register to encode the condition code (the compare does not need a destination register).

The new instruction encodings are pushed here as usual : https://github.com/John-Lluch/CPU74/tree/master/Docs The relevant document is "CPU74InstrSetV9.pdf" (Labeled as version 9). The same document also explains the working of the status flags, and the instruction decoding patterns that should be pretty straightforward now. The idea is to convert the 9 bit wide instruction field into a 7 bit microcode that can be input to one or more ATF16V8 to produce the control signals

[The assembler and simulator are not yet updated]

Joan


Sat Oct 12, 2019 5:57 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 210 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software