View unanswered posts | View active topics It is currently Fri Mar 29, 2024 2:49 pm



Reply to topic  [ 66 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
 Started 6809 core 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Nice diagrams - helpful - thanks!


Fri Jan 07, 2022 9:05 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Some success! The eight-bit version of the system successfully lights up the LEDs. This is going through the network to load the instruction cache, execute instructions, then send a write back to the system to lite LEDs. The program to do this was in assembler. In the meantime the compiler was improved.

After getting the compiler to work with 12-bit bytes I put more work into the 12-bit version of the core. Using a 12-bit core adds about 30% to the overall size of the system. Not surprising as 12-bit is 30% more bits to process. The appeal of using 12-bits is 24-bit addressing. The compiler supports pointer values made up of two bytes. It is actually a fair amount of work to get the compiler to use 24-bit pointers when the byte size is only eight bits. Registers cannot contain a 24-bit address and quite a few hoops must be jumped to get good compiled output. It is simpler to modify the compiler to use 12-bit bytes, that way pointers can be contained in registers.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 08, 2022 4:04 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Made the number of nodes in the ring configurable. Reduced the node count to one for debugging. This reduces the system build time considerably. Got the system as far as clearing the screen now, but it does not clear to the correct colors.

Working on software for the core. Thinking of creating a multi-core version of the sieve, code below. Each core would be responsible for applying a portion of the sieve. I do not think this would be any faster than performing it on a single core as global memory would be updated.
Code:
; First fill screen chars with 'P' indicating prime positions
; Each core is responsible for the Nth position where N is the
; core number minus two.
;
multi_sieve:
   lda      #'P'               ; indicate prime
   ldb      COREID            ; find out which core we are
   subb   #2
   ldx      #0                  ; start at first char of screen
   abx
multi_sieve3:
   sta      TEXTSCR,x         ; store 'P'
   leax   8,x                  ; advance to next position
   cmpx   #4095
   blo      multi_sieve3
   addb   #2                  ; start sieve at 2 (core id)
   lda      #'N'               ; flag position value of 'N' for non-prime
multi_sieve2:
   ldx      #0
   abx                           ; skip the first position - might be prime
multi_sieve1:
   abx                           ; increment
   sta      TEXTSCR,x
   cmpx   #4095
   blo      multi_sieve1
   addb   #8                  ; number of cores working on it
   cmpb   #4095
   blo      multi_sieve2
multi_sieve4:               ; hang machine
   bra      mult_sieve4   

_________________
Robert Finch http://www.finitron.ca


Sun Jan 09, 2022 7:16 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
three nodes might be a good tradeoff between build time and finding bugs in the network...


Sun Jan 09, 2022 7:36 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got a note about the Turbo9 project and watched the you-tube video. The core is a pipelined 6809 core. Running at about 120MHz with 3x the performance of a stock 6809 gives the equivalent of over 350x performance. I assume they picked a fast FPGA part for timing. My own core runs at over 40MHz in a slow part. It would probably run over 100MHz in fast part. However, my core does not have an overlapping pipeline. So its probably only about 120x the performance.

Had the stacks for both cores of a node at the same location, not good. Ran into issues with values on the stack being corrupted. There is a switch now for different processing for the odd or even numbered core. For instance, only one core needs to copy the global ROM to RAM since the RAM is shared.

There seems to be an issue accessing local memory. I tried running the sieve program and it worked. Thing is it only uses global video memory. However more sophisticated program is not working because apparently subroutine calls do not work. A subroutine may be called but does not return properly.

Quote:
three nodes might be a good tradeoff between build time and finding bugs in the network...

I have tried several different numbers of nodes the network seems to be working. The sieve was coded for four nodes (eight cores) so wont run properly without that number. It still runs however, just incorrect results.

I am working on other software between trial runs, like a BIOS / boot rom and OS.

There are also peripheral cores to update. Currently eight-bit versions are used with a 12-bit bus. Several peripherals like the keyboard and RTC don’t make sense to update.

I may have fixed the local memory issue. I made the memory controlled about as simple as possible which cost some performance. I think the issue was the enable signal was being cleared during the same clock cycle as data was captured from the RAM. I delayed clearing the signal by a clock cycle and now it seems to work. More testing needed though.

_________________
Robert Finch http://www.finitron.ca


Sun Jan 09, 2022 4:56 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The RAM test routine passed okay except for core #5 which reported an error. RAM test is special in that the return address for it is stuffed in the U register so that the RAM test does not need any memory. Next thing to try is reducing the clock frequency.
Got a bunch of keyboard driver routines written.
Code:
 ; Local RAM test routine
; Checkerboard testing.
; There is 70kB of local RAM
; Does not use any RAM including no stack

ramtest:
   ldy      #0
   lda      #1
   sta      LEDS
   ldd      #$AAA555
ramtest1:
   std      ,y++
   cmpy   #71680
   blo      ramtest1
   ; now readback values and compare
   ldy      #0
ramtest3:
   ldd      ,y++
   cmpd   #$AAA555
   bne      ramerr
   cmpy   #71680
   blo      ramtest3
   lda      #2
   sta      LEDS
   jmp      ,u
ramerr:
   lda      #$80
   sta      LEDS
   ldx      #TEXTSCR
   ldb      COREID
   abx
   lda      #'F'
   sta      ,x
   sync
   jmp      ,u

_________________
Robert Finch http://www.finitron.ca


Mon Jan 10, 2022 5:04 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest fix: the nybble codes for transfer and exchange instructions were positioned wrong for the 12-bit version of the core. This led to those instructions not working correctly.

Looking at the transfer and exchange instructions for the 12-bit core there are extra bits that could be used. The core id input to the core is a register that could potentially be accessible with the TFR instruction. ATM the core id is available at a specially dedicated address: $FFFFE0. There are other registers common in other architectures that might be useful to include such as the tick count.

Added: a checkpoint interrupt. The checkpoint register at address $FFFFFFFE1 must be cleared within 1 second or an NMI is generated. The checkpoint register is set automatically by a timer circuit.

The instruction cache load now uses asynchronous reads across the network for better performance. With asynchronous reads multiple reads may be issued before any responses are received back. This allows the cache to place a high speed burst of addresses on the network. Tricky to get working because response may come back out of order.

Code is running from the local RAM now after it is loaded from global ROM. However, after running numerous iterations of the delay loop, the core jumps off to a wild address, then keeps executing unknown instructions from the wild set of addresses. That was incentive to add checkpointing.
All cores reporting ram test failure after some slight modifications. More to fix yet.

_________________
Robert Finch http://www.finitron.ca


Tue Jan 11, 2022 3:59 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
Have you looked at the Hitachi 6309, a enhanced 6909.?


Wed Jan 12, 2022 12:46 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Have you looked at the Hitachi 6309, a enhanced 6909.?

Yes. That is a great update to the 6809. I have looked at that instruction set, and I am not fond of the extensions. Extensions to the instruction set for triple byte addressing conflict with 6309 extensions. I had a version coded to support the 6309 instead of the 6809 but dropped it. I never used it. There are a lot of 6809 systems using banked addressing. I feel it is more important to extend the addressing range than it is to change other aspects of the ISA. The easy route is to make a wider machine, 12-bits instead of eight bits.

Latest Fixes: the assembler was only outputting the low order 16-bits of a 24-bit displacement for long branches. This led to issues executing subroutines.

Latest Additions: Added a different configuration for processing nodes. Rather than have two cores sharing RAM in one node. The new configuration has only a single core with dedicated local RAM. There is half as much RAM available. The new configuration arose out of the desire to simplify the software for the node. With two cores sharing RAM there was a lot testing and setting in the software to control buffer positions for each core. When there is only a single core present the software is simpler.

Thinking about adding a second ring to the bus for responses. This issue is that if the bus becomes flooded with requests then there is no space on it for responses. This hangs the bus because the requests will keep looping around until there is a response. If there was a dedicated ring for responses then it may help.

The keyboard initialization routine is hanging, it loops around forever. I have traced the code using ILA through a branch that is failing. I have no idea why the branch would fail. <- figured this out. The decrement instructions were not encoded by the assembler correctly. In one location I decided to code DEY instead of LEAY -1,y.
Character output via the DisplayChar() routine seems to be working. The start-up message displays onscreen.

Just trying to get keyboard input working ATM.

_________________
Robert Finch http://www.finitron.ca


Wed Jan 12, 2022 3:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Spent a chunk of time today debugging the keyboard. Finally figured out that there was another device placing data on the databus when a keyboard read occurred. The two values were wire or’d together. Now that the keyboard device is working, trying to use the keyboard with the monitor program still does not work. The scan-codes fetched from the keyboard look correct as the values are dumped to the screen, but when converted to ascii the values are incorrect. Displaying a character from the keyboard simply displays a space instead of the character it is supposed to display. Strangely, the carriage return key appears to work correctly. I suspect something is causing the issue in the scan-code converter routine. But I have not been able to trace why yet. CTRL-ALT-DEL works as expected.
Maybe someone with good eyes can spot the issue.
Code:
; KeyState2 variable bit meanings
;1176543210
; ||||||||+ = shift
; |||||||+- = alt
; ||||||+-- = control
; |||||+--- = numlock
; ||||+---- = capslock
; |||+----- = scrolllock
; ||+------ = <empty>
; |+------- =    "
; |         =    "
; |         =    "
; |         =    "
; +-------- = extended

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Debug versison of keyboard get routine.
;
; Parameters:
;      b:   0 = non blocking, otherwise blocking
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

DBGGetKey:
   pshs   x
   stb      KeybdBlock            ; save off blocking status
dbgk2:
   ldb      KeybdBlock
   pshs   b
   bsr      KeybdGetStatus
   andb   #$80                     ; is key available?
   puls   b
   bne      dbgk1                     ; branch if key
   tstb                              ; block?
   bne      dbgk2                     ; If no key and blocking - loop
   ldd      #-1                        ; return -1 if no block and no key
   puls   x,pc
dbgk1:
   bsr      KeybdGetScancode
;   lbsr   DispByteAsHex
   ; Make sure there is a small delay between scancode reads
   ldx      #20
dbgk3:
   dex
   bne      dbgk3
   ; switch on scan code
   cmpb   #SC_KEYUP
   bne      dbgk4
   clr      KeyState1               ; make KeyState1 = -1
   neg      KeyState1
   bra      dbgk2                     ; loop back
dbgk4:
   cmpb   #SC_EXTEND
   bne      dbgk5
   lda      KeyState2
   ora      #$800
   sta      KeyState2
   bra      dbgk2
dbgk5:
   cmpb   #SC_CTRL
   bne      dbgkNotCtrl
   tst      KeyState1
   bmi      dbgk7
   lda      KeyState2
   ora      #4
   sta      KeyState2
   bra      dbgk8
dbgk7:
   lda      KeyState2
   anda   #~4
   sta      KeyState2
dbgk8:
   clr      KeyState1
   bra      dbgk2
dbgkNotCtrl:
   cmpb   #SC_RSHIFT
   bne      dbgkNotRshift
   tst      KeyState1
   bmi      dbgk9
   lda      KeyState2
   ora      #1
   sta      KeyState2
   bra      dbgk10
dbgk9:
   lda      KeyState2
   anda   #~1
   sta      KeyState2
dbgk10:
   clr      KeyState1
   bra      dbgk2
dbgkNotRshift:
   cmpb   #SC_NUMLOCK
   bne      dbgkNotNumlock
   lda      KeyState2
   eora   #16
   sta      KeyState2
   lda      KeyLED
   eora   #2
   sta      KeyLED
   tfr      a,b
   clra
   bsr      KeybdSetLED
   bra      dbgk2
dbgkNotNumlock:
   cmpb   #SC_CAPSLOCK
   bne      dbgkNotCapslock
   lda      KeyState2
   eora   #32
   sta      KeyState2
   lda      KeyLED
   eora   #4
   sta      KeyLED
   tfr      a,b
   clra
   bsr      KeybdSetLED
   bra      dbgk2
dbgkNotCapslock:
   cmpb   #SC_SCROLLLOCK
   bne      dbgkNotScrolllock
   lda      KeyState2
   eora   #64
   sta      KeyState2
   lda      KeyLED
   eora   #1
   sta      KeyLED
   tfr      a,b
   clra
   bsr      KeybdSetLED
   bra      dbgk2
dbgkNotScrolllock:
   cmpb   #SC_ALT
   bne      dbgkNotAlt
   tst      KeyState1
   bmi      dbgk11
   lda      KeyState2
   ora      #2
   sta      KeyState2
   bra      dbgk12
dbgk11:
   lda      KeyState2
   anda   #~2
   sta      KeyState2
dbgk12:
   clr      KeyState1
   bra      dbgk2
dbgkNotAlt:
   tst      KeyState1
   beq      dbgk13
   clr      KeyState1
   bra      dbgk2
dbgk13:
   lda      KeyState2      ; Check for CTRL-ALT-DEL
   anda   #6
   cmpa   #6
   bne      dbgk14
   cmpb   #SC_DEL   
   bne      dbgk14
   jmp      [$FFFFFC]      ; jump to NMI vector
dbgk14:
   tst      KeyState2      ; extended code?
   bpl      dbgk15
   lda      KeyState2
   anda   #$7FF
   sta      KeyState2
   ldx      #keybdExtendedCodes
   bra      dbgk18
dbgk15:
   lda      KeyState2      ; Is CTRL down?
   bita   #4
   beq      dbgk16
   ldx      #keybdControlCodes
   bra      dbgk18
dbgk16:
   bita   #1               ; Is shift down?
   beq      dbgk17
   ldx      #shiftedScanCodes
   bra      dbgk18
dbgk17:
   ldx      #unshiftedScanCodes
dbgk18:
   abx                        ; index into table is scancode in accb
   ldb      ,x               ; load accb with ascii from table
   clra
   puls   x,pc            ; and return

_________________
Robert Finch http://www.finitron.ca


Fri Jan 14, 2022 4:57 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I found the error in the keyboard routine about five minutes after I posted the code. Keyboard input works now.

Got the uart core up and going for 12-bit operation. It can transmit or receive data in 5 to 12 bits format and remains 6551 compatible.

There are several 12-bit bus peripherals now. Keyboard, Uart, Text controller, sprite controller and bitmap controller.
Building a 12-bit eco-system.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 15, 2022 4:16 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Working on interrupts today. They almost work. The system hangs when the second interrupt occurs. I had to build a special circuit to detect the bad address of the second interrupt. <- The adjustment of the stack pointer during an interrupt call was off by one. The system hangs now because the input/output vectors are getting overwritten. But it only occurs if interrupts are enabled. It is strange because if the system is sitting idle interrupts can occur just fine. It appears to hang only when returning to the monitor after executing a monitor command.

Modified the keyboard routines. Getkey() may now either get a character directly from the keyboard device or less directly from a scancode buffer. The scancode buffer will be loaded by an interrupt driven interface.

Had to rearrange the boot code a little bit, the 16k ROM area is getting full.

For the NoC version, the code in the aging circuit was stripped out. Packets live forever now until they are processed. The issue was that when things were aged, a retry response was sent back to the requester for packets that were too old. This sort of worked, but caused a flood of retry requests and responses, ultimately hanging the network.

_________________
Robert Finch http://www.finitron.ca


Sun Jan 16, 2022 4:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Ran into an interesting issue today. Indirect addressing was not working properly. The first issue found and fixed was the detection of indirect addressing mode. The wrong bit of the indexing post-byte was being checked. With that fixed, things almost work. The first fetch using an indirect address worked, but the second fetch failed to fetch from the correct address. It turns out the indirect address is complementing the high order byte of the indirect address every other time it is used. I cannot figure out why this is happening. I can see it in the logic analyzer. Every other time the indirect address is used, the high order byte is complemented. It is looking to me like some sort of hardware glitch.

Found a further issue with outer indexing. The core was still using the eight-bit method of detecting outer indexes for the twelve-bit version.

Worked on a disassembler for the monitor program. It should handle the twelve-bit nature of the core. It uses a couple of tables with a more or less brute force approach. It could be improved.

_________________
Robert Finch http://www.finitron.ca


Mon Jan 17, 2022 4:24 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The core seems to run at 60MHz. It does not run reliably at 80Mhz although it can display the start-up message. At 60MHz the tools report timing is missed by a few ns. But it may be the case that the signals are not critical. The FPGA is likely somewhat faster than the minimum for the speed-grade.

I should mention I am working on two versions of the system. A single-core version and a network-on-chip version. The NoC version does not really work yet. It has been a lot of fun to debug. It gets as far as lighting up LEDs. The single-core version is obviously further along.

Adjusted the operation of the NIC for write cycles. They must now wait for an ack back before the cycle is complete. The issue is that when writes did not wait for an ack, they could end up writing in the wrong order because the packets might circulate around the network several times before getting the opportunity to write.
I suppose asynchronous writes could be provided but that would mean altering the instruction set somewhat. There would have to be a means to indicate the type of write cycle in the instruction set. Maybe an asynchronous prefix byte? Or maybe an IO port switch. If clearing the screen for instance, we do not care what order the writes take place in. Only that they are all done. This is true for many circumstances such as initializing a buffer.

I changed the text screen layout from 56x29 to 64x32. Gives a few more characters and has a better font.

_________________
Robert Finch http://www.finitron.ca


Tue Jan 18, 2022 5:11 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Maybe a single extra instruction would be enough: a write barrier.
https://en.m.wikipedia.org/wiki/Memory_barrier


Tue Jan 18, 2022 10:35 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 66 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software