View unanswered posts | View active topics It is currently Sat Apr 27, 2024 8:35 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 47, 48, 49, 50, 51, 52  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Put the head pointers and the queue valid signals into modules. Breaking up the mainline into more smaller modules, it now contains about 30 modules. The tools are better able to optimize a compartmentalized core.

_________________
Robert Finch http://www.finitron.ca


Sat Jun 17, 2023 9:30 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Ran the core for 20us in sim and got an IPC of 1.2. Then I realized that something was wrong with the calculation because it was too good. The count of instructions committed during the cycle was not being reset to zero. So, it always incremented the instruction count every clock cycle even if no instructions were committed. The real IPC turned out to be 0.093 or about 10.8 clocks per instruction. Given that loads and store take about 30 clocks averaging 10.8 means a lot of the load and store time is hidden. Next may be to get the data cache working better.

The region table was configured for no read or write caching. The core works at the lowest cache level it finds when translating an address so no read or write caching for data was occurring. The region table was updated to allow read caching with write-through stores.

Ported the rgb2dvi core from VHDL to System Verilog to avoid an error message having to do with mixed languages and structure variables.

_________________
Robert Finch http://www.finitron.ca


Sun Jun 18, 2023 3:24 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
When I created the reset micro-code I forgot to shift the reset address in the boot ROM 12 bits to the left to account for the micro-code ip. This causes the PC to be loaded with the incorrect value causing a crash right at reset. The reset micro-code works like the 68k. It loads the stack pointer and program counter from fixed addresses in memory. In this case though it uses the highest memory address rather than the lowest.

If the PC value is in a register, it includes the micro-code ip in the lowest 12-bits of the register. But, instructions containing displacements reference relative to the upper bits of the PC not including the micro-code ip.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 22, 2023 7:07 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Tried switching resets to synchronous resets from asynchronous. The core seems to run fine in sim, but not in an FPGA. The issue with asynchronous resets is that everything may not reset at exactly the same instant. I tried also delaying the startup micro-code by about six clock cycles to give other components a chance to reset. I suspect a bad route in the FPGA is the issue though.

It is not the FPGA. Functional simulation run on the workstation matches the FPGA results. There is something in the synthesis build. It looks like the PC register is being shifted left twelve times at some point. The design is so complex. So, to try and break things down, the PC register updates were all extracted from the ifetch code and placed into their own module. Hopefully it will be easier to debug.

Spent some time working on a 64-bit 68000.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 23, 2023 4:09 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Breaking down the ifetch some more, a new module to track the micro_ir, the instruction register value of the macro instruction was created. Still not having much luck with after synthesis results. The PC register gets loaded correctly on reset, but the first transition is to an invalid address. Where the invalid address is coming from is a mystery. Now tried removing the PC update task and putting the code inline. Issues with the implementation of tasks have cropped up before. There may be a hidden coding issue there.

_________________
Robert Finch http://www.finitron.ca


Sat Jun 24, 2023 3:59 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Finally figured out what was causing the PC to transition to an invalid address. The excmiss signal which indicates exception processing is to take place, was active all the time. This caused the PC to be read from the kernel vector register which was not initialized yet. The excmiss signal was not active during simulation causing simulation and synthesis to differ. Simulation defaulted it low, and synthesis defaulted it high. Also the oddball commit was always active causing exception processing to take place even for invalid queue entries. With a couple of fixes LEDs are lighting up.

Milestone: LEDs lighting up on the FPGA. Reset got far enough to light the LEDs. Which means the stack pointer had to be loaded and the PC loaded from vectors in memory. The micro-coded reset routine worked. Crashed a few instructions later. Next will be to see if screen output is possible.

Goal is to get software loaded via the serial port.

_________________
Robert Finch http://www.finitron.ca


Sun Jun 25, 2023 1:59 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Modified the core so that generally stores do not wait for an ack back. Waiting for an ack back is now an option with indexed stores only. This greatly improves performance, turning many stores into a single cycle operation.

In FPGA testing every other byte of the screen was cleared. In the 128 to 64 bridge, address bit 3 was not being recreated which meant only 128-bit aligned addresses were recognized and sent to the text controller.

_________________
Robert Finch http://www.finitron.ca


Mon Jun 26, 2023 4:20 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Milestone: Got the first 30 chars of the text screen to clear before a crash. To the correct colors even.

Some of the timing in the scratchpad / ROM memory needed to be adjusted.
The core is working much faster now. It has reached at IPC of 0.56 in a tight memory store loop or about 1.7 clocks per instruction. 4256 instructions done in 7565 clock cycles. That time is including the startup time and cache loads. It should improve with longer runs.

_________________
Robert Finch http://www.finitron.ca


Wed Jun 28, 2023 5:05 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
Any stats on average clocks for data,stack and program memory access?


Wed Jun 28, 2023 6:41 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Any stats on average clocks for data,stack and program memory access?
Program memory access is easy: two instructions per clock are fetched, assuming the instructions are in the cache. Other stats are harder to come by and it is a little early yet for measurements. But, stores should be taking slightly more than 1 clock per store. Loads are very much slower because they need to wait for an okay signal before proceeding. I have been measuring things using the marker bars in the simulator. For instance, 2700 ns for 10 iterations of a loop is 270 ns per iteration at 50 ns per clock is 5.4 clocks to execute the loop. The branchback signal pulses for every loop iteration and it looks to be about 5 clocks. I think it is four instructions in the loop.

Here is a sample of what I am looking at. There are several screens of bus signals.
Attachment:
File comment: timing_measurements
measure.png
measure.png [ 96.23 KiB | Viewed 7269 times ]


Access time depends on the device being accessed. The bootrom, scratchpad and text display are fast. DRAM is slower.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 29, 2023 5:17 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Forgot to include code for the shift instructions again. Seems like whenever I work on a machine I always forget the shift instructions. Including full barrel shifters for shifts and rotates added a chunk of logic to the core. Considering now limiting the shift amount to 0 to 7 for instance.

The subtract instruction was not encoded correctly by the assembler. Leading to infinite loops.

Figured out why only the first 30 chars of the screen were updated. The entry for the screen address in the TLB was being updated unexpectedly causing the physical address to change so it no longer pointed to the screen. Instead, it pointed to the boot ROM causing it to be overwritten.

Milestone: full clearscreen. Blue screen now visible.
Milestone: first subroutine call performed successfully.

First trial of serial port access: no indication that it works at all. Not even garbage characters coming through.

A postfix instruction was not marked as done when queued causing the machine to hang. I am trying to figure out why this particular instruction was not marked as done. Postfixes are marked done as soon as they queue. As a stopgap added postfix instructions as ALU operations that do not produce a result. That way if one misses getting marked done, it is executed with no effect and will not cause the machine to hang.

A long sequence of instructions where every other instruction was a postfix caused a postfix fault. This fault is supposed to happen only for a long sequence of postfixes. With intervening instructions it should not happen. The issue was with queuing of instruction pairs.

The register file valid flags were not always begin updated correctly. This had to do with two different sizes used for the source id. This caused the core to hang waiting for an update from an invalid source.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 29, 2023 5:19 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Completely re-wrote the data cache controller which also performs non-cached access.

Simplified the logic for an ack in the main module. The dcache controller now always returns an ack pulse to indicate it is ready for new data. Previously it did not ack stores; however, it turned out that some stores went missing because the cache controller could not keep up with the CPUs requests. Between the cache controller and the CPU there are always acks, between the cache controller and memory there may not be acks for stores.

For some reason simplifying the logic increased its size. With all the fixes the FPGA is sitting at 89% full.

I declared a task variable called ‘ndx’. Unfortunately, there was also a module variable called ‘ndx’ and the task variable did not hide the module variable. This caused all sorts of issues with the dcache controller.

The return-and-deallocate, RTD, instruction was using the pc instead of the stack pointer for stack pointer updates. This led to a corrupt stack pointer.

Multiply and divide were not fully implemented causing a hang when a multiply instruction was encountered.

_________________
Robert Finch http://www.finitron.ca


Sat Jul 01, 2023 6:42 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
Rather than shift by 0, would it be better to byte swap for liitle and big endian data?


Sat Jul 01, 2023 3:16 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Rather than shift by 0, would it be better to byte swap for liitle and big endian data?
I thought about shifting by 1 to 8, with the zero code being 8. But it would require more muxes than just shifting 0 to 7.
The byte map, BMAP, instruction can permute bytes, or be used to swap endian.

_________________
Robert Finch http://www.finitron.ca


Sun Jul 02, 2023 3:27 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Spent some time trying to get serial IO happening. Had to load up a demo project to determine which COM port was the one connected to the FPGA board. After identifying the COM port testing the serial port from the FPGA did not work. It turns out the receiver and transmitter were enabled using a defunct bus signal. So, that was fixed, more testing pending. Worked on Xmodem routines in the meantime.

The core is hanging on a return to an invalid address during the display of a character. A push or pop operation is not working correctly. This is hit after executing thousands of instructions including pushes and pops. My feeling is that this is going to be one of those hard-to-find bugs as opposed to the immediate and obvious.

The ATOM instruction did not have a functional unit assigned to it causing the core to hang when the instruction was encountered. ATOM is now queued as a NOP. ATOM is special in that it operates immediately at the fetch stage, so by the time it queues it is done.

_________________
Robert Finch http://www.finitron.ca


Sun Jul 02, 2023 3:29 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 47, 48, 49, 50, 51, 52  Next

Who is online

Users browsing this forum: Google [Bot] and 103 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software