View unanswered posts | View active topics It is currently Sat Apr 27, 2024 12:46 am



Reply to topic  [ 121 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next
 RTF64 processor 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
To get such kind of output for an instruction set you need non-flag-altering instructions as well as conditional moves or selects. as you seem to have.
Most ALU operations can optionally set the compare results register (flag register). Since a wide instruction encoding is in use there is a bit in the instruction to control this. It is important to have both kinds of instructions (flag setting and non-flag setting).

Loop mode is entered based on the comparison of the low order 32-bits of the branch target address and addresses in the pipeline. This was done to limit the size of comparators
used to detect a loop, reducing hardware. This means that a loop should not cross a 4GB boundary.

With a little fiddling of what is going on during states, the minimum CPI has been reduced to three when loop mode is active. This makes loop mode a whopping 40% faster than usual.

With adding loop mode and few other fixes, the size budget for the core has been blown.

Something is amiss in the FPGA version. It does not quite get the LEDs lighting. As can be seen from the logic analyzer it does startup. Burst loads the I$ and jumps to a subroutine at $400. Eventually it hangs.
Attachment:
File comment: RTF64 FPGA Startup
FPGAStartup.png
FPGAStartup.png [ 26.84 KiB | Viewed 931 times ]

_________________
Robert Finch http://www.finitron.ca


Sun Nov 08, 2020 4:44 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Found a ‘C’ compiler test suite on the web and ran the test programs through the compiler. Needless to say, there were many, many errors that surfaced. A lot of errors due to things I had not planned on supporting, but I guess are now a part of the C standard. For instance, variable declarations in the middle of code. This is now supported. Previously vars had to be declared at the top of a function. The long-long type. Arbitrary statements within a switch statement. Nameless structures. And the list goes on. One can do a lot of nonsensical things in C that are legal for the compiler but are almost never seen in real code.

I fixed up the preprocessor as well. Previously it did not support variable arguments lists for macros and it now does. The basic pre-processor was written over 25 years ago when memory was much more limited. As a result, it croaked on some gigantic macro expansions in the test programs. It previously had a limit of something like 5,000 chars, I bumped it up to 100,000. There was a macro that expanded out to about 20,000 chars in the test program. IMHO this is ridiculous.

An example of what needs to be updated in the compiler. For the following code the compiler outputs three times the required storage because it sees x declared three times. One probably would not write code like this on purpose.
Code:
 1   int x, x = 3, x;
     2   
     3   int
     4   main()
     5   {
     6      if (x != 3)
     7         return 0;
     8   
     9      x = 0;
    10      return x;
    11   }
    12   

To fix it requires complicating the compiler to record the generated storage location and reuse it.
The compiler successfully error-lessly compiles about 2/3 of the test suite. There are about 220 "programs" in the test suite.

_________________
Robert Finch http://www.finitron.ca


Tue Nov 10, 2020 4:17 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1783
Well I for one am surprised that first line is valid.

Personally, I'd be happy with an old-fashioned C compiler - C89 might be the thing - but then of course I'd need a testsuite for that particular version. It might be interesting to try a modern compiler on the testsuite with a -std=c89 or -ansi, and only match the passes and fails that you get from that.

But good on you for tackling the testsuite as you find it.


Tue Nov 10, 2020 7:14 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Rob,

Please can you provide the link to that 'C compiler suite", I would be interested to have a look at it.

Thanks


Tue Nov 10, 2020 7:22 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
BigEd wrote:
Well I for one am surprised that first line is valid.

Personally, I'd be happy with an old-fashioned C compiler - C89 might be the thing - but then of course I'd need a testsuite for that particular version. It might be interesting to try a modern compiler on the testsuite with a -std=c89 or -ansi, and only match the passes and fails that you get from that.

But good on you for tackling the testsuite as you find it.


Will modern hardware, let you run that thing? I got bitten by windows,
when I lost a bunch of software , when I got forced to use a 64 Bit OS.
(mostly because the help files did not work)


Another question are the libraries thread safe? kernal memory locking work?
As the cpu gets fewer bugs,I suspect muilti-tasking will be next be debuging project.
Ben.


Tue Nov 10, 2020 8:06 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Here is the link
https://github.com/c-testsuite/c-testsuite

I believe it is C11 testing which is a bit newer.

Quote:
Personally, I'd be happy with an old-fashioned C compiler - C89 might be the thing -

I would be happy if it is easy to port real software, regardless of whether the compiler is 100% C compatible. I wanna play Doom on my own cpu. Most of the issues with compatibility would not be encountered with most software. The CC64 compiler already has several extensions like bit selects that are not a part of C.

_________________
Robert Finch http://www.finitron.ca


Wed Nov 11, 2020 3:44 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The LEDs are now blinking and screen clears ! Running in the FPGA.
Now to update the software.

Running @57MHz with measured CPI of about 8.5. Its got to be at least 3-4 or more times faster than a 8MHz 68000.
Wondering if it will be able to run Doom.

_________________
Robert Finch http://www.finitron.ca


Wed Nov 11, 2020 7:28 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1783
robfinch wrote:
Here is the link
https://github.com/c-testsuite/c-testsuite

I believe it is C11 testing which is a bit newer.

I had a quick look: it looks like each test is tagged, so if you want to run the subset that's C89, or C99, you can. That's handy. (I also notice that clang passes all the tests, although other compilers have some fails, so that tells me that some non-trivial things are being tested.)


Wed Nov 11, 2020 8:45 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1783
robfinch wrote:
The LEDs are now blinking and screen clears ! Running in the FPGA.
...
Running @57MHz ...

Well done! That's certainly a milestone.


Wed Nov 11, 2020 8:46 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
It is always tiny steps that get things working.

The system works a tiny bit further now. Clears the screen and displays a single letter ‘A’ on-screen. It is supposed to display three ‘A’s. This is after a software fix in the assembler. The assembler was only outputting six of eight bytes for eight-byte instructions. The two high order bytes were always zero. This messed up some of the larger constants.

The delay routine had to be modified for 16x as long a delay. It was only counting to 300,000 that has been changed to 3,000,000 now. It completes in about 2 seconds. Completing the routine shows the processor can successfully call a routine and return from it. Some part of the stack memory must be working then along with call and ret instructions. This is working with pipeline operation mode; all the instructions for the loop fit in the pipe so there is no instruction fetches taking place.

Code:
FFFFFFFFFFFC0400 09 15 00 1B                       ldi         $a1,#3000000
FFFFFFFFFFFC0404 38 F5 16 00                 
FFFFFFFFFFFC0408 00 00 00 00                 
                           .0001:
FFFFFFFFFFFC040C 0C B6 42 09                       lsr         $a2,$a1,#16
FFFFFFFFFFFC0410 09 02 00 18                       stb         $a2,LEDS
FFFFFFFFFFFC0414 38 02 EE 7F                 
FFFFFFFFFFFC0418 FF FF FF FF                 
FFFFFFFFFFFC041C B8 16 00 01                 
FFFFFFFFFFFC0420 30 B5 FE                          sub.      $a1,$a1,#1
FFFFFFFFFFFC0423 47 A4 FF                          bne         .0001
FFFFFFFFFFFC0426 43 00                             rts



Forgot to implement the MVSEG instruction in the processor.

_________________
Robert Finch http://www.finitron.ca


Thu Nov 12, 2020 1:48 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The cause of the system hanging in the FPGA has been determined to be software. A number of subroutines did not get included in the build and that caused undefined addresses to be used, resulting in a hang.

Several software glitches have been corrected. For some reason when VS17 was uninstalled nmake which was being used to build the system no longer worked the same. nmake command were embedded in the makefiles themselves, and when run returned an ‘nmake not found’ error. This had worked previously. So, assuming some difference in the system I decided to spend some time looking at cmake again. After being unable to get cmake to build using a cross toolset, the nmake issue was examined more closely. It turns out that using $(MAKE) instead of the actual filename nmake in a makefile is better to do. Especially since it caused things to work again. The issue was outdated directory paths. A newer version of the VS was recently installed, outdating the directory paths setup in the makefiles.

Adding pipeline loop mode broke trace, or did it? Trace traces instructions from the IFETCH2 state which is never entered while in loop mode. I am not sure if this is a good or bad thing. It means not being able to trace through short loops, however that may be a desired behaviour. It is kind of pointless to validate every loop iteration if the loop is known to be working. Some loops contain thousand or millions of instructions. It is probably better if they are not fully traced. It is probably sufficient to know that the loop ran and terminated successfully. In any case skipping over loops after the first iteration is a feature at the moment.

_________________
Robert Finch http://www.finitron.ca


Fri Nov 13, 2020 5:41 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest fixes: the call and jump instructions implemented only the low order 22 bits. The instruction format supports 30 bits. The return instruction RET implemented the stack adjustment incorrectly. This led to the RET working, but the next stack operation hung the processor at an invalid address. On flow control ops the pipeline was being NOPped out but the register file was still being updated. This was switched to a single valid bit for each pipeline stage which controls the register updates. Memory access was triggered when a call instruction was decoded even though the pipeline stage was invalid. This caused the pipeline to hang.

Scraped the dust off the ole PTI component. PTI standing for Parallel Transfer Interface, which is a parallel port style interface driven by the USB port of the FPGA board. I had not used it for a while because of lack of room in the FPGA. But it allows transferring information. With a little bit more room in the FPGA I am going to try and get it working. It should make software updates and debugging easier. The PTI transfers data in a streamed fashion using DMA.

I built a little utility making use of vendor’s API’s that can send data from the host workstation through the PTI interface. Previously the interface could not be made to work very well because it was too fast for PIO. But since the last try the interface has been rewritten to use DMA.

On the software side, source code linking does not seem to work very well, it is missing subroutines. This has got me thinking about implementing a real linker and librarian.

_________________
Robert Finch http://www.finitron.ca


Sat Nov 14, 2020 5:21 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest fixes: flags results were not being calculated soon enough. This caused loops to loop one extra time. Bypassing did not check if values were being written to the register file. This caused false bypass operations to occur.

After noting that the signals advanced by the pipeline depend only on the pipeline advance signal and not on the state, the signal advancement was moved outside of the state case. This should save a logic gate or two for every signal and reduce the propagation delays.

Well, after updates to the core it now loops forever blinking LEDs instead of stopping after two seconds. It seems to work in simulation. An attempt will be made to run the system at a reduced clock rate to see if that helps.

_________________
Robert Finch http://www.finitron.ca


Sun Nov 15, 2020 4:06 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got the three 'A's to appear onscreen as expected.

Latest fixes: the Rs1 selection for the gcsub instruction had mixed up blocking versus non-blocking assigns due to a typo. This caused results forwarding to fail. Rs1 was not being set for the stack short forms of load and store instructions. This led to address zero being the target all the time.

Branch validation has moved from the m-stage to the e-stage one stage sooner. This reduces the number of wasted cycles for every mis-predicted branch.

Some work was done to reduce the number of clock cycles required for execution of stages. Register fetch stage was reduced to two cycles from three. Normally it takes two clock cycles to read the register file ram; however, by using an extra bypass mux one of the register files clock can be ignored. The same trick could be applied again making the register fetch stage single cycle if needed. The EX stage was reduced to two cycles from three in cases where the results flags do not need to be updated. The EX stage was split into three cycles to allow the results flags to be calculated after the result is calculated.

The goal now is a minimum CPI of two in pipeline loop mode.

_________________
Robert Finch http://www.finitron.ca


Mon Nov 16, 2020 6:10 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Last Milestone: four ‘A’s on screen

Latest fixes: result flags were not being updated for load operations. This resulted in improper loop termination in the PutString() function. Several functions were not adjusting the stack pointer before returning, the result would be returns to invalid addresses. In the SoC the PTI controller’s ack line was tied low causing the system to hang when the PTI controller was accessed. It had been tied low as the PTI controller was not present in an earlier version of the system. The debug printchar routine was not incrementing the cursor position causing all the characters to be piled up in one corner of the screen. Several routines were found not having the minimum alignment of four for target call addresses.

Ran into an interesting pipeline loop mode bug. There was a pipeline bubble inserted after a load instruction. The instruction in the bubble position, a repeat of the load instruction, was simply marked invalid not to be executed. The bug was that the next time around in loop mode the extra bubble instruction was still there, so there were then two load instructions and another bubble was required. This caused one of the instructions in the loop to be booted out, replace by a load instruction. Then guess what? On the next loop iteration, the same thing happened again, and another instruction got booted out of the loop, then there were four load instructions. Eventually all that was left in the loop were the load instructions and a branch back to the load instruction. This caused an infinite loop as the loop counter no longer changed, having been booted out of the loop. The solution was simple, convert the instruction in the bubble position to a NOP. That way the next time through the loop it is harmless.

_________________
Robert Finch http://www.finitron.ca


Tue Nov 17, 2020 5:33 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 121 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8, 9  Next

Who is online

Users browsing this forum: Applebot and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software