View unanswered posts | View active topics It is currently Thu Mar 28, 2024 4:49 pm



Reply to topic  [ 138 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 10  Next
 FISA64 - 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Hi Rob, please could you sketch out the resource utilisation for this dual CPU setup? Pasting a bit of a Xilinx report would be fine - but also knowing which FPGA you're on, and which dev board, would be interesting.
I added an html page and a .GIF file with core design summary information to GitHub.
I'm using a nice shiny new Nexys4ddr board from Digilent. It has an Artix7 xc7a100t-1csg324 FPGA on it.

Some stats:

For the entire dual core test system(numerous other devices besides cpu):
Number of Slice LUTs
38,801 out of 63,400 (61%)

Number of occupied Slices
12,584 out of 15,850 (79%)

For the core (not the entire system)
Minimum period: 15.423ns (Maximum Frequency: 64.838MHz) *
* This is with the 64x64 multiplier affecting the rating.
The core is actually a fair bit faster (80MHz? IIRC) with the multiplier setup as a multi-cycle operation.
The core is about 12,000 LUTs (19200 LC's).

However, I had trouble getting it to work at 25MHz with the entire system put together, so it's running at 16.67MHz at the moment.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 25, 2015 1:16 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The test system is currently getting bus errors with dual cores. It runs for a little bit then hangs with a bus timeout. (The bus error signal/routine works) I believe it's the memory controller which I modified to support holding onto the bus so that some operations could be done as atomic memory operations. (compare and swap, increment, and a couple of others). I need to write a test bench for the memory controller I guess.
I wrote a little more code in the interrupt routine and *poof* bus errors resulted.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 25, 2015 1:29 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The test system is working much better at the moment so I took a snapshot of the bitfile and saved it on Github. In case anyone would like to try it on their Nexys4ddr board.

The assembler's also been updated and now supports a "message" directive which helps identify problems where the assembler hangs up because it finds something it can't assemble. It really needs better error checking.

Load word and reserve (LWAR) and store word clear reservation (SWCR) instructions have been added to the instruction set. It occurred to me that the memory system might not allow for atomic updates, so then what does one do to implement semaphores ? These types of instructions were developed for RISC processors which didn't support complex read-modify-write instructions. LWAR and SWCR are similar to the linked load and store operations on a MIPS processor. They are also found on a PowerPC processor. In the case of this system, the memory system itself is what supports the address reservations. All the load and store instructions do is set an output port bit to high from the processor while the instruction is running. It's up the memory system to make use of them.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 25, 2015 7:20 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks for the resource summary Rob - I see the core alone is 4x bigger than an LX9, so this is a serious core indeed!


Wed Mar 25, 2015 7:26 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Thanks for the resource summary Rob - I see the core alone is 4x bigger than an LX9, so this is a serious core indeed!

Well, I wasn't shooting for small size. It's at least 2-3 times larger than minimalist 64 bit RISC processor would be, add another factor of two or more times larger than a 32 bit RISC. The core does a lot of things, some to enhance performance like indirect memory jumps and branch prediction. It has somewhat superfluous operations like bitfield instructions and rotates. It also detects things like privilege violations and bus errors. Given it's large size, it still easily fits into the new FPGA. More features yet like floating point could be added. Something like the RISC-V core with its minimal integer instruction set would probably be much smaller (and faster). The FISA64 core probably losses 20-30% in clock cycle time because of superfluous instructions. I just happen to like them for programming, that's all. For instance the RTS instruction isn't really necessary, it can be done with combination of LW, JAL and ADD instructions. The core also supports unaligned memory accesses *poof* double the size of the bus interfacing. I like the unaligned memory access support for structure packing for software.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 25, 2015 11:31 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Hi Rob, no slight intended! You have a serious FPGA and a serious CPU to go in it.


Thu Mar 26, 2015 10:03 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The following bit of code in an interrupt routine fails. The code attempts to set a semaphore before further entry into the interrupt code. However, it always takes that path that updates the missed_ticks rather than flowing into the interrupt routine. I believe it's either the update of the control register bit, or the readback of that bit by the mfspr instruction. In the current system it should always be able to set the semaphore.
Code:
   lwar    r1,tcb_sema
   bne     r1,.0006
   swcr    tr,tcb_sema
   mfspr   r1,cr0
   and     r1,r1,#$1000000000 ; bit 36 indicates successful swcr
   bne     r1,.0005
.0006:
   inc      missed_ticks
   pop     r2
   pop     r1
   rti
.0005:

FISA64 is turning into more of a software project now. Most of the integer instruction set has been determined.

_________________
Robert Finch http://www.finitron.ca


Fri Mar 27, 2015 7:27 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Well I found about 1/2 dozen bugs in both hardware and software within about five minutes of my last post. I forgot to flag the LWAR / SWCR instructions as multi-cycle operations causing the processor to work incorrectly. I also forgot to gate the LWAR result onto the databus. It's all fixed now, and the code seems to work as it should. At least it runs the larger portion of the interrupt routine now.
Today work was done getting FMTK (Finitron Multi-Tasking Kernel) to run in the test system as built into the boot rom. It's not really necessary as it's possible to bootstrap from the serial port. I hope to get something like an FTP client running soon.
I gave some more thought as to how to integrate an external debug facility onto the processor. I can't see how to single step the processor other than to single step the clock. Since the processor has an overlapped pipeline multiple instructions are affected at the same time by a clock pulse. Stopping the clock in the middle of a multi-cycle operation could result in a bus error timeout. There's lots to consider anyways.

_________________
Robert Finch http://www.finitron.ca


Sat Mar 28, 2015 12:45 am
Profile WWW
User avatar

Joined: Fri Jan 10, 2014 9:46 pm
Posts: 37
[removed due to user inquiry]


Sat Mar 28, 2015 11:58 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
i think you are talking about this project, i can't see sources.

I placed it under the "Bootrom" http://github.com/robfinch/Cores/blob/master/FISA64/trunk/software/bootrom/FMTK.ssource directory for FISA64.

Tonight's problem is driving me crazy. The test system quits with a "priv fault" privilege violation error. The problem is that it shouldn't because the processor is currently always in kernel mode which has access to everything. It looks to me like a real hardware problem of some sort. In the processor itself, the HDL is coded so that the processor only ever switches to kernel mode, it never switches to user mode. In order to get to user mode, a software program must clear the kernel mode bit in a register, then execute an interrupt or exception return instruction. This is by design. As far as I can tell this isn't being done by any software. The priv fault does occur on instructions that are protected with a privilege test. So it's not random. Somehow the "km" bit in the processor ends up being reset to zero.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 29, 2015 12:00 am
Profile WWW
User avatar

Joined: Fri Jan 10, 2014 9:46 pm
Posts: 37
[removed due to user inquiry]


Sun Mar 29, 2015 7:37 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
i see C files and assembly files, no Makefile. Do you have a C compiler for the target processor ?
I see there is a sort of "port" from a 68K C compiler. Does it work ?

There is a C compiler for the target processor. I haven't actually gotten a lot of compiled code to work, so I don't want to dare to say it's working, but it looks that way. I'm currently working on a debugger written in 'C'. The first thing I'm attempting is a disassembler which I haven't got to work yet. Calls to perform I/O aren't working properly. The program can call the BIOS putstring function correctly, but not the putchar function. This is strange because they are both processed in an identical fashion.
Quote:
Also, it's not clear to the kernel design, if you have SMP or AMP. I do not understand how the 2 cores are connected.

Currently the design is asymmetrical in the sense that CPU#1 doesn't have access to I/O. Both CPU's are of the same type and so can potentially run the same code. The design is symmetrical in that sense. The plan is to have I/O done through a system call. Once that happens it will be possible to write software that either core could execute. The implementation of the system call will be different for each processor. But it'll hide the differences.
Each CPU now has it's own interrupt controller. The system tick period was doubled to 30Hz. A separate 30Hz pulse interrupts each CPU. The pulses for each CPU are phase shifted by 180 degrees, so that the CPU's don't both try and process a clock tick at the same time. The net effect is that the Tick routine gets processed at a 60Hz rate (30 Hz by each CPU). There's no savings in cache access to have both CPU's process a tick at the same time, as there is no data cache, and no shared cache memory. I suppose I could draw a diagram.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 29, 2015 4:35 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
putstr2 works correctly, and putchar doesn't appear to work. They both call the BIOS in an identical fashion. It may however be something to do with printf() now that I tihink about it.

Code:
pascal void putstr2(char *p)
{
    asm {
        push    r6
        lw      r1,24[bp]
        ldi     r6,#15   ; BIOS DisplayString16 function
        sys     #10
        pop     r6
    }
}


Code:
pascal void putch(char ch)
{
   asm {
        push    r6
      lw      r1,24[bp]
      ldi     r6,#5    ; OutChar function
        sys     #10
        ldi     r1,#$1234
        sc      r1,LEDS
        pop     r6
   }
}


In the disassem20 routine the putstr2 works! :D but the printf() doesn't :cry: It just displays the first character 'D' then hangs.
Code:
void disassem20(unsigned int ad)
{
    int nn;
    unsigned int ad1;

    putstr2("Disassem:\r\n");
    printf("Disassem:\r\n");
    getchar();
    for (nn = 0; nn < 20; nn++) {
        disassem(&ad);
    }
}

_________________
Robert Finch http://www.finitron.ca


Sun Mar 29, 2015 4:48 pm
Profile WWW
User avatar

Joined: Fri Jan 10, 2014 9:46 pm
Posts: 37
[removed due to user inquiry]


Sun Mar 29, 2015 4:50 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
which working scheme ? GDB ? any TAP ?

I would like to mimic the GDB user interface, but I've not used GDB before so I might get it wrong. I was looking at the GDB site and there is just a mountain of information to try and absorb. I'd have to admit TAP isn't a high priority for me right now. But I added a debug processor (dbg16) to the design. It's a low performance 16 bit RISC processor. I had the core coded already for another project (Table887). It takes tons of clock cycles to execute instructions as it's not got an overlapped pipeline, and the pipeline's about seven or eight stages long. It has access to a rom, a shared debug memory and a serial port. I need to add a PIO yet so it can control things. I added a TAP interrupt to FISA64 which will work like a breakpoint interrupt. None of this works yet, but there's a good chunk of it coded now. It isn't posted on Github yet.

I got the disassembler written in C basically working. There's a couple of bugs in it I don't understand yet. The one branch displacement display isn't correct and I don't know why. The assembler code looks correct, so I suppose it could be a processor bug yet.

_________________
Robert Finch http://www.finitron.ca


Mon Mar 30, 2015 2:18 pm
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 138 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 10  Next

Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software