AnyCPU http://anycpu.org/forum/ |
|
DSD7 http://anycpu.org/forum/viewtopic.php?f=23&t=331 |
Page 6 of 7 |
Author: | robfinch [ Fri Mar 03, 2017 5:45 am ] |
Post subject: | Re: DSD7 |
I'm stuck in the water. The toolset refuses to run. It keeps telling me it needs to install the Visual C++ 12.0 redistributable. I've tried repairing, uninstalling, installing, and rebooting, no luck. Obviously there is a variable or key setting that is corrupt. This is too much trouble for what amounts to a toy. So I'm going to move onto a different hobby for now. Attachment: vc2012err.png |
Author: | BigEd [ Fri Mar 03, 2017 9:02 am ] |
Post subject: | Re: DSD7 |
:unhappy face: :boggle eyes: :roll eyes: :stick of dynamite: |
Author: | robfinch [ Tue May 02, 2017 4:41 am ] |
Post subject: | Re: DSD7 - DSD9 |
I've managed to get past the redistributable error by uninstalling additional versions of the toolset and reinstalling. The One Page Computer challenge has actually spurred me to work on DSD9/DSD7 again. The code has been ported to a working FPGA board. I'm currently waiting for the design to route which might take up to 4.5 hours. I really want to get to the point where software can be run so I can experiment with mobile programs. |
Author: | BigEd [ Tue May 02, 2017 6:16 am ] |
Post subject: | Re: DSD7 |
Glad to hear you're back in the saddle. |
Author: | robfinch [ Fri May 05, 2017 3:17 am ] |
Post subject: | Re: DSD7 |
2017/05/04 Added plain JMP and CALL instructions to DSD9. These allow a transfer directly to a 32 bit target address (The upper bits of the PC are set to zero). Previously a jump or call required specifying a register which was summed with an immediate to form the target address. This allowed for many types of jump instructions, but most of the time the register specified is R0. Got DSD9 to the point where it clears the screen. It was locking up on an access to DRAM because I renamed the ram acknowledge signal and left it unconnected when I ported the code. This caused it to be defaulted to always inactive. I happen to know that the RET instruction isn't the proper one so that's what's probably causing a problem now. RET at the end of the clearscreen. |
Author: | Arlet [ Fri May 05, 2017 6:11 pm ] |
Post subject: | Re: DSD7 - DSD9 |
robfinch wrote: The One Page Computer challenge has actually spurred me to work on DSD9/DSD7 again. You're thinking of fitting it in one page ? |
Author: | robfinch [ Sat May 06, 2017 10:08 am ] |
Post subject: | Re: DSD7 |
It might fit on a microfiche page. |
Author: | robfinch [ Mon May 08, 2017 2:30 pm ] |
Post subject: | Re: DSD7 |
2017/05/07 Been working on MMU code for DSD9. DSD9 uses a simple paged mmu where the page tables are included in the mmu itself rather than in main memory. Allocating and freeing mmu pages is made tricky by the use of 4MB pages in addition to 64kB pages. To manage 64kB pages a FAT style table of 16 bit ints is used. There are 8192 table entries corresponding to an address range of 512 MB. However to manage the 4MB pages a bitmap of included 64kB pages is maintained with a 128 entry table. There are 64, 64kB pages in each 4MB page. For 4MB allocations a simple linear search of the table is performed. Memmanagement vars. Code: // There are 1024 pages in each map. In the normal 64k page size that means a max of // 64Mib of memory for an app. Since there is 512MB ram in the system that equates // to 8192 x 64k pages. #define NPAGES 8192 private __int16 pam[NPAGES]; // page allocation map (links like a DOS FAT) // There are 128, 4MB pages in the system. Each 4MB page is composed of 64 64kb pages. private int pam4mb[NPAGES/64]; // 4MB page allocation map (bit for each 64k page) private __int16 freelist; int syspages; // number of pages reserved at the start for the system int sys_pages_available; // number of available pages in the system int sys_4mbpages_available; Allocate a 64k page. Also sets a bit for 4MB page container. Code: pascal int alloc_sys_page() { int sb, pg4; if (freelist < syspages || freelist >= NPAGES) return 0xffff; sb = freelist; freelist = pam[freelist]; sys_pages_available--; pg4 = (sb >> 6); pam4mb[pg4] |= (1 << (sb & 63)); return sb; } |
Author: | robfinch [ Thu May 11, 2017 10:11 am ] |
Post subject: | Re: DSD7 - DSD9 |
Movement on the DSD9 front! The system is now able to get to the BIOS main menu after executing a couple hundred lines of assembler code. Impressive as it's executing instructions from an L1,L2 cache combo. Pressing a button does flow into the correct switch case statement, but then after that the system hangs. For instance selecting the ramtest button causes the "RAM test" title to display along with the first address. I noticed I have interrupts enabled while executing BIOSMain(), so I'm going to try disabling them as they are not tested yet. The core is probably about 7,000 slices. It contains an 80 bit datapath with a plethora of instructions. (When built with other SoC components the total system size is about 9,200 slices). Code: void BIOSMain() { float pi = 3.1415926535897932384626; float a,b; int btn; int seln=0; DBGAttr = 0x087FC00;//0b0000_1000_0111_1111_1100_0000_0000; DBGClearScreen(); DBGHomeCursor(); DBGDisplayString(" DSD9 Bios Started\r\n"); DBGDisplayString(" Menu\r\n up = ramtest\r\n left = float test\r\n right=TinyBasic\r\n"); forever { //0b0000_1000_0111_1111_1100_0000_0000; //0b1111_1111_1000_0100_0000_0000_0000; btn = GetButton(); switch(btn) { case BTNU: while(GetButton()); ramtest(); break; case BTNL: while(GetButton()); FloatTest(); break; case BTNR: while(GetButton()); asm { jmp TinyBasicDSD9 }; break; } } } |
Author: | BigEd [ Thu May 11, 2017 10:19 am ] |
Post subject: | Re: DSD7 |
Excellent! Is there any special technique you used to make your caches on FPGA? What's their timing, in terms of where the clock boundaries are? |
Author: | Arlet [ Thu May 11, 2017 10:29 am ] |
Post subject: | Re: DSD7 |
What kind of device/board are you using ? |
Author: | robfinch [ Thu May 11, 2017 12:23 pm ] |
Post subject: | Re: DSD7 |
Quote: What kind of device/board are you using ? Quote: Excellent! Is there any special technique you used to make your caches on FPGA? What's their timing, in terms of where the clock boundaries are? There are a couple of techniques used for the cache. The current implementation may not be the best possible but it seems to work. Splitting the cache up allows for more tweaking options. The caches operate in a synchronous fashion at the rising edge of the clock. All transfers take place on the positive clock edge. I’ve had difficulty in the past getting the toolset to co-operate with a ½ width clock timing because of the use of inverted clock signals, so I’ve stuck with a positive edge only arrangement. Block RAM resources in the FPGA have registered outputs. For better performance the outputs can be double-registered. This doesn’t work that well for an instruction cache. It creates a two cycle pipeline delay, and it would turn branch operations into a minimum three cycles even with prediction. It would also extend the length of the pipeline and make it more difficult to handle. An instruction aligner is needed at the output of the cache because the core uses 40 bit instructions. The solution used here is to break the cache up into two levels. The first level L1 cache is in distributed RAM which does not require registers on the outputs and is accessible in a single cycle. L1 is kept small (64 lines – 2kB) purposefully so that LUT memories don’t require cascading levels of LUTs. The L1 cache is connected to L2 with cache-line wide bus (256 bits) so that it can be loaded from L2 in only two clock cycles on a cache miss. The L2 cache is much larger (16kB) than L1. The L2 cache interfaces through the core’s 128 bit wide data bus to main memory. Loading the L2 cache then takes only two memory bus cycles. However each cycle takes multiple clock cycles depending on the type of memory accessed. Accessing the system’s ROM takes about four clock cycles. Accessing DRAM may take eight or more cycles depending how much memory traffic is present. L2 cache memory access is also slowed down by a couple of clock cycles due to address translations through a mmu. The cache is controlled by a secondary state machine activated from the core’s primary state machine on a cache miss. Caching timing trace cyc,stb show an access to the system ROM which takes 9 cycles (4+1 dead + 4) to load a cache line Attachment: CacheTiming1.png |
Author: | BigEd [ Thu May 11, 2017 12:27 pm ] |
Post subject: | Re: DSD7 |
That's great information - thanks! |
Author: | robfinch [ Sun May 14, 2017 12:25 pm ] |
Post subject: | Re: DSD7 |
I added L1, L2 cache hit and miss counters for performance measurement and changed the timing of the cache slightly to improve performance. And, now it doesn’t work. Busted back to the screen-clear stage instead of the BIOS menu. Noticed on an L2 miss was the fact the L1 cache line was being loaded twice due to a loop in the cache state machine. There’s loads of room for improvement yet. For instance the L2 cache doesn’t begin reading until after a miss in L1. In theory it could start reading before that, in sync with PC movements. This would shave a clock cycle off of every cache load on an L1 miss. The improvement would be modest as a cache load takes 20 to 30 clock cycles. A ToDo for the next version of the core. The character display / memory bug that I wasn’t able to figure out seems to have disappeared with a port of the core to a new FPGA board. |
Author: | robfinch [ Tue May 16, 2017 3:08 pm ] |
Post subject: | Re: DSD7 |
Well I changed something in DSD9 and now its back to the BIOS menu. Code is hardly stable at this point. Got the ram tester to work with a glitch. Working on converting the DSD9 icache to use cams rather than ordinary ram having seen the ARM3 cache design in this thread: http://anycpu.org/forum/viewtopic.php?f=3&t=379&sid=875740a6a8b0210ab637e7841ee098c8#p2503 To support these changes the L1 cache size is being reduced to only 1kB, however that allows it to be fully associative using only a single cam for tags. L2 cache will be 16 set, 32 way associative instead of direct mapped. The L2 cache remain 16kB in size. A diagram of the L2 cache looks like the ARM3's diagram except that there are 16 blocks to the cache rather than 4. Also high order address bits are used to select the set, rather than low order bits. |
Page 6 of 7 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |