Last visit was: Sat Aug 03, 2024 7:38 pm It is currently Sat Aug 03, 2024 7:38 pm

 Page 2 of 11 [ 159 posts ] Go to page Previous  1, 2, 3, 4, 5 ... 11  Next
ANY-1
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2117
I was thinking along the same lines, that vector and scalar operations should be largely the same. Um, could I point out that the suggested instruction format has a ‘V’ indicator to indicate vector operation, and it uses the same opcode for vector or scalar ops. ADD with the ‘V’ bit clear is scalar and with the ‘V’ bit set is vector.

Dedicating a bit in the register spec for all registers to specify a vector or scalar register is a bit wasteful as most of the time one wants to perform operations with the same class of registers. Since many operations are commutative it may suffice to have an extra bit on only the ‘B’ operand. That would allow things like adding a scalar to a vector, or performing a vector logic operation with a scalar register. One does not normally want to perform a vector operation then place the result in a scalar register. Similarly, it is rare to perform a scalar operation and place the result in a vector register. There will be instructions for transferring results between vector and scalar registers.

Quote:
Take SLT (Set Less Than) for example….
There would be an instruction in the instruction set (V2BITS – vector to bits) to convert a vector to a scalar by copying the least significant bit of the vector to each bit in the scalar register. There would also be an instruction to go the other way (BITS2V). It may be worthwhile to have the compare instructions (SLT, SGT, …) place the result directly to a mask register. So the target register for a SLT instruction could be Vx, Sx, or Mx. I do not know how often one would want to compare a vector to a scalar, I would think it would be rare enough that it would be better to broadcast the scalar into another vector register then perform an ordinary vector compare. So, the ‘B’ operand bit could be used to indicate a scalar target in this case.

As to the -1, my own preference is for 1 and 0 to represent true and false. It is a proverbial choice, true and false can be represented either way equally well. I got used to -1 when I started programming in BASIC, then switched my preference later. Maybe the choice could be placed in a config parameter.
I find RISC-V a bit spartan when it comes to compare operations. I would like to see (SLT, SGT, SEQ, SNE, SLTU, CGTU) for immediate operations or (SLT, SGE, SLTU, SGEU, SEQ, and SNE) for register operations.

I am not sure how LE’s map onto logic cells, but the MRISC32 is about 13,000LE’s. for a 32-bit machine. Since this machine is 64-bit it will probably more than double, so a guess at size would be 30,000LE’s. This maps to roughly 87,000 Logic cells if I calc'd it right.

_________________
Robert Finch http://www.finitron.ca

Wed Jan 27, 2021 4:32 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Gotcha... yeah I guess it doesn't make sense to add vector, scalar, vector for example. A V bit on the opcode is a fine compromise.

The only reason he went with -1 for true was to be able to use it in bitwise operations more easily, namely 'and', but there's other ways to do that same thing. I don't think the MRISC32 has any mask registers. So if there was instructions that compared directly into the mask registers, I think that would save a vector op instead of having to compare to a vector, 'and' that vector to another vector, then do the masked op -- instead you can just compare to a mask register and then do the masked op right away.

As for a full complement of compare instructions, we could provide the rest as pseudo instructions in the assembler. I want to do this for my processor because I am trying to fit the instruction into 16 bits. But with 64 bit instructions there's plenty of room, it's not that much hardware to have a full set, and if you're looking at disassembled code (say, in a debugging interface), a full set of compares makes it a lot more obvious what's happening.

But, if I might make a suggestion? I am likely to never own a 200T device. Even a 100T is a bit too pricey for me given I am a novice, and by the time my skill levels up to 100T I am hoping there will be better dev boards available. I might buy a ulx3s with 84k luts, mainly because there's a lot of existing software. So if it was configurable for 84k lut-4, 100k lut-6 and 200k lut-6 -- probably by configuring the length of the vector registers -- that would make it a lot more accessible. There could be an instruction to reset the vector length register to the max, and the ability to subtract that from a scalar register to help programs work on any vector length available.

Wed Jan 27, 2021 12:41 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1791
(Quick aside on FPGA dev boards: Arrow DECA Max 10 FPGA development board with 50k LUTs offered for \$37 (Promo))

Wed Jan 27, 2021 3:07 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2117
Quote:
I don't think the MRISC32 has any mask registers.
I believe he dedicated one of the scalar registers as the mask register. The issue with doing that is that the vector length is then restricted to the size of the scalar register. RISC-V vector extension uses one of the vector registers for the mask. The issue with that is that another read port is required on the vector register file. x86 AVX extensions has eight mask registers.

Quote:
But, if I might make a suggestion? I am likely to never own a 200T device. Even a 100T is a bit too pricey for me given I am a novice, and by the time my skill levels up to 100T
I think the length of the vector registers does not make a huge difference in the size of the core. (Assuming the registers are implemented with block ram). It is the width of the datapath processed and amount of parallel processing taking place. Also an instruction / data cache made up of LUT rams can use a lot of LUTs.
I would also really like to see the core fit into <100k LUTs even <50k LUTs if possible. I am just trying not to set unrealistic expectations. Reducing the width of the datapath to 32-bits would make the core considerably smaller. How it is configured may make a lot of difference. Is it going to process multiple elements in parallel? Having 256-bit wide ALU’s and busses to support processing four elements at once will make the core much larger. One thought for simplicity is to process all the elements at once using a 4096-bit wide data path. Obviously, such an approach would be far too large. It would be nice to be able to configure the number of elements processed.
Quote:
(Quick aside on FPGA dev boards: Arrow DECA Max 10 FPGA development board with 50k LUTs offered for \$37 (Promo))
That looks like quite a deal. I might just buy one.
I have had trouble with Altera software licensing in the past. So I am hesitant to buy Altera, but I hear they have very good software.

I bought one of these boards (via ebay) last year (\$80): (QMTech xc7A100T-2)
https://www.aliexpress.com/i/4000170003 ... 6da7iOyzwJ
I have not put it to use yet so I can not say how well it works. It is basically the chip and a dram, so anything else needs to be wired up. It also needs a power supply and programming cable and those can be a bit pricey too.

_________________
Robert Finch http://www.finitron.ca

Thu Jan 28, 2021 3:33 am

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
robfinch wrote:
That looks like quite a deal. I might just buy one.

I succumbed to the temptation yesterday and bought a couple to play with.

Quote:
I have had trouble with Altera software licensing in the past. So I am hesitant to buy Altera, but I hear they have very good software.

My only complaint with the software is that if you need to support multiple generations of devices you might need to keep more than one version installed. I still need to build for Cyclone III, so I need Quartus 13.1 for that, as well as a newer version for Max10, Cyclone IV, V and 10LP.

(Plus the free version produces time-limited cores if you use NIOS or some of the other higher-end IPs - but for open designs that's not a problem.)

Quote:
I bought one of these boards (via ebay) last year (\$80): (QMTech xc7A100T-2)
https://www.aliexpress.com/i/4000170003 ... 6da7iOyzwJ
I have not put it to use yet so I can not say how well it works. It is basically the chip and a dram, so anything else needs to be wired up. It also needs a power supply and programming cable and those can be a bit pricey too.

The QMTech boards are nice - I have the 55KLE Cyclone IV board. I especially like the standardised form factor and pin spacing, so if your project outgrows the FPGA you can easily migrate to a more capable board in the range.

Thu Jan 28, 2021 3:21 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 620
One thing keeping my from buy new FPGA stuff is I don't have SDRAM module for them.
I need to write my own and SDRAM is a pain in the ass to interface, and the doc's are confusing.
Ben.

Thu Jan 28, 2021 6:27 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2117
The docs for ANY-1 have been updated.

It looks like there are a lot of instructions (60+ pages and growing) but really there are fewer basic ones. It is just extra mnemonics to help with the encoding.

For vector loads and stores the precision of the load / store is currently coming from a precision register rather than being encoded in the instruction. There is lots of room in the instruction to encode the precision, and I think it is better if it is encoded in the instruction. However, if load / store formats are kept the same as scalar formats then there would be unused bits in the scalar load / stores.
I think the precision could be handled like the rounding mode. values 0 to 6 specify byte, wyde, tetra, etc. 7 specifies to get the precision from the precision register.

_________________
Robert Finch http://www.finitron.ca

Fri Jan 29, 2021 3:39 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1791
oldben wrote:
One thing keeping my from buy new FPGA stuff is I don't have SDRAM module for them.

Fri Jan 29, 2021 8:47 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2117
The docs for ANY-1 have been updated.

Did not change very much today. Fleshed out the vector load instructions, including instruction formats.

_________________
Robert Finch http://www.finitron.ca

Sun Jan 31, 2021 3:49 am

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
Where is the doc?
Are you going to write a software simulator to test the ISA?

I am doing something similar (software simulator), but for a super simple ISA.

Sun Jan 31, 2021 4:58 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2117
The ANY-1 repository is at:
https://github.com/robfinch/ANY-1

Quote:
Are you going to write a software simulator to test the ISA?

At some point there will probably be a software simulator.

What language is your simulator written in?

_________________
Robert Finch http://www.finitron.ca

Sun Jan 31, 2021 7:16 pm

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
It's written in pure C/89.
I'd like to rewrite it in Go.

Sun Jan 31, 2021 7:57 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 620
I keep thinking of GO the game, some strange cellular automation with a strange screen display flashing.

Sun Jan 31, 2021 8:45 pm

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
You are right, I had better say "Go-lang"

Sun Jan 31, 2021 10:00 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
I wrote my simulator in Go. Would recommend. Just, resist the urge to use goroutines and channels, there be dragons.

Tue Feb 02, 2021 2:39 am
 Page 2 of 11 [ 159 posts ] Go to page Previous  1, 2, 3, 4, 5 ... 11  Next

Who is online

Users browsing this forum: No registered users and 0 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

 Jump to:  Select a forum ------------------ General Discussions Newbies Software    General programming    Languages and tools    Kernels and operating systems Hardware    Hardware in general    CPU/MCU choices and designs    Implementation and Construction Programmable logic Simulation and emulation Nostalgia Projects Anycpu.org