Last visit was: Wed Dec 01, 2021 6:53 pm
It is currently Wed Dec 01, 2021 6:53 pm



 [ 33 posts ]  Go to page Previous  1, 2, 3  Next
 Noc - Network on chip 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
Taking the advice of members on the board network speed has been increased. The network has been changed to byte wide serial with an 8x clock. That makes it about 16x faster than previous. (6.25MB/s) or 50Mbit/s. There is still some room for it to go faster if needed. Data is transferred in packets with a leading zero byte, sixteen data bytes, and a trailing byte of all ones.
Several errors were identified and fixed in the router component and the connection grid.

_________________
Robert Finch http://www.finitron.ca


Wed Jun 14, 2017 5:13 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
The capability to follow fixed routes was added to the router. Then it was removed as it made the router too big causing P & R to fail. Everything added to a node component is multiplied by 64 for resource usage since there are 64 nodes.

A bug was found in the router causing it to retransmit the same packet over and over again. A flag bit needed to be reset.

The node status display still doesn't work. Possibly the network isn't working in the FPGA.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 15, 2017 6:14 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
At long last, some success. A program to ping all the nodes in the network worked. The biggest problem encountered was a comparison for the node number which compared only the three LSB's of the number rather than all four LSB's. This caused row and column number 8 of the nodes to be missed, so nothing was able to travel past row/column 7.
The network had to change to four bits wide at a 4x clock rate, from eight bits at 8x clock. The clock rate was doubled and the number of bits transferred per clock halved, so the overall bit/byte rate remained the same. Going with a higher speed clock and narrower bus reduced the size of the design. There was trouble getting place and route to work with the wider bus.
Once more than 100,000 LUTs are used the device couldn't P & R even though there are 136,000 LUTs available. I found this for a few designs now. I wonder if it's a general rule. A device is needed that is about 25 to 30% larger than the design is, otherwise the tools have a heck of a time during P & R.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 16, 2017 4:07 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1647
Congratulations on the success, Rob! I'm sure you are pushing FPGAs a lot further than anyone else here!


Fri Jun 16, 2017 7:51 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
Trying to get keyboard character input working. It’s complicated by the fact that the keyboard controller is connected on a different node ($21) than the node that has the input focus ($11). There doesn’t seem to be any keystroke messages transmitted so I setup the node with the keyboard controller to transmit a continuous stream of ‘A’s rather than wait for a keypress. The ‘A’s are not being picked up by the node with the input focus. Not sure if it’s at the transmitting end or receiving end.

I’ve setup node $21 to handle all tactile input devices (keyboard, buttons, switches). Node $11 is setup to handle text video output and LED output. Node $11 is also the master node for Tiny Basic. I plan on using different nodes to support different I/O devices. This avoids a large input multiplexer. It also distributes the workload of handling I/O devices to improve system performance. This allows the I/O device to be handled with polling. It also means there has to be different software ROMs for different nodes. I'm wondering if it would be better to have smaller ROMs and more RAM and push the device drivers out to the nodes that need them from a root node.

And it turns out that node $21 does send a continuous stream of keystroke messages to node $11. The router was set to receive all messages and the messages dumped to the screen. However, the message payload is wrong. It’s all zeros instead of containing the character ‘A’. This likely filled the keyboard buffer with null characters at the receiver end which would be ignored.

I really should do more research on parallel computer operating systems rather than arriving there ad-hoc.

_________________
Robert Finch http://www.finitron.ca


Sun Jun 18, 2017 3:06 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
2017/06/18
Code was added to the router to filter out messages with an illegal message destination. In some cases a message of all zeros was appearing probably from a fifo during reset. This zero message would circulate around the network because it would never match any destination. It would eventually age and disappear.

The message age field was changed to a time-to-live field operating in the same manner as TCP/IP. However the field size is only six bits that allows a maximum of 63 hops. Since the grid is 8x8 it should never take more than about 20 hops for a message to reach a destination. Six bits are processed more efficiently in the FPGA than eight bits.

A TCP/IP protocol for handling a local grid computer is desired. The grid computer may eventually contain hundreds of nodes. It is undesirable to use IP addresses for each node in the computer. Instead a single local IP address would be assigned to the whole computer. An applications layer would take care of accessing the grid’s nodes.

It is desirable to have some Ethernet connectivity for downloading software and data to the grid computer.
The opencores.org ethmac core was modified to make it accessible to an eight bit data bus instead of a 32 bit bus. Node $31 was dedicated to handling Ethernet traffic.

_________________
Robert Finch http://www.finitron.ca


Mon Jun 19, 2017 4:59 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
2017/06/19
Logic was added to support routing between grids. The router now has an optional Z route in addition to X and Y. Only a handful of nodes support Z routing. One column of nodes supports Z routing so on a Z miss the router routes along the X direction until it hits a column that can transmit in the Z direction.
The clocking of the transceivers in the router was changed to use a clock separate from the cpu bus clock in order to hopefully allow a much higher clock rate. The system hasn't been built yet so the following is somewhat hypothetical, but should work. X,Y routing messages will now travel at 133Mbp/s. Z messages travel 100 Mb/s. The Z router is only 3 bits parallel so 9 channels could fit onto the grid router. The grid router being the router for the whole grid. There are physically only three differential pairs of wires for HDMI data so three channels are fit onto each wire pair.
The X,Y routers are 4 bits parallel. The Z router bits are GCR encoded from 3 bits to 4 bits for transmission. One gotcha was that the serializers don’t support the number of bits (12) needed to encode the channel data. It was either 10 or 14. So 14 bits are used to transmit 12 bits of data.
The interface between grids will use HDMI ports. The HDMI port supports 9 channels at 100 Mbit/sec each for a total of 900 Mbps. The grid transceiver can actually support up to 2,400 Mb/s but the channels don’t’ feed it fast enough. The grid to grid interface isn’t test yet.

_________________
Robert Finch http://www.finitron.ca


Tue Jun 20, 2017 8:54 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
2017/06/22
Read the spec. (Something I’m telling myself).

It finally dawned on me that the 133Mb/s grid interconnect network couldn’t be transferred to the grid router at that speed because a 7x clock is required and the grid router was based around a 400MHz clock. What to do ? I didn't want to try and build a transmission rate changing system. The max clock spec’d for the DDR outputs is 475 MHz. 400 MHz is a reasonable trade-off because it’s easy to generate in the clock generator, so that’s not changing. Though 480MHz Is looking really tempting. Two other parameters can change. The clock multiplication factor and the grid interconnect clock. Changing the multiplication factor to 4x and the interconnect clock to 100MHz allows two three bit parallel channels per wire to be transferred. Six channels in all.

Still haven’t managed to get keyboard input working. But several software problems have been found. In the meantime a real-time clock and bitmap controller have been added to the system. Access to main memory is through the bitmap controller which has it's own special network node. It already has a pixel accelerate which can read and write memory, so was easy to modify to support general memory read and write operations.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 23, 2017 11:59 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1647
Hi Rob
just to clarify your earlier post on Z routing: does this mean you now support a 3D arrangment of nodes, but rather than a fully connected 3D mesh, you have a stack of 2D plates and a backplane? That is, one column of nodes on each level is able to route in the Z direction.


Fri Jun 23, 2017 12:04 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
Quote:
just to clarify your earlier post on Z routing: does this mean you now support a 3D arrangment of nodes, but rather than a fully connected 3D mesh, you have a stack of 2D plates and a backplane? That is, one column of nodes on each level is able to route in the Z direction.
Yes that is correct. In theory any node can still reach any other node. It's just the number of wires and channel bandwidth between boards is limited. There is a trade-off in the number of channels supported between grids. Fewer faster channels or more slower ones. I have to get more actual hardware and shelving to test it though. I had to reduce the number of nodes to an 8x7 grid rather than 8x8.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 23, 2017 12:58 pm WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 206
Location: Huntsville, AL
Rob:

You wouldn't have a diagram of the 3D grid routing you are working on. Just interested in how you are approaching this problem.

_________________
Michael A.


Sat Jun 24, 2017 3:19 am

Joined: Tue Dec 31, 2013 2:01 am
Posts: 112
Location: Sacramento, CA, United States
Is it possible that it looks something like this? [Please excuse the ugly free-hand]

Attachment:
noc.jpg


Each "sheet" is 8 x 7 nodes, at least in my imagination.

Mike B.


You do not have the required permissions to view the files attached to this post.


Sat Jun 24, 2017 6:38 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
Quote:
Is it possible that it looks something like this? [Please excuse the ugly free-hand]
That's a decent diagram.
There's some sort of problem with the grid computer. Not all nodes are responding to ping requests. According to the ping response nodes $21,$25,$26,$27,$31,$53,$45,$55 don't respond. It might be a software problem with nodes $21 or $31 as they have different software, but the remaining nodes are all generic. I'm going to try reducing the operating frequency a bit to see if that helps. I wouldn't mind if some of the nodes didn't work but $21 is keyboard input. Maybe I should try moving it to node $51.
Picture of grid computer master node screen. the red characters at the right hand side of the screen indicate nodes responding to ping requests. The green character indicates Tiny Basic is scanning for an input character. Sorry about the angle on the picture.


You do not have the required permissions to view the files attached to this post.

_________________
Robert Finch http://www.finitron.ca


Last edited by robfinch on Tue Jun 27, 2017 5:52 pm, edited 1 time in total.



Sat Jun 24, 2017 10:52 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
2017/06/24
Reducing the clock frequency of the nodes was tried and made matters worse. Increasing the clock frequency also made matters worse. The master node was able to run at 66MHz. An 80MHz clock may be tried next.

And it turns out: the ping routine was pinging the wrong set of nodes. It pinged row 8 which had been chopped off, instead of column 8. This would result in cluttering up the network with ping messages unable to reach the destination and so aging away. Once the extra ping messages were removed from the network all nodes except $21 and $31 responded to the ping.

2017/06/25
Changed the routers from asynchronous to synchronous operation. This reduced the complexity slightly and allows for high speed operation. And, doesn’t work. Spent most of the day trying to get a simple synchronous interface working. It appears to work in sim, but no ping responses when loaded into the FPGA. So, back to the async interface.

_________________
Robert Finch http://www.finitron.ca


Mon Jun 26, 2017 3:09 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1531
Location: Canada
The grid-to-grid communications modules have been re-written to work like an rgb-to-dvi and dvi-to-rgb converters. That means synchronization pulses are inserted into the data stream. This was deemed necessary to recover received data reliably. This also was to make use of the rgb2dvi and dvi2rgb code supplied by the board vendor. Sending the data out isn’t too bad but receiving the data is fairly complex. The code was modified so that 36 bit data is transferred instead of 24 bit rgb data. 36 bits allow 9, 4 bit channels to be transferred.

One thing that may cause a problem is the large blanking interval for vertical sync. So a WXGA sync generator was modified for use as a sync generator to generate shorter vertical sync and blanking pulses. The transmitter fifo’s might become full before there’s a chance to transmit. The fifo’s were made large (511 entries) to help compensate. There’s about 168 transmit slots unavailable for use due to vertical blanking. About 10 transmit slots are used for horizontal blanking.

What is a transmit slot ? It is a strip of 40 pixel clock cycles during which 128 bits of data for nine different channels are transmitted. Each pixel clock transfers 36 bit data. 128 bits of data could be transferred in only 32 clock cycles except that the fifo needs a couple of clock cycles before data is available. Rather than have 34 clocks per data strip, 40 was chosen because it fits evenly into the horizontal display time.

_________________
Robert Finch http://www.finitron.ca


Tue Jun 27, 2017 9:52 am WWW
 [ 33 posts ]  Go to page Previous  1, 2, 3  Next

Who is online

Users browsing this forum: CCBot and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software