View unanswered posts | View active topics It is currently Fri Dec 06, 2019 5:26 am



Reply to topic  [ 102 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
 DSD7 
Author Message

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 179
Location: Huntsville, AL
Rob:

Verilog blocking and non-blocking assignments are a constant source of confusion. From a synthesis perspective there's not too much of a difference between them. However, from a simulation perspective, there's a major difference between them. Incorrect use of these two assignment constructs frequently leads to differences between simulated behavior and synthesized behavior.

I've attached a great paper by Cliff Cummings that I used to develop my Verilog coding style when I was first learning Verilog and encountering differences in behavior between simulation and synthesis. I highly recommend the section of the attached paper that relates to the proper use of blocking (=) and non-blocking (<=) assignments. Mr. Cummings offers a number of coding rules that help prevent many of the types of simulation vs. synthesis behavioral differences that you described above.

I also highly recommend his company's website. He provides links to many of his papers on Verilog synthesis and simulation. He also provides papers on critical subjects such as clock domain crossing circuits.


Attachments:
File comment: Non-blocking Assignments in Verilog Synthesis, Coding Styles That Kill!
CummingsSNUG2000SJ_NBA.pdf [96.9 KiB]
Downloaded 225 times

_________________
Michael A.
Sat Dec 03, 2016 12:57 pm
Profile

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Using verilator as a simulation tool is also helpful, as it doesn't really use a typical simulation approach. Instead it "synthesizes" the verilog to C++ similar to how a synthesis tool would translate the verilog to hardware.


Sat Dec 03, 2016 1:38 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
Thanks for the reference. I spent some time last night reviewing.
I'll have to try verilator sometime.

But I think I understand how blocking and non-blocking assignments are supposed to work from the simple perspective.
I had coded something like this (mixing blocking and non-blocking assignments in the same always block and cascading blocking assignments):
Code:
always @(posedge clk)
begin
a = b;
c = a;
d <= c;
end

Unfortunately the value of 'c' didn't change until after the clock edge, as if it were a non-blocking assignment. It wasn't the same as 'b''s value. Causing the assignment to 'd' to be delayed by a clock cycle. Some of the blocking assignments actually worked. The one that didn't work was for a PC increment signal. Anyway I moved the combo logic out to a separate always block so the assignments weren't mixed (as recommended).


I got the text display to work a bit better by maintaining the cursor position and character attribute in the system’s scratchpad ram rather than main memory. This hints at a problem with main memory. But there’s a problem with the display that has me mystified. About ½ the screen appears blank. If I didn’t know better I’d say this looks like a addressing problem, except for the fact it’s exactly ½ the screen and the screen isn’t an even multiple of two wide. It is also the right half that’s blank. Not the top/bottom. If it was an addressing problem it should be jagged. The only other thing I can think of is that the number of columns displayed is getting corrupted somehow. I’m writing code to initialize the text controller’s registers in case the default values are being corrupted somehow.
The bootrom software runs all the way through what I’ve coded now, at least according to the status LED’s.
I fixed up the paging unit to register the outputs. This adds a cycle to memory access times. The paging unit acts a bit like a bridge now.

Since I can’t seem to get text out of the display I figured I’d try randomizing the display memory. But nothing is updated onscreen when the program runs. According to the seven segment display things are messed up.
The high order address bits are flipping around and the low order bits are a stable ‘00000’. This is opposite to the way it’s supposed to be. It should be a stable ‘FFD’ followed by flipping values.

Multiply / Divides weren’t setting the target register. Testing is so much fun 

_________________
Robert Finch http://www.finitron.ca


Sat Dec 03, 2016 8:24 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
Finally, some text output. Not perfect yet.
Indexed half-word store was storing the wrong register. This resulted in a garbage display for the hex print routine.
The compiler didn’t output the correct code for pointer additions. This caused a ramtest routine to fail. Adding a constant to a pointer is supposed to increase according to the size of the pointed to object. The spaghetti logic in expression parsing needed to be fixed.
Figured out the problem with main memory. The address to the memory controller needed to be shifted left once and the LSB set to zero because the memory controller accepts a byte address, and the core only generates wyde addresses. Main memory now passes the ramtest successfully.
Code:
void ramtest()
{
   int badcount = 0;
   int *p;
   
   DBGDisplayString("  RAM Test\r\n");
   for (p = (int *)0x10000; p < (int *)67108864; p+=2) {
      if ((p & 0xFFF)==0) {
         TwoSpaces();
         puthex((int)p);
         putch('\r');
      }
      p[0] = 0xAAAAAAAA;
      p[1] = 0x55555555;
   }
   for (p = (int *)0x10000; p < (int *)67108864; p+=2) {
      if ((p & 0xFFF)==0) {
         TwoSpaces();
         puthex((int)p);
         putch('\r');
      }
      if (p[0] != 0xAAAAAAAA) {
         badcount++;
         dumpaddr(p);
      }
      if (p[1] != 0x55555555) {
         badcount++;
         dumpaddr(p);
      }
      if (badcount > 10)
         break;
   }
   putch('\r');
   putch('\n');

   for (p = (int *)0x10000; p < (int *)67108864; p+=2) {
      if ((p & 0xFFF)==0) {
         TwoSpaces();
         puthex((int)p);
         putch('\r');
      }
      p[0] = 0x55555555;
      p[1] = 0xAAAAAAAA;
   }
   badcount = 0;
   for (p = (int *)0x10000; p < (int *)67108864; p+=2) {
      if ((p & 0xFFF)==0) {
         TwoSpaces();
         puthex((int)p);
         putch('\r');
      }
      if (p[0] != 0x55555555) {
         badcount++;
         dumpaddr(p);
      }
      if (p[1] != 0xAAAAAAAA) {
         badcount++;
         dumpaddr(p);
      }
      if (badcount > 10)
         break;
   }
}

_________________
Robert Finch http://www.finitron.ca


Sun Dec 04, 2016 5:04 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
Continuing the blogging....

2016/12/05

The float branches treated the register as if it were only a float single. This caused a compare to fail as the low order bits of the register were zero and it matched with a FBEQ.q instruction.
Float branches weren’t loading the operand registers.
PUSHI5 instruction failed to fetch the stack pointer for update.
The CLI/SEI (enable / disable) interrupt instructions had the opcodes reversed in the assembler. For the longest time I couldn’t figure out why interrupts weren’t happening and I finally decided to take a closer look at the software.
I thought there was a bizarre pipelining problem in processor causing the same routine to be called over and over again. It turned out to be software / UI related. On selecting the ramtest routine it would display the start message multiple times down the screen before starting the ramtest. It turns out the ramtest can be aborted by pressing a button. The button press to select the ramtest was still active, causing the ramtest to quickly abort and restart multiple times.
I haven't been able to get interrupts working yet.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 06, 2016 7:58 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1316
As a matter of interest, are you accumulating a series of short programs which test the various aspects of the machine, or one big program, or perhaps something else?


Tue Dec 06, 2016 8:08 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
Quote:
As a matter of interest, are you accumulating a series of short programs which test the various aspects of the machine, or one big program, or perhaps something else?

At the moment there's no linker or loader or OS. So everything is being built as one large program by the assembler which .includes everything. It is then used to set the contents of block RAM before synthesis. The program is made up of a number of separate files however. I suppose I could put some of the code on GitHub. The problem with bootstrapping the processing core is that by the time an OS can be run, it has to be mostly working already.

2016/12/06
Working on exception processing logic today.
Came up with a new register to control exception routing. That is some exceptions like arithmetic divide by zero can now be optionally routed to a local exception handler rather than being treated as a global exception.
A locally handled exception just jumps directly to the exception handler that’s part of a try / catch statement. It passes something called an __exception type (basically an integer code) to the catch handler.
IRQ’s are giving me trouble. The irq line to the core is activated – it’s displayed on a LED, but the processor isn’t jumping to the IRQ routine. When I manually create an IRQ via a button press the core hangs. So it does recognize the IRQ to some extent.
The IRQ table was defined at two different locations. The assembler spit out a warning that I didn’t notice. Vectors weren’t being set at the correct location.
The core now jumps to the IRQ routine but there’s something wrong in the IRQ routine with the fetching of jump vectors. So I bypassed this and just hard coded a jump to an IRQ routine. Well it worked. The interrupt jump / return themselves seem to work.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 06, 2016 9:12 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/06b
The cause code coming from the PIC was internally truncated by the PIC. The fifth bit was chopped off so the value 30 would come back as 14 for instance.
The RET instruction was updating the stack pointer from the wrong result bus. This caused the RET instruction to work only for routines where there were no parameters to pop off the stack. RET failed in routines with parameters and Pascal calling conventions.


Getting further, trying to display floating point numbers is a bit of a puzzle.
For pi the display is: 33119966338877
10+10 displays: 11..9999999999
10*10 displays 9999999999999
At first this looks like the values are way off, but it’s a display problem. Notice every other digit is correct and alternate digits are missing and the digits are doubled up. This looks for anything like a problem with a half-word store acting like a word store and overwriting previously stored characters.
Half-word stores place the same data on both halves of the databus, but then activate only the strobe line for the half of the databus that should be active.
The actual values are probably:
pi: 3.1415926535897
10+10: 19.9999999999
10*10: 99.9999999999

_________________
Robert Finch http://www.finitron.ca


Thu Dec 08, 2016 1:33 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/07
I’ve been working on the prtflt() routine which converts raw binary floats to displayable ascii. Unfortunately it’s dismally inaccurate for some values. I’ve studied a number of conversion routines available on the web, some good and some not so good. I think it's necessary to implement log10() and pow() functions to get better results. For instance 1.11111111111111111e+23 displays as 1.12E+23.
The doubled-up display problem was circumvented by storing the characters using word operations rather than half-word operations (buffer declared as int rather than char). The values do indeed display correctly. I haven't been able to identify yet where the mistake is with half-word memory operations.
In the process I’ve discovered that the compiler does not return float values from functions properly. So it needs some more work.

_________________
Robert Finch http://www.finitron.ca


Thu Dec 08, 2016 10:06 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/08
Fixed the compiler return value problem. Ported over the log10q() function from gcc, but it doesn’t seem to work. The result displays as zero for any log.
The precision of the FP unit is being reduced to 80 bits from 128. I can’t think of a good reason beyond academia for a 128 bit unit, and it’s wasteful of resources to support more. I want the FPGA resources for other things !
Who needs that kind of precision anyway ? I’d rather work on something more practical.
Got the precision downshifted to 80 bits. Running the quick FP test I got the same results as for 128 bits.
The compiler didn’t output the correct constants for negative values. It was treating a negative constant like a negative variable and outputting a NEG operation. This worked fine except for constants defined as initializers. The constant tables in the log() function weren’t correct. This was fixed and log10() still returns zero.

_________________
Robert Finch http://www.finitron.ca


Fri Dec 09, 2016 11:28 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/09
The timing decode for latency-of-one (LOO) operations was not decoding the correct field, leading to the delay before a FP op is done not being correct. The LOO ops results weren’t being placed on the FP data output bus. This affects ftoi and itof instructions.
The compiler wasn’t outputting good code to convert types between floats and ints.
log10() now returns 1.76E-4940 at least it's non-zero.
Vastly improved the accuracy of the prtflt() routine by removing the divides and the effect of most of the subtractions. The numbers seem to be accurate to at least 16 digits (double precision). The precision only supports about 20. prtflt() doesn't use the log10() or pow() functions ! However, it may be slow for some numbers with large exponents.

_________________
Robert Finch http://www.finitron.ca


Sat Dec 10, 2016 9:18 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1316
It's quite hard to get floating point conversions right - indeed it's hard to figure out what 'right' means. There's a good blog at
http://www.exploringbinary.com/properti ... ys-strtod/
if you like this kind of numerical analysis.

(Simple code which is nearly right almost all the time is probably good enough for hobby purposes, but whether it's good enough in an emotional sense is another question.)


Sat Dec 10, 2016 9:41 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/10
prtflt() is probably accurate enough now for my purposes.
Working on exceptions. Found the counter for the 30Hz interrupt wasn’t being incremented -> explains why there were no 30Hz interrupts occurring. Modified the core to use triple precision format to store the 80 bit floats rather than quad precision. The floats needed to be word sized to keep stack alignment proper. Since they were three words on stack, the same was done for other memory storage.
I still haven’t figured out the half-word store issue. I’m pretty sure it’s not to do with loads because strings are printed using half-word loads and that works okay.

Some routines for ftoa can be found here:
http://stackoverflow.com/questions/2302969/how-to-implement-char-ftoafloat-num-without-sprintf-library-function-I

_________________
Robert Finch http://www.finitron.ca


Sun Dec 11, 2016 8:45 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/11
Working on a TinyBasic interpreter for DSD7.
I tried ramping the clock frequency of the core up to 75 MHz (the speed of memory) from 25MHz and it didn’t work. Then I tried 50 (no luck) and finally 37.5 MHz. The core seems to work at 37.5MHz which is ½ the memory clock frequency. I must have missed a timing constraint because the tools say it should work at 75MHz. It does use both clock phases. I can’t do a whole lot more with the core right now without investing a ton of work to get an OS up and running. It needs an emulator before writing more sophisticated software, and I plain just don’t want to bother writing one. I’d rather work on the next core.

Having got a number of bugs worked out of DSD7, I started working on DSD9. I’m going to go for an 80/32/16/8 bits machine with 40/20 bit instructions. This is primarily to support the 80 bit floating point.
Working out the instruction addressing should be fun.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 13, 2016 12:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 968
Location: Canada
2016/12/12
DSD9 has 34 bits per instruction average using a combination of 40 and 20 bit instructions. But, I've since decided to scrap the 20 bit instructions and go with 24 bits instead.
Got most of it coded. And updated the assembler to support DSD9. Also did some work on the ‘C’ compiler.
I changed the data handling to 80/40/16/8. So it supports deci-bytes, penta-bytes, wydes, and bytes. The core doesn't have alignment restrictions. Data or code may be on any byte address. Bunches of code were stolen from the DSD7 core.

_________________
Robert Finch http://www.finitron.ca


Wed Dec 14, 2016 4:15 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 102 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software