Last visit was: Wed May 18, 2022 10:37 am
It is currently Wed May 18, 2022 10:37 am



 [ 15 posts ] 
 Neo-Retro-Computing And Moore's Law 
Author Message
User avatar

Joined: Sat Dec 25, 2021 6:49 pm
Posts: 6
Location: A magnetic field
I am greatly influenced by kc5tja's articles about Neo-Retro-Computing and Software Survivalism. GARTHWILSON's publication of these articles led me to the 6502 Forum and anycpu. I am in strong agreement with GARTHWILSON that simple systems may be quite powerful and that theoretical advances may be applied to historical systems.

We have long since passed the point where it is possible to render Jurassic Park at home and it is questionable how a good laptop may exceed the price of a good laptop from 10 years ago. While the price stays level, the functionality doubles every two years. Unfortunately, most of this is squandered on whizzy graphics. For example, increased processing power allows better pattern matching which allows video codecs to improve efficiency at the rate of 15% per year.

I argue this is the true rate of progress. While it is possible to populate one address line every other year, robust progress occurs at 1/3 of this rate. People complain that code bloat started in the 16 bit era. However, when measuring efficiency per transistor, Intel's 4004 was the most efficient design. This is Fred Brooks Jr.'s No Silver Bullet and Amdahl's law (diminishing returns). Merely switching from 4 bit bus to 8 bit bus fails to double efficiency. Since then we've seen Wirth's law (software bloat) as a lost cause. In 1979, a 6502 system might be 1MHz, 1KB RAM and no video. By 1985, a system of the same cost might be 3MHz, 128KB RAM and play PCM audio. However, there is a minor fraud here. The early systems were boot-strapped from toggle switches and were self-hosting. The latter systems weren't self-hosting. For example, Apple ProDOS3.3, Microsoft BASIC and Acorn BASIC Version 4 were all assembled on mini-computers. Computers with such firmware could not self-host. Applications were similar. Early ones were written in assembly. Latter ones were written in compiled languages, such as Pascal. That's why RAM bloated so much but processing power didn't keep pace.

This continued in the 16 bit era. People assume that x86 has good instruction density - and I suppose it does if you keep adding hundreds of instructions. However, source compatible 8080 applications are 20% larger on 8086. This was rarely noticed because the 8086 systems typically had at least 4x RAM and processing power.

This continues to present. The recommendation for compiling FreeBSD is to use a minimum of 48 4GHz cores and a minimum of 2GB RAM per core. That's a minimum of 96GB RAM and therefore 64 bit hardware is mandatory. I tried self-hosting GCC and LLVM on Raspberry Pi. I'm possibly the only person to succeed. After writing wrapper scripts around the binaries to force the most conservative parameters, GCC required more 600MB RAM over 40 hours and LLVM required more than 700MB over 36 hours. The default configurations fail to work within a 31 bit address-space. That's why 64 bit, quad core Raspberry Pi with 1GB RAM is regarded as "embedded" - not a self-hosting system. While it is welcome to have LLVM compile to 6502, it is overwhelmingly likely that it only works on 64 bit hardware.

I'd like to have a 64 bit extension of 6502 because it covers all cases. In particular, it would definitely run GCC and LLVM. However, it is increasingly preferable to have a self-hosting system which fits within 32 bit, 24 bit, 20 bit or 16 bit address-space. However, anything smaller than 24 bit may require an exponential amount of effort. CollapseOS on Z80 shows that it is possible to self-host within 14 bit address-space, however, the core system is a shell, a line editor and an assembler. None of these is large or requires a large kernel. How am I suppose to assemble 16KB ROM if the branch labels are 8x larger? How am I suppose to assemble anything larger? Well, Elite on BBC Micro was assembled in eight pieces. That must have been tedious.

Many of my early attempts at processor design were developed in ignorance of 16 bit mini-computers or extensions thereof. However, it is now quite obvious to me that I worked towards the systems which overcame obvious limitations. Indeed, if 16 bit mini-computers were not available to squeeze the last economies of scale from 8 bit computing then 16 bit mini-computers would have been developed as a necessity. We see variations of this to present. The GPT-3 neural network required a *horrendous* amount of resources during training. Energy consumption alone exceeded the typical energy consumption of a vehicle during its life cycle. However, the trained system requires less resources than a video player.

Without economies of scale, some of the homebrew 8 bit computers are now eclipsing the consumer 16 bit systems of the 1990s. They are quietly advancing at 15% per year - and are vastly cheaper than a gaming laptop or desktop. Moore's law still holds but 1/3 of the advancement is in functionality and 2/3 is price. This is significant because cheaper systems will overwhelm by quantity. For microcontrollers, sales volume of 32 bit systems and larger is almost equal to smaller systems. While sales of x86 and ARM have increased, sales of 8051 and AVR have also increased. This has been apparent for decades. In the 1990s, Sun Microsystems predicted that conventional laptop/desktop computers would only be 1/49 of all networked systems. In an era of smartphones and dual core light bulbs, that prediction might have been conservative.

The easiest way to scale performance is a loose cluster of cores. I believe this has been the case for the Top100 super-computers for more than 10 years. However, there is demand for maximum single-threaded performance because it allows old software to run faster with no refinement. Depending how you count, x86 has more than 1000 instructions. When Intel Xeon can dispatch up to five instructions per clock cycle, that's 10^15 instruction permutations; possibly working on 128 bit data. It is impossible to exhaustively test these systems for security guarantees - and only one failure makes a system insecure. So far, more than 23 classes of failure have been found. While most were found about two years ago, the most recent has been found within the last month or so.

In any programming language, there is one bug per 50 lines and one critical bug per 5000 lines. While it is possible to reduce bugs - or even estimate bugs - the remainder can be doozies. Unfortunately, hardware description language is not exempt from lines per bug and the proprietary processor designers/manufacturers have amassed *millions* of lines of flawed HDL. When mitigations for known flaws are added, an Intel processor runs at less than 3% of its maximum speed. That undoes more than five iterations of Moore's law. That's more than 10 years of illusory progress - rather than the robust progress advocated by kc5tja and GARTHWILSON. Actually, it is only 10 years, so far. The insecurity researchers aren't finished. We may find that 2/3 of the progress since 80586 only works in corner cases, such as not running arbitrary code on shared hosts.

Open source RISC won't save you. After fragmentation of operating system distributions, fragmentation of web browsers and fragmentation of user interfaces, we may incur fragmentation of instruction sets. So far, RISC-V has more than 10 optional extensions. Of these, the privilege scheme is stuck at Version 0.0. That works in a Western Digital harddisk but that's useless for anything which runs JavaScript. Arguably, JavaScript shouldn't be used anywhere and the security/efficiency is questionable. However, JavaScript is the only common language which runs on servers, desktops and a range of proprietary environments. With appropriate hooks, it is possible to implement a wide range of applications using DHTML (or equivalent) and zero arbitrary code. We seem to be taking the opposite approach of arbitrary code and no structure - and this is often to the detriment of minorities and our elders. This may get much worse. I had a horrible nightmare that Augmented Reality would be implemented with terabytes of geographic three dimensional CSS and JavaScript triggers. Unfortunately, my nightmare has not ended. Please suggest a better implementation because Microsoft Hololens, PhotoSynth, Minecraft and Quake's Binary Space Partitioning would be worse.

I'm not against the inevitable progress which comes from miniaturization - but we have to do it correctly on the first attempt. Hopefully, people will get wise and see that time spent on rounded corners and cutesy pastels is often contrary to a secure, robust system. In the 1920s, people got wise to any "phoney". In the 2020s, people execute arbitrary code on their phone and wonder how their bank account gets emptied.

We need a Plan B and this is most concisely achieved by pretending that the vastly majority of computing never happened. Many of us would like a hypothetical system which combines the best parts of a Commodore 64 and Windows XP. This has never been achieved. Specifically, it needs to be fast, relevant, easy to repair, wholly knowable, bend to a user's whims and never need a security update. After more than 200 Patch Tuesdays, no-one has a patched Windows computer. However, those cruddy little computers from the 1980s worked first time and never needed a security update. Perhaps we lost something along the way. Perhaps we can regain it.

_________________
Beep, Beep! I'm a sheep!


Tue Mar 22, 2022 1:59 pm
Online

Joined: Mon Oct 07, 2019 2:41 am
Posts: 317
a: Part of the computing problem, is that hardware and software need to sell a new system every few
years. This means adding new gimicks like cloud computing or all singing dancing apps to read a web page or view a video. With the old model you paid for hardware and software support instead.
b: Part of the problem is not GCC, but the C language standard and software models used.
PL/I could define variable length variables. FIXED BINARY (15,0) FIXED BINARY (31,0) . When it came
out in the MID 60's it was never expanded to use any sized Variables. Hardware and software never
got past this stage, that the led patching software and hardware for the new data types. C C++ K&R C99 C# Microsoft C to name a few.
c: The other problem is you still have that FLAT memory model that wastes space compared to a MODEL like
the 6502 that segments code,variable,stack and array type objects. Most programs are not
written with segments in mind. Segments like the stack and variables are short. Code midsized
and arrays large.
d: gui's are a resource hog.
Ben.
PS: stay "a sheep" after 3 then it Zzzz for Me. :)


Wed Mar 23, 2022 6:35 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1678
Location: Canada
Interesting diatribe. I think our understanding of computing systems has grown progressively, and as a consequence we have lost some of the simplicity. In order to achieve high levels of performance and reliability, things like virtual memory and encryption are required, and they add complexity to the systems. People desire to understand how systems are built and work because it gives one a feeling of control. When we do not know how things work we are not in control and have to have faith in the systems that someone else built instead. Faith and trust do not come easy for some people. There is a certain amount of value in knowing how something is built and works. But many of todays more complex systems are beyond what one could expect one person to know. I picked up a text on USB with the thought of learning enough about it to build my own USB components. But it is complicated compared to a simple serial port. It needs to be to get the performance out of the system.

It is possible to put together simple systems on one’s own. The issue is it is a bit unreasonable to put together a really complex system on one’s own. I read the F35 has 8 million lines of code. Who can write 8 million lines of code by themselves? And get it all right? Still we place our faith and trust in things others constructed. I drive a car, a lot of people do. I wonder how many LOC a modern car has?

_________________
Robert Finch http://www.finitron.ca


Thu Mar 24, 2022 4:14 am WWW
Online

Joined: Mon Oct 07, 2019 2:41 am
Posts: 317
I suspect most of said 8 million lines of code are for real world bugs and work arounds.
not like the 'teaching' operating systems like Minix or Xinu or a PDP 11 unix. Information
hiding is a big problem with todays hardware like USB, but you need the $$$ to play the new
computer game.
Ben.


Thu Mar 24, 2022 5:45 am

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 84
I can certainly identify both with the neo-retro and software survivalist mindsets.

The tragic irony is that much of the complexity that exists in modern systems actually came about due to (misguided?) attempts to make things simpler, by hiding the existing complexity behind multiple abstraction layers. Thus we end up with the current situation of websites and other systems built upon a teetering 30-foot-high Jenga-tower of dependencies that no-one fully understands, or even has a complete and accurate overview of - and which frequently fail in subtle ways. And when they do fall over, the error happens at some innermost core layer and doesn't get propagated back to the presentation layer - giving birth to the era of the "Oops, something went wrong" error message.

Self-hosting is certainly a worthy goal, and one that's going to become ever-more important as mainstream systems get more and more locked down (hello, Microsoft Pluton!) I hope within the next few years we'll see a board with a large FPGA supported by the open source toolflow, and with enough onboard RAM for a RISC-V setup to become entirely self-hosting.

In the meantime, in my own software-survivalist tinkerings, I'll gradually work towards having 832's software toolchain become self-hosting.


Fri Mar 25, 2022 6:12 pm
User avatar

Joined: Fri Oct 20, 2017 7:54 pm
Posts: 8
robinsonb5 wrote:
I hope within the next few years we'll see a board with a large FPGA supported by the open source toolflow, and with enough onboard RAM for a RISC-V setup to become entirely self-hosting.

That exactly is what I dream of.

There already is a Linux on ICE40UP5k (IceBreaker). Maybe a different Unix flavour cold be more efficient and if we have all aspects of the system under control from gate level upwards, the software only would need to cope with the hardware this system really has. Another front to get rid of lot of unused stuff.

A swarm of smaller FPGAs on a bus or grid might even be nicer to cope with and if they are equally addressed, that system even might reconfigure components incrementally. A hardware amoeba? What an adventure!

Still some miles to go...

It would need drastically more RAM than that UP5K project to run it's own FPGA toolchain and that's where my alarm bells are ringing. Icestorm already isn't like TinyCC and the toolchains for other chips are more like in the mullti Gigabytes range. Maybe sticking to Icestorm and many smaller FPGAs might help there again? Which cures against the complexity problem in the toolchains (especially the one for the FPGAs, smaller C compilers (PCC) still exist) are possible?

_________________
"Stay OmmmMMMmmmPtimistic!" — yeti
"Logic, my dear Zoe, merely enables one to be wrong with authority." — The 2nd Doctor
"Don't we all wait for SOMETHING-ELSE-1.0?" — yeti


Fri Mar 25, 2022 7:05 pm

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 15
robinsonb5 wrote:
Self-hosting is certainly a worthy goal, and one that's going to become ever-more important as mainstream systems get more and more locked down (hello, Microsoft Pluton!) I hope within the next few years we'll see a board with a large FPGA supported by the open source toolflow, and with enough onboard RAM for a RISC-V setup to become entirely self-hosting.


I think we're seeing a lot of "cross-pollination" (for lack of a better term!) here and on other forums in recent weeks (no bad thing IMO) - there is a minimalist computing group on facebook, and recent threads on the retrocomputing forum too...

However in this context, it exists today! It also existed 40+ years ago...

However what are your expectations... the Apple IIe running Appleworks was capable of being a business tool and self hosting things like Pascal and BASIC (also C). CP/M was (still is) "a thing".

Today - I'm just about to start experimenting with FPGAs and RISC-V is a nice CPU that's open and usable and fits inside many FPGAs - the toolsets are getting there and more and more open source support - or at least "community support" is there.

As for the self hosting bit - back to expectations again... My own retro thing is written in BCPL and is self hosting in that I can run the BCPL compiler on my current target (a W65x816 "16-bit 6502" system with 512KB of RAM) and using that I've written a command-line OS that supports multi-tasking, and an interface that's very much Unix shell-like - using that I wrote a RISC-V emulator and made my BCPL OS run under that and now I'm looking at real (as real as FPGA gets) hardware to move it over to.

Self hosting from scratch is hard though. I was lucky in that I had a compiler and more importantly a system I could compile the compiler on, so all I had to do was write the bytecode interpreter that the compiler produces... then write the editor, CLI, filesystem, terminal handler, memory allocator, etc., etc., etc. ... My to-do list has .... n-2: Write GUI, n-1: Make a box, n0: dig a hole ...

(also see: One Man Unix)

RISC-V has been an interesting journey of discovery for me - it seems that broadly speaking there are 2 classes of devices for it now - either small 32-bit systems with relatively low RAM (say 32KB) aimed at the "IoT" or embedded development market, or the other end of the scale, 1+ 64-bit cores, GB of RAM aimed at running Linux. There is just one device I've found that would fit my purposes and that's the ESP32-C3 device with 400KB of RAM. The FPGA I have is the Sipeed Tang Nano 9K which has 2MB of RAM, lots of flash, video output and some other widgetry to go. It has a proprietary IDE which does run on Linux, however there are community supported open source tools for it now too, so there is some hope.

FPGAs do seem to be full of proprietary stuff too. This one has some "binary blobs" needed for the high speed RAM and video for example. The ESP devices have proprietary interfaced to their on-board peripherals, and so on. If/when I have the ability, I'll look into making a "pure" RISC-V core with external memory interface - just like an old-school CPU

I would love to make my own CPU that runs the bytecode that BCPL compiles to directly, but for now I'll settle for writing the bytecode VM in assembler (and the RISC-V version was a joy to write compared to the 65816 one!) but I have to start my FPGA journey somewhere..

However, even if I did develop a nice little self-hosting system, would I always be wondering "what if"? What if I had more RAM, faster CPU, ways to connect CPUs together, networking, more graphics, ... I think I can actually say that I've been there and seen it happen and worked in it for the past 40+ years, time to go round that circle again?

Cheers,

-Gordon


Fri Mar 25, 2022 7:50 pm

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 84
yeti wrote:
A swarm of smaller FPGAs on a bus or grid might even be nicer to cope with and if they are equally addressed, that system even might reconfigure components incrementally. A hardware amoeba? What an adventure!


I think there were some crypto-mining rigs with multiple FPGAs in just such a grid - though since the interconnect didn't need to carry much information I think they had low-bandwidth links such as SPI.

Quote:
It would need drastically more RAM than that UP5K project to run it's own FPGA toolchain and that's where my alarm bells are ringing. Icestorm already isn't like TinyCC and the toolchains for other chips are more like in the mullti Gigabytes range.


Indeed - RAM is the bottleneck with current easily-available boards. The QMTech Kintex7 board, for instance, has a crazy powerful FPGA for the money, but "only" 256 megabytes of DDR3 - a similar board with, say, 4 gigabytes would suddenly make the whole idea more believable.

drogon wrote:
However what are your expectations... the Apple IIe running Appleworks was capable of being a business tool and self hosting things like Pascal and BASIC (also C). CP/M was (still is) "a thing".


My own expectation, or hope at least, is to have something cool I can tinker with, that I can still bring up from scratch in five years time, and not find some crucial piece of infrastructure isn't available any more. I develop cores for the Turbo Chameleon 64 and MiST boards - both with Cyclone III chips. Quartus 13.1 is no longer easily downloadable from the Intel webiste (unless you happen to know the URL for the actual downloads) and is no longer trivial to install and run on a modern Linux install. It fails the "5 year" test.
Similarly, the QMTech Kintex7 board I mentioned is supported by ISE Embedded Edition (free to use) - but ISE is also difficult to install on a modern system. It fails the "5 year" test. (I never actually succeeded in installing Lattice Diamond, though I have to admit to not spending much time trying.)

Quote:
Self hosting from scratch is hard though. I was lucky in that I had a compiler and more importantly a system I could compile the compiler on, so all I had to do was write the bytecode interpreter that the compiler produces... then write the editor, CLI, filesystem, terminal handler, memory allocator, etc., etc., etc. ... My to-do list has .... n-2: Write GUI, n-1: Make a box, n0: dig a hole ...


Hehe, yeah - as you say, far from trivial. For the software side of my project I wrote the assembler and linker completely from scratch, and wrote a backend for the VBCC C compiler. The whole software toolchain (except the on-chip debugger, which requires ncurses) can be compiled and run on AmigaOS, so I believe that given a suitable command-line OS / runtime, it could self-host.

Quote:
however there are community supported open source tools for it now too, so there is some hope.


The open-source tools are picking up momentum nicely - work's well underway on supporting 7-series Xilinx devices, and also Intel Cyclone V chips - as well as the Lattice chips already supported.

Quote:
FPGAs do seem to be full of proprietary stuff too. This one has some "binary blobs" needed for the high speed RAM and video for example. The ESP devices have proprietary interfaced to their on-board peripherals, and so on. If/when I have the ability, I'll look into making a "pure" RISC-V core with external memory interface - just like an old-school CPU


Indeed - and such blobs also raise awkward questions regarding licensing as applied to open-source HDL code. The situation is improving all the time, however. Did you see BrianHG's open-source DDR3 controller, posted at the EEVBlog forums a few months back (targetting MAX10 and Cyclone V)? That's a very interesting project.

Quote:
I think I can actually say that I've been there and seen it happen and worked in it for the past 40+ years, time to go round that circle again?


Would it be fun? If so, sure, why not? :)


Fri Mar 25, 2022 11:39 pm

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 15
robinsonb5 wrote:

Quote:
I think I can actually say that I've been there and seen it happen and worked in it for the past 40+ years, time to go round that circle again?


Would it be fun? If so, sure, why not? :)


Well, this time I am doing it for fun... Mostly because no-one is paying me to do it!

I think it's nice to re-imagine technology this way.

-Gordon


Mon Mar 28, 2022 4:15 pm
Online

Joined: Mon Oct 07, 2019 2:41 am
Posts: 317
I too have problems with FPGA's and few random thoughts.
With FPGA's you can't even get all the source, even if you pay for it.
I Have a ALTERA DE-1, and DDR RAM but no driver code. The DATA sheet says
'adjust phase lock loop until it reads correctly'. I want to drive it with a 4 phase
clock,but you can't get any useful timing of the FPGA or D-RAM.

Altera (now INTEL) just seems to want to sell FPGA ARM CORES with some tiny FPGA
logic around it, from the feel of the DE-1. I have been unhappy since FPGA's lost 5 I/O
and set/clear (both) of flip flops.


You can't even trust 8 bit micros's to stay small.
The 6502 has mutated to the trinary 5500. https://www.ternary-computing.com/

I got looking at TRINAY logic U(nknown) T F and it makes more sense than Boelan logic.
Any programing languages use that, something more useful in my mind than 15 bit graphic floating point extension for the QXA 123 INTEL-I-AM for the 986 graphics card.


Anyone can build a fast CPU. The trick is to build a fast system.

Seymour Cray


Mon Mar 28, 2022 6:35 pm

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 210
Location: Huntsville, AL
oldben:
oldben wrote:
I Have a ALTERA DE-1, and DDR RAM but no driver code. The DATA sheet says 'adjust phase lock loop until it reads correctly'.
Just curious. What driver were you expecting to be supplied by Intel / Altera?

The FPGA-based DDR interfaces I have had the opportunity to use, whether provided as soft-cores or as hard cores, have all been complete interfaces. They have not required any HDL / SW code on my part. I suspect that what Intel / Altera is asking for your logic to provide is a "trained" clock signal that places the DDR sample points at the "optimal" points in the eye pattern. I don't use Intel / Altera FPGAs, and the ones from Xilinx with which I've implemented DDR interfaces with have also included the training logic as a built-in function. Xilinx FPGAs have programmable delays in all of its recent FPGA families I/Os blocks, and a built-in controller for adjusting the interface clock phase to "optimally" sample the eye pattern. My understanding is that training takes place in both directions. Although not up to date on the DDR SDRAM specifications, I assumed that pattern used in the SDRAM - I/F controller direction was set by an easily selected function of the SDRAM control logic. In the other direction, the I/F controller supplies a predetermined pattern so the SDRAM can "optimally" adjust its input delays to align its data sampling to the circuit delays. I don't know how the training cycles are started or terminated.

In the distant past, when I did do a design with an Altera Flex 10K FPGA, I found that Altera's application engineers provided well written application notes, not unlike those provided by Xilinx. Perhaps Intel / Altera has such a detailed application note available for its DDR SDRAM interface on the DE-1 board.
oldben wrote:
I have been unhappy since FPGA's lost 5 I/O and set/clear (both) of flip flops.
Agree on the 5V tolerance issue, but the set/clear FFs issue is not that big of a deal. Asynchronous setting / clearing of the FFs presents major problems with signal routing, especially if the set / clear signals are themselves generated asynchronously with a deep logic tree in the FPGA. My first few projects in FPGAs I used many of the techniques that I'd used with discrete logic FFs and PALs. It did not take long to adopt the approach that all signals within an FPGA, including the setting and clearing of the FPGA FFs, must be synchronous. A first corollary is to minimize the number of clocks used in a system, and a second is to explicitly synchronize all external signals entering the FPGA. Adopting that synchronous logic approach has been very beneficial for me in my more complex FPGA designs.

I may occasionally use an asynchronous reset, but the output(s) of the FF(s) that is (are) asynchronously reset must be synchronized before use in the remainder of the FPGA logic. Thus, the latency advantage of the asynchronous set / clear functions of an FF in an FPGA is often lost in the resynchronization frequently required within the FPGA logic.

The moral for FPGAs is: just say no to asynchronous logic.

_________________
Michael A.


Tue Mar 29, 2022 12:27 am
Online

Joined: Mon Oct 07, 2019 2:41 am
Posts: 317
Just curious. What driver were you expecting to be supplied by Intel / Altera?
Not what they gave me.
The DE1 was designed as Teaching FPGA device, switches leds static and D-ram.
I bought it many moons ago for breadboarding TTL logic. A D/FF has set and reset.
If want to use both it my right as a designer, not some CEO looking for a new sales gimick
telling me what THEY THINK A D F/F is.
Any supplied complex logic, for that card is internal bus specific for a ARM RISC device,
not something that is a generic device. It all point and click interacing that vender specfic
after you pay big $$$ for the IP stuff.





The FPGA-based DDR interfaces I have had the opportunity to use, whether provided as soft-cores or as hard cores, have all been complete interfaces. They have not required any HDL / SW code on my part. I suspect that what Intel / Altera is asking for your logic to provide is a "trained" clock signal that places the DDR sample points at the "optimal" points in the eye pattern. I don't use Intel / Altera FPGAs, and the ones from Xilinx with which I've implemented DDR interfaces with have also included the training logic as a built-in function. Xilinx FPGAs have programmable delays in all of its recent FPGA families I/Os blocks, and a built-in controller for adjusting the interface clock phase to "optimally" sample the eye pattern. My understanding is that training takes place in both directions. Although not up to date on the DDR SDRAM specifications, I assumed that pattern used in the SDRAM - I/F controller direction was set by an easily selected function of the SDRAM control logic. In the other direction, the I/F controller supplies a predetermined pattern so the SDRAM can "optimally" adjust its input delays to align its data sampling to the circuit delays. I don't know how the training cycles are started or terminated.


But I don't want do to that! That is a single phase clock with data delayed by +-90 degrees from the
master clock. I want to use a 4 phase clock instead.



In the distant past, when I did do a design with an Altera Flex 10K FPGA, I found that Altera's application engineers provided well written application notes, not unlike those provided by Xilinx.

I use ADHL rather the horrid VHDL and VERLOG. Altera has rather poor quality documentation
on this.

The moral for FPGAs is: just say no to asynchronous logic.

I use RESET rather a lot from the front panel, not like the Bugs Bunny cartoon SAM: I said stop you stupid dragon.
Hits dragon with a hammer. Dragon stops just before cliff. SAM keeps moving forward over the dragon and stops in mid air. Falling SAM: Dragons are so stupid.


The only real asynchronous logic used for this design and power on clear.
Ben.

Seymour Cray
When people asked why he didn’t use caches, he replied “You can’t fake memory bandwidth that isn’t there.” No cache could handle the data sets that his machines were meant for. Instead, he used hundreds of megabytes of interleaved 20 ns ECL RAM.


Tue Mar 29, 2022 5:56 am
User avatar

Joined: Sat Dec 25, 2021 6:49 pm
Posts: 6
Location: A magnetic field
robfinch on Thu 24 Mar 2022 wrote:
Interesting diatribe.


Thank you. I have longer ones, if you're interested.

Anyone who has recently completed NAND2Tetris will get the impression that computer systems are strictly layered and that good architecture is always followed. They'll get a shock when they encounter real systems because the coupling is far too high and components are rarely discarded. This is particularly apparent in networking where, for example, some people are quite happily using Banyan Vines or EcoNet. In networking, a year never passes where it gets simpler.

We seem to place systems into two categories: toy systems and "real world" systems. We really need to examine the validity of these categories or why it is acceptable to ignore the practice which we teach. This may be deeper but related problem where, for example, language designers typically work in academia or pure research departments and their work is ignored or adopted very slowly. As an example, C is more than 50 years old but common use is clustered around the 1999 standard. A similar principle applies to applications. Most word processing documents can be exported in 20 year old formats without loss of fidelity. Likewise, I suspect that the typical web browser choice would be more than 10 years old if automatic updates were not default.

Actually, we have a whole set of contradictory figures where everything is in flux:

  • RAM, processing power, network bandwidth, storage and graphics all advance at different rates.
  • Hardware truly doubles in capability every six years.
  • Data inter-change advances at 1/2 the rate of a protocol. (This applies to grammar and techniques of programming.)
  • The average duration of an application installation is 22 months whereas the average duration of an operating system installation is 15 months.
  • Programmers generally quit or are promoted within 15 years.

This leads to odd situations such as:

  • A continuous procession of audio and video codecs.
  • File formats from the 1980s or before which remain in common use.
  • Unmaintained software where all of the primary authors are dead.
  • Maintained software which is continuously re-written.
  • Mature software which is indistinguishable from abandoned software.
  • What JMZ described as "perpetual beta" where software is abandoned at the same difficult stage.
  • A continuous procession of simplified software. For example, Nginx simplifying Apache HTTPD Server.
  • Use of insecure programming languages which are older than the programmers.
  • Two or more layers of software simulation with horrendous inefficiency.
  • Too many platforms.
  • Too many programming languages.
  • Too many licenses.
  • The language of your choice or the license of your choice but not both.
  • Constant corner cases and fragmentation.
  • Perpetual work-arounds.

A particularly sad example of perpetual work-arounds is the non-existent day of 29 Feb 1900. Leap year rules are:

  • if (year%400)=0 then leap year.
  • elsif (year%100)=0 then not leap year.
  • elsif (year%4)=0 then leap year.
  • else not leap year.

It is really easy to get this wrong. Years from 1904-2096 are leap years but 1900 and 2100 are not leap years. Well, early versions of Lotus 1-2-3 get this wrong. Later version maintain compatibility. Microsoft Excel imports Lotus 1-2-3 files and therefore handles this case. OpenOffice Calc imports Lotus 1-2-3 and Microsoft Excel and also handles this case. Whatever replaces OpenOffice Calc may also handle the erroneous leap day. It is the most famous day which never happened.

I realize that I'm complaining old stuff being broken and attempts to support/fix/replace it always being the wrong choice; usually due to short-term financial choice or other technical debt. However, a common root cause is poor assumptions when an application is being written. And many of these root causes can be traced to a singleton:

  • Application is written in unsuitable programming language.
  • Application works in one natural language.
  • Application works in one currency.
  • Application works with one (broken) date format. Two digit year is a typical wrong assumption - even in the 22 years since Year 2000.
  • Application has one interface.
  • Application assumes one display.
  • Application assumes single core system.
  • Application assumes one float implementation.
  • Application works on one processor architecture.
  • Application works on one operating system.
  • Application assumes single user system.
  • Application assumes single task environment.
  • Application assumes one serialization format.
  • Application assumes one file volume.
  • Application assumes one type of network interface.
  • Application assumes one consistent network connection.

Many of the remainder can be traced to hard limits, such as address-space, volume limit, inode limit, file size limit, peripheral port limit or network connection limit. oldben suggests scalable types but that isn't a complete solution. Arguably, an application should have zero untyped integers and zero untyped strings. Potentially, all data-types could be parameterized, after linking, at load - or possibly during execution. Hit a 16 bit connection limit? Raise it to 32 bit. Unfortunately, this is incompatible with most operating systems. It is also incompatible with Greenspun's tenth rule where, for example, any case statement in a while loop may be an untyped virtual machine.

I agree with robinsonb5 that abstraction can be the cause of complexity. BeOS is a famous example. It was intended to be a clean-sheet, multi-media replacement for a MacOS. The clean-sheet efficiency claims became increasingly tenuous and it fell behind MacOS. You could easily argue that economies of scale allowed MacOS to progress. Indeed, if equal effort had been applied to MacOS, BeOS and AmigaOS then Apple's effort would only be preferred due to application support. This is the problem. Clean-sheet designs have no data, no applications to manipulate data and no finance to develop applications. Whereas, real designs devolve into one giant hack. The most economically valuable effort is to keep extending a large pile of hacks. Then we get massive security problems when, for example, energy, medical, legal, financial, government and military all use the same flawed commodity hardware. But that's alright because that's an externalized cost.

yeti on Fri 25 Mar 2022 wrote:
Gigabytes


Anything which gets to 30-32 bit address range encounters multiple barriers. The most obvious is the 4GB addressing limit of common 32 bit processors. This is often reduced to 3GB or less in practical systems. However, there is a more subtle limit. If I solder a DIP 8 bit processor which runs at 20MHz and it can block copy data from storage at 1MB/s then it will take more than 1000 seconds to read 1GB. Presumably, it will take many multiples to process that volume of data. Part of the reason for getting a self-hosting system within 24 bit address-space is due to the total number of bus cycles.

Dr Jefyll and drogon have success with stand-alone systems and both prefer to work in the 16-24 bit address range. Dr Jefyll has 65C02 Forth acceleration and a four bank register scheme which provides functionality outside of 65816. drogon has self-hosting BCPL with 32 bit pointers running on 65816 with 512KB RAM. To me, this indicates that 18-30 bit addressing is most favorable for self-hosting. 16 bit is too crammed. 32 bit reduces options. In particular, it requires a large set of dependencies which are not stable.

drogon on Fri 25 Mar 2022 wrote:
However what are your expectations... the Apple IIe running Appleworks was capable of being a business tool and self hosting things like Pascal and BASIC (also C). CP/M was (still is) "a thing".


I take an extended view of self-hosting. I'd like to display PDF, run circuit design software and submit designs to JLCPCB or PCBWay - and that requires JavaScript served over SSL. Perhaps JavaScript is unrealistic but I'd definitely like to send designs via SMTP to a local manufacturer. PDF could be rendered in batch mode to a bitmap format. The speed of conversion and volume of output is not critical. Nor do I care about text search when many legacy documents do not have this functionality.

I would also like a video player for the primary purpose of playing tutorial videos. The display and codec are arbitrary and may be designed for each other.

I've described Z80 CP/M Pascal as "frighteningly conformist" but one stable stack is better than no stable stack. Actually, an idea which I do not have the ability to pursue is a microcontroller agnostic design which allows either W65C265 or eZ80 to be fitted (but not both). W65C265 is 100 pin SMD 6502 with 16 bit extensions, integrated peripherals and banking for 24 bit address-space. eZ80 is 100 pin SMD, cycle efficient Z80 with integrated peripherals and extensions for 24 bit address-space. They are similarly matched extensions of 6502 and Z80 and both are over-looked options for a self-hosting system. 65265 has four UART and integrated sound through a generalized DTMF peripheral. eZ80 runs CP/M very fast. Both are suitable to run Fuzix or neutral bytecode (BASIC, BCPL, Forth, Perl, WebAssembly) - and I have many significant finds related to bytecode formats.

Firstly, a KIM-1, Commodore VIC20 or Commander X16 memory map may use one 74x138 to obtain 8*8KB blocks of memory. Up to six RAM chips may provide 48KB RAM while also allowing 8KB for I/O and 8KB ROM. This can be extended beyond 16 bit address-space and allow multiple applications to have 16 bit address-space. It also allows more efficient use of larger RAM or ROM. Separately, 8*8KB blocks can be rotated around by one position for Z80. In this case, it is possible to make one ROM which boots 65816 from the top of 16 bit address-space or Z80 from the bottom of 16 bit address-space. The ROM may contain 65816 bytecode interpreter and eZ80 bytecode interpreter which present the same memory map. Specifically, 65816 LDA $E000,Y may perform the same action as eZ80 LD A,(IY) - and both offer separation of program and data.

Secondly, WebAssembly is entirely compatible with little-endian 8 bit processors. WebAssembly specifically requires IEEE-754 floating point implementation. However, I strongly suspect that most software works with BASIC style floats. For example, 8 bit exponent for single precision and 16 bit exponent for double precision. This reduces bytecode interpreter size and processing time.

Thirdly, I understand that Perl has its own memory allocation system in which 8KB blocks are handled internally. This was quite typical in the 1990s to maximize cache performance. InnoDB is another example. I believe that 8KB was chosen because it is compatible with Unix systems which had virtual memory page size of 4KB or 8KB. However, it is also convenient for 8*8KB KIM-1, Commodore VIC20 or Commander X16 type memory maps if they are extended beyond 16 bits. This is ideal for self-hosting. In particular, it is possible to write a C compiler in Perl which outputs bytecode of some form. On the 6502 Forum, I believe that GARTHWILSON suggested a C to Forth compiler. Unfortunately, that's lost in the noise. WebAssembly or SIMPL may also be suitable. The C compiler only has to compile the subset of grammar used by Perl. The bytecode interpreter only has to request 8KB blocks and implement a subset of POSIX functions. This requires thorough understanding of Perl, C, POSIX, bytecode and an obscure hardware target. It is otherwise a concise route to self-hosting a moderately mainstream compiler and interpreter on minimal hardware. It would also run command line utilities, such as PerlPowerTools. Python might be preferable, however Python is significantly more fussy about float support and memory allocation while using more memory and running slower.

None of these choices preclude 65C02 with instruction stream bank switching, 8080 or Z80 with I/O segment bank switching or FPGA implementation. So, yes, you can have an 80MHz DDR VERA implementation which is compatible with ASIC implementation. I strongly prefer each component to have a maximum of 10000 transistors (100*100) so that visible wavelength lithography is possible. Potentially, home lithography may be possible. Dozens of units in one FPGA is interesting if it is faster, cheaper and more widely available. However, sockets for discrete, hackable, replaceable components remain preferable.

drogon on Fri 25 Mar 2022 wrote:
I think we're seeing a lot of "cross-pollination" (for lack of a better term!) here and on other forums in recent weeks (no bad thing IMO) - there is a minimalist computing group on facebook, and recent threads on the retrocomputing forum too...


I've been thinking about this for too long. Other people are in a similar situation. For those who are much newer, there is a ground-swell among the Reddit Arduino and Ben Eater acolytes who are bored with breadboards, bored with I2C modules, bored with PCB design, bored with TMS9918/VGA text mode, bored with video tutorials/edutainment and bored with waiting for Commander X16. However, I'm not aware of recent efforts beyond the ongoing War Against General Purpose Computing and the Commander X16's VERA graphics implementation being published. The latter is helpful to me because my 6502/65816 designs are loosely compatible the Commander X16's ABI, memory map, physical layout and planned size/cost reduction. (Actually, the memory map is equally incompatible with VIC20, Commander X16, Apple II, Acorn and W65C265.) I skipped the graphics system entirely and assumed I would connect something like Gameduino or VERA over SPI. Now that VERA itself is available, Commander X16 is redundant for many people. If anyone can purchase a pre-assembled, pre-programmed VERA - or extend one binary compatible design - then it may be a considerable step change for self-hosting systems. We aren't limited to composite video from an obsolete chip. We aren't limited to VGA text. We aren't limited to one screen either.

oldben on Mon 28 Mar 2022 wrote:
The DATA sheet says
'adjust phase lock loop until it reads correctly'.


Presumably, this is followed by 'Have you tried turning it off and on again?'

_________________
Beep, Beep! I'm a sheep!


Fri Apr 15, 2022 2:04 pm
Online

Joined: Mon Oct 07, 2019 2:41 am
Posts: 317
Quote:
Presumably, this is followed by 'Have you tried turning it off and on again?'

Is that before or after you hit it with a large hammer a few times
(random thoughts)
I wonder if most of the need for large virtual memory is for all the memory needed
for streaming ads, that seem to go with every ap you download.

Computers moving from the 1980's no longer have the (mostly) free documentaion
and/or basic softwarethat came with the earlier systems. I can get PDP 8 stuff easly
for a $2K used computer ; or 10k$ software/and docs for $500 Micro Vax. This may be
why hobby programmers have never been able to program more than 'toy' systems.

Computer designs have been pushing the RISC model, but many hi-level languges don't follow
this model thus a miss match code and data. C,C++ is quite popular but what about BCPL,BASIC,PASCAL,ALGOL,ADA,SNOBAL,LISP,FORTRAN IV, structured ASM and FORTH to name a few. COBOL does live again. Do we need some new standard to add back features that have been
dropped for the sake of speed, like BCD math, boolean operations, and stack based architecture
with displays.


Sat Apr 16, 2022 7:03 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1678
Location: Canada
I do not know what to make of all the comments. It is a bit much to digest. I have my own random thoughts.

6502/65816 would be a challenge to run modern software on. Some design aspects are required to obtain acceptable performance levels. Using memory banks instead of a flat model leads to its own complexities. Virtual memory is used for reliability. Bad pages can be re-routed, and apps protected from one another.

I sit writing this using a i7700 quad core, with Nvidia GEFORCE graphics I cannot imagine trying to run something like Word on a 65816 with low power graphics. I have read that the ISA does not make much difference, so why not 65816 / ez80, stack based or others. Everything gets converted to micro-ops. I think once one has a Turing complete machine anything can be emulated. I am reminded of running GEOS on a 1 MHz 6502 which was slowed down by the graphics display. If a 1MHz 6502 can do that, then what can a 100 MHz 6502 handle? I believe there are measured characteristics that say things like with X MIPS of processing power Y is achievable.

I think the situation is the result of the push towards building better people. Everybody wants to outdo the next guy. People build on other people’s work. It is costly to start from scratch. One ends up with software and hardware stacked on software and hardware. I do not want to touch ‘X’ because we know ‘X’ works, but we still want to get to ‘Y’ from ‘X’.

Gigabytes of memory are not enough. I do not see any 3D displays with real-time processing. It takes multi-megabytes of memory to render a 2D display. Multiply that by 3D, then by animation.

I am using 64kiB memory pages for the project I am working on.

I would argue the reason many people do not get past the ‘toy’ systems is the sheer number of man-hours required to develop something more sophisticated. Some systems require multiple parties working on it. I do not think it is lack of documentation. There is all kinds of documentation on the web. It is time consuming to absorb it. Using modern tools to build something helps a lot.

_________________
Robert Finch http://www.finitron.ca


Sun Apr 17, 2022 7:39 am WWW
 [ 15 posts ] 

Who is online

Users browsing this forum: AhrefsBot, CCBot, oldben and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software