Last visit was: Tue Apr 07, 2026 2:29 pm
It is currently Tue Apr 07, 2026 2:29 pm



 [ 6 posts ] 
 Two modern discrete-component computer architectures 
Author Message
Online

Joined: Sat Apr 04, 2026 1:21 pm
Posts: 5
Hello friends, I'm designing two open-source computers that don't use any complex VLSI—no microprocessors, FPGAs, PLDs, ASICs, or DRAM—for critical infrastructure, privacy seekers, and education. These are hand-solderable but use surface mount components in significant numbers, so as much development as possible is done in simulation. Nothing is ready to build a physical implementation yet.

Dauug|18 will come first. It's an 18-bit "solder defined" controller built primarily around 6 SRAM ICs that contain the lion's share of the system logic. The circuit board can be about the size of a postcard. I expect the CPU to run at 40 MHz, and each instruction to take one clock cycle, which limits how much each instruction can get done.

• 3 SRAMs are a single-layer, fed-back, bit-sliced arithmetic logic unit. Thus these SRAMs contain only firmware and aren't written to after the system is initialized. The smallest synchronous SRAMs on the market are essentially 256 Ki x 18 bits, so they have 18 inputs and 18 outputs. That means the address space for the ALU can be split 6+6+6 bits with the left operand, right operand, and operation to be done taking 6 bits each.

• 2 SRAMs are for code memory and are wired in parallel so they act like a 256 Ki x 36 bit memory. Of these two, the first is control signals (12 bits for general control + 6 bits to specify the ALU operation), and the other is an 18-bit literal argument for each instruction, not always used.

• 1 SRAM is data memory.

Dauug|36 will follow Dauug|18, in part because it will use Dauug|18 as a coprocessor for initialization and I/O. It's a 36-bit minicomputer (I can't call a microcomputer without a microprocessor) with a board a little larger than a sheet of copier paper (hard to call it typing paper if people don't use typewriters). This architecture is a lot more "full service" than Dauug|18 and has preemptive multitasking, paged virtual memory, and strong separation between user programs—strong enough that I would trust it without further virtualization hardware and software.

Dauug|36 already has about 190 opcodes tested and working in its firmware. I've also written a real-time operating system kernel (RTOS) named Osmin, which in simulation can already boot and run multiple programs at once. There is still more to do. Osmin only has one API call thus far; namely, terminate the currently running program.

Dauug|36 requires about 36 SRAMs in the architecture. I won't enumerate them here. About two dozen only contain firmware. Dauug|36 will run at 40 MHz like Dauug|18, but only achieve 10 MIPS. Its instructions are quite robust; for example, you can count the number of ones in a 36-bit word in two instructions. Switching context from one program running to the kernel to a different program running takes just 20 clock cycles. There are no registers to spill when the context changes: registers are in SRAM, so every program has 512 registers of its own. This makes local scalar variables very convenient. Limit of 256 running programs.

Both architectures have very unusual characteristics relative to mainstream CPUs. I would exhaust my 60,000 character limit going into them here; I've already written about 200,000 words of documentation to date. Some have questioned the choice of 18- and 36-bit word sizes. They are correct and stem from the peculiar characteristics of SRAM arithmetic logic units.

Marc
Executive Engineer, The Dauug House


Sat Apr 04, 2026 4:57 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1879
Thanks for sharing! I'm afraid I'm a bit boggled - I'd hoped to be able to get a clue and then make an intelligent comment, but it's not happening.


Mon Apr 06, 2026 11:42 am
Online

Joined: Sat Apr 04, 2026 1:21 pm
Posts: 5
Hi Ed,

My bad for not starting this thread with more specific context. I used to work in communication privacy, especially metadata privacy, and although I was fine writing protocols, ciphers, applications, network stacks, operating systems, I was always stymied by the utter lack of trustworthy hardware we could use at communication endpoints. Without secure hardware, there is no secure software.

Today's die-implemented CPUs are too complex to predict their behavior in the presence of unusual inputs and circumstances, not inspectable at the logic gate level by end users notwithstanding serious principal-agent problems surrounding potential existence of backdoors, and are too backwards-compatible to eliminate longstanding causes of software defects (arithmetic results that don't fit, stacks that extend out of bounds, etc.).

I claim there are applications in privacy and critical infrastructure where complex VLSI as manufactured today—microprocessors, PLDs, FPGAs, ASICs, and DRAM—inherently cannot meet security objectives, and workarounds are needed to meet the needs of these applications. I further claim we can build computers that are small enough, cheap enough, fast enough, for many applications without using any complex VLSI at all. Such computers would be faster than a '286, but inspectable like a soroban.

Unfortunately, I don't think it's enough to stop with these claims. I think people need to actually see these "solder-defined," transparently functioning architectures that I say are practical to build and use. Folks also need off-the-shelf reference implementations they can immediately build and use, as well as tools, operating systems, and documentation. So that's what I do full-time now.

Is this thread any clearer? Thanks for your patience.

Marc


Mon Apr 06, 2026 4:36 pm
Online

Joined: Sat Apr 04, 2026 1:21 pm
Posts: 5
I'll be the host for this Thursday's show at Hacker Public Radio. The program is

hpr4614 :: Dauug|18: Faster Than a ’286, but Inspectable Like a Soroban

and is for a very technical audience. My show from last year is less technical and is at

hpr4333 :: A Radically Transparent Computer Without Complex VLSI


Mon Apr 06, 2026 4:43 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1879
Thanks... I'm interested presently in the technological and engineering aspect: what does this CPU do, and how does it do it.

Because you want an immutable and auditable machine, you've made certain design choices, one of which is to use synchronous static RAM as the main building block.

One question comes to mind at once: how is that RAM loaded with the necessary data at boot time? I assume some storage, some address counting, some multiplexors.

But once the RAM is loaded, I'm supposing the machine works somewhat like an EPROM-based machine: look up tables can do whatever logical operations are needed. I'm happy with that.

So the second question is more about design than implementation: these six bit ALU slices, what can they do, and how are those operations used.

You have a diagram about permuting 18 bit inputs, I think - I didn't understand what was being done, and why it was needed. Hopefully one or two sentences will suffice!

I'm supposing the high level goal is to program first in assembly language, and then perhaps in a higher level language?

Apologies if all this is already answered in the docs.

It's surely an interesting thing you have there, and I'd like to understand how it works.


Tue Apr 07, 2026 7:57 am
Online

Joined: Sat Apr 04, 2026 1:21 pm
Posts: 5
Hi Ed,

It would be nice to have multiplexers at various points, but the AUC logic family doesn't include any. The closest next option is the gates that offer hi-Z output, which includes the SRAMs, 16-bit (quad 4-bit) buffers, and 16-bit (dual 8-bit) flip-flops. The firmware loaders are still being designed, but we know what they have to do:

Firmware injection points are shown on page 19 of the Dauug|36 preprint and page 1 of the Dauug|18 overview. (These are simplified diagrams, so where stuff is not drawn, the firmware load path isn't shown either.) In many instances these locations where firmware is brought in is not where it needs to be, but it can then follow existing flows within the CPU to get where it needs to go. So the firmware loader in effect replaces the control decoder ('36) or code SRAM ('18) when the system boots.

Storage for firmware, Dauug|18 needs at most 256Ki * 18 * 5 (size * width * chips) bits. "Inspectable" to me implies punched tape, but on "standard" (pretend with me) 8-bit tape with holes every 0.1 inch and no error checking, that's 4.7 miles of tape. Dauug|36 would use much more. Humanly achievable, but not practical for use and not very fast to boot.

So the next best option, I'll use a serial NOR flash IC. This will be the only IC on the board with non-volatile memory, so we'll know the bounds of where state can "hide" when the power is off. I chose a flash chip that supports booting in that it doesn't need a read command to start a data transfer when the power comes up. A power-ok input + clock will start outputing bits serially. We won't provide an electrical path to the flash from the CPU, only from, so it won't be possible for remote attackers to save modified firmware.

Once bits are streaming from the flash, the logic's not terribly hard. First up, we need data in parallel, so we fashion a shift register from 16-bit flip-flops. (The AUC family doesn't offer shift registers.) Then between some minimal combinational logic (soldered) and some precomputed control signals embedded within the firmware bitstream, we load the SRAMs, set the program counter, disconnect from the firmware loader, and start the CPU. This is the Dauug|18 boot process.

Booting Dauug|36 is more logically and electrically complex, so what we do with that is build a Dauug|18 onto the same board, boot the '18, then have the '18 read the '36 firmware from the same NOR flash (just continue the existing bitstream), and use that to initialize the '36. Then that same Dauug|18 branches to code that lets it implement I/O (bit banging, all that) for the '36.

To the extent we need address counting hardware for the firmware loader, this will use Galois linear feedback shift registers (LFSRs) to keep the component count small. So the order of the firmware words won't be linear in the NOR flash; they will follow the LFSR's ordering instead.

The 6-bit ALU slices, the '18 and '36 work a little differently, so this description is approximate. Each slice is an SRAM, works like an EPROM as you suggested except it's 10 times as fast (SRAM is faster than EPROM). They compute any function on 18 input bits, and we get up to 18 output bits. That gives us, for the input, a 6-bit left operand, 6-bit right operand, and 64 functions they can implement (there are 6 bits available to chose which function).

Functions you'll find in these slices include 6-bit add, subtract, AND, NAND, OR, NOR, XOR, XNOR, NOT, left OR not right, right OR not left, left AND not right, right AND not left, exactly left, exactly right, NOT left, NOT right, FALSE (ignores operands), TRUE (ignores operands). Those are the easy ones.

Harder functions to understand: 6-bit multiply produces a 12-bit result. So there's a function to produce the low 6 bits of the product, and a different function for the high 6 bits. Popcounts involves some serious gymnastics; the operation to count bits within a 6-bit slice is easy, but operations are needed to combine the slices for a final total. Carry adjustments for addition and subtraction. Magnitude compare. Minimum and maximum. S-box operations for hashing, pseudorandom number generation, and possibly block ciphers. Bit permutations. Various unary functions. Special instructions to accelerate full-word multiplications and division. Shifts and rotates.

Why we transpose (you wrote "permute" which is accurate but less specific) 18- and 36-bit words: because otherwise, we're stuck. Suppose I want to rotate an word left 1 bit, so ABCDEF GHIJKL MNOPQR needs to become BCDEFG HIJKLM NOPQRA. The ALU is bit sliced, so there's no way to move G from the middle to the left subword, M from the right to the middle subword, or A from the left to the right subword.

Here's how the rotate works in real life: ABCDEF GHIJKL MNOPQR is rotated locally 1 bit left in each subword, becoming BCDEFA HIJKLG NOPQRM. That's legal, no bit crosses a subword boundary. Then we do our transposition: the left, middle, right two bits of each subword are relocated to the left, middle, right subwords. How? Just copper circuit board traces that go where we want the bits. That gives us BCHINO DEJKPQ FALGRM.

What does that buy us? We were trying to solve the problem that we couldn't get the G, M, or A to cross into their correct subwords. But in the transposed form, all three letters are in the right subword, currently FALGRM. Now one of the bit slice operations is to rearrange the right subword only: FALGRM is replaced with FGLMRA, the left and middle slices leave their subwords alone, and the 18-bit word is now BCHINO DEJKPQ FGLMRA.

We now transpose a second time, again that's just done using copper. The transposition is self-inverse, but we made a small rotation in the left subword while transposed. Our BCHINO DEJKPQ FGLMRA now becomes BCDEFG HIJKLM NOPQRA, which is our original ABCDEF GHIJKL MNOPQR rotated left 1 bit as we desired. That's how rotates work, and that's one example of several why we need a way to transpose 18-bit words.

Dauug|18 handles these transpositions awkwardly in the sense that it takes several instructions to complete tasks such as shifts and additions (addition also needs this transposition). Dauug|36 has three Dauug|18-like ALU layers with transpositions going into and out of the second layer, so one instruction can do the whole rotation, the whole addition, and so on. The penalty is there are a lot more components, and each instruction takes 4 clock cycles on the '36 instead of 1 clock on the '18. (The 4th clock cycle for the '36 is for register fetch and store, which the '18 doesn't have because it has no registers.)

Yes, assembly language support will precede any higher-level languages.

Marc


Tue Apr 07, 2026 2:26 pm
 [ 6 posts ] 

Who is online

Users browsing this forum: claudebot, marc and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software