2000 lines #255

samiam95124 · 2025-11-13T17:11:28Z

samiam95124
Nov 13, 2025
Maintainer

2000 lines of code

So one of the ongoing refactors is breaking pgen down into machine dependent and machine independent sections. The main idea of that is to separate the CPU dependent and non-CPU dependent code. That makes it very clear just how much code it takes to port Pascal-P6 to a new cpu.

However, right now its fairly clear what the answer is, which is about 2000 lines of code or LOC. The phases of pgen are:

Loading and forming graphs from intermediate (generate).
output statement level code (assemble).
Assign registers to expressions (assreg).
Output expression code (genexp).

The AMD64 CPU model is reasonably complex, so it is unlikely that a more complex implementation will arrive. There are plenty of advanced implementation details yet to arrive, such as variable sets and vector operations, so lets apply a rule of thumb and say it will double to 4000 at some point. The entire backend is 6000 lines of code at this point.

Is 2000 lines a lot? a basic page is about 80 lines of code, that is, the number of lines of code an editor displays on a modern 4k display. I used to use the rule of thumb that a page is about 65 LOCs, because that is what a typical dumb terminal would display[2]. This means 2000 lines is about 25 pages of code, and that is what you have to look through and understand to implement a typical CPU.

The backend of IP Pascal (machine.pas) is 16,000 lines. Using the same ratio, 1/3 from pgen (6000/2000 LOC) we get about 5000 LOC, or roughly 2.5 times pgen.

Why is that important?

These 2000 lines are the fulcrum that Pascal-P6 stands on. There is another path (or two), which is the cmach implementation. Basically, pgen dictates if the compiler lives or dies in the future. I assume that nobody is going to take the hit of an order of mangnitude slower runtimes (or more) given by cmach. Java tried this and just ended up with a reputation for poor performance that has been very hard to wash away. Even to this day.

Is 2000 lines the final number?

As in, would the to total LOCs grow from improvements like advanced register allocation? I don't think so. The architecture of register allocation followed up by code generation based on that has the nice property of not needing change to the base code generation because it just uses the registers it was assigned (genexp). It does not care how they were assigned. This means that the new register allocation algorithm would not add to the CPU specific code.

Is pgen the minimum code generator?

Actually far from it. The minimum model is threaded code generation, which I haven't used since the 1970's, followed by "in line" code generation, which is a model you would see in the check encoder of IP Pascal. That puppy has gone unused for decades, but for reference is about 10,000 lines of code.

If you really wanted to see the minimum LOCs with the current pgen, you would put the "code strips" from pgen (the parameters to wrtins) into a big honkin' table. This would actually cause the net code to get bigger, since the code that selects what code strip to output would need to be systematized. However it would concentrate the source code for generation into a smaller space.

Is pgen a good code generator?

Pascal-P6 uses an executable model intermediate, consisting of a stack machine. Others are (for example):

Abstract machine model (gcc).
Abstract syntax tree (llvm).

The advantages for executable intermediate are that you can understand the intermediate in terms of what it does, and that you can also make an interpreter for it (pint). In both pgen and IP Pascal's machine.pas, the intermediate reaches all the way down to the case statement that generates the code. To me, this means that the abstraction works, and works well.

The future

I have committed to producing a pgen for each of the AMD64, ARM32/64, and Risc-V 32/64. Thats 5 times 2000 lines, ignoring for now the commonalities between the 32 and 64 bit models (which are considerable). The total then is 10,000 LOC.

Pascaline is designed to run efficiently on 16 bit processors, as evidenced by the range and static features[1]. These are only useful on 16 bit address constrained models. Unlike gcc and llvm, neither Pascal-P6 nor IP Pascal are designed to favor regular instruction sets (where any register can use any operation). However, the reality is neither Pascal-P6 nor IP Pascal will see a CPU that is not regular, nor less than 32 bits. There is simply not enough time left for me, nor interest to do that.

[1] The range feature allows you to exactly specify the numeric range of intermediate results. The static feature allows you to specify that routines will not recurse. Both are designed for limited CPUs typical of 16 bit models. IP Pascal came from 8 bit, and then 16 bit CPUs, so those features were important (8 bit CPUs are really treated as 16 bit CPUs because of addressing).

[2] Also about what would be printed at about 12 point type on a page. That's 13 double sided pages. Pretty reasonable to carry around.

samiam95124 · 2025-12-12T18:03:00Z

samiam95124
Dec 12, 2025
Maintainer Author

And the final number is:

3317 for the CPU specific module pgen.pas. So I underestimated. Sue me.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2000 lines #255

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

2000 lines #255

Uh oh!

Uh oh!

samiam95124 Nov 13, 2025 Maintainer

2000 lines of code

Why is that important?

Is 2000 lines the final number?

Is pgen the minimum code generator?

Is pgen a good code generator?

The future

Replies: 1 comment

Uh oh!

samiam95124 Dec 12, 2025 Maintainer Author

And the final number is:

samiam95124
Nov 13, 2025
Maintainer

samiam95124
Dec 12, 2025
Maintainer Author