Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace TREE with PDTREE #127

Closed
mateuszviste opened this issue Sep 24, 2024 · 104 comments
Closed

replace TREE with PDTREE #127

mateuszviste opened this issue Sep 24, 2024 · 104 comments
Labels
packages SvarDOS packages

Comments

@mateuszviste
Copy link
Collaborator

SvarDOS comes currently with TREE v3.7.2 from the FreeDOS project, itself being a 1995 commercial code by Dave Dunfield, GPL-ized somewhere in the 2000s.

There is also pdTree: a public domain version written by @PerditionC
https://web.archive.org/web/20011212151852/http://www.darklogic.org/fdos/projects/tree/
https://github.com/FDOS/tree

Glancing at the source code it appears to be a very clean and memory-efficient implementation. Perhaps SvarDOS could use it instead of the current GPL tree after some minor work:

  • "port" it from TCC to OpenWatcom, this should be a formality, code looks very nice and standard
  • replace CPP parts with ANSI C (did not investigate this, but main file is CPP so I guess there must be some C++ hidden somewhere)
  • create a proper makefile
  • drop all Windows-related stuff
  • maybe remove LFN support, if it leads to any significant simplification
  • replace CATGETS with SvarLANG
  • add a few translations (DE, FR, PL, TR, ...)
  • remove (/reimplement) all GPL-tainted code (DB.C, DB.H, GET_LINE.C ? perhaps these are only CATGETS dependencies)
  • drop libc, reimplement necessary (few) std functions and link with WMINCRT
  • add some documentation/notes that explains where this TREE comes from and to avoid any confusion with KJD's original
@mateuszviste mateuszviste added the packages SvarDOS packages label Sep 24, 2024
@boeckmann
Copy link
Collaborator

Well, that bullet list doesn't exactly look like "minor work". But if you think its worth it, go for it. Maybe we should put a counter anywhere which counts the packages still being GPL licensed, and try to converge that towards zero :)

@PerditionC
Copy link

it should build with ow given it builds with msvc, but I will check it this week. note I have a pd implementation of cats that I plan on switching out the lgl one with and releasing as fd tree 4.0. It's been a while, I'm guessing it builds with a batch file. Adding a make based build for ow should be trivial, so I will check on it. I guess I need to make sure all translations are updated, both versions use same files for translations. There is no c++ other than maybe some variable scoping, I believe it was a naming conflict resolution. The api is based on win32 findfile, so removing lfn support likely won't really reduce size much.

I have too look, may need to add a findfile using ow pragma aux.

(off topic, but fd format is switching back to small model, but small penalty to copy strings from far buffer to near accessible one).

I will try to get some updates on GitHub later this week.

@mateuszviste
Copy link
Collaborator Author

Maybe we should put a counter anywhere which counts the packages still being GPL licensed, and try to converge that towards zero :)

I did think about it for a short time, but ultimately dropped the idea because I do not want to give the impression that "fighting" with GPL is one of SvarDOS goals.

I plan on switching out the lgl one with and releasing as fd tree 4.0.

So pdTree will soon become the default FreeDOS tree, replacing the 3.7.2 version from Dave Dunfield? Cool. Is there any practical reason for it?

BTW the FreeDOS listing is somewhat confusing: it lists TREE 3.7.2 by Dave Dunfield, but provides a link to your PD version:
https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.2/repos/pkg-html/tree.html

I have too look, may need to add a findfile using ow pragma aux.

It's exactly what I was planning to do :)
Great to hear that you are still interested in this 20-years old code!

@mateuszviste
Copy link
Collaborator Author

I started doing some maintenance & trimming on pdTree here:
http://svn.svardos.org/log.php?repname=SvarDOS&path=%2Ftree%2Ftrunk%2F&isdir=1

So far I removed most of Windows-specific stuff, utf8 output, CATGETS, and wide characters support. I have also replaced all C++-isms by ANSI C equivalents, so this fork of pdTree is 100% C now.

@mateuszviste
Copy link
Collaborator Author

After some more hours of hacking, slashing and reformatting, SvarDOS TREE is finally functional.
It compiles with OpenWatcom with a proper makefile, loads its translations via SvarLANG and appears to work reliably as far as I can tell. I removed a LOT of code (no LFN, no Windows support, no Unicode...), simplified many parts of it and replaced some functions with OpenWatcom-specific calls.

UPXed it is a 11K COM file now. Not as small as I want it to be, but there is still hope to get a binary smaller than the GPL FreeDOS TREE (10K) once I get rid of printf. I will probably not go the WMINCRT route, though, since I rely heavily on OpenWatcom's libc now.

Two things that I have yet to do:

  • write some (light) document that explains the history of this TREE
  • review the translation strings, there are at least some unused strings, removing them will help decreasing the footprint

@boeckmann
Copy link
Collaborator

Well done :) We could spare some bytes by not writing the default language to the .lng file. We could adapt tlumacz to at least optionally skip outputting the default language. For FDISK, for example, that would save over 10k bytes. And I am sure that I will recompile the program when I have to update the default language anyway.

@mateuszviste
Copy link
Collaborator Author

We could spare some bytes by not writing the default language to the .lng file.

Yes but then it would not be possible to re-load EN from within the application (if the application supportes such reloads). This is used by the SvarDOS installer.

@mateuszviste
Copy link
Collaborator Author

a win would be to have each language in the LNG file optionally compressed. It's human language, lots of redundancy. There must be some algorithm nowadays that is capable of in-place depacking with a reasonably small depacking code.

@boeckmann
Copy link
Collaborator

I took the opportunity to make the makefile compatible with Linux. Should still work fine under DOS (at least in DosBox it does).

@ecm-pushbx
Copy link

a win would be to have each language in the LNG file optionally compressed. It's human language, lots of redundancy. There must be some algorithm nowadays that is capable of in-place depacking with a reasonably small depacking code.

You may want to look at my use of heatshrink in my Extensions for lDebug, extpak.eld and list.eld. You need a buffer the size of the depack window. If you access the compressed file in a linear way then you can re-use the same depacker state across several calls into the depacker.

@ecm-pushbx
Copy link

@mateuszviste
Copy link
Collaborator Author

Main depacker is in https://hg.pushbx.org/ecm/ldebug/file/a35f88de973a/source/eld/depack.asm

Seems a bit complicated. I was thinking about something much simpler. Maybe a custom algorithm that would read bytes from the data file and when spotting a special marker (say, 0xFF O S), it would know that it has to copy now S bytes from offset -O in past stream. Probably not the most efficient approach, but should be easy to implement in any language, and might provide good enough results for text strings. I might investigate this in incoming days, once I am done with SvarTREE.

@mateuszviste
Copy link
Collaborator Author

mateuszviste commented Sep 28, 2024

I committed an experimental change to SvarLANG: TLUMACZ outputs now two versions of the LNG file: one as usual (OUT.LNG) and the other one compressed (OUTC.LNG). SvarLANG is able to read both. Older versions of SvarLANG will not crash on it, just ignore the compressed languages.

The compression scheme is very simple, I invented it during a coffee break and implemented within an evening hour. It is not meant to be state of the art - just somewhat better than uncompressed text and simplest possible to decompress with no extra memory buffer. It assumes highly redundancy of comoressed data, should work well with text, but attempting to compress binary things is likely to produce "compressed" data twice larger than the original.

It works on a simple example, I will proceed with further tests tomorrow or next week. There is also one or two tweaks I'd like to check.

@mateuszviste
Copy link
Collaborator Author

Some preliminary results:

  • compressing TREE.LNG yields no benefit, the size of the strings is almost the same (8.7K uncompressed, 8.8K compressed).
  • compressing FDISK.LNG makes the string resources over twice smaller (72K uncompressed, 35K compressed).

TREE still works after compression. FDISK I could not check because for some reasons I am not able to compile it (unrelated to SvarLANG).

@mateuszviste
Copy link
Collaborator Author

SVARCOM.LNG compresses from 59K to 35K. Works properly.

This looks quite good. I think I will publish a new SvarLANG version today or tomorrow, and then migrate SVARCOM to it. I just need to test it on some real & ancient (286) hardware first to make sure it is not too slow.

@boeckmann
Copy link
Collaborator

compressing FDISK.LNG makes the string resources over twice smaller (72K uncompressed, 35K compressed).

Awesome :) Time then for a new FDISK version.

FDISK I could not check because for some reasons I am not able to compile it (unrelated to SvarLANG).

Can you give some details about the build environment? Then I can try to reproduce.

@boeckmann
Copy link
Collaborator

Btw. the current FDISK.LNG (version 1.3.16) is 104k. May it be that you are working with an older FDISK source?

@boeckmann
Copy link
Collaborator

Or is it only the strings you are counting?

@mateuszviste
Copy link
Collaborator Author

May it be that you are working with an older FDISK source?

Yes, I took some old source tree that was laying on my HDD, I did not have enough courage to study git again to find how to checkout the latest code. I will download the latest source as a zip file and try again.

@mateuszviste
Copy link
Collaborator Author

tested with latest FDISK source tree. With SvarLANG's MVCOMP the strings are:
uncompressed = 102K
compressed = 49.7K

FDISK works fine (help screen and main screen display correctly).

@boeckmann
Copy link
Collaborator

Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK...

@mateuszviste
Copy link
Collaborator Author

Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK...

Be my guest. :)

I think I am done with SvarLANG for now, MVCOMP is working unexpectedly well for such a quick hack.

@mateuszviste
Copy link
Collaborator Author

I have moved mvcomp compression of lang blocks to a dedicated switch: TLUMACZ /comp
When compression is enabled, it will actually compress the lang block only if it is beneficial (saves at least one byte) - so using /comp will never be worse than uncompressed LNG (worst case it will be left uncompressed).

@mateuszviste
Copy link
Collaborator Author

Having the compression flag on a per-language basis allows to have some languages in the LNG compressed and other not. I was not sure if this was useful to do it like that, but now I see it was a good decision. In the case of TREE, some languages can be slightly compressed, while other do not, so TLUMACZ applies compression only where it makes sense:

lang EN mvcomp-ressed (949 bytes -> 942 bytes)
lang DE mvcomp-ressed (1001 bytes -> 990 bytes)
lang ES mvcomp-ressed (1091 bytes -> 1032 bytes)
lang FI left UNCOMPRESSED (uncomp=943 bytes ; mvcomp=956 bytes)
lang LV left UNCOMPRESSED (uncomp=948 bytes ; mvcomp=1034 bytes)
lang PT mvcomp-ressed (1019 bytes -> 978 bytes)
lang RU left UNCOMPRESSED (uncomp=959 bytes ; mvcomp=1082 bytes)
lang TR left UNCOMPRESSED (uncomp=985 bytes ; mvcomp=1038 bytes)

@mateuszviste
Copy link
Collaborator Author

Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK...

saves exactly 6.7K. I have added /excref to TLUMACZ.

@boeckmann
Copy link
Collaborator

Oh, ok then we did it both :D

@boeckmann
Copy link
Collaborator

Also interesting: ZIP only gets the compressed FDISK.LNG 12%(!) smaller while compressing it when wmake dist is called.

@boeckmann
Copy link
Collaborator

Coming within 12% of ZIP is quite remarkable for such a "simple" algorithm. And a fair bit of it will be attributed to the compression of the dictionary, I think.

@boeckmann
Copy link
Collaborator

Regarding the tree makefile: did rm -f actually fail for you under DOS? Because this is a WMake internal and even on DOS should work perfectly fine, according to the manual.

@boeckmann
Copy link
Collaborator

I was thinking about doing a similar calculation during packing, it would be almost free there, and it could be reported by mvcomp.

Thats what I suggested in one of my comments. It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer.

@ecm-pushbx
Copy link

I created the mvsize tool which displays the number of paragraphs needed before the source in order to depack

Very nice. It more or less simulates depacking (without bothering to actually produce a result)

It does actually depack repeatedly. It doesn't store the result anywhere but it does compare it with the original file. That comparison is not strictly needed, corruption of needed source data should be detectable without it, but it does do that to harden the process.

and waits for a "fake overwrite" to happen.

It actually does a binary search, repeatedly depacking, where lower bound starts out as 0 (no additional buffer allocation) and upper bound starts out as the rounded up size of the original file (full additional buffer allocation). In every subsequent run, upper bound is known to work and lower bound is 1 higher than certain failure.

So upper bound minus lower bound, divided by two (rounding down), plus lower bound, is the next additional buffer size to try. If it succeeds to depack at this size then this size is the new upper bound. If it fails then this size plus 1 is the new lower bound.

When the two bounds match then this is the exact minimum size needed before the source payload.

I was thinking about doing a similar calculation during packing, it would be almost free there, and it could be reported by mvcomp.

This is certainly possible but you'd have to study the exact boundaries needed for that. My implementation was already done in inicomp test A mode and I just had to translate it. (By the way, mvsize is a little more hardened than inicomp because it checks the destination size returned from depack. For inicomp the destination size is implied / assumed.)

I compared the results of mvsize and inicomp. Accounting for the addition of the payload size and INIT1 section size they give the same result.

By the way, trying to pack an empty file results in "Floating point exception".

I was thinking about doing a similar calculation during packing, it would be almost free there, and it could be reported by mvcomp.

Thats what I suggested in one of my comments. It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer.

That'd change the compressed file format so I didn't want to do it yet.

@mateuszviste
Copy link
Collaborator Author

It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer.

In the case of TLUMACZ I was rather thinking of doing it in two steps:

  1. the mvcomp routine embedded in TLUMACZ calculates the necessary overhead and allocates the in-lib memory buffer accordingly (the one that is currently fixed to +5%)

  2. make the svarlang_memsz a hard requirement, ie. the library would load an LNG file and if its svarlang_memsz is less than what the LNG wants, fail. This is more strict than what we do currently, but in practice I don't think it will make much of a difference. Plus, no need to change the LNG format, just the way svarlang_memsz is calculated and how svarlang.lib validates it.

So it's similar to the idea you proposed 3 days ago, but without the "you can resort to the slower decompression".

How the "generic" (demo-like) mvpack compressor does it does not matter much since we do not rely on it for SvarLANG compression, but still it would be nice for it to be able to report the size as an information - then if anyone wants, the code is easy to adapt to other needs. After all mvcomp is targeted at programmers that wish to hack around a bit, as it's completely worthless to "normal" users.

By the way, trying to pack an empty file results in "Floating point exception".

Fixed.

I created the mvsize tool which displays the number of paragraphs

FYI - I've imported it to mvcomp's svn. I hope it's fine.

mvcomp packer is slow, compressing the debugger (< 200 KiB) takes more than a dozen seconds. Heatshrink is much faster (...)

Took a quick look at this, typed some letters, removed others. mvcomp is about 33x faster now (my compression test went from 59s to 1.7s). Is it still noticeably slower than heatshrink?

@boeckmann
Copy link
Collaborator

2. make the svarlang_memsz a hard requirement, ie. the library would load an LNG file and if its svarlang_memsz is less than what the LNG wants, fail. This is more strict than what we do currently, but in practice I don't think it will make much of a difference. Plus, no need to change the LNG format, just the way svarlang_memsz is calculated and how svarlang.lib validates it.

Sounds good to me!

@ecm-pushbx
Copy link

By the way, trying to pack an empty file results in "Floating point exception".

Fixed.

I imported all your changes (except the test suite) into my hg repo using a command like for rev in 15 16 18 19; do hg expo -R ../svn -r $rev | hg impo - --config hooks.pretxncommit.checkcommitmessage=true; done

The empty file case works as expected now.

I created the mvsize tool which displays the number of paragraphs

FYI - I've imported it to mvcomp's svn. I hope it's fine.

Yes, absolutely.

mvcomp packer is slow, compressing the debugger (< 200 KiB) takes more than a dozen seconds. Heatshrink is much faster (...)

Took a quick look at this, typed some letters, removed others. mvcomp is about 33x faster now (my compression test went from 59s to 1.7s).

Great! It is certainly much faster now. I also checked, the lCDebugX image that I have used as a test is byte by byte identical to the earlier revision's output.

Is it still noticeably slower than heatshrink?

It is, but only about ten times as slow. That's still faster than some of the other packers that inicomp supports. The heatshrink method is a special case in that it may try up to 66 different combinations of -l and -w to find the best one. Whereas, mvcomp doesn't have any parameters and therefore a single run is enough. Here's a comparison:

ldebug/source$ 123
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ldebug/source$ time (for x in $(seq 0 65); do rm -f packed && ./mvcomp ~/proj/ldebug/tmp/cdebugx.big packed > /dev/null; echo -ne x; done; echo "")
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

real    0m17.238s
user    0m16.973s
sys     0m0.257s
ldebug/source$ time ../../hstest/mvtest/hs.sh ~/proj/ldebug/tmp/cdebugx.big packed
!!.!..!...!....!.....!......!.....................................
/home/evln/proj/ldebug/tmp/cdebugx.big 18.42 %   139616 -> 113893 (-w 11 -l 3)

real    0m1.701s
user    0m1.514s
sys     0m0.187s
ldebug/source$

@mateuszviste
Copy link
Collaborator Author

mateuszviste commented Oct 4, 2024

It is, but only about ten times as slow.

Ugh... still quite a difference. I will see next week if I can make it faster (edit: probably by replacing my primitive tape-rewinding buffer by a proper circular construct). I do not think that I will be able to beat heatshrink, but even a few percent closer to it would be nice.
This evening I slightly improved the compression ratio. The EDR kernel compresses to 89% of its zerocomp size now.

@mateuszviste
Copy link
Collaborator Author

gained an extra x4 speedup on compression during the weekend and slightly improved the compression ratio. not much more I can do. will focus on depacker size now.

@mateuszviste
Copy link
Collaborator Author

Thats what I suggested in one of my comments. It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer.

Looking at mvcomp last evening I realized that calculating the maximum buffer for in-place decompression is not really needed. All we need is to know the "worst case scenario", and allocate this. Since I introduced support for literal strings, mvcomp's worst case scenario went down from 200% to 103%. That is max. file size after decompression is FILESIZE + (FILESIZE / 31) + 1. I've adjusted TLUMACZ accordingly.

I have also implemented a depacker in assembly. It's about 60 bytes long, excluding the compilator's function prolog and epilog. Does not make any difference for SvarLang, though - the code generated from C is about the same size. Changing it to a pragma aux might make a small difference, maybe. But there is not much to gain overall.

@boeckmann
Copy link
Collaborator

boeckmann commented Oct 7, 2024

the code generated from C is about the same size. Changing it to a pragma aux might make a small difference, maybe. But there is not much to gain overall.

That does not wonder me at all. The OpenWatcom code generator is quite good. WAY better than the one Turbo / Borland C provides. It also generates quite fast code. I once compared my assembly svarlang_strid function with the disassembled OpenWatcom version. Not THAT much of a difference. While the assembly version might still be improved I found the OpenWatcom output quite remarkable.

Disassembled C code:

0000                            svarlang_strid_:
0000  53                                push            bx
0001  51                                push            cx
0002  52                                push            dx
0003  56                                push            si
0004  57                                push            di
0005  89 C7                             mov             di,ax
0007  8B 1E 00 00                       mov             bx,_svarlang_string_count
000B  31 D2                             xor             dx,dx
000D  4B                                dec             bx
000E  83 3E 00 00 00                    cmp             word ptr _svarlang_string_count,0x0000
0013  74 23                             je              L$2
0015                            L$1:
0015  39 DA                             cmp             dx,bx
0017  77 1F                             ja              L$2
0019  89 D8                             mov             ax,bx
001B  29 D0                             sub             ax,dx
001D  D1 E8                             shr             ax,0x01
001F  D1 E8                             shr             ax,0x01
0021  01 D0                             add             ax,dx
0023  89 C6                             mov             si,ax
0025  D1 E6                             shl             si,0x01
0027  D1 E6                             shl             si,0x01
0029  8B 8C 00 00                       mov             cx,_svarlang_dict[si]
002D  39 CF                             cmp             di,cx
002F  74 10                             je              L$4
0031  76 17                             jbe             L$5
0033  89 C2                             mov             dx,ax
0035  42                                inc             dx
0036  EB DD                             jmp             L$1
0038                            L$2:
0038  B8 00 00                          mov             ax,offset DGROUP:L$31
003B                            L$3:
003B  5F                                pop             di
003C  5E                                pop             si
003D  5A                                pop             dx
003E  59                                pop             cx
003F  5B                                pop             bx
0040  C3                                ret

@ecm-pushbx
Copy link

Thats what I suggested in one of my comments. It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer.

Looking at mvcomp last evening I realized that calculating the maximum buffer for in-place decompression is not really needed. All we need is to know the "worst case scenario", and allocate this. Since I introduced support for literal strings, mvcomp's worst case scenario went down from 200% to 103%. That is max. file size after decompression is FILESIZE + (FILESIZE / 31) + 1. I've adjusted TLUMACZ accordingly.

I don't understand this at all. Are you saying the worst case is the worst compression, or the best compression? If I follow, the worst (most) amount of memory needed for depacking is linked to the best compression?

I don't understand FILESIZE + (FILESIZE / 31) + 1 either. Is "FILESIZE" the uncompressed data size? Or the compressed data size?

@mateuszviste
Copy link
Collaborator Author

I don't understand this at all.

That's normal, it's because I don't know what I'm doing. Or rather - I focused on the wrong end of the spectrum. :-P I will get back to this subject later during the week.

That does not wonder me at all. The OpenWatcom code generator is quite good.

I did not have much expectation, in fact I was expecting the OW-generated code to be very compact already, it's not the first time I fail to beat the compiler. But I did want to create an asm version anyway, if only to include it in mvcomp's set of examples. I will nonetheless try to convert my assembly into a pragma, just to have a definitive comparison. A depacker in ~60 bytes is quite nice. Will definitely fit in the EDR deblocking buffer (if/when I figure out the rest of the puzzle for this).

WAY better than the one Turbo / Borland C provides.

OW's code generation is excellent, yes. Too bad that programs are still much bigger than when compiled with Turbo C. The OW libc is heavy (also feature-full, but heavy).

@mateuszviste
Copy link
Collaborator Author

With mvucomp() rewritten as a pragma aux I am able to obtain a svarlang.obj 11 bytes smaller than with native all-C code. So it's a worthless move, I will revert to C-only code tomorrow.

I've hit one glitch on the road that I cannot explain. Isn't this:

rep movsb

supposed to be equivalent to this?

ag:
lodsb
stosb
loop ag

the rep movsb version does not seem to increment di when used in a pragma aux block, albeit it did work in my earlier version as a standalone function. I must be missing something obvious. The code is here. If I uncomment line 199 and comment lines 200-203, then di is not incremented (which obviously corrupts all the strings).

@ecm-pushbx
Copy link

I've hit one glitch on the road that I cannot explain. Isn't this:

rep movsb

supposed to be equivalent to this?

ag:
lodsb
stosb
loop ag

Technically you'd have to add a jcxz before the loop and of course movsb doesn't use al. Other than that they are equivalent.

the rep movsb version does not seem to increment di when used in a pragma aux block, albeit it did work in my earlier version as a standalone function. I must be missing something obvious. The code is here. If I uncomment line 199 and comment lines 200-203, then di is not incremented (which obviously corrupts all the strings).

Please compile an example program that fails to increment di for you, and upload it. I don't see why it fails.

@ecm-pushbx
Copy link

Additional question: Does the rep movsb increment di by 1 or not at all? Does it increment si?

@mateuszviste
Copy link
Collaborator Author

As far as I can tell si is incremented properly, and di is not incremented at all.
I observed this on the svarlang test program (in svarlang.lib)

wmake
cd test
build.bat
test.exe

I will compile it tomorrow and upload the binary.

@boeckmann
Copy link
Collaborator

boeckmann commented Oct 7, 2024

the rep movsb version does not seem to increment di when used in a pragma aux block, albeit it did work in my earlier version as a standalone function. I must be missing something obvious. The code is here. If I uncomment line 199 and comment lines 200-203, then di is not incremented (which obviously corrupts all the strings).

http://svn.svardos.org/blame.php?repname=SvarDOS&path=%2Fsvarlang.lib%2Ftrunk%2Fsvarlang.c&rev=2119#l199

The REP STOSB is not emitted at all. This seems to be silently ignored because of the C comment inside the quoted string. Put the C comment in the above line outside the string. For me it solved it. You may use the OpenWatcom wdis command on svarlang.obj to disassemble the object file and compare the disassembled code with the one in the C file.

Index: svarlang.c
===================================================================
--- svarlang.c	(Revision 2119)
+++ svarlang.c	(Arbeitskopie)
@@ -196,11 +196,7 @@
 "    mov si, di"\
 "    sub si, ax"\
 "    /* do the copy */"\
-"    /* rep movsb *//* copy cx bytes from ds:si to es:di + inc si + inc di */"\
-"ag:"\
-"lodsb"\
-"stosb"\
-"loop ag"\
+"    rep movsb" /* copy cx bytes from ds:si to es:di + inc si + inc di */ \
 \
 "    /* restore regs */"\
 "    pop si"\

@mateuszviste
Copy link
Collaborator Author

Ha, I knew it would be something stupid... always is, when the problem starts looking too esoteric. :)
Thanks for checking it out! Good idea comparing the code with its wdis version.

I did not think about rep being flatly ignored because incrementing di manually (inc di, inc di) made the test program display strings that looked better. Of course it must have been a coincidence.

@mateuszviste
Copy link
Collaborator Author

Looking at mvcomp last evening I realized that calculating the maximum buffer for in-place decompression is not really needed. All we need is to know the "worst case scenario", and allocate this.

I don't understand this at all. Are you saying the worst case is the worst compression, or the best compression? If I follow, the worst (most) amount of memory needed for depacking is linked to the best compression?

No, "worst case" is "worst compressability".

I don't understand FILESIZE + (FILESIZE / 31) + 1 either. Is "FILESIZE" the uncompressed data size? Or the compressed data size?

FILESIZE is the uncompressed size. And the result of the above calculation is the size of the depacking buffer that would guarantee (if I'm right) that the compressed stream will never be overwritten during in-place decompression.

Had to think about it again to replay my thought pattern from last weekend, when I first had the idea, because it became fuzzy to me once I sobered up and when you asked for details I panicked. Let's try again:

If a chunk of data is highly compressible then we do not have to worry about the depacking buffer size, because since the compressed stream will be placed at the end of the depacking buffer, more compressible strings automatically translates to a bigger buffer, thus further placement of the compressed stream. We basically cannot loose. Zero-compressability strings are also safe, because they advance the buffer size by exactly the amount of space they take in the compressed stream.

The problem occurs if there is data that is NOT compressible, because such data grows the depacking buffer size slower than how the compressed stream advances, and then we are at risk of catching up to the compressed stream.

Simple example: let's imagine two compressed words AAAA and BBBB. AAAA decompresses to 3 bytes and BBBB to 6 bytes. Hence:

+----buff----+
|            |
|AAAABBBBAAAA|   <-- compressed string
|aaabbbbbbaaa|   <-- decompressed string

We see here that if decompressed in-place with no margin space in the depack buffer, by the time we decompressed BBBB, we have already overwritten the first byte of the second AAAA. This is because the two AAAA's takes more space in the compressed stream than they grow the buffer.

My last weekend idea was to make sure that every compressed word grows the depacking buffer by at least the amount of space it takes in the compressed stream. And to enforce this, I simply increase the depacking buffer by the "worst compression" (negative) margin.

Back to our example words: the maximum overhead here is 33% ('aaa' => 'AAAA'). So following my idea we need to increase the depacking buffer by 4 bytes:

+------buff------+
|                |
|    AAAABBBBAAAA|   <-- compressed string
|aaabbbbbbaaa    |   <-- decompressed string

and now it works.

I am severely lacking in math theory here, so I am working my way up somewhat mechanically. I might be totally wrong. If any ideas, or examples that would prove me wrong - please share.

I am also almost sure that growing the depack buffer by the "worst compression" margin is overkill, the optimal theoretical value is somewhere lower, but for mvcomp it's only 3% anyway so it has little practical impact.

@mateuszviste
Copy link
Collaborator Author

mateuszviste commented Oct 8, 2024

the size of the svarlang_load() + mvucomp code is as follows:

with mvucomp as a C function = 420 bytes (mvucomp=88 bytes + svarlang_load=332 bytes)
with mvucomp as a pragma aux = 393 bytes

So mvcomp support in LNG files costs 61 bytes when implemented as a pragma, and 88 bytes when implemented as a C function.

I do not know if it makes any sense to keep the asm version for just 27 bytes of gain, but since it's done I left it for the time being and made it controllable through a -D directive in the makefile. This code might perhaps be reused later in the assembly version of the svarlang loader.

I have also extended TLUMACZ so it reports the size of the required in-place depacking buffer for any given language. Apparently for all languages only a few extra bytes are needed, between 2 and 8 depending on the language file. These few bytes are added to the 5% safety margin now.

@ecm-pushbx
Copy link

Looking at mvcomp last evening I realized that calculating the maximum buffer for in-place decompression is not really needed. All we need is to know the "worst case scenario", and allocate this.

I don't understand this at all. Are you saying the worst case is the worst compression, or the best compression? If I follow, the worst (most) amount of memory needed for depacking is linked to the best compression?

No, "worst case" is "worst compressability".

My thought process is that badly compressed data (ie literals) never is a problem for in-place depacking, as they always consume at least as many source bytes as the amount of destination bytes that they produce. This is why in mvsize and inicomp I only put a pointer overlap check at the very beginning and after processing of well compressed data (ie backreference matches).

I don't understand FILESIZE + (FILESIZE / 31) + 1 either. Is "FILESIZE" the uncompressed data size? Or the compressed data size?

FILESIZE is the uncompressed size. And the result of the above calculation is the size of the depacking buffer that would guarantee (if I'm right) that the compressed stream will never be overwritten during in-place decompression.

What exactly is the buffer size? I assume you don't mean UncompressedSize + (UncompressedSize / 31) + 1 + CompressedSize (that'd be trivially equivalent to not-in-place depacking) , so is the entire buffer (including the space for the compressed payload initially written to the tail end) supposed to be UncompressedSize + (UncompressedSize / 31) + 1 ?

Had to think about it again to replay my thought pattern from last weekend, when I first had the idea, because it became fuzzy to me once I sobered up and when you asked for details I panicked. Let's try again:

If a chunk of data is highly compressible then we do not have to worry about the depacking buffer size, because since the compressed stream will be placed at the end of the depacking buffer, more compressible strings automatically translates to a bigger buffer, thus further placement of the compressed stream. We basically cannot loose.

I tried to construct a test case that'd fail this but I was unable to, your UncompressedSize + (UncompressedSize / 31) + 1 formula actually seems to work. I also compared it to the mvsize result. It still works, albeit mvsize finds a smaller minimal size than your formula:

mvcomp$ time ./mvsize ~/proj/ldebug/tmp/cdebugx.big packed
Done depack with full buffer.
SSSFSFFSFFSSSF
Done binary search, 14 attempts, needs 1557 paragraphs before source.
1557

real    0m0.007s
user    0m0.004s
sys     0m0.003s
mvcomp$ stat -c %s ~/proj/ldebug/tmp/cdebugx.big
139840
mvcomp$ stat -c %s ~/proj/ldebug/tmp/cdebugx.big
139840
mvcomp$ uncompressedsize="$(stat -c %s ~/proj/ldebug/tmp/cdebugx.big)"
mvcomp$ buffersize="$((uncompressedsize + uncompressedsize / 31 + 1))"
mvcomp$ compressedsize="$(stat -c %s packed)"
mvcomp$ mvsizelength="$((1557 * 16))"
mvcomp$ echo $((compressedsize + mvsizelength))
139852
mvcomp$ echo $((buffersize))
144351
mvcomp$

However, I think that what you implemented is yet another approach:

    /* monitor the amount of bytes that the compressed stream is "ahead" of
     * uncompressed data, this is an information used later to size a proper
     * buffer for in-place depacking */
    if (complen + litqueuelen + 2 > bytesprocessed) {
      unsigned short mvstreamlen = complen + litqueuelen + 2;
      if (*maxbytesahead < (mvstreamlen - bytesprocessed)) *maxbytesahead = mvstreamlen - bytesprocessed;
    }

I think this is probably equivalent to the mvsize implementation in its result.

Zero-compressability strings are also safe, because they advance the buffer size by exactly the amount of space they take in the compressed stream.

The problem occurs if there is data that is NOT compressible, because such data grows the depacking buffer size slower than how the compressed stream advances, and then we are at risk of catching up to the compressed stream.

I think your approach is backwards as compared to mine: You care about the back end of the source stream "catching up" to the destination write pointer, which it can only do with uncompressible/literal data. I care about the destination write pointer "catching up" to the source read pointer, which it can only do with compressed/backreference data. In truth both conditions are the same, but my check is after backreferences (and it is exacting) whereas your approach is to grow the buffer enough (depacked size + 1/31st) so that the end of the compressed source data is never possibly reached at the end of the destination data. (At every point of progressing through the buffer, your formula holds this true for the then-end of source and the then-last destination write pointer.)

Simple example: let's imagine two compressed words AAAA and BBBB. AAAA decompresses to 3 bytes and BBBB to 6 bytes. Hence:

+----buff----+
|            |
|AAAABBBBAAAA|   <-- compressed string
|aaabbbbbbaaa|   <-- decompressed string

We see here that if decompressed in-place with no margin space in the depack buffer, by the time we decompressed BBBB, we have already overwritten the first byte of the second AAAA. This is because the two AAAA's takes more space in the compressed stream than they grow the buffer.

My last weekend idea was to make sure that every compressed word grows the depacking buffer by at least the amount of space it takes in the compressed stream. And to enforce this, I simply increase the depacking buffer by the "worst compression" (negative) margin.

Back to our example words: the maximum overhead here is 33% ('aaa' => 'AAAA'). So following my idea we need to increase the depacking buffer by 4 bytes:

+------buff------+
|                |
|    AAAABBBBAAAA|   <-- compressed string
|aaabbbbbbaaa    |   <-- decompressed string

and now it works.

Note that you could grow your buffer by just a single byte in that case. Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):

+------buff------+
|                |
| AAAABBBBAAAA   |   <-- compressed string
|aaabbbbbbaaa    |   <-- decompressed string

I am severely lacking in math theory here, so I am working my way up somewhat mechanically. I might be totally wrong. If any ideas, or examples that would prove me wrong - please share.

I am also almost sure that growing the depack buffer by the "worst compression" margin is overkill, the optimal theoretical value is somewhere lower, but for mvcomp it's only 3% anyway so it has little practical impact.

Yes, I agree, as in my example mvsize actually finds a smaller combined buffer size than your formula. However, I now think that your maxbytesahead approach will result in the same size as mvsize.

@ecm-pushbx
Copy link

Note that you could grow your buffer by just a single byte in that case. Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):

...

Yes, I agree, as in my example mvsize actually finds a smaller combined buffer size than your formula. However, I now think that your maxbytesahead approach will result in the same size as mvsize.

Modulo the fact that mvsize operates on paragraphs of course. Cannot check how the source data length is rounded up to a paragraph boundary right now as our server is down for a while today.

@mateuszviste
Copy link
Collaborator Author

My thought process is that badly compressed data (ie literals) never is a problem for in-place depacking, as they always consume at least as many source bytes as the amount of destination bytes that they produce. This is why in mvsize and inicomp I only put a pointer overlap check at the very beginning and after processing of well compressed data (ie backreference matches).

It is true that they are not a problem in the sense that they will never overwrite the destination themselves. But they are causing the problem indirectly, since their result is small, they do not contribute enough to the final size of the depacking buffer, and this makes the entire depacking operation at risk because the compressed stream gets dangerously closer to the start of the buffer.

FILESIZE is the uncompressed size. And the result of the above calculation is the size of the depacking buffer that would guarantee (if I'm right) that the compressed stream will never be overwritten during in-place decompression.

What exactly is the buffer size? I assume you don't mean UncompressedSize + (UncompressedSize / 31) + 1 + CompressedSize (that'd be trivially equivalent to not-in-place depacking) , so is the entire buffer (including the space for the compressed payload initially written to the tail end) supposed to be UncompressedSize + (UncompressedSize / 31) + 1 ?

Yes, exactly. My postulate is that a buffer sized as uncompressed_len + (uncompressed_len / 31) + 1 guarantees that the source may be depacked in place, independently of its content and compression ratio.

Note however, that this is a theoretical thing now since TLUMACZ sizes the depacking buffer in a more practical and measured way.

I tried to construct a test case that'd fail this but I was unable to, your UncompressedSize + (UncompressedSize / 31) + 1 formula actually seems to work. I also compared it to the mvsize result. It still works, albeit mvsize finds a smaller minimal size than your formula:

This is very much expected, because mvsize operates on actual data so it calculates the real practical value. My formula is only providing a universal limit that guarantees everything that matches the uncompressedSize will depack properly in-place. But even this formula is probably an overestimation, because for the problem to happen there must be at least something that compresses well. But I am too short in my jeans to come up with a precise mathematical formula. This all came to me instinctively, I have no clue about the math mechanics behind it.

However, I think that what you implemented is yet another approach:

Yes, this is the trivial computation that I mentioned a few days ago, which matches also what Bernd was suggesting a little bit earlier. I'm just checking the length difference between compressed data and uncompressed data at compress time. It's nothing smart, just a stupid measure. But it provides the exact solution to the problem (just like your mvsize algorithm).

I think this is probably equivalent to the mvsize implementation in its result.

Most certainly, yes. It's just much simpler to do it while compressing data rather than working on already compressed data like what you did.

I think your approach is backwards as compared to mine: You care about the back end of the source stream "catching up" to the destination write pointer, which it can only do with uncompressible/literal data. I care about the destination write pointer "catching up" to the source read pointer, which it can only do with compressed/backreference data. In truth both conditions are the same, but my check is after backreferences (and it is exacting) whereas your approach is to grow the buffer enough (depacked size + 1/31st) so that the end of the compressed source data is never possibly reached at the end of the destination data. (At every point of progressing through the buffer, your formula holds this true for the then-end of source and the then-last destination write pointer.)

Exactly. That's why I answered yesterday that "I focused on the wrong end of the the spectrum" - because in the meantime I forgot how weekend-Mateusz came up with this formula (plus he didn't leave me any notes) and I felt I've got it backward. But replaying the thought process today it came back to me. And it is backward indeed, but still true and has the advantage to be universal for all sets of compressed data (I think).

Note that you could grow your buffer by just a single byte in that case.

Yes, but this is data-dependent. Another set of data might fail. The formula I proposed was meant to simplify the programmer's life by not worrying at all about the data - ie. "just know that you will be able to depack in-place any compressed data of length X if you provide a buffer of size Y". There may be situations where the programmer does not know the data up front.

Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):

True. Measuring it provides the best buffer size for the data at hand. And that's why I ended up doing it the "maxbytesahead" way in TLUMACZ.

I think SvarLANG is as ready as can reasonably be, so I will most probably release a new version tomorrow. Then I will finally get to focus at TREE again -- likely publishing a first SvarDOS version later this week.

@ecm-pushbx
Copy link

Note that you could grow your buffer by just a single byte in that case. Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):

...

Yes, I agree, as in my example mvsize actually finds a smaller combined buffer size than your formula. However, I now think that your maxbytesahead approach will result in the same size as mvsize.

Modulo the fact that mvsize operates on paragraphs of course. Cannot check how the source data length is rounded up to a paragraph boundary right now as our server is down for a while today.

I'm always writing the compressed source at a paragraph boundary, and the entire depack buffer consists of the rounded-up original size plus the rounded-up packed size. Up to 15 trailing bytes in the source buffer are unused.

I also updated mvsize to fix a few wrong filename displays: https://hg.pushbx.org/ecm/mvcomp/rev/338250570cff

@ecm-pushbx
Copy link

My thought process is that badly compressed data (ie literals) never is a problem for in-place depacking, as they always consume at least as many source bytes as the amount of destination bytes that they produce. This is why in mvsize and inicomp I only put a pointer overlap check at the very beginning and after processing of well compressed data (ie backreference matches).

It is true that they are not a problem in the sense that they will never overwrite the destination themselves. But they are causing the problem indirectly, since their result is small, they do not contribute enough to the final size of the depacking buffer, and this makes the entire depacking operation at risk because the compressed stream gets dangerously closer to the start of the buffer.

Yes, backwards as compared to my thinking.

FILESIZE is the uncompressed size. And the result of the above calculation is the size of the depacking buffer that would guarantee (if I'm right) that the compressed stream will never be overwritten during in-place decompression.

What exactly is the buffer size? I assume you don't mean UncompressedSize + (UncompressedSize / 31) + 1 + CompressedSize (that'd be trivially equivalent to not-in-place depacking) , so is the entire buffer (including the space for the compressed payload initially written to the tail end) supposed to be UncompressedSize + (UncompressedSize / 31) + 1 ?

Yes, exactly. My postulate is that a buffer sized as uncompressed_len + (uncompressed_len / 31) + 1 guarantees that the source may be depacked in place, independently of its content and compression ratio.

Note however, that this is a theoretical thing now since TLUMACZ sizes the depacking buffer in a more practical and measured way.

Indeed.

I tried to construct a test case that'd fail this but I was unable to, your UncompressedSize + (UncompressedSize / 31) + 1 formula actually seems to work. I also compared it to the mvsize result. It still works, albeit mvsize finds a smaller minimal size than your formula:

This is very much expected, because mvsize operates on actual data so it calculates the real practical value. My formula is only providing a universal limit that guarantees everything that matches the uncompressedSize will depack properly in-place. But even this formula is probably an overestimation, because for the problem to happen there must be at least something that compresses well. But I am too short in my jeans to come up with a precise mathematical formula. This all came to me instinctively, I have no clue about the math mechanics behind it.

I'm no good at maths my self. As evidenced by mvsize/inicomp using a very stubborn way to detect the minimum size.

However, I think that what you implemented is yet another approach:

Yes, this is the trivial computation that I mentioned a few days ago, which matches also what Bernd was suggesting a little bit earlier. I'm just checking the length difference between compressed data and uncompressed data at compress time. It's nothing smart, just a stupid measure. But it provides the exact solution to the problem (just like your mvsize algorithm).

Writing of which I think mvsize could do with a single run as well if it keeps track of (base-)plus-dst versus (base-)plus-src and compares them in a similar fashion to what you're doing here.

I think this is probably equivalent to the mvsize implementation in its result.

Most certainly, yes. It's just much simpler to do it while compressing data rather than working on already compressed data like what you did.

See above. I may try to do this to simplify mvsize's performance.

I think your approach is backwards as compared to mine: You care about the back end of the source stream "catching up" to the destination write pointer, which it can only do with uncompressible/literal data. I care about the destination write pointer "catching up" to the source read pointer, which it can only do with compressed/backreference data. In truth both conditions are the same, but my check is after backreferences (and it is exacting) whereas your approach is to grow the buffer enough (depacked size + 1/31st) so that the end of the compressed source data is never possibly reached at the end of the destination data. (At every point of progressing through the buffer, your formula holds this true for the then-end of source and the then-last destination write pointer.)

Exactly. That's why I answered yesterday that "I focused on the wrong end of the the spectrum" - because in the meantime I forgot how weekend-Mateusz came up with this formula (plus he didn't leave me any notes) and I felt I've got it backward. But replaying the thought process today it came back to me. And it is backward indeed, but still true and has the advantage to be universal for all sets of compressed data (I think).

Yeah, likely so.

Note that you could grow your buffer by just a single byte in that case.

Yes, but this is data-dependent. Another set of data might fail. The formula I proposed was meant to simplify the programmer's life by not worrying at all about the data - ie. "just know that you will be able to depack in-place any compressed data of length X if you provide a buffer of size Y". There may be situations where the programmer does not know the data up front.

Now you've got it backward again =) In the case of the magic 1/31th formula you need to know the maximum uncompressed size. For inicomp, the uncompressed size is not known (currently) at all to the depacker, it only knows how large the source is and how much memory it has available. So it wouldn't be able to check the free size against the 1/31th formula because it doesn't know the uncompressed size.

Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):

True. Measuring it provides the best buffer size for the data at hand. And that's why I ended up doing it the "maxbytesahead" way in TLUMACZ.

Agreed.

@mateuszviste
Copy link
Collaborator Author

Now you've got it backward again =) In the case of the magic 1/31th formula you need to know the maximum uncompressed size.

Bad wording on my part, sorry. Of course you need to know the uncompressed size up front, otherwise all bets are off and you are left with only 3 unattractive options:

  • allocate the best-ratio-case buffer compressedSize * 8 + (compressedSize * 8 / 31) + 1
  • fake-decomp the stream in memory to learn its uncompressed lenght, then allocating a suitable buffer and decompress it for real
  • analyze the stream mvsize-style to learn its uncompressed len

It's much easier and more efficient to just save the original size somewhere (or even better - save the size of the required depacking buffer for in-place decompression).

@boeckmann
Copy link
Collaborator

I think SvarLANG is as ready as can reasonably be, so I will most probably release a new version tomorrow.

Perfect, then I will import it into FDISK the next days :)

@mateuszviste
Copy link
Collaborator Author

SvarLANG 20241010 released.

...closely followed by SvarDOS TREE 20241010 (available through PKGNET or on the packages web repo).

@ecm-pushbx
Copy link

  1. I updated mvsize to allow specifying a different unit albeit it must be a power of 2.
  2. I also added a maxbytesahead calculation to the first depack run. The resulting maxunitsahead should match what the stubborn mvsize run calculates. That means if everything works we don't have to do the entire run any longer, just the first depack call would suffice.
  3. Fixed a few typos in the in-place.txt file.

@mateuszviste
Copy link
Collaborator Author

will pick it all tomorrow. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages SvarDOS packages
Projects
None yet
Development

No branches or pull requests

4 participants