-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace TREE with PDTREE #127
Comments
Well, that bullet list doesn't exactly look like "minor work". But if you think its worth it, go for it. Maybe we should put a counter anywhere which counts the packages still being GPL licensed, and try to converge that towards zero :) |
it should build with ow given it builds with msvc, but I will check it this week. note I have a pd implementation of cats that I plan on switching out the lgl one with and releasing as fd tree 4.0. It's been a while, I'm guessing it builds with a batch file. Adding a make based build for ow should be trivial, so I will check on it. I guess I need to make sure all translations are updated, both versions use same files for translations. There is no c++ other than maybe some variable scoping, I believe it was a naming conflict resolution. The api is based on win32 findfile, so removing lfn support likely won't really reduce size much. I have too look, may need to add a findfile using ow pragma aux. (off topic, but fd format is switching back to small model, but small penalty to copy strings from far buffer to near accessible one). I will try to get some updates on GitHub later this week. |
I did think about it for a short time, but ultimately dropped the idea because I do not want to give the impression that "fighting" with GPL is one of SvarDOS goals.
So pdTree will soon become the default FreeDOS tree, replacing the 3.7.2 version from Dave Dunfield? Cool. Is there any practical reason for it? BTW the FreeDOS listing is somewhat confusing: it lists TREE 3.7.2 by Dave Dunfield, but provides a link to your PD version:
It's exactly what I was planning to do :) |
I started doing some maintenance & trimming on pdTree here: So far I removed most of Windows-specific stuff, utf8 output, CATGETS, and wide characters support. I have also replaced all C++-isms by ANSI C equivalents, so this fork of pdTree is 100% C now. |
After some more hours of hacking, slashing and reformatting, SvarDOS TREE is finally functional. UPXed it is a 11K COM file now. Not as small as I want it to be, but there is still hope to get a binary smaller than the GPL FreeDOS TREE (10K) once I get rid of printf. I will probably not go the WMINCRT route, though, since I rely heavily on OpenWatcom's libc now. Two things that I have yet to do:
|
Well done :) We could spare some bytes by not writing the default language to the .lng file. We could adapt tlumacz to at least optionally skip outputting the default language. For FDISK, for example, that would save over 10k bytes. And I am sure that I will recompile the program when I have to update the default language anyway. |
Yes but then it would not be possible to re-load EN from within the application (if the application supportes such reloads). This is used by the SvarDOS installer. |
a win would be to have each language in the LNG file optionally compressed. It's human language, lots of redundancy. There must be some algorithm nowadays that is capable of in-place depacking with a reasonably small depacking code. |
I took the opportunity to make the makefile compatible with Linux. Should still work fine under DOS (at least in DosBox it does). |
You may want to look at my use of heatshrink in my Extensions for lDebug, extpak.eld and list.eld. You need a buffer the size of the depack window. If you access the compressed file in a linear way then you can re-use the same depacker state across several calls into the depacker. |
Main depacker is in https://hg.pushbx.org/ecm/ldebug/file/a35f88de973a/source/eld/depack.asm |
Seems a bit complicated. I was thinking about something much simpler. Maybe a custom algorithm that would read bytes from the data file and when spotting a special marker (say, 0xFF O S), it would know that it has to copy now S bytes from offset -O in past stream. Probably not the most efficient approach, but should be easy to implement in any language, and might provide good enough results for text strings. I might investigate this in incoming days, once I am done with SvarTREE. |
I committed an experimental change to SvarLANG: TLUMACZ outputs now two versions of the LNG file: one as usual (OUT.LNG) and the other one compressed (OUTC.LNG). SvarLANG is able to read both. Older versions of SvarLANG will not crash on it, just ignore the compressed languages. The compression scheme is very simple, I invented it during a coffee break and implemented within an evening hour. It is not meant to be state of the art - just somewhat better than uncompressed text and simplest possible to decompress with no extra memory buffer. It assumes highly redundancy of comoressed data, should work well with text, but attempting to compress binary things is likely to produce "compressed" data twice larger than the original. It works on a simple example, I will proceed with further tests tomorrow or next week. There is also one or two tweaks I'd like to check. |
Some preliminary results:
TREE still works after compression. FDISK I could not check because for some reasons I am not able to compile it (unrelated to SvarLANG). |
SVARCOM.LNG compresses from 59K to 35K. Works properly. This looks quite good. I think I will publish a new SvarLANG version today or tomorrow, and then migrate SVARCOM to it. I just need to test it on some real & ancient (286) hardware first to make sure it is not too slow. |
Awesome :) Time then for a new FDISK version.
Can you give some details about the build environment? Then I can try to reproduce. |
Btw. the current FDISK.LNG (version 1.3.16) is 104k. May it be that you are working with an older FDISK source? |
Or is it only the strings you are counting? |
Yes, I took some old source tree that was laying on my HDD, I did not have enough courage to study git again to find how to checkout the latest code. I will download the latest source as a zip file and try again. |
tested with latest FDISK source tree. With SvarLANG's MVCOMP the strings are: FDISK works fine (help screen and main screen display correctly). |
Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK... |
Be my guest. :) I think I am done with SvarLANG for now, MVCOMP is working unexpectedly well for such a quick hack. |
I have moved mvcomp compression of lang blocks to a dedicated switch: |
Having the compression flag on a per-language basis allows to have some languages in the LNG compressed and other not. I was not sure if this was useful to do it like that, but now I see it was a good decision. In the case of TREE, some languages can be slightly compressed, while other do not, so TLUMACZ applies compression only where it makes sense:
|
saves exactly 6.7K. I have added /excref to TLUMACZ. |
Oh, ok then we did it both :D |
Also interesting: ZIP only gets the compressed FDISK.LNG 12%(!) smaller while compressing it when |
Coming within 12% of ZIP is quite remarkable for such a "simple" algorithm. And a fair bit of it will be attributed to the compression of the dictionary, I think. |
Regarding the tree makefile: did |
Thats what I suggested in one of my comments. It should not only be reported but written to the file, so that a depacker can decide if it can depack in-place or has to allocate a buffer. |
It does actually depack repeatedly. It doesn't store the result anywhere but it does compare it with the original file. That comparison is not strictly needed, corruption of needed source data should be detectable without it, but it does do that to harden the process.
It actually does a binary search, repeatedly depacking, where lower bound starts out as 0 (no additional buffer allocation) and upper bound starts out as the rounded up size of the original file (full additional buffer allocation). In every subsequent run, upper bound is known to work and lower bound is 1 higher than certain failure. So upper bound minus lower bound, divided by two (rounding down), plus lower bound, is the next additional buffer size to try. If it succeeds to depack at this size then this size is the new upper bound. If it fails then this size plus 1 is the new lower bound. When the two bounds match then this is the exact minimum size needed before the source payload.
This is certainly possible but you'd have to study the exact boundaries needed for that. My implementation was already done in inicomp test A mode and I just had to translate it. (By the way, mvsize is a little more hardened than inicomp because it checks the destination size returned from depack. For inicomp the destination size is implied / assumed.) I compared the results of mvsize and inicomp. Accounting for the addition of the payload size and INIT1 section size they give the same result. By the way, trying to pack an empty file results in "Floating point exception".
That'd change the compressed file format so I didn't want to do it yet. |
In the case of TLUMACZ I was rather thinking of doing it in two steps:
So it's similar to the idea you proposed 3 days ago, but without the "you can resort to the slower decompression". How the "generic" (demo-like) mvpack compressor does it does not matter much since we do not rely on it for SvarLANG compression, but still it would be nice for it to be able to report the size as an information - then if anyone wants, the code is easy to adapt to other needs. After all mvcomp is targeted at programmers that wish to hack around a bit, as it's completely worthless to "normal" users.
Fixed.
FYI - I've imported it to mvcomp's svn. I hope it's fine.
Took a quick look at this, typed some letters, removed others. mvcomp is about 33x faster now (my compression test went from 59s to 1.7s). Is it still noticeably slower than heatshrink? |
Sounds good to me! |
I imported all your changes (except the test suite) into my hg repo using a command like The empty file case works as expected now.
Yes, absolutely.
Great! It is certainly much faster now. I also checked, the lCDebugX image that I have used as a test is byte by byte identical to the earlier revision's output.
It is, but only about ten times as slow. That's still faster than some of the other packers that inicomp supports. The heatshrink method is a special case in that it may try up to 66 different combinations of -l and -w to find the best one. Whereas, mvcomp doesn't have any parameters and therefore a single run is enough. Here's a comparison:
|
Ugh... still quite a difference. I will see next week if I can make it faster (edit: probably by replacing my primitive tape-rewinding buffer by a proper circular construct). I do not think that I will be able to beat heatshrink, but even a few percent closer to it would be nice. |
gained an extra x4 speedup on compression during the weekend and slightly improved the compression ratio. not much more I can do. will focus on depacker size now. |
Looking at mvcomp last evening I realized that calculating the maximum buffer for in-place decompression is not really needed. All we need is to know the "worst case scenario", and allocate this. Since I introduced support for literal strings, mvcomp's worst case scenario went down from 200% to 103%. That is max. file size after decompression is I have also implemented a depacker in assembly. It's about 60 bytes long, excluding the compilator's function prolog and epilog. Does not make any difference for SvarLang, though - the code generated from C is about the same size. Changing it to a pragma aux might make a small difference, maybe. But there is not much to gain overall. |
That does not wonder me at all. The OpenWatcom code generator is quite good. WAY better than the one Turbo / Borland C provides. It also generates quite fast code. I once compared my assembly svarlang_strid function with the disassembled OpenWatcom version. Not THAT much of a difference. While the assembly version might still be improved I found the OpenWatcom output quite remarkable. Disassembled C code:
|
I don't understand this at all. Are you saying the worst case is the worst compression, or the best compression? If I follow, the worst (most) amount of memory needed for depacking is linked to the best compression? I don't understand |
That's normal, it's because I don't know what I'm doing. Or rather - I focused on the wrong end of the spectrum. :-P I will get back to this subject later during the week.
I did not have much expectation, in fact I was expecting the OW-generated code to be very compact already, it's not the first time I fail to beat the compiler. But I did want to create an asm version anyway, if only to include it in mvcomp's set of examples. I will nonetheless try to convert my assembly into a pragma, just to have a definitive comparison. A depacker in ~60 bytes is quite nice. Will definitely fit in the EDR deblocking buffer (if/when I figure out the rest of the puzzle for this).
OW's code generation is excellent, yes. Too bad that programs are still much bigger than when compiled with Turbo C. The OW libc is heavy (also feature-full, but heavy). |
With mvucomp() rewritten as a pragma aux I am able to obtain a svarlang.obj 11 bytes smaller than with native all-C code. So it's a worthless move, I will revert to C-only code tomorrow. I've hit one glitch on the road that I cannot explain. Isn't this:
supposed to be equivalent to this?
the |
Technically you'd have to add a
Please compile an example program that fails to increment |
Additional question: Does the |
As far as I can tell si is incremented properly, and di is not incremented at all.
I will compile it tomorrow and upload the binary. |
The
|
Ha, I knew it would be something stupid... always is, when the problem starts looking too esoteric. :) I did not think about rep being flatly ignored because incrementing di manually (inc di, inc di) made the test program display strings that looked better. Of course it must have been a coincidence. |
No, "worst case" is "worst compressability".
FILESIZE is the uncompressed size. And the result of the above calculation is the size of the depacking buffer that would guarantee (if I'm right) that the compressed stream will never be overwritten during in-place decompression. Had to think about it again to replay my thought pattern from last weekend, when I first had the idea, because it became fuzzy to me once I sobered up and when you asked for details I panicked. Let's try again: If a chunk of data is highly compressible then we do not have to worry about the depacking buffer size, because since the compressed stream will be placed at the end of the depacking buffer, more compressible strings automatically translates to a bigger buffer, thus further placement of the compressed stream. We basically cannot loose. Zero-compressability strings are also safe, because they advance the buffer size by exactly the amount of space they take in the compressed stream. The problem occurs if there is data that is NOT compressible, because such data grows the depacking buffer size slower than how the compressed stream advances, and then we are at risk of catching up to the compressed stream. Simple example: let's imagine two compressed words AAAA and BBBB. AAAA decompresses to 3 bytes and BBBB to 6 bytes. Hence:
We see here that if decompressed in-place with no margin space in the depack buffer, by the time we decompressed BBBB, we have already overwritten the first byte of the second AAAA. This is because the two AAAA's takes more space in the compressed stream than they grow the buffer. My last weekend idea was to make sure that every compressed word grows the depacking buffer by at least the amount of space it takes in the compressed stream. And to enforce this, I simply increase the depacking buffer by the "worst compression" (negative) margin. Back to our example words: the maximum overhead here is 33% ('aaa' => 'AAAA'). So following my idea we need to increase the depacking buffer by 4 bytes:
and now it works. I am severely lacking in math theory here, so I am working my way up somewhat mechanically. I might be totally wrong. If any ideas, or examples that would prove me wrong - please share. I am also almost sure that growing the depack buffer by the "worst compression" margin is overkill, the optimal theoretical value is somewhere lower, but for mvcomp it's only 3% anyway so it has little practical impact. |
the size of the svarlang_load() + mvucomp code is as follows: with mvucomp as a C function = 420 bytes (mvucomp=88 bytes + svarlang_load=332 bytes) So mvcomp support in LNG files costs 61 bytes when implemented as a pragma, and 88 bytes when implemented as a C function. I do not know if it makes any sense to keep the asm version for just 27 bytes of gain, but since it's done I left it for the time being and made it controllable through a -D directive in the makefile. This code might perhaps be reused later in the assembly version of the svarlang loader. I have also extended TLUMACZ so it reports the size of the required in-place depacking buffer for any given language. Apparently for all languages only a few extra bytes are needed, between 2 and 8 depending on the language file. These few bytes are added to the 5% safety margin now. |
My thought process is that badly compressed data (ie literals) never is a problem for in-place depacking, as they always consume at least as many source bytes as the amount of destination bytes that they produce. This is why in mvsize and inicomp I only put a pointer overlap check at the very beginning and after processing of well compressed data (ie backreference matches).
What exactly is the buffer size? I assume you don't mean
I tried to construct a test case that'd fail this but I was unable to, your
However, I think that what you implemented is yet another approach:
I think this is probably equivalent to the mvsize implementation in its result.
I think your approach is backwards as compared to mine: You care about the back end of the source stream "catching up" to the destination write pointer, which it can only do with uncompressible/literal data. I care about the destination write pointer "catching up" to the source read pointer, which it can only do with compressed/backreference data. In truth both conditions are the same, but my check is after backreferences (and it is exacting) whereas your approach is to grow the buffer enough (depacked size + 1/31st) so that the end of the compressed source data is never possibly reached at the end of the destination data. (At every point of progressing through the buffer, your formula holds this true for the then-end of source and the then-last destination write pointer.)
Note that you could grow your buffer by just a single byte in that case. Your maxbytesahead and my approach should both result in this smaller buffer size, which still works (bbbbbb destination writing doesn't corrupt the second source AAAA before it is read):
Yes, I agree, as in my example mvsize actually finds a smaller combined buffer size than your formula. However, I now think that your maxbytesahead approach will result in the same size as mvsize. |
...
Modulo the fact that mvsize operates on paragraphs of course. Cannot check how the source data length is rounded up to a paragraph boundary right now as our server is down for a while today. |
It is true that they are not a problem in the sense that they will never overwrite the destination themselves. But they are causing the problem indirectly, since their result is small, they do not contribute enough to the final size of the depacking buffer, and this makes the entire depacking operation at risk because the compressed stream gets dangerously closer to the start of the buffer.
Yes, exactly. My postulate is that a buffer sized as Note however, that this is a theoretical thing now since TLUMACZ sizes the depacking buffer in a more practical and measured way.
This is very much expected, because mvsize operates on actual data so it calculates the real practical value. My formula is only providing a universal limit that guarantees everything that matches the uncompressedSize will depack properly in-place. But even this formula is probably an overestimation, because for the problem to happen there must be at least something that compresses well. But I am too short in my jeans to come up with a precise mathematical formula. This all came to me instinctively, I have no clue about the math mechanics behind it.
Yes, this is the trivial computation that I mentioned a few days ago, which matches also what Bernd was suggesting a little bit earlier. I'm just checking the length difference between compressed data and uncompressed data at compress time. It's nothing smart, just a stupid measure. But it provides the exact solution to the problem (just like your mvsize algorithm).
Most certainly, yes. It's just much simpler to do it while compressing data rather than working on already compressed data like what you did.
Exactly. That's why I answered yesterday that "I focused on the wrong end of the the spectrum" - because in the meantime I forgot how weekend-Mateusz came up with this formula (plus he didn't leave me any notes) and I felt I've got it backward. But replaying the thought process today it came back to me. And it is backward indeed, but still true and has the advantage to be universal for all sets of compressed data (I think).
Yes, but this is data-dependent. Another set of data might fail. The formula I proposed was meant to simplify the programmer's life by not worrying at all about the data - ie. "just know that you will be able to depack in-place any compressed data of length X if you provide a buffer of size Y". There may be situations where the programmer does not know the data up front.
True. Measuring it provides the best buffer size for the data at hand. And that's why I ended up doing it the "maxbytesahead" way in TLUMACZ. I think SvarLANG is as ready as can reasonably be, so I will most probably release a new version tomorrow. Then I will finally get to focus at TREE again -- likely publishing a first SvarDOS version later this week. |
I'm always writing the compressed source at a paragraph boundary, and the entire depack buffer consists of the rounded-up original size plus the rounded-up packed size. Up to 15 trailing bytes in the source buffer are unused. I also updated mvsize to fix a few wrong filename displays: https://hg.pushbx.org/ecm/mvcomp/rev/338250570cff |
Yes, backwards as compared to my thinking.
Indeed.
I'm no good at maths my self. As evidenced by mvsize/inicomp using a very stubborn way to detect the minimum size.
Writing of which I think mvsize could do with a single run as well if it keeps track of (base-)plus-dst versus (base-)plus-src and compares them in a similar fashion to what you're doing here.
See above. I may try to do this to simplify mvsize's performance.
Yeah, likely so.
Now you've got it backward again =) In the case of the magic 1/31th formula you need to know the maximum uncompressed size. For inicomp, the uncompressed size is not known (currently) at all to the depacker, it only knows how large the source is and how much memory it has available. So it wouldn't be able to check the free size against the 1/31th formula because it doesn't know the uncompressed size.
Agreed. |
Bad wording on my part, sorry. Of course you need to know the uncompressed size up front, otherwise all bets are off and you are left with only 3 unattractive options:
It's much easier and more efficient to just save the original size somewhere (or even better - save the size of the required depacking buffer for in-place decompression). |
Perfect, then I will import it into FDISK the next days :) |
SvarLANG 20241010 released. ...closely followed by SvarDOS TREE 20241010 (available through PKGNET or on the packages web repo). |
|
will pick it all tomorrow. thanks! |
SvarDOS comes currently with TREE v3.7.2 from the FreeDOS project, itself being a 1995 commercial code by Dave Dunfield, GPL-ized somewhere in the 2000s.
There is also pdTree: a public domain version written by @PerditionC
https://web.archive.org/web/20011212151852/http://www.darklogic.org/fdos/projects/tree/
https://github.com/FDOS/tree
Glancing at the source code it appears to be a very clean and memory-efficient implementation. Perhaps SvarDOS could use it instead of the current GPL tree after some minor work:
The text was updated successfully, but these errors were encountered: