from activity monitor for rust program

memory - real mem 977KB - 464KB -- basic program 8MB - 9MB -- with objects

rust repr

Productions: 169,372 actions: 4,449,272 gotos: 616,826

python obj sizes

> asizeof.asizeof([0, 1, 3])
176
> asizeof.asizeof((0, 1, 3))
152

> asizeof.asizeof(p)
7,755,824
> asizeof.asizeof(parser.action)
6,776,520

> asizeof.asizeof('stringme' * 10)
136
> asizeof.asizeof(b'stringme' * 10)
120

after using tuples instead of top-level dict

> asizeof.asizeof(p)
7,548,088

> write_parser_table(output_path=Path('/tmp/v1-str.pickle'))
data size: asizeof.asizeof(data)=7,322,024
PosixPath('/tmp/v1-str.pickle')

using bytes for action names

> write_parser_table(output_path=Path('/tmp/v1-bytes.pickle'))
pickle size: asizeof.asizeof(data)=11,326,744
PosixPath('/tmp/v1-bytes.pickle')

since Python strings does interning (str with same content is allocated only once), bytes dont bring any improvement

tried using marshal

there was not much difference with pickle

PLY table Size reduction strategies

initial sizes

actions: list[dict[str, int] | int] where index is state id and value is either a dict of symbol to action or a default action

asizeof.asizeof(productions)=260KiB
asizeof.asizeof(actions)=6.18MiB
asizeof.asizeof(gotos)=560KiB

type	file-size
pickle	975.55KiB
py	1.61 MiB
jsonl	1.61 MiB

timings for Xonsh v0.16 parser with tasks.profile_mem.py::xonsh_ply_small_string()

current=9197.4KiB,  peak=11233.3KiB
Total allocated size: 8932.5 KiB
Took:  1.21s

merge overridden actions to base class - no change in sizes

_object_size(productions)='260.96 KiB'
_object_size(actions)='6.18 MiB'
_object_size(gotos)='560.97 KiB'

peak memory usage using lr-table.py type

benchmarks.PeakMemSuite.peakmem_parser_init_ ok [75.00%] ··· ============================ ======= param1 ---------------------------- ------- /tmp/xonsh-lr-table.pickle 31.5M /tmp/xonsh-lr-table.py 215M /tmp/xonsh-lr-table.jsonl 33.5M ============================ ======= benchmarks.TrackLrParserSize.track_lr_parser_size 1/4 failed [100.00%] ··· ============================= ======== param1 ----------------------------- -------- /tmp/xonsh-lr-table.pickle 7.54M /tmp/xonsh-lr-table.py 4.8M /tmp/xonsh-lr-table.jsonl 7.53M /tmp/xonsh-lr-table.cpickle 7.54M

Using default action reduction strategy reduce_to_default_action

_object_size(productions)='244.72 KiB'
_object_size(actions)='5.16 MiB' len(actions)=1661
_object_size(gotos)='465.41 KiB'

timings of PLY parser with tasks.profile_mem.py::ply_small_string()


current=3370.6KiB,  peak=10706.6KiB
Total allocated size: 10399.7 KiB
Took:  2.60s

with PEGen based parser

current=2173.5KiB,  peak=2231.1KiB
...
Total allocated size: 2166.9 KiB
Took:  1.57s

but the same time PLY parsing was faster but used more memory

Took:  0.31s
current=2625.3KiB,  peak=9884.7KiB
...
Total allocated size: 9799.6 KiB

1. after some optimizations to the tokenizer and grammar generated by PEG

current=1391.4KiB,  peak=1448.3KiB
...
1178 other: 418.8 KiB
Total allocated size: 1390.9 KiB
Took:  0.55s

Compiling with mypyc doesn't improve much

needed to add following to use it successfully

from mypy_extensions import mypyc_attr


@mypyc_attr(allow_interpreted_subclasses=True)
class Parser():
    ...

with mypyc

current=2125.4KiB,  peak=28725.6KiB
...
1836 other: 757.9 KiB
Total allocated size: 1387.7 KiB
Took:  4.03s

without mypyc

current=1904.7KiB,  peak=1947.3KiB
...
1948 other: 596.0 KiB
Total allocated size: 1901.8 KiB
Took:  1.39s

PEG parser sizes

final peg_parser/parser.py sizes

step	size	Uniq/Code/Lines
initial	361K
after removing extra spaces	356K
optimize LOCATIONS	322K
after repetitions deduplication	294K
after short location names	287K	2336/9775/10083
optimize gathered	261K	2104/8665/8973
after ruff formatting	229K	2257/6006/6316
after mark passing from memoize	223K	2261/5638/5949
reuse rhs if rule is same	213K	2165/5307/5617
simplified seq_alts	203K	2122/5001/5311

optimized get_last_non_whitespace_token brought benchmarks.PeakMemSuite.peakmem_parser_large_file runtime to 5s from 10s

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

42517    6.443    0.000    6.529    0.000 tokenizer.py:169(get_last_non_whitespace_token)

# after optimization
39117    0.056    0.000    0.123    0.000 tokenizer.py:169(get_last_non_whitespace_token)

Memoization

there is not huge improvement in speed with memoization

-------------------------------------------------------------------------------------- benchmark 'large-file': 4 tests --------------------------------------------------------------------------------------- Name (time in ms) Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_large_file[RuffParser] 24.4814 (1.0) 0.3910 (1.0) 24.3660 (1.0) 0.2627 (1.0) 4;3 40.8473 (1.0) 37 1 test_large_file[TreeSitter] 41.4037 (1.69) 4.0364 (10.32) 40.5053 (1.66) 0.5786 (2.20) 1;1 24.1524 (0.59) 24 1 test_large_file[PlyParser] 1,791.7076 (73.19) 16.0015 (40.92) 1,795.4418 (73.69) 26.7454 (101.81) 2;0 0.5581 (0.01) 5 1 test_large_file[PegenParser-memoize-all] 2,674.6126 (171.35) 20.5175 (66.92) 2,669.3158 (171.85) 17.8950 (111.88) 1;1 0.3739 (0.01) 5 1 test_large_file[PegenParser] 5,312.4585 (217.00) 44.2033 (113.05) 5,325.4162 (218.56) 80.0707 (304.81) 1;0 0.1882 (0.00) 5 1

----------------------------------------------------------------------------------------- benchmark 'small-string': 4 tests ----------------------------------------------------------------------------------------- Name (time in us) Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_small_string[RuffParser] 6.8298 (1.0) 0.3288 (1.0) 6.7920 (1.0) 0.0840 (1.0) 12;22 146,417.6823 (1.0) 773 1 test_small_string[TreeSitter] 10.2818 (1.51) 1.0306 (3.13) 10.1670 (1.50) 0.1250 (1.49) 10;27 97,259.6897 (0.66) 786 1 test_small_string[PlyParser] 268.5103 (39.31) 13.1569 (40.02) 263.9170 (38.86) 6.8540 (81.60) 15;19 3,724.2525 (0.03) 179 1 test_small_string[PegenParser-memoize-all] 405.4760 (134.34) 41.5348 (180.66) 400.7085 (133.57) 34.6040 (823.40) 13;7 2.4662 (0.01) 440 1 test_small_string[PegenParser] 1,074.9712 (157.39) 55.4020 (168.50) 1,067.2500 (157.13) 13.5200 (160.95) 8;13 930.2574 (0.01) 171 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!