Skip to content

Latest commit

 

History

History
232 lines (181 loc) · 8.58 KB

experiments.md

File metadata and controls

232 lines (181 loc) · 8.58 KB

from activity monitor for rust program

memory - real mem 977KB - 464KB -- basic program 8MB - 9MB -- with objects

rust repr

Productions: 169,372 actions: 4,449,272 gotos: 616,826

python obj sizes

> asizeof.asizeof([0, 1, 3])
176
> asizeof.asizeof((0, 1, 3))
152

> asizeof.asizeof(p)
7,755,824
> asizeof.asizeof(parser.action)
6,776,520

> asizeof.asizeof('stringme' * 10)
136
> asizeof.asizeof(b'stringme' * 10)
120

after using tuples instead of top-level dict

> asizeof.asizeof(p)
7,548,088

> write_parser_table(output_path=Path('/tmp/v1-str.pickle'))
data size: asizeof.asizeof(data)=7,322,024
PosixPath('/tmp/v1-str.pickle')

using bytes for action names

> write_parser_table(output_path=Path('/tmp/v1-bytes.pickle'))
pickle size: asizeof.asizeof(data)=11,326,744
PosixPath('/tmp/v1-bytes.pickle')
  • since Python strings does interning (str with same content is allocated only once), bytes dont bring any improvement

tried using marshal

  • there was not much difference with pickle

PLY table Size reduction strategies

  1. initial sizes
  • actions: list[dict[str, int] | int] where index is state id and value is either a dict of symbol to action or a default action
asizeof.asizeof(productions)=260KiB
asizeof.asizeof(actions)=6.18MiB
asizeof.asizeof(gotos)=560KiB
type file-size
pickle 975.55KiB
py 1.61 MiB
jsonl 1.61 MiB
  • timings for Xonsh v0.16 parser with tasks.profile_mem.py::xonsh_ply_small_string()
current=9197.4KiB,  peak=11233.3KiB
Total allocated size: 8932.5 KiB
Took:  1.21s
  1. merge overridden actions to base class - no change in sizes
_object_size(productions)='260.96 KiB'
_object_size(actions)='6.18 MiB'
_object_size(gotos)='560.97 KiB'
  1. peak memory usage using lr-table.py type

benchmarks.PeakMemSuite.peakmem_parser_init_ ok [75.00%] ··· ============================ ======= param1 ---------------------------- ------- /tmp/xonsh-lr-table.pickle 31.5M /tmp/xonsh-lr-table.py 215M /tmp/xonsh-lr-table.jsonl 33.5M ============================ ======= benchmarks.TrackLrParserSize.track_lr_parser_size 1/4 failed [100.00%] ··· ============================= ======== param1 ----------------------------- -------- /tmp/xonsh-lr-table.pickle 7.54M /tmp/xonsh-lr-table.py 4.8M /tmp/xonsh-lr-table.jsonl 7.53M /tmp/xonsh-lr-table.cpickle 7.54M

  1. Using default action reduction strategy reduce_to_default_action
_object_size(productions)='244.72 KiB'
_object_size(actions)='5.16 MiB' len(actions)=1661
_object_size(gotos)='465.41 KiB'
  • timings of PLY parser with tasks.profile_mem.py::ply_small_string()

current=3370.6KiB,  peak=10706.6KiB
Total allocated size: 10399.7 KiB
Took:  2.60s

with PEGen based parser

current=2173.5KiB,  peak=2231.1KiB
...
Total allocated size: 2166.9 KiB
Took:  1.57s

but the same time PLY parsing was faster but used more memory

Took:  0.31s
current=2625.3KiB,  peak=9884.7KiB
...
Total allocated size: 9799.6 KiB
    1. after some optimizations to the tokenizer and grammar generated by PEG
current=1391.4KiB,  peak=1448.3KiB
...
1178 other: 418.8 KiB
Total allocated size: 1390.9 KiB
Took:  0.55s

Compiling with mypyc doesn't improve much

needed to add following to use it successfully

from mypy_extensions import mypyc_attr


@mypyc_attr(allow_interpreted_subclasses=True)
class Parser():
    ...

with mypyc

current=2125.4KiB,  peak=28725.6KiB
...
1836 other: 757.9 KiB
Total allocated size: 1387.7 KiB
Took:  4.03s

without mypyc

current=1904.7KiB,  peak=1947.3KiB
...
1948 other: 596.0 KiB
Total allocated size: 1901.8 KiB
Took:  1.39s

PEG parser sizes

  • final peg_parser/parser.py sizes
step size Uniq/Code/Lines
initial 361K
after removing extra spaces 356K
optimize LOCATIONS 322K
after repetitions deduplication 294K
after short location names 287K 2336/9775/10083
optimize gathered 261K 2104/8665/8973
after ruff formatting 229K 2257/6006/6316
after mark passing from memoize 223K 2261/5638/5949
reuse rhs if rule is same 213K 2165/5307/5617
simplified seq_alts 203K 2122/5001/5311
  • optimized get_last_non_whitespace_token brought benchmarks.PeakMemSuite.peakmem_parser_large_file runtime to 5s from 10s
ncalls  tottime  percall  cumtime  percall filename:lineno(function)

42517    6.443    0.000    6.529    0.000 tokenizer.py:169(get_last_non_whitespace_token)

# after optimization
39117    0.056    0.000    0.123    0.000 tokenizer.py:169(get_last_non_whitespace_token)

Memoization

  • there is not huge improvement in speed with memoization

-------------------------------------------------------------------------------------- benchmark 'large-file': 4 tests --------------------------------------------------------------------------------------- Name (time in ms) Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_large_file[RuffParser] 24.4814 (1.0) 0.3910 (1.0) 24.3660 (1.0) 0.2627 (1.0) 4;3 40.8473 (1.0) 37 1 test_large_file[TreeSitter] 41.4037 (1.69) 4.0364 (10.32) 40.5053 (1.66) 0.5786 (2.20) 1;1 24.1524 (0.59) 24 1 test_large_file[PlyParser] 1,791.7076 (73.19) 16.0015 (40.92) 1,795.4418 (73.69) 26.7454 (101.81) 2;0 0.5581 (0.01) 5 1 test_large_file[PegenParser-memoize-all] 2,674.6126 (171.35) 20.5175 (66.92) 2,669.3158 (171.85) 17.8950 (111.88) 1;1 0.3739 (0.01) 5 1 test_large_file[PegenParser] 5,312.4585 (217.00) 44.2033 (113.05) 5,325.4162 (218.56) 80.0707 (304.81) 1;0 0.1882 (0.00) 5 1

----------------------------------------------------------------------------------------- benchmark 'small-string': 4 tests ----------------------------------------------------------------------------------------- Name (time in us) Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_small_string[RuffParser] 6.8298 (1.0) 0.3288 (1.0) 6.7920 (1.0) 0.0840 (1.0) 12;22 146,417.6823 (1.0) 773 1 test_small_string[TreeSitter] 10.2818 (1.51) 1.0306 (3.13) 10.1670 (1.50) 0.1250 (1.49) 10;27 97,259.6897 (0.66) 786 1 test_small_string[PlyParser] 268.5103 (39.31) 13.1569 (40.02) 263.9170 (38.86) 6.8540 (81.60) 15;19 3,724.2525 (0.03) 179 1 test_small_string[PegenParser-memoize-all] 405.4760 (134.34) 41.5348 (180.66) 400.7085 (133.57) 34.6040 (823.40) 13;7 2.4662 (0.01) 440 1 test_small_string[PegenParser] 1,074.9712 (157.39) 55.4020 (168.50) 1,067.2500 (157.13) 13.5200 (160.95) 8;13 930.2574 (0.01) 171 1