Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 2.53 KB

README.md

File metadata and controls

38 lines (27 loc) · 2.53 KB

python-dict-stats

Quick & dirty analysis of Python dict performance characteristics across versions

Installation and usage

Clone the repository, then run ./generate.sh (assumes Linux).

The versions.txt file lists versions of Python to try. The script will assume that the corresponding Python environments can be found like "python{version}" - so, python2.7, python3.5 etc.

An output/ folder will appear with summary reports for each Python version, and a CSV file comparing all the results.

Discussion

First see this reference link on Stack Overflow: How are Python's Built In Dictionaries Implemented?

The results should look something like:

version 2.7.x 3.5.x 3.6.x 3.7.x 3.8.x 3.9.x 3.10.x 3.11.x 3.12.x
empty dict size 280 288 240 248 64 64 64 64 64
5-key letter-to-number dict size 280 288 240 248 232 232 232 184 184
5-key number-to-letter dict size 280 288 240 248 232 232 232 224 224
10-key letter-to-number dict size 1048 480 368 376 360 360 360 272 272
10-key number-to-letter dict size 1048 480 368 376 360 360 360 352 352
resizes if 0 keys removed 1 * 1 1 1 1 1 1 1
resizes if 1 key removed 5 * 5 5 5 5 5 5 5
resizes if 2 keys removed 10 * 10 10 10 10 10 10 10
resizes if 3 keys removed 10 * 0 0 0 0 10 10 10
resizes if 4 keys removed 5 * 0 0 0 0 5 5 5
resizes if 5 keys removed 0 * 0 0 0 0 0 0 0

(The resize results for Python 3.5 are not consistent between runs.)

The first set of rows shows the size in bytes (as reported by sys.getsizeof) for the specified dicts, created directly from literals (e.g. the empty dict is created by {}). This does not count the size of the key or value objects themselves.

For the second set of rows: first, a dict is created with 5 key-value pairs; then some number of keys is deld; then a key is added. Each possible combination of keys to remove is tried, and the code reports how many combinations result in the dict being resized. (N.B. Across all versions of Python, the smallest hash table has 8 entries, and two-thirds of table entries (so, 5) may be used before a table resize is triggered. However, deleting keys may or may not "make room" for the sixth key to be added, depending on the implementation.)