Support duplicate bag in insert and lookup #1718

jgonet · 2025-06-17T14:53:08Z

~~Merge after #1614. New code after "ets hash table: rename status enum".~~

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

jgonet · 2025-06-17T14:55:32Z

src/libAtomVM/nifs.c

+        switch (key_atom) {
+            case NAMED_TABLE_ATOM: {
+                is_named = true;
+                break;
+            }
+            case PRIVATE_ATOM: {
+                private = true;
+                public = false;
+                break;
+            }
+            case PUBLIC_ATOM: {
+                private = false;
+                public = true;
+                break;
+            }
+            case SET_ATOM: {
+                is_duplicate_bag = false;
+                break;
+            }
+            case DUPLICATE_BAG_ATOM: {
+                is_duplicate_bag = true;
+                break;
+            }
+            case KEYPOS_ATOM: {
+                VALIDATE_VALUE(value, term_is_integer);
+                avm_int_t keypos = term_to_int(value);
+                if (UNLIKELY(keypos < 1)) {
+                    RAISE_ERROR(BADARG_ATOM);
+                }
+                key_index = keypos - 1;
+                break;
+            }
+            default:
+                RAISE_ERROR(BADARG_ATOM);
+        }
+        options = term_get_list_tail(options);


I was thinking about adding interop_parse_kw_opts(term kw, &opts) but this loop relies on ability to override keys when they're repeated (e.g. [set, duplicate_bag, set]). Maybe in the future.

jgonet · 2025-06-17T14:57:57Z

src/libAtomVM/ets_hashtable.c

+}
+
+// TODO: create list elsewhere, by copying terms from orig heap, appending new copied tuple and using ets_hashtable_new_node
+struct HNode *ets_hashtable_new_node_from_list(term old_tuples, term tuple, size_t key_index)


This copies old tuple list from previous node's heap, copies tuple from process' heap and merges them together on the new heap. TODO is for removing this function and using utility function to do that and then just pass new heap to new_node.

src/libAtomVM/ets.c

bettio

I'm adding a first round of comments here, I did not yet finished this review, so more comments will follow.
A rebase on latest main would defintely help, since the previous ets PR has been merged.

src/libAtomVM/ets.c

bettio · 2025-07-01T17:58:43Z

src/libAtomVM/nifs.c

    return term_invalid_term();
 }

+static inline term get_option_key(term option, term *maybe_value)


it took me a moment to find the purpose of this function, so I suggest naming it extract_option_key_value().

Also, if you plan to make something generic (while keeping a clean and simple API) that can be used elsewhere as well, returning true as value for atoms might be a good choice (while using undefined when no property is found).

bettio · 2025-07-01T19:14:45Z

src/libAtomVM/nifs.c

+        VALIDATE_VALUE(key_atom, term_is_atom);
+
+        switch (key_atom) {
+            case NAMED_TABLE_ATOM: {


fun fact: on OTP it is not a proplist, so ets:new(foo, [{named_table, true}]). raises badarg, but ets:new(fiz, [named_table]) doesn't.
So I think this can be handled with a simple change:

switch (head) { case NAMED_TABLE_ATOM: ... default: VALIDATE_VALUE(head, term_is_tuple); term value; term key_atom = get_option_key(head, &value); VALIDATE_VALUE(key_atom, term_is_atom);

ets:new(foo, [{named_table, true}]). also, let's make sure in tests that fails as expected.

src/libAtomVM/ets.c

Signed-off-by: Jakub Gonet <[email protected]>

ETS mixes position (1-based) with indexing (0-based). Tuple are 0-indexed so we should convert to it at earliest possible moment. This commit doesn't introduce any change in behavior. Signed-off-by: Jakub Gonet <[email protected]>

Signed-off-by: Jakub Gonet <[email protected]>

ETS supports setting different key position (default: first element in tuple) in ets:new/2. Checking if updated element shouldn't be placed as first element is wrong. Signed-off-by: Jakub Gonet <[email protected]>

Additionally, unifies tuple arity checks (no change in behavior). Signed-off-by: Jakub Gonet <[email protected]>

Signed-off-by: Jakub Gonet <[email protected]>

bettio

I left some minor comments, I think this PR is already quite good.

Let's remember to update this in CHANGELOG, so we can let everyone know that we have support for additional options: there should be already an entry for ETS, there is no need to add a new one since that feature has not been released yet, let's just make sure that the ets line is updated.

Important: it looks like "test_ets:FAILED" is failing on the BEAM, please check the error and let's make sure that tests behave the same way on the BEAM and AtomVM.

bettio · 2025-07-10T09:16:57Z

tests/erlang_tests/test_ets.erl

+    ok = assert_badarg(fun() -> ets:update_counter(Tid, foo, 10) end),
+    % ok = assert_badarg(fun() -> ets:update_element(Tid, foo, {1, error}) end),
+
+    % true = ets:delete_object(Tid, {bad, bad}),


In case we are leaving some commented out tests, let's write a comment why.

bettio · 2025-07-14T13:38:39Z

src/libAtomVM/ets_hashtable.c

+struct HNode *ets_hashtable_new_node(term tuple, int key_index)
 {
+    assert(term_is_tuple(tuple));
+    assert(term_get_tuple_arity(tuple) >= key_index);


Let's add an empty line after this, in order to separate the logical block with assertions from the rest of the function.

bettio · 2025-07-14T14:08:32Z

src/libAtomVM/ets.c

-    if ((size_t) term_get_tuple_arity(entry) < keypos + 1) {
+    while (term_is_nonempty_list(list)) {
+        term tuple = term_get_list_head(list);
+        nodes[last_i] = tuple_to_insertion_node(ets_table, tuple, global);


I did a check here, but I'd love a confirm from your side: in case of OOM, do we remain in a consistent state? I want to be sure we don't leave half inserted stuff or any other incosistency.

Here, yes. The main flow is to create hash table nodes first and then insert, assuming they don't fail.
If we're really low on memory, term_compare may fail and we end up in the inconsistent state (we assert for that).

Unfortunately, I don't see how this can be fixed without making hashtable code more complicated – we'd need to essentially copy entire hashtable, add new nodes and discard it if OOM.

bettio · 2025-07-14T14:12:13Z

src/libAtomVM/ets.c

+    } else if (is_duplicate_bag) {
+        assert(term_is_list(ets_entry));
+        // for tuple list and it reversed version - we don't want to copy terms in the loop
+        int _proper;


we have an UNUSED(x) macro, let's use it.
Also, in C, identifiers starting with a single underscore are reserved.

bettio · 2025-07-14T14:17:54Z

src/libAtomVM/ets.c

    }

-    bool _found = ets_hashtable_remove(ets_table->hashtable, key, ets_table->keypos, ctx->global);
+    bool _found = ets_hashtable_remove(ets_table->hashtable, key, ctx->global);


Se previous comment about underscore. We can even ignore and discard return value. No need to assign it.

I'll leave it for readability, it will be optimized out anyway.

bettio · 2025-07-14T14:20:48Z

src/libAtomVM/term.h

    const term *boxed_value = term_to_const_term_ptr(t);

    TERM_DEBUG_ASSERT((size_t) elem_index < term_get_size_from_boxed_header(boxed_value[0]));
-


This empty line can be kept.

jgonet · 2025-08-19T23:29:56Z

@bettio I discovered a problem with ets:insert(T, list) – if we have a list of [a, b, a, c], the second a wins the write, dropping the data. I think I'll split this PR and rework the hashtable internals to have two levels of mapping. One for having just key, one for entry:

struct HNode
{
    struct HNode *next;
    term key; // owned by first entry
    struct Entry *entries;
};

struct Entry {
    struct Entry* next;
    term key; // key from tuple
    term item; // a tuple
    Heap heap;
};

This way I avoid problems with list allocation – current approach is to have key -> list of tuples mapping. This is annoying when you discover you need to add another item to list because you need to allocate new heap, copy terms, etc.

Ets insertion would create a new empty hash table, insert tuples there, and pass HNodes from it to original hashtable. Single item would create single HNode with single Entry.

If you have any comments on how to simplify this design, please share.

jgonet commented Jun 17, 2025

View reviewed changes

bettio requested changes Jul 1, 2025

View reviewed changes

jgonet added 17 commits July 8, 2025 21:46

ets: add duplicate_bag option

f632d7b

Signed-off-by: Jakub Gonet <[email protected]>

ets: no prefix for internal functions, add missing headers

3700a10

Signed-off-by: Jakub Gonet <[email protected]>

term: improve docs

fbcf57f

Signed-off-by: Jakub Gonet <[email protected]>

ets: use index instead of position (NIFs)

e87cc6a

ETS mixes position (1-based) with indexing (0-based). Tuple are 0-indexed so we should convert to it at earliest possible moment. This commit doesn't introduce any change in behavior. Signed-off-by: Jakub Gonet <[email protected]>

ets: use 0-indexing for lookup/2

bf77d3b

Signed-off-by: Jakub Gonet <[email protected]>

ets: use 0-indexing for update_counter/4

d892b24

Signed-off-by: Jakub Gonet <[email protected]>

ets: fix update_counter position check

64f2bb2

ETS supports setting different key position (default: first element in tuple) in ets:new/2. Checking if updated element shouldn't be placed as first element is wrong. Signed-off-by: Jakub Gonet <[email protected]>

ets: rename ets_table->keypos to key_index

d791640

Additionally, unifies tuple arity checks (no change in behavior). Signed-off-by: Jakub Gonet <[email protected]>

ets: rename params

72ea5b2

Signed-off-by: Jakub Gonet <[email protected]>

ets: rename entry to tuple in insert

c804e42

Signed-off-by: Jakub Gonet <[email protected]>

ets: rename internal lookup function

daaabc2

Signed-off-by: Jakub Gonet <[email protected]>

ets: use has_key_at helper everywhere

492beb1

Signed-off-by: Jakub Gonet <[email protected]>

ets: support duplicate_bag in lookup/2

684d4b8

Signed-off-by: Jakub Gonet <[email protected]>

ets: support duplicate_bag in insert/2 list

98acad0

Signed-off-by: Jakub Gonet <[email protected]>

ets: remove prefix from internal functions

bca4c4f

Signed-off-by: Jakub Gonet <[email protected]>

ets: add duplicate_bag error for update_counter/4

d6eee06

Signed-off-by: Jakub Gonet <[email protected]>

ets: uncomment working tests

0e7d3f5

Signed-off-by: Jakub Gonet <[email protected]>

jgonet force-pushed the jgonet/ets-2 branch from 42fd2a1 to 0e7d3f5 Compare July 8, 2025 19:48

bettio requested changes Jul 14, 2025

View reviewed changes

		const term *boxed_value = term_to_const_term_ptr(t);

		TERM_DEBUG_ASSERT((size_t) elem_index < term_get_size_from_boxed_header(boxed_value[0]));

Support duplicate bag in insert and lookup #1718

Are you sure you want to change the base?

Support duplicate bag in insert and lookup #1718

Uh oh!

Conversation

jgonet commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bettio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bettio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgonet commented Aug 19, 2025

Uh oh!

Uh oh!

jgonet commented Jun 17, 2025 •

edited

Loading