Skip to content

Conversation

jgonet
Copy link
Contributor

@jgonet jgonet commented Jun 17, 2025

Merge after #1614. New code after "ets hash table: rename status enum".

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

Comment on lines +3341 to +3618
switch (key_atom) {
case NAMED_TABLE_ATOM: {
is_named = true;
break;
}
case PRIVATE_ATOM: {
private = true;
public = false;
break;
}
case PUBLIC_ATOM: {
private = false;
public = true;
break;
}
case SET_ATOM: {
is_duplicate_bag = false;
break;
}
case DUPLICATE_BAG_ATOM: {
is_duplicate_bag = true;
break;
}
case KEYPOS_ATOM: {
VALIDATE_VALUE(value, term_is_integer);
avm_int_t keypos = term_to_int(value);
if (UNLIKELY(keypos < 1)) {
RAISE_ERROR(BADARG_ATOM);
}
key_index = keypos - 1;
break;
}
default:
RAISE_ERROR(BADARG_ATOM);
}
options = term_get_list_tail(options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding interop_parse_kw_opts(term kw, &opts) but this loop relies on ability to override keys when they're repeated (e.g. [set, duplicate_bag, set]). Maybe in the future.

}

// TODO: create list elsewhere, by copying terms from orig heap, appending new copied tuple and using ets_hashtable_new_node
struct HNode *ets_hashtable_new_node_from_list(term old_tuples, term tuple, size_t key_index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copies old tuple list from previous node's heap, copies tuple from process' heap and merges them together on the new heap. TODO is for removing this function and using utility function to do that and then just pass new heap to new_node.

Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a first round of comments here, I did not yet finished this review, so more comments will follow.
A rebase on latest main would defintely help, since the previous ets PR has been merged.

return term_invalid_term();
}

static inline term get_option_key(term option, term *maybe_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me a moment to find the purpose of this function, so I suggest naming it extract_option_key_value().

Also, if you plan to make something generic (while keeping a clean and simple API) that can be used elsewhere as well, returning true as value for atoms might be a good choice (while using undefined when no property is found).

VALIDATE_VALUE(key_atom, term_is_atom);

switch (key_atom) {
case NAMED_TABLE_ATOM: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun fact: on OTP it is not a proplist, so ets:new(foo, [{named_table, true}]). raises badarg, but ets:new(fiz, [named_table]) doesn't.
So I think this can be handled with a simple change:

switch (head) {
   case NAMED_TABLE_ATOM:
      ...
   default:
       VALIDATE_VALUE(head, term_is_tuple);
       term value;
       term key_atom = get_option_key(head, &value);
       VALIDATE_VALUE(key_atom, term_is_atom);

ets:new(foo, [{named_table, true}]). also, let's make sure in tests that fails as expected.

jgonet added 17 commits July 8, 2025 21:46
Signed-off-by: Jakub Gonet <[email protected]>
ETS mixes position (1-based) with indexing (0-based).
Tuple are 0-indexed so we should convert to it at earliest possible moment.

This commit doesn't introduce any change in behavior.

Signed-off-by: Jakub Gonet <[email protected]>
ETS supports setting different key position (default: first element in tuple) in ets:new/2.
Checking if updated element shouldn't be placed as first element is wrong.

Signed-off-by: Jakub Gonet <[email protected]>
Additionally, unifies tuple arity checks (no change in behavior).

Signed-off-by: Jakub Gonet <[email protected]>
Signed-off-by: Jakub Gonet <[email protected]>
Signed-off-by: Jakub Gonet <[email protected]>
Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, I think this PR is already quite good.

Let's remember to update this in CHANGELOG, so we can let everyone know that we have support for additional options: there should be already an entry for ETS, there is no need to add a new one since that feature has not been released yet, let's just make sure that the ets line is updated.

Important: it looks like "test_ets:FAILED" is failing on the BEAM, please check the error and let's make sure that tests behave the same way on the BEAM and AtomVM.

ok = assert_badarg(fun() -> ets:update_counter(Tid, foo, 10) end),
% ok = assert_badarg(fun() -> ets:update_element(Tid, foo, {1, error}) end),

% true = ets:delete_object(Tid, {bad, bad}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we are leaving some commented out tests, let's write a comment why.

struct HNode *ets_hashtable_new_node(term tuple, int key_index)
{
assert(term_is_tuple(tuple));
assert(term_get_tuple_arity(tuple) >= key_index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an empty line after this, in order to separate the logical block with assertions from the rest of the function.

if ((size_t) term_get_tuple_arity(entry) < keypos + 1) {
while (term_is_nonempty_list(list)) {
term tuple = term_get_list_head(list);
nodes[last_i] = tuple_to_insertion_node(ets_table, tuple, global);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a check here, but I'd love a confirm from your side: in case of OOM, do we remain in a consistent state? I want to be sure we don't leave half inserted stuff or any other incosistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, yes. The main flow is to create hash table nodes first and then insert, assuming they don't fail.
If we're really low on memory, term_compare may fail and we end up in the inconsistent state (we assert for that).

Unfortunately, I don't see how this can be fixed without making hashtable code more complicated – we'd need to essentially copy entire hashtable, add new nodes and discard it if OOM.

} else if (is_duplicate_bag) {
assert(term_is_list(ets_entry));
// for tuple list and it reversed version - we don't want to copy terms in the loop
int _proper;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an UNUSED(x) macro, let's use it.
Also, in C, identifiers starting with a single underscore are reserved.

}

bool _found = ets_hashtable_remove(ets_table->hashtable, key, ets_table->keypos, ctx->global);
bool _found = ets_hashtable_remove(ets_table->hashtable, key, ctx->global);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Se previous comment about underscore. We can even ignore and discard return value. No need to assign it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it for readability, it will be optimized out anyway.

const term *boxed_value = term_to_const_term_ptr(t);

TERM_DEBUG_ASSERT((size_t) elem_index < term_get_size_from_boxed_header(boxed_value[0]));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line can be kept.

@jgonet
Copy link
Contributor Author

jgonet commented Aug 19, 2025

@bettio I discovered a problem with ets:insert(T, list) – if we have a list of [a, b, a, c], the second a wins the write, dropping the data. I think I'll split this PR and rework the hashtable internals to have two levels of mapping. One for having just key, one for entry:

struct HNode
{
    struct HNode *next;
    term key; // owned by first entry
    struct Entry *entries;
};

struct Entry {
    struct Entry* next;
    term key; // key from tuple
    term item; // a tuple
    Heap heap;
};
image

This way I avoid problems with list allocation – current approach is to have key -> list of tuples mapping. This is annoying when you discover you need to add another item to list because you need to allocate new heap, copy terms, etc.

Ets insertion would create a new empty hash table, insert tuples there, and pass HNodes from it to original hashtable. Single item would create single HNode with single Entry.

If you have any comments on how to simplify this design, please share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants