Skip to content

Conversation

rruuaanng
Copy link

@rruuaanng rruuaanng commented Oct 18, 2025

hash constant is widely used in file verification, so
I suggest adding a hash method for strings to obtain
the hash constant of a string. For its usage example:

hash_str = 'foobar'.hash('sha1') # return a sha1 constant
# some process

@rruuaanng rruuaanng marked this pull request as ready for review October 18, 2025 02:52
@bonzini
Copy link
Collaborator

bonzini commented Oct 18, 2025

MD5 is insecure and totally should not be used. Maybe SHA1 or SHA256 but certainly not MD5.

@rruuaanng
Copy link
Author

Maybe I should add a hash() that behaves in the same way as the cmake string().

@rruuaanng rruuaanng changed the title Add a built-in md5 function Add a built-in hash function Oct 18, 2025
@rruuaanng
Copy link
Author

rruuaanng commented Oct 18, 2025

@bonzini I think it looks really great, please review it again :)

Copy link
Collaborator

@bonzini bonzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments: testcases are needed and it should be a method rather than a function.

@rruuaanng rruuaanng changed the title Add a built-in hash function Add a hash method for strings Oct 18, 2025
- sha3_224
- sha3_256
- sha3_384
- sha3_512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python documentation lists more algorithms that are always present.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image It looks like there is no more :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong section of the docs. That section specifically uses the language "such as these", indicating it isn't complete. Elsewhere on that page, it says:

Constructors for hash algorithms that are always present in this module are sha1(), sha224(), sha256(), sha384(), sha512(), sha3_224(), sha3_256(), sha3_384(), sha3_512(), shake_128(), shake_256(), blake2b(), and blake2s(). md5() is normally available as well, though it may be missing or blocked if you are using a rare “FIPS compliant” build of Python. These correspond to algorithms_guaranteed.

Additional algorithms may also be available if your Python distribution’s hashlib was linked against a build of OpenSSL that provides others. Others are not guaranteed available on all installations and will only be accessible by name via new(). See algorithms_available.

@dnicolodi
Copy link
Member

Unless I'm missing something, this implementation of hash() cannot support binary files (str in Meson is always an unicode string and anyway there is no facility to read a binary file into a str object) thus I think it is of limited use. What is the use case for adding an hash() method to the str class?

@bonzini
Copy link
Collaborator

bonzini commented Oct 18, 2025

What is the use case for adding an hash() method to the str class?

I could have used it in the past to create an opaque identifier for the version (to build a private symbol in a library for example) but it's indeed quite niche.

It would be useful to hear from the submitter though.

Copy link
Member

@eli-schwartz eli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash constant is widely used in file verification, so

The fs module already supports this:

fs = import('fs')

myhash = fs.hash('foo.txt', 'sha512')

I do not understand the purpose of hashing a string object.

@dnicolodi,

Unless I'm missing something, this implementation of hash() cannot support binary files (str in Meson is always an unicode string and anyway there is no facility to read a binary file into a str object) thus I think it is of limited use. What is the use case for adding an hash() method to the str class?

Yes, and the fs module (already) solves this. :)

@rruuaanng
Copy link
Author

rruuaanng commented Oct 19, 2025

@eli-schwartz

I do not understand the purpose of hashing a string object.

Alright, it seems there was a problem with my description. Actually, I hope to replicate the string() function in CMake, which can encode a string into a specified hash, such as in the following CMake example:

string(MD5 MD5_SUM ${CMAKE_CURRENT_LIST_DIR})
if(WIN32)
  execute_process(COMMAND ${CMAKE_COMMAND}
                  -E  write_regv
                 "HKEY_CURRENT_USER\\Software\\Kitware\\CMake\\Packages\\Zephyr\;${MD5_SUM}" "${CMAKE_CURRENT_LIST_DIR}"
)
...

This is my first time using Meson. I believe I need a string function, but it seems that Meson doesn't have such a built-in feature.

And this can do many things, for example, using configure to generate a set of build-time signatures that can be read and verified by the program, without having to sign them within the program itself. This greatly reduces runtime overhead, since the operations are moved to build time.

Hash constants are widely used in signatures, so
I suggest adding a hash method for strings to obtain
the hash constant of a string. For its usage example:

```meson
hash_str = 'foobar'.hash('sha1') # return a sha1 constant
# some process
@eli-schwartz
Copy link
Member

Alright, it seems there was a problem with my description. Actually, I hope to replicate the string() function in CMake, which can encode a string into a specified hash, such as in the following CMake example:

string(MD5 MD5_SUM ${CMAKE_CURRENT_LIST_DIR})
if(WIN32)
  execute_process(COMMAND ${CMAKE_COMMAND}
                  -E  write_regv
                 "HKEY_CURRENT_USER\\Software\\Kitware\\CMake\\Packages\\Zephyr\;${MD5_SUM}" "${CMAKE_CURRENT_LIST_DIR}"
)
...

This CMake example is based on the CMake documentation for registering a search directory path for find_package(). The purpose of an md5 sum here is as a form of UUID, it doesn't have to be md5. Formally speaking, the exact value is irrelevant and "has no meaning" (it is not even used for collating a search order).

I am not sure this is an entirely compelling argument in favor of supporting md5 in particular, or hashes at all. I suppose that if I were to anyways use a script to set a registry key, I'd include UUID generation (or even cheap hashing) inside that script.

Although I do agree that in this case what you're interested in using as input is indeed a string, not a file.

And this can do many things, for example, using configure to generate a set of build-time signatures that can be read and verified by the program, without having to sign them within the program itself. This greatly reduces runtime overhead, since the operations are moved to build time.

I'm not certain that I fully understand this use case. By build time signatures do you mean security codesigning? How does this relate to a string value? (I would assume you'd want to compute the hash of a compiled artifact that the program would want to load, and requires a known matching hash before agreeing to load it.)

@rruuaanng
Copy link
Author

rruuaanng commented Oct 19, 2025

I would assume you'd want to compute the hash of a compiled artifact that the program would want to load, and requires a known matching hash before agreeing to load it.

Yes, that's one of the applications, for example, creating hash information for a set of build-time data, or generating a unique identifier for a file path. There may be many other scenarios as well, I believe it has great potential.

(Most importantly, it’s built-in and doesn’t require writing any external scripts, quite simple, I think that good :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants