Encapsulate non-obvious code used in a half-dozen places #23872

khwilliamson · 2025-10-22T12:54:43Z

This makes this code into a macro so that each place is easier to read and maintain.

The code is for deciding and doing some set up of when there is a choice to execute some operation on a per-byte basis and/or a per-word basis

This set of changes does not require a perldelta entry.

This moves two variable declarations, making the ARGS_ASSERT macro first, and it combines one declaration with its initialization.

Better to have this first in the function

This macro encapsulates the task of finding how far until the next word boundary the passed-in address is. There are several places that could use this, but instead of converting use this in those places, the next commit will create macros that depend on this one and those places will instead convert to use those other new macros.

There are several places in the perl core that, for performance, use word-at-a-time operations on byte data when the data to be processed is long enough to overcome the extra setup overhead required. The code that does this is not immediately obvious, and is currently repeated at each such place. This macro creates two macros that encapsulate this logic, making each place that uses them easier to read. One macro is for data that isn't dependent on the character set. The other is for character data. EBCDIC data is not suitable for per-word operation, so the this macro always returns false on an EBCDIC platform. This allows for the removal of some EBCDIC #ifdefs in our code base.

This converts the places that could benefit from this new macro (and its kin) to use them.

This has been subsumed by BYTES_REMAINING_IN_WORD, and is no longer used.

Instead of refusing to compile, it is easy to handle this case.

The 'variant_byte_number' function was written to find the byte number in a word of the first byte whose meaning varies depending on if the string it is part of is encoded in UTF-8 or not. On ASCII machines, that is simply when the upper bit is set. On EBCDIC machines, there is no similar pattern, so this function hasn't been compiled on those. A long time ago, I realized that this function could also handle binary data by coercing that binary data into having the form of having that bit set or not depending on the pattern being looked for, and then calling that function. But I actually hadn't realized until now that it was binary data not tied to a character set that was being worked on. This commit rectifies that. A new alias is added for that function that emphasizes that it works on binary data, the function is now compiled for EBCDIC, and the EBCDIC-only code that avoided using it is now removed.

khwilliamson added 11 commits October 22, 2025 06:36

is_utf8_invariant_string_loc: mv declarations, do init

0b6ba69

This moves two variable declarations, making the ARGS_ASSERT macro first, and it combines one declaration with its initialization.

variant_under_utf8_count: mv ARGS_ASSERT

f414db4

Better to have this first in the function

regexec.c: Move declarations to the point of initialization

0547e32

Use new WORTH_PER_WORD_LOOP()

25227ee

This converts the places that could benefit from this new macro (and its kin) to use them.

utf8.c: Clarify comment

4d39196

Remove PERL_IS_SUBWORD_ADDR

a34159b

This has been subsumed by BYTES_REMAINING_IN_WORD, and is no longer used.

inline.h: Remove trailing blanks

7dff316

variant_byte_number: Handle unusual byte ordering

ed6b6e2

Instead of refusing to compile, it is easy to handle this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encapsulate non-obvious code used in a half-dozen places #23872

Encapsulate non-obvious code used in a half-dozen places #23872

khwilliamson commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Encapsulate non-obvious code used in a half-dozen places #23872

Are you sure you want to change the base?

Encapsulate non-obvious code used in a half-dozen places #23872

Conversation

khwilliamson commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant