Skip to content

Conversation

@khwilliamson
Copy link
Contributor

This makes this code into a macro so that each place is easier to read and maintain.

The code is for deciding and doing some set up of when there is a choice to execute some operation on a per-byte basis and/or a per-word basis

  • This set of changes does not require a perldelta entry.

This moves two variable declarations, making the ARGS_ASSERT macro
first, and it combines one declaration with its initialization.
Better to have this first in the function
This macro encapsulates the task of finding how far until the next word
boundary the passed-in address is.

There are several places that could use this, but instead of converting
use this in those places, the next commit will create macros that depend
on this one and those places will instead convert to use those other new
macros.
There are several places in the perl core that, for performance,  use
word-at-a-time operations on byte data when the data to be processed is
long enough to overcome the extra setup overhead required.

The code that does this is not immediately obvious, and is currently
repeated at each such place.

This macro creates two macros that encapsulate this logic, making each
place that uses them easier to read.

One macro is for data that isn't dependent on the character set.  The
other is for character data.  EBCDIC data is not suitable for per-word
operation, so the this macro always returns false on an EBCDIC platform.
This allows for the removal of some EBCDIC #ifdefs in our code base.
 This converts the places that could benefit from this new macro (and
 its kin) to use them.
This has been subsumed by BYTES_REMAINING_IN_WORD, and is no longer
used.
Instead of refusing to compile, it is easy to handle this case.
The 'variant_byte_number' function was written to find the byte number
in a word of the first byte whose meaning varies depending on if the
string it is part of is encoded in UTF-8 or not.  On ASCII machines,
that is simply when the upper bit is set.  On EBCDIC machines, there is
no similar pattern, so this function hasn't been compiled on those.

A long time ago, I realized that this function could also handle binary
data by coercing that binary data into having the form of having that
bit set or not depending on the pattern being looked for, and then
calling that function.

But I actually hadn't realized until now that it was binary data not
tied to a character set that was being worked on.  This commit rectifies
that.  A new alias is added for that function that emphasizes that it
works on binary data, the function is now compiled for EBCDIC, and the
EBCDIC-only code that avoided using it is now removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant