Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 11, 2025

Summary

This PR adds Vector512 and Vector256 SIMD code paths to UTF-8 utility functions, providing significant performance improvements for string operations on systems with AVX2 and AVX-512 support.

Fixes #110940

Changes

New Vector Helper Functions (Utf8Utility.cs)

Extended existing Vector128-only functions to support larger vector sizes:

  • AllBytesInVector256AreAscii / AllBytesInVector512AreAscii - Efficiently check if 32/64 bytes are ASCII
  • Vector256OrdinalIgnoreCaseAscii / Vector512OrdinalIgnoreCaseAscii - Perform case-insensitive comparison of 32/64 ASCII bytes

Updated String Comparison Functions (Ordinal.Utf8.cs)

Implemented Vector512/Vector256 fast paths with graceful fallback:

  • EqualsIgnoreCaseUtf8_Vector512 / EqualsIgnoreCaseUtf8_Vector256 - Accelerated equality comparison for OrdinalIgnoreCase
  • StartsWithIgnoreCaseUtf8_Vector512 / StartsWithIgnoreCaseUtf8_Vector256 - Accelerated prefix matching for OrdinalIgnoreCase
  • Updated entry points to prioritize: Vector512 → Vector256 → Vector128 → Scalar

Performance Impact

These changes accelerate common string operations:

  • Dictionary lookups using StringComparer.OrdinalIgnoreCase
  • String comparisons with case-insensitive matching
  • Text processing operations on ASCII data

Performance improvements scale with vector size:

  • Vector512 (AVX-512): Process 64 bytes per iteration
  • Vector256 (AVX2): Process 32 bytes per iteration
  • Vector128 (SSE2/NEON): Process 16 bytes per iteration (existing baseline)

Testing

All existing tests pass successfully:

  • System.Runtime.Tests: 69,396 tests passed
  • System.Globalization.Tests: 2,735 tests passed

Notes

Per the original issue, GetPointerToFirstInvalidChar in Utf16Utility.Validation.cs is intentionally not modified due to known performance issues with Vector512 execution unit contention (VPADDUSW vs VPMOVB2M).

Functions GetPointerToFirstInvalidByte and TranscodeToUtf8 already benefit from Vector512 acceleration through their use of Ascii.GetIndexOfFirstNonAsciiByte and Ascii.NarrowUtf16ToAscii for ASCII fast paths. The complex multi-byte validation/transcoding logic would require extensive rework for further vectorization.

Related Work

This implementation follows the established patterns in Ascii.Utility.cs which already includes Vector512 support, ensuring consistency across the codebase.

Original prompt

This section details on the original issue you should resolve

<issue_title>Light up Utf8Utility..cs and Utf16Utility..cs with Vector512 code paths. </issue_title>
<issue_description>This task is to identify potential opportunities to use Vector512 in these libraries(ASCII/UTF) and add Vector512 paths where possible to further accelerate using SIMD.

@dotnet/avx512-contrib

See below for list of functions which have SIMD paths in them

  • GetPointerToFirstInvalidChar(utf16Utility.Validation.cs)
    Note from Ruihan - The existing implementation has Sse2 code path and Vector128 code path, a Sse2.AddSaturate differentiates these 2 code path. We worked internally to upgrade it with Vector256/Avx2, and Vector512/Avx512 code paths, the performance for Vector256 path was fine but the upgrade was blocked by the performance issue we found on Vector512 path due to the execution unit contention between AddSaturate(VPADDUSW) and ExtractMostSignificantBits(VPMOVB2M), which leads to no performance gain compared against Vector256.

  • GetPointerToFirstInvalidByte(Utf8Utility.Validation.cs)
    Up for grabs for community to work on

  • TranscodeToUtf8(Utf8Utility.Transcoding.cs)
    Up for grabs for community to work on

  • AllBytesInVector128AreAscii(Utf8Utility.cs) , Vector128OrdinalIgnoreCaseAscii(Utf8Utility.cs)-> EqualsIgnoreCaseUtf8_Vector128(Ordinal.Utf8.cs) -> EqualsIgnoreCaseUtf8(Ordinal.Utf8.cs)
    Up for grabs for community to work on</issue_description>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @dotnet/area-meta See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

This task is to identify potential opportunities to use Vector512 in these libraries(ASCII/UTF) and add Vector512 paths where possible to further accelerate using SIMD.

@dotnet/avx512-contrib

Author: DeepakRajendrakumaran
Assignees: -
Labels:

area-Meta

Milestone: -
@ Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

This task is to identify potential opportunities to use Vector512 in these libraries(ASCII/UTF) and add Vector512 paths where possible to further accelerate using SIMD.

@dotnet/avx512-contrib

Author: DeepakRajendrakumaran
Assignees: -
Labels:

area-Meta, area-System.Text.Encoding, untriaged

Milestone: -

Fixes #86119


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@Copilot Copilot AI changed the title [WIP] Light up Utf8Utility and Utf16Utility with Vector512 code paths Add Vector512/Vector256 SIMD acceleration to UTF-8 utility functions Oct 11, 2025
Copilot finished work on behalf of stephentoub October 11, 2025 17:58
@Copilot Copilot AI requested a review from stephentoub October 11, 2025 17:58
@xtqqczze
Copy link
Contributor

Fixes #110940

?

@jkotas jkotas added the tenet-performance Performance related issue label Oct 11, 2025
@stephentoub stephentoub marked this pull request as ready for review October 12, 2025 02:20
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Vector512 and Vector256 SIMD acceleration to UTF-8 utility functions, enhancing performance for string operations on systems with AVX2 and AVX-512 support. The implementation follows established patterns from existing Vector128 code and provides graceful fallback to smaller vector sizes or scalar operations.

Key changes include:

  • New Vector512 and Vector256 helper functions for ASCII detection and case-insensitive comparison
  • Enhanced entry points that prioritize larger vector sizes (Vector512 → Vector256 → Vector128 → Scalar)
  • Accelerated implementations for string equality and prefix matching operations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.cs Added Vector512/Vector256 ASCII detection and case-insensitive comparison helper functions
src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.Utf8.cs Implemented Vector512/Vector256 accelerated string comparison and prefix matching with cascading fallback logic

Comment on lines +147 to +157
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));

Copilot uses AI. Check for mistakes.

Comment on lines +147 to +157
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));

Copilot uses AI. Check for mistakes.

Comment on lines +147 to +157
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
return EqualsIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref charA, i), (int)(lengthA - (int)i), ref Unsafe.Add(ref charB, i), (int)(lengthB - (int)i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthA - (int)i), ref Unsafe.Add(ref charB, i), (int)(lengthB - (int)i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthA - (int)i), ref Unsafe.Add(ref charB, i), (int)(lengthB - (int)i));

Copilot uses AI. Check for mistakes.

Comment on lines +215 to +219
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));

Copilot uses AI. Check for mistakes.

Comment on lines +215 to +219
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthU - i), ref Unsafe.Add(ref charB, i), (int)(lengthU - i));
return EqualsIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));
}
// Use scalar path for trailing elements
return EqualsIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref charA, i), (int)(lengthA - i), ref Unsafe.Add(ref charB, i), (int)(lengthB - i));

Copilot uses AI. Check for mistakes.

Comment on lines +597 to +607
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));

Copilot uses AI. Check for mistakes.

Comment on lines +597 to +607
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
return StartsWithIgnoreCaseUtf8_Vector256(ref Unsafe.Add(ref source, i), sourceLength - (int)i, ref Unsafe.Add(ref prefix, i), prefixLength - (int)i);
}
// Use Vector128 path for trailing elements if possible
if (Vector128.IsHardwareAccelerated && (lengthU - i) >= (nuint)Vector128<byte>.Count)
{
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), sourceLength - (int)i, ref Unsafe.Add(ref prefix, i), prefixLength - (int)i);
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), sourceLength - (int)i, ref Unsafe.Add(ref prefix, i), prefixLength - (int)i);

Copilot uses AI. Check for mistakes.

}

// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));

Copilot uses AI. Check for mistakes.

Comment on lines +665 to +669
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));

Copilot uses AI. Check for mistakes.

Comment on lines +665 to +669
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}

// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
Copy link

Copilot AI Oct 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining length calculation is incorrect. It should pass the actual remaining lengths of each string, not the minimum remaining length for both strings. This could cause buffer overruns or incorrect comparisons when the strings have different lengths.

Suggested change
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(lengthU - i), ref Unsafe.Add(ref prefix, i), (int)(lengthU - i));
return StartsWithIgnoreCaseUtf8_Vector128(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));
}
// Use scalar path for trailing elements
return StartsWithIgnoreCaseUtf8_Scalar(ref Unsafe.Add(ref source, i), (int)(sourceLength - i), ref Unsafe.Add(ref prefix, i), (int)(prefixLength - i));

Copilot uses AI. Check for mistakes.

@jkotas
Copy link
Member

jkotas commented Oct 12, 2025

What's the public API that this change is expected to materially improve? We need micro-benchmarks for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Light up Utf8Utility.*.cs and Utf16Utility.*.cs with Vector512 code paths.

4 participants