Skip to content

array_unique() with SORT_REGULAR returns duplicate values #20262

@jmarble

Description

@jmarble

Description

The following code:

<?php
$units = ['5', '10', '5', '3A', '5', '5'];
$unique = array_unique($units, SORT_REGULAR);
print_r($unique);

Resulted in this output:

Array
(
    [0] => 5
    [1] => 10
    [3] => 3A
    [4] => 5
)

But I expected this output instead:

Array
(
    [0] => 5
    [1] => 10
    [3] => 3A
)

Demonstrations:


PRs in progress:


Root Cause

The algorithm:

  1. Sort array using comparison function from php_get_data_compare_func_unstable()
  2. Walk through sorted array comparing only adjacent elements
  3. Delete duplicates when adjacent elements compare equal

The bug:

SORT_REGULAR uses zend_compare() which calls zendi_smart_strcmp() for string comparisons. This function has non-transitive behavior when mixing numeric and non-numeric strings:

  • "5" < "10" → true (numeric comparison: 5 < 10)
  • "10" < "3A" → true (lexicographic: "1" < "3")
  • "3A" < "5" → true (lexicographic: "3" < "5") Creates a cycle!

Because the comparison is non-transitive, sorting algorithms (which require transitive comparisons) produce inconsistent results depending on input order.

The deduplication walks through comparing adjacent elements:

lastkept = position_0;  // "5"
position_1 "10" != "5"keep, lastkept = position_1
position_2 "10" == "10"delete
position_3 "3A" != "10"keep, lastkept = position_3
position_4 "5" != "3A"keepBug! Never compared to position_0
position_5 "5" == "5"delete

The root issue: Non-transitive comparisons break the sorting algorithm's guarantee that equal values will be grouped together. The adjacent-only comparison is correct - but it requires the array to be properly sorted first, which requires transitive comparisons.


Comparison with SORT_STRING

<?php
$units = ['5', '10', '5', '3A', '5', '5'];
echo count(array_unique($units, SORT_REGULAR)) . "\n"; // 4 ✗ Wrong
echo count(array_unique($units, SORT_STRING)) . "\n";  // 3 ✓ Correct

SORT_STRING uses lexical comparison without numeric extraction, so duplicates stay grouped.


Workaround

For simple arrays of scalar values, you can use array_unique with default SORT_STRING flag.

<?php
$unique = array_unique($array, SORT_STRING);

For arrays or objects.

$uniqueAddr = [];
foreach ($addresses as $addr) {
    if (! in_array($addr, $uniqueAddr)) {
        $uniqueAddr[] = $addr;
    }
}

PHP Version

PHP 8.4.13 (cli) (built: Sep 26 2025 00:45:36) (NTS clang 15.0.0)
Copyright (c) The PHP Group
Built by Laravel Herd
Zend Engine v4.4.13, Copyright (c) Zend Technologies
    with Zend OPcache v8.4.13, Copyright (c), by Zend Technologies

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions