Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce uppercase-string #3613

Open
wants to merge 6 commits into
base: 1.12.x
Choose a base branch
from
Open

Conversation

pmjones
Copy link

@pmjones pmjones commented Nov 8, 2024

This PR introduces uppercase-string in all places where lowercase-string is recognized, except for SprintfFunctionDynamicReturnTypeExtension.

The omission is because sprintf() format strings cannot be typed reliably as uppercase-string because most of the formatting specifiers are lowercase. For example, the format string 'FOO %s BAR' is treated as a plain string rather than uppercase-string, because %s is lowercase (even though it might represent an uppercase string).

It appears that lowercase-string exhibits similar behavior. For example, consider the format string foo %F. It formats as the lowercase string 'foo' and a non-locale-aware float; the returned string will be the same after strtolower(), so it should be valid as a lowercase-string.

However, in nsrt/lowercase-string-sprintf.php, if you add assertType('lowercase-string', sprintf('foo %F', $lowercase));, it will fail with:

Expected: lowercase-string
Actual:   non-falsy-string

That's because %F is uppercase, even though it represents a float.

If you consider these to be problems, they might be resolved by adding a new type, printf-string, that can ignore the formatting specifiers when determining isLowercase() and isUppercase(), but I imagine that would have rather broad implcations.

Let me know how you'd like to proceed.

@pmjones pmjones changed the base branch from 2.0.x to 1.12.x November 8, 2024 19:16
Copy link
Member

@ondrejmirtes ondrejmirtes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very complete and very nice, thank you! Just one thing I noticed.

@@ -312,6 +312,11 @@ public function isLowercaseString(): TrinaryLogic
return TrinaryLogic::createMaybe();
}

public function isUppercaseString(): TrinaryLogic
{
return TrinaryLogic::createMaybe();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numeric string might be a lowercase-string because of the scientific "e" notation. But can a numeric string ever be an uppercase string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1e100 and 1E100 are both numeric strings, so both LC/UC are allowed

and php prints scientific notation in UC by default - https://3v4l.org/Tg497 - so if float is casted into string, numeric-string&uppercase-string is the best narrowed type

@ondrejmirtes
Copy link
Member

One more thing - Windows tests are failing https://github.com/phpstan/phpstan-src/actions/runs/11749776276/job/32736693637?pr=3613

@pmjones
Copy link
Author

pmjones commented Nov 9, 2024

One more thing - Windows tests are failing https://github.com/phpstan/phpstan-src/actions/runs/11749776276/job/32736693637?pr=3613

Yes; it's related to this change where two reasons are given instead of one, and the PHP_EOL between them causes the Windows failures.

After thinking about it some more, I believe it should remain as one reason (not two) and am trying to track down where the second reason is coming from. Any help there would be appreciated!

Copy link
Member

@ondrejmirtes ondrejmirtes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few more things.

@@ -635,10 +635,6 @@ public function testArrayUdiffCallback(): void
'Parameter #3 $data_comp_func of function array_udiff expects callable(1|2|3|4|5|6, 1|2|3|4|5|6): int, Closure(string, string): string given.',
6,
],
[
'Parameter #3 $data_comp_func of function array_udiff expects callable(1|2|3|4|5|6, 1|2|3|4|5|6): int, Closure(int, int): (literal-string&lowercase-string&non-falsy-string&numeric-string) given.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this error disappear?

@@ -472,7 +472,7 @@ function coalesce()

assertType('int<0, max>', rand() ?? false);

assertType('0|lowercase-string', preg_replace('', '', '') ?? 0);
assertType('0|(lowercase-string&uppercase-string)', preg_replace('', '', '') ?? 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type shouldn't be possible. My guess is that it wouldn't survive type normalization. Because from their implementation of isSuperTypeOf, these types are mutually exclusive. For the same reason string&int can't be created.

I understand it's possible for a string to comply with both constraints, but I'd prefer if this extension produced just a general string instead of lowercase-string&uppercase-string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing lowercase-string&uppercase-string would create issue, even for users of only one type

/** @param lowercase-string $string */
public function acceptsOnlyLowercase($string) {}

/** @var int $int */
$int = ...;

acceptsOnlyLowercase((string) $int) {} // Will report an error

(string) $int is both a lowercase and uppercase string, so should be accepted by acceptsOnlyLowercase but if we lose those accessory in normalization ; wouldn't we get an error "lowercase-string expected, string passed" ?


$x = rand(0,2);
assertType("literal-string&lowercase-string", str_repeat('19', $x));
assertType("literal-string&lowercase-string&uppercase-string", str_repeat('19', $x));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here

@ondrejmirtes
Copy link
Member

After you rebase on top of 1.12.x with this commit ccfb4ab, the rule error tip should appear only once.

if ($leftStringType->isLowercaseString()->and($rightStringType->isLowercaseString())->yes()) {
$accessoryTypes[] = new AccessoryLowercaseStringType();
}

if ($leftStringType->isUppercaseString()->and($rightStringType->isUppercaseString())->yes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ($leftStringType->isUppercaseString()->and($rightStringType->isUppercaseString())->yes()) {
elseif ($leftStringType->isUppercaseString()->and($rightStringType->isUppercaseString())->yes()) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldnot be elseif because a string can be both lowercase and uppercase.

Example: '33' or '-'.

@pmjones
Copy link
Author

pmjones commented Nov 10, 2024

After you rebase on top of 1.12.x with this commit ccfb4ab, the rule error tip should appear only once.

That did the trick; thanks!

Copy link
Contributor

@VincentLanglet VincentLanglet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be copied too. #3510

IntegerRangeType::toString and IntegerType::toString are both lowercase-string and uppercase-string.

And a whole reflection on lowercase-string&uppercase-string seems needed.

Cause if this intersection is lost the current code

/** @param lowercase-string $string */
public function acceptsOnlyLowercase($string) {}

/** @var int $int */
$int = ...;

acceptsOnlyLowercase((string) $int); // Will report an error

and doesn't actually cf
https://phpstan.org/r/e722e920-af05-4c7b-8290-1205df68c4f8

if ($leftStringType->isLowercaseString()->and($rightStringType->isLowercaseString())->yes()) {
$accessoryTypes[] = new AccessoryLowercaseStringType();
}

if ($leftStringType->isUppercaseString()->and($rightStringType->isUppercaseString())->yes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldnot be elseif because a string can be both lowercase and uppercase.

Example: '33' or '-'.

@@ -93,6 +94,24 @@ public function processNode(Node $node, Scope $scope): array
$verbosity = VerbosityLevel::precise();
}

if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be merged with the previous if.

$leftType->isConstantScalarValue()->yes()
&& !$leftType->isString()->no()
&& !$rightType->isConstantScalarValue()->yes()
&& !$rightType->isString()->no()
&& (
    TrinaryLogic::extremeIdentity($leftType->isLowercaseString(), $rightType->isLowercaseString())->maybe()
    || TrinaryLogic::extremeIdentity($leftType->isUppercaseString(), $rightType->isUppercaseString())->maybe()
) || (
    ...
)

@@ -309,6 +309,11 @@ public function isLowercaseString(): TrinaryLogic
return TrinaryLogic::createYes();
}

public function isUppercaseString(): TrinaryLogic
{
return TrinaryLogic::createNo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be TrinaryLogic::createMaybe().

'42' or '-' is both lowercase and uppercase...


public function isLowercaseString(): TrinaryLogic
{
return TrinaryLogic::createNo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be TrinaryLogic::createMaybe();

) {
if ($type instanceof AccessoryLowercaseStringType && !$level->isPrecise()) {
continue;
}
if ($type instanceof AccessoryUppercaseStringType && !$level->isPrecise()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be merged with previous if (?)

@@ -69,7 +69,13 @@ function str_shuffle(string $string): string {}

/**
* @param array<mixed> $result
* @param-out ($string is lowercase-string ? array<int|string, array<mixed>|lowercase-string> : array<int|string, array<mixed>|string>) $result
* @param-out ($string is lowercase-string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, a first check of
string is lowercase-string&uppercase-string should be done since such character exists. (Like '42' or '-')

But I dunno if real case exists with parse_str.

@@ -315,16 +321,16 @@ function is_callable(mixed $value, bool $syntax_only = false, ?string &$callable
function abs($num) {}

/**
* @return ($string is lowercase-string ? lowercase-string : string)
* @return ($string is lowercase-string ? lowercase-string : ($string is uppercase-string ? uppercase-string : string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check $string is lowercase-string&uppercase-string first.
https://phpstan.org/r/98dd06f1-2543-45c5-96b9-58c0495e3302

or we will have the following issue:

'-23-' // isLowercase TRUE ; isUppercase TRUE
trim('-23-', '-') // isLowercase TRUE ; isUppercase FALSE

*/
function trim(string $string, string $characters = " \n\r\t\v\x00"): string {}

/**
* @return ($string is lowercase-string ? lowercase-string : string)
* @return ($string is lowercase-string ? lowercase-string : ($string is uppercase-string ? uppercase-string : string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, should check $string is lowercase-string&uppercase-string first.

*/
function ltrim(string $string, string $characters = " \n\r\t\v\x00"): string {}

/**
* @return ($string is lowercase-string ? lowercase-string : string)
* @return ($string is lowercase-string ? lowercase-string : ($string is uppercase-string ? uppercase-string : string))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, should check $string is lowercase-string&uppercase-string first.

@@ -472,7 +472,7 @@ function coalesce()

assertType('int<0, max>', rand() ?? false);

assertType('0|lowercase-string', preg_replace('', '', '') ?? 0);
assertType('0|(lowercase-string&uppercase-string)', preg_replace('', '', '') ?? 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing lowercase-string&uppercase-string would create issue, even for users of only one type

/** @param lowercase-string $string */
public function acceptsOnlyLowercase($string) {}

/** @var int $int */
$int = ...;

acceptsOnlyLowercase((string) $int) {} // Will report an error

(string) $int is both a lowercase and uppercase string, so should be accepted by acceptsOnlyLowercase but if we lose those accessory in normalization ; wouldn't we get an error "lowercase-string expected, string passed" ?

@ondrejmirtes
Copy link
Member

In this case we need to verify that TypeCombinator::intersect() of both lowercase-string and uppercase-string leads to expected result. And both types need to tell that it's possible it can be the other too.

Should be done in TypeCombinatorTest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants