Skip to content

Incorrect Regex class example with pattern "\d{2}?" #5043

@adisib

Description

@adisib

(At least as of current commit c9445d2) Line 195 of System.Text.RegularExpressions/Regex.xml states the following:
\d{2}? Match two decimal digits zero or one time.

The description of this behavior does not match .NET's execution of this behavior. This will not match a two digit string zero times. I suspect that the actual behavior of ? in quantifier expressions is to match a number of characters non-greedily (as might be useful in {n,} or {n,m} expressions) rather than being interpreted as making the quantifier optional.

The following is a minimal example of this issue. In this example r1 is the example expression mentioned above, which according to the Regex class documentation should match an empty string as a zero times match. r2 pulls the operator out to act on a non-capturing group to perform the correct behavior of matching two decimal digits zero or one times.

using System;
using System.Text.RegularExpressions;
					
public class Program
{
	public static void Main()
	{
		Regex r1 = new Regex("\\d{2}?");
		Regex r2 = new Regex("(?:\\d{2})?");

		Console.WriteLine(string.Format("Regex 1: {0}", r1.IsMatch("")));
		Console.WriteLine(string.Format("Regex 2: {0}", r2.IsMatch("")));
	}
}

The output of this example (on .NET Framework 4.7.2) is the following (with the expected behavior of both lines showing true if the documentation on the Regex class page is correct):

Regex 1: False
Regex 2: True

Furthermore (though it probably could be argued this is a separate issue and should be filed as such), the related example which uses the {n}? expression in question seems to be incorrect:
(\d*\.?\d{2}?){1} Match the pattern of integral and fractional digits separated by a decimal point symbol at least one time.

This issue can be seen in the context of the encompassing expression example ^\s*[\+-]?\s?\$?\s?(\d*\.?\d{2}?){1}$ and the example currency program that uses it. The example program can state that "$42" represents a currency value, but "$4" does not represent a currency value. This is because the expression actually requires there to be at least two digits (because the ? character does not make the digits from \d{2} optional as the Regex class documentation states, which is the issue stated above). I think the example rather should be (\d*(?:\.\d{2})?){1} and ^\s*[\+-]?\s?\$?\s?(\d*(?:\.\d{2})?){1}$ respectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions