Skip to content

[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types #143653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

YexuanXiao
Copy link

@YexuanXiao YexuanXiao commented Jun 11, 2025

Including the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, the results of pointer-pointer subtraction and checks for standard library functions (and their calls).

The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

The current HEAD commit implements this feature by introducing a new subtype of Type called PredefinedSugarType, which was considered appropriate in discussions. I tried to keep PredefinedSugarType simple enough yet not limited to size_t and ptrdiff_t so that it can be used for other purposes. PredefinedSugarType wraps a canonical Type and provides a name, conceptually similar to a compiler internal TypedefType but without depending on a TypedefDecl or a source file.

Additionally, checks for the z and t format specifiers in format strings for scanf and printf were added. It will precisely match expressions using typedefs or built-in expressions.

The affected tests indicates that it works very well.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form.

@YexuanXiao YexuanXiao requested a review from Endilll as a code owner June 11, 2025 04:19
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. clang:static analyzer coroutines C++20 coroutines clang:openmp OpenMP related changes to Clang labels Jun 11, 2025
@YexuanXiao
Copy link
Author

CC @AaronBallman

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-clang-static-analyzer-1
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: YexuanXiao (YexuanXiao)

Changes

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

It was implemented by injecting __size_t, __signed_size_t, and __ptrdiff_t into the AST. Additionally, checks for the z and j format specifiers in format strings for scanf and printf were added.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form. Extensive testing of the modifications indicates that it works very well (aside from the unsightly double underscores).

The test CodeGen/cfi-unrelated-cast.cpp could not be fixed because I am unfamiliar with LLVM IR. The tests Modules/new-delete.cpp, PCH/cxx-exprs.cpp, PCH/cxx1z-aligned-alloc.cpp, SemaCXX/delete.cpp, and OpenMP/declare_target_codegen.cpp reported ambiguity issues with new and delete expressions. Since I have no clue how to resolve them, I was unable to fix these tests. I would be very grateful if someone could fix them.


Patch is 325.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143653.diff

56 Files Affected:

  • (modified) clang/include/clang/AST/ASTContext.h (+19-11)
  • (modified) clang/lib/AST/ASTContext.cpp (+50-17)
  • (modified) clang/lib/AST/FormatString.cpp (+87-21)
  • (modified) clang/lib/AST/PrintfFormatString.cpp (+6-3)
  • (modified) clang/lib/AST/ScanfFormatString.cpp (+12-7)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+2-1)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+2-2)
  • (modified) clang/lib/CodeGen/CGObjCMac.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp (+44-36)
  • (modified) clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-array.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-expr-json.c (+9-3)
  • (modified) clang/test/AST/ast-dump-expr-json.cpp (+18-10)
  • (modified) clang/test/AST/ast-dump-expr.c (+3-3)
  • (modified) clang/test/AST/ast-dump-expr.cpp (+8-8)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-recovery.c (+1-1)
  • (modified) clang/test/AST/ast-dump-stmt-json.cpp (+58-28)
  • (modified) clang/test/AST/ast-dump-stmt.cpp (+2-2)
  • (modified) clang/test/AST/ast-dump-traits.cpp (+4-4)
  • (modified) clang/test/AST/ast-dump-types-errors-json.cpp (+3-1)
  • (modified) clang/test/Analysis/cfg.cpp (+1-1)
  • (modified) clang/test/Analysis/explain-svals.cpp (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-weakdeps.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-lookup.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-vs-stream-checker.c (+2-2)
  • (modified) clang/test/Analysis/std-c-library-functions.c (+2-2)
  • (modified) clang/test/CXX/drs/cwg2xx.cpp (+1-1)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p2.cpp (+5-5)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p5.cpp (+3-3)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p7.cpp (+1-1)
  • (modified) clang/test/FixIt/fixit-format-ios-nopedantic.m (+1-1)
  • (modified) clang/test/FixIt/format.m (+3-3)
  • (modified) clang/test/Sema/format-strings-fixit-ssize_t.c (+1-1)
  • (modified) clang/test/Sema/format-strings-int-typedefs.c (+6-6)
  • (modified) clang/test/Sema/format-strings-scanf.c (+4-4)
  • (modified) clang/test/Sema/format-strings-size_t.c (+6-7)
  • (modified) clang/test/Sema/matrix-type-builtins.c (+4-4)
  • (modified) clang/test/Sema/ptrauth-atomic-ops.c (+1-1)
  • (modified) clang/test/Sema/ptrauth.c (+1-1)
  • (modified) clang/test/SemaCXX/cxx2c-trivially-relocatable.cpp (+1-1)
  • (modified) clang/test/SemaCXX/enum-scoped.cpp (+2-2)
  • (modified) clang/test/SemaCXX/new-delete.cpp (+1-1)
  • (modified) clang/test/SemaCXX/static-assert-cxx26.cpp (+7-7)
  • (modified) clang/test/SemaCXX/type-aware-new-delete-basic-free-declarations.cpp (+1-1)
  • (modified) clang/test/SemaCXX/unavailable_aligned_allocation.cpp (+12-12)
  • (modified) clang/test/SemaObjC/format-size-spec-nsinteger.m (+5-12)
  • (modified) clang/test/SemaObjC/matrix-type-builtins.m (+1-1)
  • (modified) clang/test/SemaOpenCL/cl20-device-side-enqueue.cl (+3-3)
  • (modified) clang/test/SemaTemplate/type_pack_element.cpp (+6-6)
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 8d24d393eab09..bd4600e479b1b 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -25,6 +25,7 @@
 #include "clang/AST/RawCommentList.h"
 #include "clang/AST/SYCLKernelInfo.h"
 #include "clang/AST/TemplateName.h"
+#include "clang/AST/Type.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/PartialDiagnostic.h"
 #include "clang/Basic/SourceLocation.h"
@@ -1952,6 +1953,13 @@ class ASTContext : public RefCountedBase<ASTContext> {
                                                         bool IsDependent,
                                                         QualType Canon) const;
 
+  // The core language uses these types as the result types of some expressions,
+  // which are typically standard integer types and consistent with it's
+  // typedefs (if any). These variables store the typedefs generated in the AST,
+  // not the typedefs provided in the header files.
+  mutable QualType SizeType;       // __size_t
+  mutable QualType SignedSizeType; // __signed_size_t
+  mutable QualType PtrdiffType;    // __ptrdiff_t
 public:
   /// Return the unique reference to the type for the specified TagDecl
   /// (struct/union/class/enum) decl.
@@ -1961,11 +1969,20 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// <stddef.h>.
   ///
   /// The sizeof operator requires this (C99 6.5.3.4p4).
-  CanQualType getSizeType() const;
+  QualType getSizeType() const;
 
   /// Return the unique signed counterpart of
   /// the integer type corresponding to size_t.
-  CanQualType getSignedSizeType() const;
+  QualType getSignedSizeType() const;
+
+  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
+  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+  QualType getPointerDiffType() const;
+
+  /// Return the unique unsigned counterpart of "ptrdiff_t"
+  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
+  /// in the definition of %tu format specifier.
+  QualType getUnsignedPointerDiffType() const;
 
   /// Return the unique type for "intmax_t" (C99 7.18.1.5), defined in
   /// <stdint.h>.
@@ -2006,15 +2023,6 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// as defined by the target.
   QualType getUIntPtrType() const;
 
-  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
-  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-  QualType getPointerDiffType() const;
-
-  /// Return the unique unsigned counterpart of "ptrdiff_t"
-  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
-  /// in the definition of %tu format specifier.
-  QualType getUnsignedPointerDiffType() const;
-
   /// Return the unique type for "pid_t" defined in
   /// <sys/types.h>. We need this to compute the correct type for vfork().
   QualType getProcessIDType() const;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 45f9602856840..00f8f87466273 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -6726,17 +6726,63 @@ QualType ASTContext::getTagDeclType(const TagDecl *Decl) const {
   return getTypeDeclType(const_cast<TagDecl*>(Decl));
 }
 
+// Inject __size_t, __signed_size_t, and __ptrdiff_t to provide portable hints
+// and diagnostics. In C and C++, expressions of type size_t can be obtained via
+// the sizeof operator, expressions of type ptrdiff_t via pointer subtraction,
+// and expressions of type signed size_t via the z literal suffix (since C++23).
+// However, no core language mechanism directly produces an expression of type
+// unsigned ptrdiff_t. The unsigned ptrdiff_t type is solely required by format
+// specifiers for printf and scanf. Consequently, no expression's type needs to
+// be displayed as unsigned ptrdiff_t. Verification of whether a type is
+// unsigned ptrdiff_t is also unnecessary, as no corresponding typedefs exist.
+// Therefore, injecting a typedef for signed ptrdiff_t is not required.
+
 /// getSizeType - Return the unique type for "size_t" (C99 7.17), the result
 /// of the sizeof operator (C99 6.5.3.4p4). The value is target dependent and
 /// needs to agree with the definition in <stddef.h>.
-CanQualType ASTContext::getSizeType() const {
-  return getFromTargetType(Target->getSizeType());
+QualType ASTContext::getSizeType() const {
+  if (SizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSizeType()), "__size_t"));
+    else
+      SizeType = getFromTargetType(Target->getSizeType());
+  }
+  return SizeType;
 }
 
 /// Return the unique signed counterpart of the integer type
 /// corresponding to size_t.
-CanQualType ASTContext::getSignedSizeType() const {
-  return getFromTargetType(Target->getSignedSizeType());
+QualType ASTContext::getSignedSizeType() const {
+  if (SignedSizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SignedSizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSignedSizeType()), "__signed_size_t"));
+    else
+      SignedSizeType = getFromTargetType(Target->getSignedSizeType());
+  }
+  return SignedSizeType;
+}
+
+/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
+/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+QualType ASTContext::getPointerDiffType() const {
+  if (PtrdiffType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      PtrdiffType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getPtrDiffType(LangAS::Default)),
+          "__ptrdiff_t"));
+    else
+      PtrdiffType = getFromTargetType(Target->getPtrDiffType(LangAS::Default));
+  }
+  return PtrdiffType;
+}
+
+/// Return the unique unsigned counterpart of "ptrdiff_t"
+/// integer type. The standard (C11 7.21.6.1p7) refers to this type
+/// in the definition of %tu format specifier.
+QualType ASTContext::getUnsignedPointerDiffType() const {
+  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
 }
 
 /// getIntMaxType - Return the unique type for "intmax_t" (C99 7.18.1.5).
@@ -6771,19 +6817,6 @@ QualType ASTContext::getUIntPtrType() const {
   return getCorrespondingUnsignedType(getIntPtrType());
 }
 
-/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
-/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-QualType ASTContext::getPointerDiffType() const {
-  return getFromTargetType(Target->getPtrDiffType(LangAS::Default));
-}
-
-/// Return the unique unsigned counterpart of "ptrdiff_t"
-/// integer type. The standard (C11 7.21.6.1p7) refers to this type
-/// in the definition of %tu format specifier.
-QualType ASTContext::getUnsignedPointerDiffType() const {
-  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
-}
-
 /// Return the unique type for "pid_t" defined in
 /// <sys/types.h>. We need this to compute the correct type for vfork().
 QualType ASTContext::getProcessIDType() const {
diff --git a/clang/lib/AST/FormatString.cpp b/clang/lib/AST/FormatString.cpp
index 5d3b56fc4e713..0c1fd33b56f25 100644
--- a/clang/lib/AST/FormatString.cpp
+++ b/clang/lib/AST/FormatString.cpp
@@ -11,6 +11,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "clang/AST/FormatString.h"
 #include "FormatStringParsing.h"
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/TargetInfo.h"
@@ -320,6 +321,69 @@ bool clang::analyze_format_string::ParseUTF8InvalidSpecifier(
 // Methods on ArgType.
 //===----------------------------------------------------------------------===//
 
+static bool namedTypeToLengthModifierKind(QualType QT,
+                                          LengthModifier::Kind &K) {
+  for (/**/; const auto *TT = QT->getAs<TypedefType>();
+       QT = TT->getDecl()->getUnderlyingType()) {
+    StringRef Name = TT->getDecl()->getIdentifier()->getName();
+    if (Name == "size_t" || Name == "__size_t") {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "__signed_size_t" ||
+               Name == "ssize_t" /*Not C99, but common in Unix.*/) {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "ptrdiff_t" || Name == "__ptrdiff_t") {
+      K = LengthModifier::AsPtrDiff;
+      return true;
+    } else if (Name == "intmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    } else if (Name == "uintmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    }
+  }
+  return false;
+}
+
+// Check whether T and E are compatible size_t/ptrdiff_t typedefs. E must be
+// consistent with LE.
+// T is the type of the actual expression in the code to be checked, and E is
+// the expected type parsed from the format string.
+static clang::analyze_format_string::ArgType::MatchKind
+matchesSizeTPtrdiffT(ASTContext &C, QualType T, QualType E,
+                     LengthModifier::Kind LE) {
+  using Kind = LengthModifier::Kind;
+  using MatchKind = clang::analyze_format_string::ArgType::MatchKind;
+  assert(LE == Kind::AsPtrDiff || LE == Kind::AsSizeT);
+
+  if (!T->isIntegerType())
+    return MatchKind::NoMatch;
+
+  if (C.getCorrespondingSignedType(T.getCanonicalType()) !=
+      C.getCorrespondingSignedType(E.getCanonicalType()))
+    return MatchKind::NoMatch;
+
+  // signed size_t and unsigned ptrdiff_t does not have typedefs in C and C++.
+  if (LE == Kind::AsSizeT && E->isSignedIntegerType())
+    return T->isSignedIntegerType() ? MatchKind::Match
+                                    : MatchKind::NoMatchSignedness;
+
+  if (LE == LengthModifier::Kind::AsPtrDiff && E->isUnsignedIntegerType())
+    return T->isUnsignedIntegerType() ? MatchKind::Match
+                                      : MatchKind::NoMatchSignedness;
+
+  if (Kind Actual = Kind::None; namedTypeToLengthModifierKind(T, Actual)) {
+    if (Actual == LE)
+      return MatchKind::Match;
+    else if (Actual == Kind::AsPtrDiff || Actual == Kind::AsSizeT)
+      return MatchKind::NoMatchSignedness;
+  }
+
+  return MatchKind::NoMatch;
+}
+
 clang::analyze_format_string::ArgType::MatchKind
 ArgType::matchesType(ASTContext &C, QualType argTy) const {
   // When using the format attribute in C++, you can receive a function or an
@@ -394,6 +458,13 @@ ArgType::matchesType(ASTContext &C, QualType argTy) const {
     }
 
     case SpecificTy: {
+      if (TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, argTy, T,
+                                    TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       if (const EnumType *ETy = argTy->getAs<EnumType>()) {
         // If the enum is incomplete we know nothing about the underlying type.
         // Assume that it's 'int'. Do not use the underlying type for a scoped
@@ -653,6 +724,18 @@ ArgType::matchesArgType(ASTContext &C, const ArgType &Other) const {
 
   if (Left.K == AK::SpecificTy) {
     if (Right.K == AK::SpecificTy) {
+      if (Left.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Right.T, Left.T,
+                                    Left.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      } else if (Right.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Left.T, Right.T,
+                                    Right.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       auto Canon1 = C.getCanonicalType(Left.T);
       auto Canon2 = C.getCanonicalType(Right.T);
       if (Canon1 == Canon2)
@@ -1200,27 +1283,10 @@ FormatSpecifier::getCorrectedLengthModifier() const {
 
 bool FormatSpecifier::namedTypeToLengthModifier(QualType QT,
                                                 LengthModifier &LM) {
-  for (/**/; const auto *TT = QT->getAs<TypedefType>();
-       QT = TT->getDecl()->getUnderlyingType()) {
-    const TypedefNameDecl *Typedef = TT->getDecl();
-    const IdentifierInfo *Identifier = Typedef->getIdentifier();
-    if (Identifier->getName() == "size_t") {
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "ssize_t") {
-      // Not C99, but common in Unix.
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "intmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "uintmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "ptrdiff_t") {
-      LM.setKind(LengthModifier::AsPtrDiff);
-      return true;
-    }
+  if (LengthModifier::Kind Out = LengthModifier::Kind::None;
+      namedTypeToLengthModifierKind(QT, Out)) {
+    LM.setKind(Out);
+    return true;
   }
   return false;
 }
diff --git a/clang/lib/AST/PrintfFormatString.cpp b/clang/lib/AST/PrintfFormatString.cpp
index 293164ddac8f8..397a1d4c1172f 100644
--- a/clang/lib/AST/PrintfFormatString.cpp
+++ b/clang/lib/AST/PrintfFormatString.cpp
@@ -543,7 +543,8 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType(Ctx.getIntMaxType(), "intmax_t");
       case LengthModifier::AsSizeT:
-        return ArgType::makeSizeT(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t"));
       case LengthModifier::AsInt3264:
         return Ctx.getTargetInfo().getTriple().isArch64Bit()
                    ? ArgType(Ctx.LongLongTy, "__int64")
@@ -626,9 +627,11 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
       case LengthModifier::AsSizeT:
-        return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::PtrTo(ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t")));
       case LengthModifier::AsPtrDiff:
-        return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+        return ArgType::PtrTo(ArgType::makePtrdiffT(
+            ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
       case LengthModifier::AsLongDouble:
         return ArgType(); // FIXME: Is this a known extension?
       case LengthModifier::AsAllocate:
diff --git a/clang/lib/AST/ScanfFormatString.cpp b/clang/lib/AST/ScanfFormatString.cpp
index 7ee21c8c61954..e3926185860db 100644
--- a/clang/lib/AST/ScanfFormatString.cpp
+++ b/clang/lib/AST/ScanfFormatString.cpp
@@ -251,9 +251,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.LongLongTy);
@@ -292,10 +294,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getUIntMaxType(), "uintmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSizeType(), "size_t"));
-        case LengthModifier::AsPtrDiff:
           return ArgType::PtrTo(
-              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t"));
+              ArgType::makeSizeT(ArgType(Ctx.getSizeType(), "size_t")));
+        case LengthModifier::AsPtrDiff:
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.UnsignedLongLongTy);
@@ -390,9 +393,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           return ArgType(); // FIXME: Is this a known extension?
         case LengthModifier::AsAllocate:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 46a5d64412275..3ff2597d65e54 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -223,7 +223,8 @@ static void appendParameterTypes(
   for (unsigned I = 0, E = FPT->getNumParams(); I != E; ++I) {
     prefix.push_back(FPT->getParamType(I));
     if (ExtInfos[I].hasPassObjectSize())
-      prefix.push_back(CGT.getContext().getSizeType());
+      prefix.push_back(
+          CGT.getContext().getSizeType()->getCanonicalTypeUnqualified());
   }
 
   addExtParameterInfosForCall(paramInfos, FPT.getTypePtr(), PrefixSize,
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 0fc488e98aaf0..265dedf228e69 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -1002,14 +1002,14 @@ RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
   }
   case llvm::Intrinsic::coro_size: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_size, T);
     return RValue::get(Builder.CreateCall(F));
   }
   case llvm::Intrinsic::coro_align: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_align, T);
     return RValue::get(Builder.CreateCall(F));
diff --git a/clang/lib/CodeGen/CGObjCMac.cpp b/clang/lib/CodeGen/CGObjCMac.cpp
index 1c23a8b4db918..5a0d2a2286bac 100644
--- a/clang/lib/CodeGen/CGObjCMac.cpp
+++ b/clang/lib/CodeGen/CGObjCMac.cpp
@@ -285,7 +285,7 @@ class ObjCCommonTypesHelper {
     SmallVector<CanQualType, 5> Params;
     Params.push_back(Ctx.VoidPtrTy);
     Params.push_back(Ctx.VoidPtrTy);
-    Params.push_back(Ctx.getSizeType());
+    Params.push_back(Ctx.getSizeType()->getCanonicalTypeUnqualified());
     Params.push_back(Ctx.BoolTy);
     Params.push_back(Ctx.BoolTy);
     llvm::FunctionType *FTy = Types.GetFunctionType(
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 8f8e1ceb7197e..9a0d824a26ae6 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5131,7 +5131,7 @@ bool Sema::BuiltinVAStartARMMicrosoft(CallExpr *Call) {
         << 3                                      /* parameter mismatch */
         << 2 << Arg1->getType() << ConstCharPtrTy;
 
-  const QualType SizeTy = Context.getSizeType();
+  const QualType SizeTy = Context.getSizeTyp...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-coroutines

Author: YexuanXiao (YexuanXiao)

Changes

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

It was implemented by injecting __size_t, __signed_size_t, and __ptrdiff_t into the AST. Additionally, checks for the z and j format specifiers in format strings for scanf and printf were added.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form. Extensive testing of the modifications indicates that it works very well (aside from the unsightly double underscores).

The test CodeGen/cfi-unrelated-cast.cpp could not be fixed because I am unfamiliar with LLVM IR. The tests Modules/new-delete.cpp, PCH/cxx-exprs.cpp, PCH/cxx1z-aligned-alloc.cpp, SemaCXX/delete.cpp, and OpenMP/declare_target_codegen.cpp reported ambiguity issues with new and delete expressions. Since I have no clue how to resolve them, I was unable to fix these tests. I would be very grateful if someone could fix them.


Patch is 325.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143653.diff

56 Files Affected:

  • (modified) clang/include/clang/AST/ASTContext.h (+19-11)
  • (modified) clang/lib/AST/ASTContext.cpp (+50-17)
  • (modified) clang/lib/AST/FormatString.cpp (+87-21)
  • (modified) clang/lib/AST/PrintfFormatString.cpp (+6-3)
  • (modified) clang/lib/AST/ScanfFormatString.cpp (+12-7)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+2-1)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+2-2)
  • (modified) clang/lib/CodeGen/CGObjCMac.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp (+44-36)
  • (modified) clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-array.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-expr-json.c (+9-3)
  • (modified) clang/test/AST/ast-dump-expr-json.cpp (+18-10)
  • (modified) clang/test/AST/ast-dump-expr.c (+3-3)
  • (modified) clang/test/AST/ast-dump-expr.cpp (+8-8)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-recovery.c (+1-1)
  • (modified) clang/test/AST/ast-dump-stmt-json.cpp (+58-28)
  • (modified) clang/test/AST/ast-dump-stmt.cpp (+2-2)
  • (modified) clang/test/AST/ast-dump-traits.cpp (+4-4)
  • (modified) clang/test/AST/ast-dump-types-errors-json.cpp (+3-1)
  • (modified) clang/test/Analysis/cfg.cpp (+1-1)
  • (modified) clang/test/Analysis/explain-svals.cpp (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-weakdeps.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-lookup.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-vs-stream-checker.c (+2-2)
  • (modified) clang/test/Analysis/std-c-library-functions.c (+2-2)
  • (modified) clang/test/CXX/drs/cwg2xx.cpp (+1-1)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p2.cpp (+5-5)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p5.cpp (+3-3)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p7.cpp (+1-1)
  • (modified) clang/test/FixIt/fixit-format-ios-nopedantic.m (+1-1)
  • (modified) clang/test/FixIt/format.m (+3-3)
  • (modified) clang/test/Sema/format-strings-fixit-ssize_t.c (+1-1)
  • (modified) clang/test/Sema/format-strings-int-typedefs.c (+6-6)
  • (modified) clang/test/Sema/format-strings-scanf.c (+4-4)
  • (modified) clang/test/Sema/format-strings-size_t.c (+6-7)
  • (modified) clang/test/Sema/matrix-type-builtins.c (+4-4)
  • (modified) clang/test/Sema/ptrauth-atomic-ops.c (+1-1)
  • (modified) clang/test/Sema/ptrauth.c (+1-1)
  • (modified) clang/test/SemaCXX/cxx2c-trivially-relocatable.cpp (+1-1)
  • (modified) clang/test/SemaCXX/enum-scoped.cpp (+2-2)
  • (modified) clang/test/SemaCXX/new-delete.cpp (+1-1)
  • (modified) clang/test/SemaCXX/static-assert-cxx26.cpp (+7-7)
  • (modified) clang/test/SemaCXX/type-aware-new-delete-basic-free-declarations.cpp (+1-1)
  • (modified) clang/test/SemaCXX/unavailable_aligned_allocation.cpp (+12-12)
  • (modified) clang/test/SemaObjC/format-size-spec-nsinteger.m (+5-12)
  • (modified) clang/test/SemaObjC/matrix-type-builtins.m (+1-1)
  • (modified) clang/test/SemaOpenCL/cl20-device-side-enqueue.cl (+3-3)
  • (modified) clang/test/SemaTemplate/type_pack_element.cpp (+6-6)
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 8d24d393eab09..bd4600e479b1b 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -25,6 +25,7 @@
 #include "clang/AST/RawCommentList.h"
 #include "clang/AST/SYCLKernelInfo.h"
 #include "clang/AST/TemplateName.h"
+#include "clang/AST/Type.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/PartialDiagnostic.h"
 #include "clang/Basic/SourceLocation.h"
@@ -1952,6 +1953,13 @@ class ASTContext : public RefCountedBase<ASTContext> {
                                                         bool IsDependent,
                                                         QualType Canon) const;
 
+  // The core language uses these types as the result types of some expressions,
+  // which are typically standard integer types and consistent with it's
+  // typedefs (if any). These variables store the typedefs generated in the AST,
+  // not the typedefs provided in the header files.
+  mutable QualType SizeType;       // __size_t
+  mutable QualType SignedSizeType; // __signed_size_t
+  mutable QualType PtrdiffType;    // __ptrdiff_t
 public:
   /// Return the unique reference to the type for the specified TagDecl
   /// (struct/union/class/enum) decl.
@@ -1961,11 +1969,20 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// <stddef.h>.
   ///
   /// The sizeof operator requires this (C99 6.5.3.4p4).
-  CanQualType getSizeType() const;
+  QualType getSizeType() const;
 
   /// Return the unique signed counterpart of
   /// the integer type corresponding to size_t.
-  CanQualType getSignedSizeType() const;
+  QualType getSignedSizeType() const;
+
+  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
+  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+  QualType getPointerDiffType() const;
+
+  /// Return the unique unsigned counterpart of "ptrdiff_t"
+  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
+  /// in the definition of %tu format specifier.
+  QualType getUnsignedPointerDiffType() const;
 
   /// Return the unique type for "intmax_t" (C99 7.18.1.5), defined in
   /// <stdint.h>.
@@ -2006,15 +2023,6 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// as defined by the target.
   QualType getUIntPtrType() const;
 
-  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
-  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-  QualType getPointerDiffType() const;
-
-  /// Return the unique unsigned counterpart of "ptrdiff_t"
-  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
-  /// in the definition of %tu format specifier.
-  QualType getUnsignedPointerDiffType() const;
-
   /// Return the unique type for "pid_t" defined in
   /// <sys/types.h>. We need this to compute the correct type for vfork().
   QualType getProcessIDType() const;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 45f9602856840..00f8f87466273 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -6726,17 +6726,63 @@ QualType ASTContext::getTagDeclType(const TagDecl *Decl) const {
   return getTypeDeclType(const_cast<TagDecl*>(Decl));
 }
 
+// Inject __size_t, __signed_size_t, and __ptrdiff_t to provide portable hints
+// and diagnostics. In C and C++, expressions of type size_t can be obtained via
+// the sizeof operator, expressions of type ptrdiff_t via pointer subtraction,
+// and expressions of type signed size_t via the z literal suffix (since C++23).
+// However, no core language mechanism directly produces an expression of type
+// unsigned ptrdiff_t. The unsigned ptrdiff_t type is solely required by format
+// specifiers for printf and scanf. Consequently, no expression's type needs to
+// be displayed as unsigned ptrdiff_t. Verification of whether a type is
+// unsigned ptrdiff_t is also unnecessary, as no corresponding typedefs exist.
+// Therefore, injecting a typedef for signed ptrdiff_t is not required.
+
 /// getSizeType - Return the unique type for "size_t" (C99 7.17), the result
 /// of the sizeof operator (C99 6.5.3.4p4). The value is target dependent and
 /// needs to agree with the definition in <stddef.h>.
-CanQualType ASTContext::getSizeType() const {
-  return getFromTargetType(Target->getSizeType());
+QualType ASTContext::getSizeType() const {
+  if (SizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSizeType()), "__size_t"));
+    else
+      SizeType = getFromTargetType(Target->getSizeType());
+  }
+  return SizeType;
 }
 
 /// Return the unique signed counterpart of the integer type
 /// corresponding to size_t.
-CanQualType ASTContext::getSignedSizeType() const {
-  return getFromTargetType(Target->getSignedSizeType());
+QualType ASTContext::getSignedSizeType() const {
+  if (SignedSizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SignedSizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSignedSizeType()), "__signed_size_t"));
+    else
+      SignedSizeType = getFromTargetType(Target->getSignedSizeType());
+  }
+  return SignedSizeType;
+}
+
+/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
+/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+QualType ASTContext::getPointerDiffType() const {
+  if (PtrdiffType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      PtrdiffType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getPtrDiffType(LangAS::Default)),
+          "__ptrdiff_t"));
+    else
+      PtrdiffType = getFromTargetType(Target->getPtrDiffType(LangAS::Default));
+  }
+  return PtrdiffType;
+}
+
+/// Return the unique unsigned counterpart of "ptrdiff_t"
+/// integer type. The standard (C11 7.21.6.1p7) refers to this type
+/// in the definition of %tu format specifier.
+QualType ASTContext::getUnsignedPointerDiffType() const {
+  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
 }
 
 /// getIntMaxType - Return the unique type for "intmax_t" (C99 7.18.1.5).
@@ -6771,19 +6817,6 @@ QualType ASTContext::getUIntPtrType() const {
   return getCorrespondingUnsignedType(getIntPtrType());
 }
 
-/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
-/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-QualType ASTContext::getPointerDiffType() const {
-  return getFromTargetType(Target->getPtrDiffType(LangAS::Default));
-}
-
-/// Return the unique unsigned counterpart of "ptrdiff_t"
-/// integer type. The standard (C11 7.21.6.1p7) refers to this type
-/// in the definition of %tu format specifier.
-QualType ASTContext::getUnsignedPointerDiffType() const {
-  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
-}
-
 /// Return the unique type for "pid_t" defined in
 /// <sys/types.h>. We need this to compute the correct type for vfork().
 QualType ASTContext::getProcessIDType() const {
diff --git a/clang/lib/AST/FormatString.cpp b/clang/lib/AST/FormatString.cpp
index 5d3b56fc4e713..0c1fd33b56f25 100644
--- a/clang/lib/AST/FormatString.cpp
+++ b/clang/lib/AST/FormatString.cpp
@@ -11,6 +11,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "clang/AST/FormatString.h"
 #include "FormatStringParsing.h"
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/TargetInfo.h"
@@ -320,6 +321,69 @@ bool clang::analyze_format_string::ParseUTF8InvalidSpecifier(
 // Methods on ArgType.
 //===----------------------------------------------------------------------===//
 
+static bool namedTypeToLengthModifierKind(QualType QT,
+                                          LengthModifier::Kind &K) {
+  for (/**/; const auto *TT = QT->getAs<TypedefType>();
+       QT = TT->getDecl()->getUnderlyingType()) {
+    StringRef Name = TT->getDecl()->getIdentifier()->getName();
+    if (Name == "size_t" || Name == "__size_t") {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "__signed_size_t" ||
+               Name == "ssize_t" /*Not C99, but common in Unix.*/) {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "ptrdiff_t" || Name == "__ptrdiff_t") {
+      K = LengthModifier::AsPtrDiff;
+      return true;
+    } else if (Name == "intmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    } else if (Name == "uintmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    }
+  }
+  return false;
+}
+
+// Check whether T and E are compatible size_t/ptrdiff_t typedefs. E must be
+// consistent with LE.
+// T is the type of the actual expression in the code to be checked, and E is
+// the expected type parsed from the format string.
+static clang::analyze_format_string::ArgType::MatchKind
+matchesSizeTPtrdiffT(ASTContext &C, QualType T, QualType E,
+                     LengthModifier::Kind LE) {
+  using Kind = LengthModifier::Kind;
+  using MatchKind = clang::analyze_format_string::ArgType::MatchKind;
+  assert(LE == Kind::AsPtrDiff || LE == Kind::AsSizeT);
+
+  if (!T->isIntegerType())
+    return MatchKind::NoMatch;
+
+  if (C.getCorrespondingSignedType(T.getCanonicalType()) !=
+      C.getCorrespondingSignedType(E.getCanonicalType()))
+    return MatchKind::NoMatch;
+
+  // signed size_t and unsigned ptrdiff_t does not have typedefs in C and C++.
+  if (LE == Kind::AsSizeT && E->isSignedIntegerType())
+    return T->isSignedIntegerType() ? MatchKind::Match
+                                    : MatchKind::NoMatchSignedness;
+
+  if (LE == LengthModifier::Kind::AsPtrDiff && E->isUnsignedIntegerType())
+    return T->isUnsignedIntegerType() ? MatchKind::Match
+                                      : MatchKind::NoMatchSignedness;
+
+  if (Kind Actual = Kind::None; namedTypeToLengthModifierKind(T, Actual)) {
+    if (Actual == LE)
+      return MatchKind::Match;
+    else if (Actual == Kind::AsPtrDiff || Actual == Kind::AsSizeT)
+      return MatchKind::NoMatchSignedness;
+  }
+
+  return MatchKind::NoMatch;
+}
+
 clang::analyze_format_string::ArgType::MatchKind
 ArgType::matchesType(ASTContext &C, QualType argTy) const {
   // When using the format attribute in C++, you can receive a function or an
@@ -394,6 +458,13 @@ ArgType::matchesType(ASTContext &C, QualType argTy) const {
     }
 
     case SpecificTy: {
+      if (TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, argTy, T,
+                                    TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       if (const EnumType *ETy = argTy->getAs<EnumType>()) {
         // If the enum is incomplete we know nothing about the underlying type.
         // Assume that it's 'int'. Do not use the underlying type for a scoped
@@ -653,6 +724,18 @@ ArgType::matchesArgType(ASTContext &C, const ArgType &Other) const {
 
   if (Left.K == AK::SpecificTy) {
     if (Right.K == AK::SpecificTy) {
+      if (Left.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Right.T, Left.T,
+                                    Left.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      } else if (Right.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Left.T, Right.T,
+                                    Right.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       auto Canon1 = C.getCanonicalType(Left.T);
       auto Canon2 = C.getCanonicalType(Right.T);
       if (Canon1 == Canon2)
@@ -1200,27 +1283,10 @@ FormatSpecifier::getCorrectedLengthModifier() const {
 
 bool FormatSpecifier::namedTypeToLengthModifier(QualType QT,
                                                 LengthModifier &LM) {
-  for (/**/; const auto *TT = QT->getAs<TypedefType>();
-       QT = TT->getDecl()->getUnderlyingType()) {
-    const TypedefNameDecl *Typedef = TT->getDecl();
-    const IdentifierInfo *Identifier = Typedef->getIdentifier();
-    if (Identifier->getName() == "size_t") {
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "ssize_t") {
-      // Not C99, but common in Unix.
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "intmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "uintmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "ptrdiff_t") {
-      LM.setKind(LengthModifier::AsPtrDiff);
-      return true;
-    }
+  if (LengthModifier::Kind Out = LengthModifier::Kind::None;
+      namedTypeToLengthModifierKind(QT, Out)) {
+    LM.setKind(Out);
+    return true;
   }
   return false;
 }
diff --git a/clang/lib/AST/PrintfFormatString.cpp b/clang/lib/AST/PrintfFormatString.cpp
index 293164ddac8f8..397a1d4c1172f 100644
--- a/clang/lib/AST/PrintfFormatString.cpp
+++ b/clang/lib/AST/PrintfFormatString.cpp
@@ -543,7 +543,8 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType(Ctx.getIntMaxType(), "intmax_t");
       case LengthModifier::AsSizeT:
-        return ArgType::makeSizeT(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t"));
       case LengthModifier::AsInt3264:
         return Ctx.getTargetInfo().getTriple().isArch64Bit()
                    ? ArgType(Ctx.LongLongTy, "__int64")
@@ -626,9 +627,11 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
       case LengthModifier::AsSizeT:
-        return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::PtrTo(ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t")));
       case LengthModifier::AsPtrDiff:
-        return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+        return ArgType::PtrTo(ArgType::makePtrdiffT(
+            ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
       case LengthModifier::AsLongDouble:
         return ArgType(); // FIXME: Is this a known extension?
       case LengthModifier::AsAllocate:
diff --git a/clang/lib/AST/ScanfFormatString.cpp b/clang/lib/AST/ScanfFormatString.cpp
index 7ee21c8c61954..e3926185860db 100644
--- a/clang/lib/AST/ScanfFormatString.cpp
+++ b/clang/lib/AST/ScanfFormatString.cpp
@@ -251,9 +251,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.LongLongTy);
@@ -292,10 +294,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getUIntMaxType(), "uintmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSizeType(), "size_t"));
-        case LengthModifier::AsPtrDiff:
           return ArgType::PtrTo(
-              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t"));
+              ArgType::makeSizeT(ArgType(Ctx.getSizeType(), "size_t")));
+        case LengthModifier::AsPtrDiff:
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.UnsignedLongLongTy);
@@ -390,9 +393,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           return ArgType(); // FIXME: Is this a known extension?
         case LengthModifier::AsAllocate:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 46a5d64412275..3ff2597d65e54 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -223,7 +223,8 @@ static void appendParameterTypes(
   for (unsigned I = 0, E = FPT->getNumParams(); I != E; ++I) {
     prefix.push_back(FPT->getParamType(I));
     if (ExtInfos[I].hasPassObjectSize())
-      prefix.push_back(CGT.getContext().getSizeType());
+      prefix.push_back(
+          CGT.getContext().getSizeType()->getCanonicalTypeUnqualified());
   }
 
   addExtParameterInfosForCall(paramInfos, FPT.getTypePtr(), PrefixSize,
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 0fc488e98aaf0..265dedf228e69 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -1002,14 +1002,14 @@ RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
   }
   case llvm::Intrinsic::coro_size: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_size, T);
     return RValue::get(Builder.CreateCall(F));
   }
   case llvm::Intrinsic::coro_align: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_align, T);
     return RValue::get(Builder.CreateCall(F));
diff --git a/clang/lib/CodeGen/CGObjCMac.cpp b/clang/lib/CodeGen/CGObjCMac.cpp
index 1c23a8b4db918..5a0d2a2286bac 100644
--- a/clang/lib/CodeGen/CGObjCMac.cpp
+++ b/clang/lib/CodeGen/CGObjCMac.cpp
@@ -285,7 +285,7 @@ class ObjCCommonTypesHelper {
     SmallVector<CanQualType, 5> Params;
     Params.push_back(Ctx.VoidPtrTy);
     Params.push_back(Ctx.VoidPtrTy);
-    Params.push_back(Ctx.getSizeType());
+    Params.push_back(Ctx.getSizeType()->getCanonicalTypeUnqualified());
     Params.push_back(Ctx.BoolTy);
     Params.push_back(Ctx.BoolTy);
     llvm::FunctionType *FTy = Types.GetFunctionType(
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 8f8e1ceb7197e..9a0d824a26ae6 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5131,7 +5131,7 @@ bool Sema::BuiltinVAStartARMMicrosoft(CallExpr *Call) {
         << 3                                      /* parameter mismatch */
         << 2 << Arg1->getType() << ConstCharPtrTy;
 
-  const QualType SizeTy = Context.getSizeType();
+  const QualType SizeTy = Context.getSizeTyp...
[truncated]

Copy link

github-actions bot commented Jun 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be typedefs instead of built-in types [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be a sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be a sugar types instead of built-in types [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be sugar types instead of built-in types [Clang] Make the SizeType, SignedSizeType and PtrdiffType be sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the SizeType, SignedSizeType and PtrdiffType be sugar types instead of built-in types [Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types Jun 14, 2025
Copy link
Contributor

@mizvekov mizvekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I have left a small review, but since I am traveling to the WG21 meeting, I can't look much into it for the next couple of weeks.

Also, please try this on the llvm compile time tracker, and take a look at any changes to the amount of AST nodes created when compiling some complex test case (there is a compiler option to print this, but I can't remember the spelling right now).

We want to make sure this doesn't regress performance.

Comment on lines 14592 to 14594
case Type::PredefinedSugar: {
return QualType();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had an underlying type that's non-canonical, then this should return a predefined type with an underlying type which is the common sugar between the two.

Since I suggested above to drop the underlying type, you should never hit here, and you can put this in the list of unreachable type classes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it unreachable causes the test PCH/cxx2a-constraints.cpp to fail. I will try to investigate it.

@YexuanXiao YexuanXiao requested review from mizvekov and Endilll June 14, 2025 10:49
@YexuanXiao
Copy link
Author

YexuanXiao commented Jun 18, 2025

CI shows that it passed all tests on Linux, but there were 4 tests failed on Windows, which seem unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category coroutines C++20 coroutines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants