LLVM and SPIRV-LLVM-Translator pulldown (WW33 2024) #15106

This commit lowers `arith.divui` and `arith.remui` to EmitC by wrapping those operations with type conversions.

Fix the location of `diag::note_constexpr_uninitialized_base`, make it same as current interpreter. This PR does not print type name with namespacethat was used to improve the current interpreter's type dump of base class type. --------- Signed-off-by: yronglin <[email protected]>

This patch sorts the clause lists for the following OpenMP operations: - omp.parallel - omp.teams - omp.sections - omp.wsloop - omp.distribute - omp.task This change results in the reordering of operation arguments, so impacted unit tests are updated accordingly.

This patch sorts the clause lists for the following OpenMP operations: - omp.taskloop - omp.taskgroup - omp.target_data - omp.target_enter_data - omp.target_exit_data - omp.target_update - omp.target This change results in the reordering of operation arguments, so impacted unit tests are updated accordingly.

…ts (#101229)

which suffers from v_mov issue.

This reverts commit d230442. The m_Add and m_Mul are commutative but the code does not expect the communtativity.

@Endilll

Factor out the processing of unsaved files into its own function as suggested by @Endilll [here](https://github.com/llvm/llvm-project/pull/78114/files#r1697730196)

On its own, this change leads to _more_ strict typing errors as the functions are mostly not annotated so far, so the `# type: ignore`s are reported as Unused. This is part of the work leading up to #78114 though, and one of the bigger parts factored out from it, so these will later lead to less strict typing errors as the functions are annotated with return types.

…on (#101319)

…783) This makes the tests less flaky and also makes a few other refactorings like using traits instead of .compile.fail.cpp tests.

… (#100741) P1937R2 only contains core language change and doesn't touch the library at all. Closes #100613.

…#101326) Host::LaunchProcess() requires to SetMonitorProcessCallback. This callback is called from the child process monitor thread. We cannot control this thread anyway. lldb-server may crash if there is a logging around this callback because TestLogHandler is not thread safe. I faced this issue debugging 100 simultaneous child processes. Note StreamLogHandler::Emit() in lldb/source/Utility/Log.cpp already contains the similar mutex.

If an unreachable block B branches to a block S inside a cycle, it may cause S to be incorrectly treated as an entry to the cycle. We avoid that by skipping unreachable predecessors when locating entries.

… build" (#101340) Reverts llvm/llvm-project#100908

Broken out from #93429 Somewhat closing the loop opened by 7017e6c. Co-authored-by: Ryan Prichard <[email protected]>

…#100685) Those two `__find_end` functions are no longer used after 101d1e9. After that commit, `std::find_end` started dispatching to `__find_end_classic`, and `ranges::find_end` to `__find_end_impl`, which means that the two `__find_end` functions were no longer necessary. Fixes #100569

Picolib testing skips any test requiring this feature, I just didn't know the feature existed until now.

This takes 1m40s to run when testing picolib on qemu. This isn't the end of the world but that's on an AArch64 server. So if someone felt the need to mark this unsupported in the first place, it's likely much slower on average hardware.

Follow-up to #96171 in an attempt to fix the Solaris bots.

…100507)

…(#99562) This is a follow up to llvm/llvm-project#98717, which made lock_guard available under _LIBCPP_HAS_NO_THREADS. We can make unique_lock available under similar circumstances. This patch follows the example in #98717, by: - Removing the preprocessor guards for _LIBCPP_HAS_NO_THREADS in the unique_lock header. - providing a set of custom mutex implementations in a local header. - using custom locks in tests that can be made to work under `no-threads`.

This is an intermediate and fairly mechanical step towards unifying the benchmarks with the rest of the test suite. Moving this around requires a few changes, notably making sure we don't throw a wrench into the discovery process of the normal test suite. This won't be a problem anymore once benchmarks are taken into account by the test setup out of the box.

- For languages following SPMD/SIMT programming model, functions and call sites are marked 'convergent' by default. 'noconvergent' is added in this patch to allow developers to remove that 'convergent' attribute when it's safe. Reviewers: nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm Reviewed By: arsenm Pull Request: llvm/llvm-project#100637

Previously, building libc for AArch64 in `LLVM_LIBC_FULL_BUILD` mode would fail because no implementation of setjmp/longjmp was available. This was the only obstacle, so now a full AArch64 build of libc is possible. This implementation automatically supports PAC and BTI if compiled with the appropriate options. I would have liked to do the same for MTE stack tagging, but as far as I can see there's currently no predefined macro that allows detection of `-fsanitize=memtag-stack`, so I've left that one as a TODO. AAPCS64 delegates the x18 register to individual platform ABIs, and allows them to choose what it's used for, which may or may not require setjmp and longjmp to save and restore it. To accommodate this, I've introduced a libc configuration option. The default is on, because the only use of x18 I've so far encountered uses it to store information specific to the current stack frame (so longjmp does need to restore it), and this is also safe behavior in the default situation where the platform ABI specifies no use of x18 and it becomes a temporary register (restoring it to its previous value is no worse than any _other_ way for a function call to clobber it). But if a platform ABI needs to use x18 in a way that requires longjmp to leave it alone, they can turn the option off.

Initially, the LRU list stored all mapped entries with no distinction between the committed (non-madvise()'d) entries and decommitted (madvise()'d) entries. Now these two types of entries are separated into two lists, allowing future cache logic to branch depending on whether or not entries are committed or decommitted. Furthermore, the retrieval algorithm will prioritize committed entries over decommitted entries. Specifically, valid-fit, committed entries (not necessarily optimal-fit) are retrieved before optimal-fit, decommitted entries.

This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern exists in some graphics applications/math libraries. Alive2: https://alive2.llvm.org/ce/z/ggQZV2

This patch implements sandboxir::AddrSpaceCastInst which mirrors llvm::AddrSpaceCastInst.

It's barely testable - the test does exercise the code, but wouldn't fail on an empty implementation. It would cause a memory leak though (because the error handle wouldn't be unwrapped/reowned) which could be detected by asan and other leak detectors.

CONFLICT (content): Merge conflict in clang/lib/Sema/SemaDecl.cpp

This patch implements sandboxir::IntToPtrInst which mirrors llvm::IntToPtrInst.

Given vscale is a power of two, we should be able to prove no-self-wrap in these cases. We currently don't, but an upcoming change will fix this.

Currently, CommandObjects are obtaining a target in a variety of ways. Often the command incorrectly operates on the selected target. As an example, when a breakpoint command is running, the current target is passed into the command but the target that hit the breakpoint is not the selected target. In other places we use the CommandObject's execution context, which is frozen during the execution of the command, and comes with its own limitations. Finally, we often want to fall back to the dummy target if no real target is available. Instead of having to guess how to get the target, this patch introduces one helper function in CommandObject to get the most relevant target. In order of priority, that's the target from the command object's execution context, from the interpreter's execution context, the selected target or the dummy target. rdar://110846511

Also standardize the license comment in several files where it was different from what we normally do.

…y (#101233) Fixes #101127 See this working example: https://godbolt.org/z/z15oj15eP

The polynomial approximation for asin is only good between [-9/16, 9/16]. Values beyond that range must be remapped to achieve good numeric results. This is done by the equation below: `arcsin(x) = PI/2 - arcsin(sqrt(1.0 - x*x))`

This patch implements sandboxir::FPToSIInst which mirrors llvm::FPToSIInst.

RFC: https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274 compile-time-tracker: https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time Currently 208 out of 256 MVTs are used, it will be run out soon, so ultimately we need to extend the original `MVT::SimpleValueType` from `uint8_t` to `uint16_t` to accomodate more types. The `MatcherTable` uses `unsigned char` for encoding the matcher code, so the extended MVTs are no longer fit into the table, thus we need to use VBR to encode them as we do on others that are wider than 8 bits. The statistics below shows the difference of "Total Array size" of the matcher table that appears in every files: ``` Table Before After Change(%) WebAssemblyGenDAGISel.inc 23576 23775 0.844 NVPTXGenDAGISel.inc 173498 173498 0 RISCVGenDAGISel.inc 2179121 2369929 8.756 AVRGenDAGISel.inc 2754 2754 0 PPCGenDAGISel.inc 163315 163617 0.185 MipsGenDAGISel.inc 47280 47447 0.353 SystemZGenDAGISel.inc 56243 56461 0.388 AArch64GenDAGISel.inc 467893 487830 4.261 MSP430GenDAGISel.inc 8069 8069 0 LoongArchGenDAGISel.inc 78928 79131 0.257 XCoreGenDAGISel.inc 3432 3432 0 BPFGenDAGISel.inc 3733 3733 0 VEGenDAGISel.inc 65174 66456 1.967 LanaiGenDAGISel.inc 2067 2067 0 X86GenDAGISel.inc 628787 636987 1.304 ARMGenDAGISel.inc 170968 171036 0.040 HexagonGenDAGISel.inc 155764 155764 0 SparcGenDAGISel.inc 5762 5798 0.625 AMDGPUGenDAGISel.inc 504356 504463 0.021 R600GenDAGISel.inc 29785 29785 0 ``` The statistics below shows the runtime peak memory usage by compiling a simple C program: `/bin/time -v clang -target $TARGET -O3 -c test.c` ``` int test(int a) { return a * 3; } ``` ``` Target Before(kbytes) After(kbytes) Change(%) wasm64 110172 110088 -0.076 nvptx64 109784 109980 0.179 riscv64 114020 113656 -0.319 avr 110352 110068 -0.257 ppc64 112612 112476 -0.120 mips64 113588 113668 0.070 systemz 110860 110760 -0.090 aarch64 113704 113432 -0.239 msp430 110284 110200 -0.076 loongarch64 111052 110756 -0.267 xcore 108340 108020 -0.295 bpf 110620 110708 0.080 ve 110960 110920 -0.036 lanai 110180 109960 -0.200 x86_64 113640 113304 -0.296 arm64 113540 113172 -0.324 hexagon 114620 114684 0.056 sparc 110412 110136 -0.250 amdgcn 118164 117144 -0.863 r600 111200 110508 -0.622 ```

Change eraseNode to require that the basic block is still contained inside the function. This is a preparation for using numbers of basic blocks inside the dominator tree, which are invalid for blocks that are not inside a function.

…100624) This is useful for language runtimes that compute register values by inspecting the state of the currently running process. Currently, there are no mechanisms enabling these runtimes to set register values to arbitrary values. The alternative considered would involve creating a dwarf expression that produces an arbitrary integer (e.g. using OP_constu). However, the current data structure for Rows is such that they do not own any memory associated with dwarf expressions, which implies any such expression would need to have static storage and therefore could not contain a runtime value. Adding a new rule for constants leads to a simpler implementation. It's also worth noting that this does not make the "Location" union any bigger, since it already contains a pointer+size pair.

This patch implements sandboxir::FPToUIInst which mirrors llvm::FPToUIInst.

…H file. (#101280) You can provide more than one AST file as an input. Emit a path for a file with a problem, so you can disambiguate between multiple files. rdar://65005546

…009) There are some cases in which variables used in OpenMP constructs are predetermined as private. The semantic checks for copyprivate were not handling those cases. Besides that, shared symbols were not being properly represented in some cases. When there was no previously declared private (implicit) symbol, no new association symbols, representing shared ones, were being created. These symbols must always be inserted in constructs that may privatize the original symbol: parallel, teams and task generating constructs. Fixes #87214 and #86907

…rue16 flags for GFX12 (#100849) duplicate vop1 tests to fake16 and update real-true16 flags for GFX12 creating duplications here to avoid bulk copy in the following true16 patches --------- Co-authored-by: guochen2 <[email protected]>

Reference: https://en.cppreference.com/w/cpp/numeric/math/nan

This introduces a `target.object-map` which allows us to remap module locations, much in the same way as source mapping works today. This is useful, for instance, when debugging coredumps, so we can replace some of the locations where LLDB attempts to load shared libraries and executables from, without having to setup an entire sysroot.

This patch implements sandboxir::SIToFPInst which mirrors llvm::SIToFPInst.

Reverts llvm/llvm-project#100818

bf02f41 changed sema handling of alignas to accomodate C23, which implements alignas as a type specifier instead of attribute. When merged, the SYCL-specific conditions that were applied before for CXX11 weren't brought over. This patch re-adds it, which addresses a number of test regressions in SemaSYCL.

Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure determinism when parallelized.

Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.

Also updates and sorts CMake target dependencies, and corrects the smoke test that expected expf16(sNaN) to return sNaN instead of aNaN, although the test still passed, as FPMatcher only checks whether both sides are NaN, not whether they're the same NaN value.

These add some IR tests for 57d10b4. These do rely on some lucky MIR placement to test the scc input, but I haven't found a better way to do it. Also, scc handling in inline asm is extremely buggy.

…ForallOpUsingTileSizes` (#91878) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. **Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` **Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. **Changes to `transform.structured.tile_using_forall`** The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <[email protected]>

Summary: The `nvlink-wrapper` can do LTO now, which means we can still create some LLVM-IR without needing an architecture. In the case that we try to invoke `nvlink` internally, that will still fail. This patch simply defers the error until later so we can use `--lto-emit-llvm` to get the IR without specifying an architecture.

GNU ld since 2.41 supports this option, which is mildly useful. It omits the section header table and non-ALLOC sections (including .symtab/.strtab (--strip-all)). This option is simple to implement and might be used by LLDB to test program headers parsing without the section header table (#100900). -z sectionheader, which is the default, is also added. Pull Request: llvm/llvm-project#101286

Split from #100596

…sicScalars. (#101384) Previously, we created a vsetvlimax intrinsic. Using X0 simplifies the code and enables some optimizations to kick when the exact value of vlmax is known.

…s (#101089) Co-authored-by: OverMighty <[email protected]>

…uiltin 89946bd changed uses of llvm_check_{compiler,linker} calls with equivalent CMake builtins and removed the llvm versions. Some references still existed to llvm_check_linker_flag, so this commit replaces those.

Split DIEBuilder::finish so that code updating .debug_names is in a separate function.

Add "-*- C++ -*-"

…sue" This reverts commit e72cdae, which broke LLVM's lldb builder for Windows msvc.

…ags (#101380) SCEV has logic for inferring wrap flags on AddRecs which are known to control an exit based on whether the step is a power of two. This logic only considered constants, and thus did not trigger for steps such as (4 x vscale) which are common in scalably vectorized loops. The net effect is that we were very sensative to the preservation of nsw/nuw flags on such IVs, and could not infer trip counts if they got lost for any reason. --------- Co-authored-by: Nikita Popov <[email protected]>

…ning without converter (#101318) TODO: test case

…01220)

…es. NFC ValueTypeByHwMode contains a std::map. We shouldn't copy it if we don't need to . Fixes #101406.

…tor to be removed. NFC This constructor was taking a ValueTypeByMode by value to create an ArrayRef. By adding an explicit cast from ValueTypeByHwMode to TypeSetByHwMode we allow the ArrayRef to be implicitly converted from a single element.

…+20 (#82008) When we initially implemented the C++20 synchronization library, we reluctantly accepted for the implementation to be backported to C++03 upon request from the person who provided the patch. This was when we were only starting to have experience with the issues this can create, so we flinched. Nowadays, we have a much stricter stance about not backporting features to previous standards. We have recently started fixing several bugs (and near bugs) in our implementation of the synchronization library. A recurring theme during these reviews has been how difficult to understand the current code is, and upon inspection it becomes clear that being able to use a few recent C++ features (in particular lambdas) would help a great deal. The code would still be pretty intricate, but it would be a lot easier to reason about the flow of callbacks through things like __thread_poll_with_backoff. As a result, this patch drops support for the synchronization library before C++20. This makes us more strictly conforming and opens the door to major simplifications, in particular around atomic_wait which was supported all the way to C++03. This change will probably have some impact on downstream users, however since the C++20 synchronization library was added only in LLVM 10 (~3 years ago) and it's quite a niche feature, the set of people trying to use this part of the library before C++20 should be reasonably small.

Summary: Adds support for the `vsscanf` function similar to `sscanf`. Based off of llvm/llvm-project#97529.

This PR introduces `sparse_tensor.coiterate` operation, which represents a loop that traverses multiple sparse iteration space.

… tests. I'm going to do a review to make sure we are testing Zvfhmin instead of Zvfh where clang expects it to work for half types, like loads/stores. Removing unnecessary FP makes less things to review.

This patch replaces getAs with castAs and dyn_cast with cast to ensure type safety and prevents potential null pointer dereferences. These changes enforce compile-time checks for correct type casting in ASTContext and CodeGenModule.

We want to use newer instructions if we are targeting sufficiently new SM and PTX versions. If we cannot use those newer instructions, let LLVM synthesize the sequence from more fundamental instructions.

To make future PRs smaller.

This patch supports legalizing load and store instruction for scalable vectors in RISCV

This patch supports GlobalISel for register bank selection for scalable vector load and store instructions in RISC-V

…eck (#101412) This commit removes an unnecessary call to `E->hasStoredFPFeatures()` within the `VisitUnaryPlus` function. The method's return value was not being used, leading to a redundant operation. The removal of this line streamlines the function and eliminates an unneeded check for stored floating-point features.

…nt. NFC

This patch implements sandboxir::PHINode which mirrors llvm::PHINode. Based almost entirely on work by vporpo.

… system. (#101392) Fixes build errors on some SDKs. rdar://132607572

ucmp can be promoted with either sext or zext. RISC-V and LoongArch prefer sext for promoting i32 to i64 unless the inputs are known to be zero extended already. This patch uses the existing SExtOrZExtPromotedOperands function that is used by SETCC promotion to intelligently handle this.

…eMD*. NFC These passes will be replaced soon as we move to the target extension based resource handling in the DirectX backend, but removing them now before the replacement stuff is all up and running would be very disruptive. However, we do need to move these passes out of the way to avoid symbol conflicts with the new DXILResourceAnalysis in the Analysis library. Note: I tried an even simpler hack in #100698 but it doesn't really work. A rename is the most expedient path forward here. Pull Request: llvm/llvm-project#101393

The GetTarget helper returns a Target reference so there's reason to convert it to a pointer and check its validity.

…ments. (#101329) Previously, llvm IR is hard to create a scalable vector splat with a specific vector length, so we use riscv.vmv.v.x and riscv.vmv.v.f to do this work. But the two rvv intrinsics needs strict type constraint which can not support fixed vector types and illegal vector types. Using vp.splat could preserve old functionality and also generate more optimized code for vector types and illegal vectors. This patch also fixes crash for getEVT not serving ptr types.

…401) MachineValueTypeSet in tablegen allocates an array with a bit per MVT. This used to be 256 bits, with the introduction of 16-bit MVT it ballooned to 65536 bits. I suspect this is increasing the memory usage of many of the data structures used by CodeGenDAGPatterns. Since we don't need the full 16-bit range yet, this patch proposes lowering the maximum MVT to 511 and using only 512 bits for MachineValueTypeSet's storage.

…RE. NFC (#101431) Merge the isVector early out with the previous check for isVector.

…#101378) This script looks for existing definitions with the `SPIRV_` prefix, so that it can preserve them when updating the file. When the commit 2d62833 changed the prefix from `SPV_`, the number of characters to strip from matched names was not updated, which broke this feature. This commit fixes remaining cases that weren't fixed by 339c87a. The relationship of this script to the files it is meant to maintain is still bitrotten in other ways.

When function has indirect call in LTO mode, it causes `assert(Alias)` in `findProfiledCalleeThroughTailCalls`

…410) The tests for most CastInst sub-classes, except AddrSpaceCastInst, are very similar. This patch creates a common template function for all of them.

PrintIRPass, PrintOpStatsPass and PrintOpGraphPass don't mutate IR so preserve all analysis to save computation resource a bit.

…#101285)

Use `cast` instead to replace `dyn_cast` when `dyn_cast` is not needed/not checked.

This is just like AArch64. Changing the threshold to 6 will increase the code size, but will also decrease unconditional branches. CPUs with wide fetch/issue units can benefit from it. The value 6 may be debatable, we can set it to `SchedModel.IssueWidth`.

At the time this was written there were no vector types in MVT. The order was: -scalar integer types -scalar FP types -isVoid I believe this isVoid check was to catch walking off the end of the scalar FP types. While the isInteger()==isInteger caught walking off the end of scalar integer types. These days we have: -scalar integer types -scalar FP types -fixed vector integer types -fixed vector FP types -scalable vector integer types -scalable vector FP types. -Glue -isVoid So checking isVoid doesn't detect what it used to. I've changed it to check isFloatingPoint() == isFloatingPoint() instead.

Test Plan: added bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: llvm/llvm-project#101267

When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code zero so destructors are not run thus TimerGroup never prints the timers. Add explicit printing just before the exit to honor options requesting timers (`--time-rewrite`, `--time-aggr`). Test Plan: updated bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: llvm/llvm-project#101270

eliminateFrameIndex should now properly handle materializing frame indices in SGPRs, so treat this like the other constant operand types. On average this will produce worse code; we need to detect VGPR uses, and improve SGPR->VGPR frame index folds.

…FC (#101453) Size is updated in sync with MemoryType. Instead of maintaining a separate Size, use the size from MemoryType where needed.

Allow for matching & rewriting a variable number of arguments in an instructions. Solves #87459

This adds bf16 to the unit stride, strided, and index load and store intrinsics. clang already assumes these work with Zvfbfmin.

…s. NFC clang uses these with Zvfhmin so we should test them.

Use a regex instead of hardcoded numbers for anonymous pattern suffixes.

See the attached test for the motivation example. If we're too greedy to not emit the definition for inline builtins, we may meet a middle end crash. And it should be good to emit inline builtins always.

The base AMDGPUISelLowering was setting custom action on 16-bit vector types, but also set in SIISelLowering.

Looks like these got left behind in 17e2d07

… declarations (#101403) Similar to #97796, fix the type of the `native_thread` parameter for the arm, mips64 and powerpc variants of `NativeRegisterContextFreeBSD_*`. Otherwise, this leads to compile errors similar to: ``` lldb/source/Plugins/Process/FreeBSD/NativeRegisterContextFreeBSD_powerpc.cpp:85:39: error: out-of-line definition of 'NativeRegisterContextFreeBSD_powerpc' does not match any declaration in 'lldb_private::process_freebsd::NativeRegisterContextFreeBSD_powerpc' 85 | NativeRegisterContextFreeBSD_powerpc::NativeRegisterContextFreeBSD_powerpc( | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```

…when running without converter (#101318)" This reverts commit 2aa96fc. This was merged without a test. Also it seems it was only fixing an issue for users which used a particular workaround that is not actually needed anymore (skipping UnrealizedConversionCast operands).

This also adds a few FIXMEs where we use UB in the tests.

…ry intrinsics. NFC

Reference: https://en.cppreference.com/w/c/numeric/math/nan The logic is copied from clang frontend: https://github.com/llvm/llvm-project/blob/1d2b2d29d733200b704f38d220d22ecc07d6cf42/clang/lib/AST/ExprConstant.cpp#L14741-L14777 --------- Co-authored-by: Nikita Popov <[email protected]>

Summary: Macro like `va_start`/`va_end` marked as builtin functions that makes these identifiers special and it results in redefinition of the identifiers as builtins and it hides macro definitions during preloading C++ modules. In case of modules Clang ignores special identifiers but `PP.getCurrentModule()` was not set. This diff fixes IsModule detection logic for this particular case. Test Plan: check-clang --------- Co-authored-by: Chuanqi Xu <[email protected]>

~0.1% instruction count improvements https://llvm-compile-time-tracker.com/compare.php?from=07d2709a17860a202d91781769a88837e4fb5f2a&to=d5cc47831ecd9f0a2b164b16da67f74b94e9aafc&stat=instructions:u

Exact mode regions within WQM may have EXEC=0 in divergent control flow. This occurs if a branch is only taken by helper lanes and an instruction requiring WQM disabling is encountered. The current code extends the exact region as far as possible; however, this can result in it including instructions with unwanted side effects at EXEC=0. In particular readfirstlane combined with scalar loads can produce invalid memory accesses in this circumstance. Workaround this by shrinking exact regions to only the instructions requiring WQM disabling when unwanted side effects are present. Eventually we should branch over these regions when EXEC=0, but this requires visibility of CFG/divergence information not currently available.

…` association (#100531) Handles variables that are storage associated via `equivalence`. The problem is that these variables are declared as `fir.ptr`s while their privatized storage is declared as `fir.ref` which was triggering a validation error in the OpenMP dialect.

https://cplusplus.github.io/CWG/issues/882.html This was implemented for Clang 3.5 by b63b6ee

Adds more tests to: * vector-transfer-permutation-lowering.mlir Specifically, adds tests for: * out-of-bounds access for the `TransferWritePermutationLowering` pattern * in-bounds access for `TransferWriteNonPermutationLowering` + `TransferWritePermutationLowering` Also renames `@permutation_with_mask_xfer_write_fixed_width` as `@xfer_write_non_transposing_permutation_map`. This is a part of a larger effort to make sure that all key cases for patterns under populateVectorTransferPermutationMapLoweringPatterns (*) are tested. I also want to make sure that tests use consistent function and variable names. (*) transform.apply_patterns.vector.transfer_permutation_patterns in TD parlance)

…1312)" This reverts commit 65c000a.

Based on the [proposal PDF](https://llvm.org/devmtg/2020-09/slides/Hahn-Matrix_Support_in_LLVM_and_Clang.pdf) and the test code under [llvm/test/Transforms/LowerMatrixIntrinsics](https://github.com/llvm/llvm-project/tree/main/llvm/test/Transforms/LowerMatrixIntrinsics), the suffix for the `@llvm.matrix.multiply.*` intrinsic should be {output matrix type}.{input matrix 1 type}.{input matrix 2 type} (e.g., `@llvm.matrix.multiply.v4i32.v4i32.v4i32`). This PR corrects the places where these suffixes do not follow the aforementioned format.

The checker could report false positives if pointer arithmetic was done on pointers to non-array data before pointer subtraction. Another problem is fixed that could cause false positive if members of the same structure but in different memory objects are subtracted.

…in LIT tests (#101196) This patch fixes/unifies AIX target triples used in libc++, libc++abi, and libunwind LIT tests.

In the cost model of partial inlining, cost for intrinsics will be applied. However, some intrinsics for vector have invalid cost, which is not allowed for partial inlining. Instead of assertion, we directly do not do partial inlining in this circumstance to avoid compiling errors.

Summary: I am going to rework these tools to just me LLVM tools. This patch is pretty much NFC to set up the CMake for that.

This reverts commit llvm/llvm-project@0a5e572, since I forgot to start a pull request.

We can use known bits information to avoid masking out one or both of the operands.

Only the bf16 declarations are needed, as only they are lowered in AutoUpgrade.cpp. f16 and other builtins have LLVM intrinsics already defined.

We originally put implementation-detail function objects into individual namespaces for `std::ranges` without a good reason for doing so. This practice was continued, presumably because there was prior art. Since there's no reason to keep these namespaces, this commit removes them, which will slightly impact binary size. This commit does not apply to CPOs, some of which need additional work.

… (#67044) We were not declaring `__uses_allocator_construction_args` helper functions, leading to several valid uses failing to compile. This patch solves the problem by moving these helper functions into a struct, which also reduces the amount of redundant SFINAE we need to perform since most overloads are checking for a cv-qualfied pair. Fixes #66714 Co-authored-by: Louis Dionne <[email protected]>

…erations (#84840) Instead of using **this in error messages for std::expected monadic operations, use value(). As shown in LWG3969, **this can trigger unintended ADL and while it's only an error message, we might as well be ADL-correct there too. Co-authored-by: Louis Dionne <[email protected]>

…eady -> ready / review (#97200) Also classes the "ready" status similarly to "tentatively ready" in make_cxx_dr_status

Since "vselect cond, (binop, x, y), x" became the canonical form the equivalent PatFrags for "binop x, (vselect cond, y, 0)" are no longer required.

For the same reasons as 6cfac49. This test was added in llvm/llvm-project#100710. It fails because when we're linking with link.exe, -gdwarf has no effect and we get a PDB file anyway. The Windows on Arm lldb bot uses link.exe. "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\Hostx86\\arm64\\link.exe" <...> 08/01/2024 01:47 PM 2,956,488 vla.cpp.ilk 08/01/2024 01:47 PM 6,582,272 vla.cpp.pdb 08/01/2024 01:47 PM 734,208 vla.cpp.tmp

This patch modifies the flang driver to introduce the `-fopenmp-targets` option to the frontend compiler invocations corresponding to the OpenMP host device on offloading-enabled compilations. This option holds the list of offloading triples associated to the compilation and is used by clang to determine whether offloading calls should be generated for the host.

llvm/llvm-project#99888 added a check (and corresponding options) to flag uses of `#pragma mc_func` on AIX. This PR turns on the check by default.

There are 5 unary operators that can be followed by a non-parenthesized expression: `sizeof`, `__datasizeof`, `__alignof`, `alignof`, `_Alignof`. When we nest them too deep, `BalancedDelimiterTracker` does not help, because there are no parentheses, and we crash. Instead, this patch recognize chains of those operators, and parse them with sufficient stack space. Fixes #45061

…#92103) Also changes the behaviour of `__builtin_is_layout_compatible` None of the historic nor the current definition of layout-compatible classes mention anything about base classes (other than implicitly through being standard-layout) and are defined in terms of members, not direct members.

…9654) This patch increases the alignment requirement for std::atomic_ref such that we can guarantee lockfree operations more often. Specifically, we require types that are 1, 2, 4, 8, or 16 bytes in size to be aligned to at least their size to be used with std::atomic_ref. This is the case for most types, however a notable exception is `long long` on x86, which is 8 bytes in length but has an alignment of 4. As a result of this patch, one has to be more careful about the alignment of objects used with std::atomic_ref. Failure to provide a properly-aligned object to std::atomic_ref is a precondition violation and is technically UB. On the flipside, this allows us to provide an atomic_ref that is actually lockfree more often, which is an important QOI property. More information in the discussion at llvm/llvm-project#99570 (comment). Co-authored-by: Louis Dionne <[email protected]>

When src/dest passed into memset/memcpy is null: ``` len == 0: this call is a noop. len != 0: the behavior is undefined. ``` See also https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics Alive2: https://alive2.llvm.org/ce/z/tJeRNL This patch converts these mem intrinsic calls into an assumption `len == 0` to mitigate code-size bloat caused by JumpThreading.

As mentioned in the LWG issue libc++ has already implemented the optimization. This adds tests and documents the implementation defined behaviour. Drive-by fixes an initialization.

…#101457) Clang expects that this works.

Very minor performance improvement.

Currently, the LowerConstantIntrinsics pass does an RPO traversal of every function... only to find that many functions don't have constant intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is already a pre-isel intrinsic lowering pass, which iterates over intrinsic declarations and lowers all users. Call lowerConstantIntrinsics from this pass to avoid the extra iteration over the entire IR and the RPO traversal.

This patch adds initial support for `ConstantRange:: shlWithNoWrap` to fold dtcxzyw/llvm-tools#22. However, this patch cannot fix the original issue. Improvements will be submitted in subsequent patches.

When the constant extender optimization pass encounters an instruction that uses an extended address pointing to another function's block, avoid adding the instruction to the extender list for the current machine function. Fixes llvm/llvm-project#99714

Summary: This fills the terminal with information already present from the `add_custom_command(COMMENT ...)` field, so it breaks everything into new lines. Remove this print to clean that up.

When concatenation of vector instructions is formed, as a part of it vector rotation is performed. The direction of the shift was not correctly calculated. This fixes the rotation factor.

Updates formatting and variable names in: * vector-transfer-permutation-lowering.mlir This is primarily to improve consistency, both within this particular test file as well as across tests. In particular, with this PR I'm adopting similar naming convention to that that's already present in vector-transfer-flatten.mlir. Overview of changes: * All memref input arguments are re-named as `%mem`. * All vector input arguments are re-named as `%vec`. * All tensor input arguments are re-named as `%dest`. * LIT variables are update to be consistent with input arguments. * Renamed all output arguments as `%res`. * Updated indentation to be more C-like.

Device compilers may fail to identify maximum stack size required by a kernel that calls CopyElement due to potential recursive calls. To avoid this, we can use dynamically allocated Stack. To avoid dynamic allocations on the host for simple cases, the Stack implementation has a reserved space (that ends up being allocated on the program stack). I tested both pre-allocated and 0-reserve implementations on the host, and all passed. The actual reserve values might be tuned as needed.

…00690) This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0. In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index. Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change. This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute. --------- Co-authored-by: Slava Zakharin <[email protected]>

Extracted from #100923.

[libc] Implement vasprintf and asprintf --------- Co-authored-by: Izaak Schroeder <[email protected]>

The condition was duplicated, the correct one for this message would have been `ImportsEnd > SymbolsEnd`. However, this is a subset of `ImportEnd > Symbols` (since `Symbols <= SymbolsEnd`), so it can be removed altogether. I made this thinko in 686d8ce. Note that that change wasn't intended to be permanent, and served as a quick stopgap to facilitate testing chained fixups in LLD before Apple upstreamed their implementation. Fixes #90662 Fixes #87203

... using the temporary section type code 0x40000020 (`clang -c -Wa,--crel,--allow-experimental-crel`). LLVM will change the code and break compatibility (Clang and lld of different versions are not guaranteed to cooperate, unlike other features). CREL with implicit addends are not supported. --- Introduce `RelsOrRelas::crels` to iterate over SHT_CREL sections and update users to check `crels`. (The decoding performance is critical and error checking is difficult. Follow `skipLeb` and `R_*LEB128` handling, do not use `llvm::decodeULEB128`, whichs compiles to a lot of code.) A few users (e.g. .eh_frame, LLDDwarfObj, s390x) require random access. Pass `/*supportsCrel=*/false` to `relsOrRelas` to allocate a buffer and convert CREL to RELA (`relas` instead of `crels` will be used). Since allocating a buffer increases, the conversion is only performed when absolutely necessary. --- Non-alloc SHT_CREL sections may be created in -r and --emit-relocs links. SHT_CREL and SHT_RELA components need reencoding since r_offset/r_symidx/r_type/r_addend may change. (r_type may change because relocations referencing a symbol in a discarded section are converted to `R_*_NONE`). * SHT_CREL components: decode with `RelsOrRelas` and re-encode (`OutputSection::finalizeNonAllocCrel`) * SHT_RELA components: convert to CREL (`relToCrel`). An output section can only have one relocation section. * SHT_REL components: print an error for now. SHT_REL to SHT_CREL conversion for -r/--emit-relocs is complex and unsupported yet. Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600 Pull Request: llvm/llvm-project#98115

…ses (#101427) The CastInst subclasses all have pretty much the same implementation. Add a helper templated class to help stamp out the subclasses more succinctly.

…n (#101415) The current MCInstBuilder for generating an ALGFI when loading something from the ADA is incorrect and will crash the compiler. r0 must also be excluded from the registers returned as the result, since it is treated as the value "0" on z/OS. Also add some tests to properly test the paths where LLILF and ALGFI are generated. --------- Co-authored-by: Tony Tao <[email protected]>

Verifies that sin function output is correct by comparing with MPFR output. NaN and inf are not tested (as our output will vary compared to MPFR), and signed zeroes are already tested in unit tests.

Fix minor typos that accumulated while the math fuzzers were disabled.

Including src/__suppot/macros/config.h is unnecessary

…ring (#96759) Consider the new atomic metadata when choosing to expand as cmpxchg instead.

…73451) We were only checking that the comparator was rvalue callable, when in reality the algorithms always call comparators as lvalues. This patch also refactors the tests for callable requirements and expands it to a few missing algorithms. Fixes #69554

support V_SWAP_B16 true16 encoding in asm/disasm for GFX11/12 Co-authored-by: guochen2 <[email protected]>

Add the "Separate" option `--irpgo-profile-sort <profile` instead of just the "Joined" option `--irpgo-profile-sort=<profile>`. This is useful if the path has a `,` for some reason which would break when trying to use `-Wl,--irpgo-profile-sort=<profile-with-comma>`. While I'm here, use `static_cast<>` instead of the C style cast introduced in llvm/llvm-project#100627

The old out-of-tree build configuration stopped working and in tree builds are supported now, so we should use the in tree configuration. The only downside is we can't run the tests any more, but at least we will be able to test the build again.

Extra tests for llvm/llvm-project#99808, including cost model tests.

This patch implements: sandboxir::UIToFPInst sandboxir::FPExtInst sandboxir::FPTruncInst sandboxir::SExtInst sandboxir::ZExtInst sandboxir::TruncInst

Summary: This patch removes the ad-hoc parsing that I used previously and replaces it with the LLVM CommnadLine interface. This doesn't change any functionality, but makes it easier to maintain.

Renaming to `Disallowed`.

…tializer (#101447)

As with other loops, we need only look at a RecordDecl's FieldDecls. Convert to using them. In the meantime, we can improve the generation of the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series of GEPs.

…(#101212) #100690 introduces allocator registry with the ability to store allocator index in the descriptor. This patch adds an attribute to fir.embox and fircg.ext_embox to be able to set the allocator index while populating the descriptor fields.

…ithms (#73451)" This reverts commit 8d151f8, which broke some build bots. I think that is caused by an invalid argument order when checking __is_comparable in upper_bound.

…bugbreak(), __builtin_verbose_trap() (#101549) 1. It fixes the problem that llvm.trap() not getting the nomerge attribute. 2. It sets nomerge flag for the node if the instruction has nomerge arrtibute. This is a copy of https://reviews.llvm.org/D146164. This only attempts to fix `nomerge` for `__builtin_trap()`, `__debugbreak()`, `__builtin_verbose_trap()`, not working for non-trap builtins. Fixes #53011

We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. --------- Co-authored-by: Nikita Popov <[email protected]>

…s (#100456) - fadd removed because I need to add for different input types - finishing rest of basic operations - noticed duplicates will remove --------- Co-authored-by: OverMighty <[email protected]>

**Summary**: When ASan checks for a potential ODR violation on a global it loops over a linked list of all globals to find those with the matching value of an indicator. With the default setting 'detect_odr_violation=1', ASan doesn't report violations on same-size globals but it still has to traverse the list. For larger binaries with a ton of shared libs and globals (and a non-trivial volume of same-sized duplicates) this gets extremely expensive. This patch adds an indicator indexed (multi-)map of globals to speed up the search. > Note: asan used to use a map to store globals a while ago which was replaced with a list when the codebase [moved off of STL](llvm/llvm-project@e4bada2). Internally we see many examples where ODR checking takes *seconds* (even double digits). With this patch it's practically free and `__asan_register_globals` doesn't show up prominently in the perf profile anymore. There are several high-level questions: 1. I understand that the intent is that we hit the slow path rarely, ideally once before the process dies with an error. But in practice we hit the slow path a lot. It feels reasonable to keep the amount of work bounded even in the worst case, even if it requires a bit of extra memory. But if not, it'd be great to learn about the tradeoffs. 2. Poisoning based ODR checking remains on the slow path. Internally we build everything with `-fsanitize-address-use-odr-indicator` so I'm not sure if poisoning-based check would exhibit the same behavior (looking at the code, the shape looks very similar, so it might?). 3. Globals with an ODR indicator of `-1` need to be skipped for the purposes of ODR checking (cf. llvm/llvm-project@a257639). But they are still getting added to the list of globals and hence take up space and slow down the iteration over the list of globals. It would be a good saving if we could avoid adding them to the globals list. 4. Any reason to use a linked list instead of e.g. a vector to store globals? **Test Plan**: * `cmake --build build --target check-asan` looks good * Perf-wise things look good when linking against this version of compiler-rt. --------- Co-authored-by: Vitaly Buka <[email protected]>

~0.1% instruction count improvements https://llvm-compile-time-tracker.com/compare.php?from=07d2709a17860a202d91781769a88837e4fb5f2a&to=d5cc47831ecd9f0a2b164b16da67f74b94e9aafc&stat=instructions:u

If the string is too long for a short string, we can simply check for the long bit. If that's false we can do an early return. This improves the code gen slightly.

There were a few places where we didn't properly quote entries in the CSV status pages, or where we followed inconsistent spacing. This causes issue when trying to synchronize status pages with Github issues.

To avoid breaking searchability of when a paper was implemented.

…(), __debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 5e84646, which broke 'nomerge.ll' test on llvm bots.

…name (#101400) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.

…p(), __debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 667598d and fixes failed tests: llvm/test/CodeGen/X86/nomerge.ll and llvm/test/MC/AArch64/local-bounds-single-trap.ll.

…ted issues (#93115) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.

This patch adds support for verifying local type units in .debug_names section. It adds a test to test if the TU index is valid, and a test that tests that an error is found inside the name entry for a type unit. We don't need to test all other errors in the name entry because these are essentially identical to compile unit entries, they just use a different DWARF unit offset index.

Also edited file header formatting on sin_fuz and cos_fuzz

- Added the dialect's prefix to operations' descriptions to follow the same style inside the TableGen file. - Minor changes in the 'emitc.yield' operation's description.

This tutorial gives an introduction to the `mlir-opt` tool, focusing on how to run basic passes with and without options, run pass pipelines from the CLI, and point out particularly useful flags. --------- Co-authored-by: Jeremy Kun <[email protected]> Co-authored-by: Mehdi Amini <[email protected]>

This patch implements sandboxir::UnaryInstruction class and updates sandboxir::LoadInst and sandboxir::CastInst to inherit from it instead of sandboxir::Instruction.

- After 'lowerConstantIntrinsics' is merged into pre-isel lowering

Follow up to #100923

Currently a Module has a std::optional<UnwindTable> which is created when the UnwindTable is requested from outside the Module. The idea is to delay its creation until the Module has an ObjectFile initialized, which will have been done by the time we're doing an unwind. However, Module::GetUnwindTable wasn't doing any locking, so it was possible for two threads to ask for the UnwindTable for the first time, one would be created and returned while another thread would create one, destroy the first in the process of emplacing it. It was an uncommon crash, but it was possible. Grabbing the Module's mutex would be one way to address it, but when loading ELF binaries, we start creating the SymbolTable on one thread (ObjectFileELF) grabbing the Module's mutex, and then spin up worker threads to parse the individual DWARF compilation units, which then try to also get the UnwindTable and deadlock if they try to get the Module's mutex. This changes Module to have a concrete UnwindTable as an ivar, and when it adds an ObjectFile or SymbolFileVendor, it will call the Update method on it, which will re-evaluate which sections exist in the ObjectFile/SymbolFile. UnwindTable used to have an Initialize method which set all the sections, and an Update method which would set some of them if they weren't set. I unified these with the Initialize method taking a `force` option to re-initialize the section pointers even if they had been done already before. This is addressing a rare crash report we've received, and also a failure Adrian spotted on the -fsanitize=address CI bot last week, it's still uncommon with ASAN but it can happen with the standard testsuite. rdar://128876433

struct SuperEmpty { struct{ int a[0];} b;}; Such 0 sized structs in c++ mode can not be ignored in i386 for that c++ fields are never empty.But when EmitVAArg, its size is 0, so that va_list not increase.Maybe we can just Ignore this kind of arguments, like X86_64 did. Fix #86385.

…ltsPass (#101281) By using DenseMap to minimize the traveral time of callOps, and the efficiency of running this pass has been greatly improved.

…e (#101546)

…y has non-zero address space (#101589)

…by a `getAddressSpace`

…sts. NFC (#101540) Loads/stores/reinterpret/vfncvt.f.f.w/vfwcvt.f.f.v/vmerge/vmv.v.v are all expected to work for f16 vectors with Zvfhmin. Remove the handcrafted Zvfhmin test that partially tested this. Splits the vfwcvt.f.f.v and vfncvt.f.f.w tests into their own file so we can have a separate RUN line from the float<->int conversions.

Memcpy, and other memory intrinsics, typically try to use wider load/store if the source and destination addresses are aligned. In CodeGenPrepare, look for calls to memory intrinsics and, if the object is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit) boundaries if it is large enough that we expect memcpy to use wider load/store instructions to copy it. Fixes #101295

Implement handling for `v8plus` feature bit to allow the user to switch between V8 and V8+ mode with 32-bit code. Currently this only sets the appropriate ELF machine type and flags; codegen changes will be done in future patches. This is done as a prerequisite for `-mv8plus` flag on clang (#98713).

Remove elementwise description for builtins that don't perform elementwise operations.

CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExprAgg.cpp

CONFLICT (content): Merge conflict in sycl/CMakeLists.txt

CONFLICT (content): Merge conflict in llvm/lib/SYCLLowerIR/SYCLVirtualFunctionsAnalysis.cpp

…v_pulldown

Original commit: KhronosGroup/SPIRV-LLVM-Translator@097435f74df64bd

Update a test after llvm-project commit 92a0654 ("[LowerMemIntrinsics] Lower llvm.memmove to wide memory accesses (#100122)", 2024-07-26). Original commit: KhronosGroup/SPIRV-LLVM-Translator@84f525abd741c30

The spirv-tools package used by the job seems no longer available for Ubuntu 20.04. Original commit: KhronosGroup/SPIRV-LLVM-Translator@88e546a689b2679

#2656) This change fixes the assertion: Assertion `C->getType() == Ty->getElementType() && "Wrong type in array element initializer"' failed Original commit: KhronosGroup/SPIRV-LLVM-Translator@e099f77cc6d02b9

Add translation for atan, acos, asin, cosh, sinh and tanh LLVM intrinsics which are mapped to corresponding OpenCL extended instructions. Original commit: KhronosGroup/SPIRV-LLVM-Translator@95605477e7fe635

Verified locally by changing the version from `65536` to `66560` in `test/transcoding/atomics.spt`. Original commit: KhronosGroup/SPIRV-LLVM-Translator@62ea823e64307e8

OpenCL spec supports atomic_float/atomic_double type for atomic_compare_exchange* functions. However, value and return type in OpAtomicCompareExchange in SPIR-V spec must be integer type. Therefore, in OCLToSPIRV translation we need to translate floating-point type to corresponding integer variant that has the same type size. Floating-point value is bitcasted so that bits remain the same. Original commit: KhronosGroup/SPIRV-LLVM-Translator@e5544014fba77d3

This change due to llvm/llvm-project#98949.

We overwrite the value in 8096a6f from llvm::Module::Override (4) to llvm::Module::Max (7).

Align with community commit: 0953fb4

…eam behavior. (#15051) Signed-off-by: Marcos Maronas <[email protected]>

…ion (#14992)" This reverts commit 0a9db37.

Commits on Aug 17, 2024

Merge branch 'sycl' into llvmspirv_pulldown

jsji committed Aug 17, 2024

Configuration menu

View commit details

Copy full SHA for 43f6fd3

Browse repository at this point

Copy the full SHA

43f6fd3 View commit details

Browse the repository at this point in the history

Commits on Aug 22, 2024

Fix conflict resolution in libclc

jsji committed Aug 22, 2024

Configuration menu

View commit details

Copy full SHA for 78703d9

Browse repository at this point

Copy the full SHA

78703d9 View commit details

Browse the repository at this point in the history

Commits on Aug 23, 2024

Use -ffp-model=fast isntead of Ofast

jsji committed Aug 23, 2024

Configuration menu

View commit details

Copy full SHA for a142ad3

Browse repository at this point

Copy the full SHA

a142ad3 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM and SPIRV-LLVM-Translator pulldown (WW33 2024) #15106

LLVM and SPIRV-LLVM-Translator pulldown (WW33 2024) #15106

Commits on Jul 31, 2024

Commits on Aug 1, 2024

Commits on Aug 2, 2024

Commits on Aug 5, 2024

Commits on Aug 15, 2024

Commits on Aug 16, 2024

Commits on Aug 17, 2024

Commits on Aug 20, 2024

Commits on Aug 21, 2024

Commits on Aug 22, 2024

Commits on Aug 23, 2024