-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM and SPIRV-LLVM-Translator pulldown (WW33 2024) #15106
Commits on Jul 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 79996cd - Browse repository at this point
Copy the full SHA 79996cdView commit details -
[mlir][emitc] Lower arith.divui, remui (#99313)
This commit lowers `arith.divui` and `arith.remui` to EmitC by wrapping those operations with type conversions.
Configuration menu - View commit details
-
Copy full SHA for 36b2c22 - Browse repository at this point
Copy the full SHA 36b2c22View commit details -
Configuration menu - View commit details
-
Copy full SHA for f395d82 - Browse repository at this point
Copy the full SHA f395d82View commit details -
[Clang][Interp] Fix the location of uninitialized base warning (#100761)
Fix the location of `diag::note_constexpr_uninitialized_base`, make it same as current interpreter. This PR does not print type name with namespacethat was used to improve the current interpreter's type dump of base class type. --------- Signed-off-by: yronglin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6434dce - Browse repository at this point
Copy the full SHA 6434dceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 38e6453 - Browse repository at this point
Copy the full SHA 38e6453View commit details -
[MLIR][OpenMP] NFC: Sort clauses alphabetically (1/2) (#101193)
This patch sorts the clause lists for the following OpenMP operations: - omp.parallel - omp.teams - omp.sections - omp.wsloop - omp.distribute - omp.task This change results in the reordering of operation arguments, so impacted unit tests are updated accordingly.
Configuration menu - View commit details
-
Copy full SHA for b3b4696 - Browse repository at this point
Copy the full SHA b3b4696View commit details -
[MLIR][OpenMP] NFC: Sort clauses alphabetically (2/2) (#101194)
This patch sorts the clause lists for the following OpenMP operations: - omp.taskloop - omp.taskgroup - omp.target_data - omp.target_enter_data - omp.target_exit_data - omp.target_update - omp.target This change results in the reordering of operation arguments, so impacted unit tests are updated accordingly.
Configuration menu - View commit details
-
Copy full SHA for a3800a6 - Browse repository at this point
Copy the full SHA a3800a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c406ea - Browse repository at this point
Copy the full SHA 5c406eaView commit details -
[AMDGPU,test] Add one more while-break case (#101300)
which suffers from v_mov issue.
Configuration menu - View commit details
-
Copy full SHA for dae7fb8 - Browse repository at this point
Copy the full SHA dae7fb8View commit details -
Revert "[DAG][NFC] Use SDPatternMatch for VScale in some instances"
This reverts commit d230442. The m_Add and m_Mul are commutative but the code does not expect the communtativity.
Configuration menu - View commit details
-
Copy full SHA for 22ce333 - Browse repository at this point
Copy the full SHA 22ce333View commit details -
[libclang/python] Factor out unsaved files processing (#101308)
Factor out the processing of unsaved files into its own function as suggested by @Endilll [here](https://github.com/llvm/llvm-project/pull/78114/files#r1697730196)
Configuration menu - View commit details
-
Copy full SHA for 4c670b2 - Browse repository at this point
Copy the full SHA 4c670b2View commit details -
[libclang/python] type-ignore
Any
returns from library calls (#101310)On its own, this change leads to _more_ strict typing errors as the functions are mostly not annotated so far, so the `# type: ignore`s are reported as Unused. This is part of the work leading up to #78114 though, and one of the bigger parts factored out from it, so these will later lead to less strict typing errors as the functions are annotated with return types.
Configuration menu - View commit details
-
Copy full SHA for 5525566 - Browse repository at this point
Copy the full SHA 5525566View commit details -
Merge from 'sycl' to 'sycl-web'
iclsrc committedJul 31, 2024 Configuration menu - View commit details
-
Copy full SHA for af8011d - Browse repository at this point
Copy the full SHA af8011dView commit details -
Configuration menu - View commit details
-
Copy full SHA for d8b985c - Browse repository at this point
Copy the full SHA d8b985cView commit details -
[libc++] Refactor tests for shared_mutex and shared_timed_mutex (#100…
…783) This makes the tests less flaky and also makes a few other refactorings like using traits instead of .compile.fail.cpp tests.
Configuration menu - View commit details
-
Copy full SHA for 29ef92b - Browse repository at this point
Copy the full SHA 29ef92bView commit details -
[libc++][docs] Remove misadded entry for P1937R2 from Cxx20Papers.csv…
… (#100741) P1937R2 only contains core language change and doesn't touch the library at all. Closes #100613.
Configuration menu - View commit details
-
Copy full SHA for 569d8ce - Browse repository at this point
Copy the full SHA 569d8ceView commit details -
[lldb] Fixed lldb-server crash (TestLogHandler was not thread safe) (…
…#101326) Host::LaunchProcess() requires to SetMonitorProcessCallback. This callback is called from the child process monitor thread. We cannot control this thread anyway. lldb-server may crash if there is a logging around this callback because TestLogHandler is not thread safe. I faced this issue debugging 100 simultaneous child processes. Note StreamLogHandler::Emit() in lldb/source/Utility/Log.cpp already contains the similar mutex.
Configuration menu - View commit details
-
Copy full SHA for 93fecc2 - Browse repository at this point
Copy the full SHA 93fecc2View commit details -
[CycleInfo] skip unreachable predecessors (#101316)
If an unreachable block B branches to a block S inside a cycle, it may cause S to be incorrectly treated as an entry to the cycle. We avoid that by skipping unreachable predecessors when locating entries.
Configuration menu - View commit details
-
Copy full SHA for 05c3a4b - Browse repository at this point
Copy the full SHA 05c3a4bView commit details -
Revert "[CMake][Fuchsia] Include libunwind and libc++abi in baremetal…
… build" (#101340) Reverts llvm/llvm-project#100908
Configuration menu - View commit details
-
Copy full SHA for 9b017db - Browse repository at this point
Copy the full SHA 9b017dbView commit details -
[cmake] switch to CMake's native
check_{compiler,linker}_flag
(#96171)Broken out from #93429 Somewhat closing the loop opened by 7017e6c. Co-authored-by: Ryan Prichard <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89946bd - Browse repository at this point
Copy the full SHA 89946bdView commit details -
[libc++][NFC] Remove two unused implementation details
__find_end
(……#100685) Those two `__find_end` functions are no longer used after 101d1e9. After that commit, `std::find_end` started dispatching to `__find_end_classic`, and `ranges::find_end` to `__find_end_impl`, which means that the two `__find_end` functions were no longer necessary. Fixes #100569
Configuration menu - View commit details
-
Copy full SHA for 5b6b488 - Browse repository at this point
Copy the full SHA 5b6b488View commit details -
[libcxx][test] Mark sort.pass.cpp as a long test (#100720)
Picolib testing skips any test requiring this feature, I just didn't know the feature existed until now.
Configuration menu - View commit details
-
Copy full SHA for f90e51a - Browse repository at this point
Copy the full SHA f90e51aView commit details -
[libcxx][test] Require long_tests for eval.PR44847.pass.cp (#100722)
This takes 1m40s to run when testing picolib on qemu. This isn't the end of the world but that's on an AArch64 server. So if someone felt the need to mark this unsupported in the first place, it's likely much slower on average hardware.
Configuration menu - View commit details
-
Copy full SHA for 23d188e - Browse repository at this point
Copy the full SHA 23d188eView commit details -
[libclang] Use check_linker_flag instead of llvm_check_linker_flag
Follow-up to #96171 in an attempt to fix the Solaris bots.
Configuration menu - View commit details
-
Copy full SHA for 7ab6433 - Browse repository at this point
Copy the full SHA 7ab6433View commit details -
Configuration menu - View commit details
-
Copy full SHA for b5a7d3b - Browse repository at this point
Copy the full SHA b5a7d3bView commit details -
[libc++] Make std::unique_lock available with _LIBCPP_HAS_NO_THREADS …
…(#99562) This is a follow up to llvm/llvm-project#98717, which made lock_guard available under _LIBCPP_HAS_NO_THREADS. We can make unique_lock available under similar circumstances. This patch follows the example in #98717, by: - Removing the preprocessor guards for _LIBCPP_HAS_NO_THREADS in the unique_lock header. - providing a set of custom mutex implementations in a local header. - using custom locks in tests that can be made to work under `no-threads`.
Configuration menu - View commit details
-
Copy full SHA for e9d5842 - Browse repository at this point
Copy the full SHA e9d5842View commit details -
[libc++] Move the benchmarks under libcxx/test (#99371)
This is an intermediate and fairly mechanical step towards unifying the benchmarks with the rest of the test suite. Moving this around requires a few changes, notably making sure we don't throw a wrench into the discovery process of the normal test suite. This won't be a problem anymore once benchmarks are taken into account by the test setup out of the box.
Configuration menu - View commit details
-
Copy full SHA for 78b4b5c - Browse repository at this point
Copy the full SHA 78b4b5cView commit details -
[clang][CUDA] Add 'noconvergent' function and statement attribute
- For languages following SPMD/SIMT programming model, functions and call sites are marked 'convergent' by default. 'noconvergent' is added in this patch to allow developers to remove that 'convergent' attribute when it's safe. Reviewers: nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm Reviewed By: arsenm Pull Request: llvm/llvm-project#100637
Configuration menu - View commit details
-
Copy full SHA for fa84297 - Browse repository at this point
Copy the full SHA fa84297View commit details -
[libc][AArch64] Add an AArch64 setjmp/longjmp. (#101177)
Previously, building libc for AArch64 in `LLVM_LIBC_FULL_BUILD` mode would fail because no implementation of setjmp/longjmp was available. This was the only obstacle, so now a full AArch64 build of libc is possible. This implementation automatically supports PAC and BTI if compiled with the appropriate options. I would have liked to do the same for MTE stack tagging, but as far as I can see there's currently no predefined macro that allows detection of `-fsanitize=memtag-stack`, so I've left that one as a TODO. AAPCS64 delegates the x18 register to individual platform ABIs, and allows them to choose what it's used for, which may or may not require setjmp and longjmp to save and restore it. To accommodate this, I've introduced a libc configuration option. The default is on, because the only use of x18 I've so far encountered uses it to store information specific to the current stack frame (so longjmp does need to restore it), and this is also safe behavior in the default situation where the platform ABI specifies no use of x18 and it becomes a temporary register (restoring it to its previous value is no worse than any _other_ way for a function call to clobber it). But if a platform ABI needs to use x18 in a way that requires longjmp to leave it alone, they can turn the option off.
Configuration menu - View commit details
-
Copy full SHA for 2a6268d - Browse repository at this point
Copy the full SHA 2a6268dView commit details -
[scudo] Separated committed and decommitted entries. (#100818)
Initially, the LRU list stored all mapped entries with no distinction between the committed (non-madvise()'d) entries and decommitted (madvise()'d) entries. Now these two types of entries are separated into two lists, allowing future cache logic to branch depending on whether or not entries are committed or decommitted. Furthermore, the retrieval algorithm will prioritize committed entries over decommitted entries. Specifically, valid-fit, committed entries (not necessarily optimal-fit) are retrieved before optimal-fit, decommitted entries.
Configuration menu - View commit details
-
Copy full SHA for 8b2688b - Browse repository at this point
Copy the full SHA 8b2688bView commit details -
[InstCombine] Recognize copysign idioms (#101324)
This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern exists in some graphics applications/math libraries. Alive2: https://alive2.llvm.org/ce/z/ggQZV2
Configuration menu - View commit details
-
Copy full SHA for b455edb - Browse repository at this point
Copy the full SHA b455edbView commit details -
[SandboxIR] Implement AddrSpaceCastInst (#101260)
This patch implements sandboxir::AddrSpaceCastInst which mirrors llvm::AddrSpaceCastInst.
Configuration menu - View commit details
-
Copy full SHA for d36c9f8 - Browse repository at this point
Copy the full SHA d36c9f8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3715035 - Browse repository at this point
Copy the full SHA 3715035View commit details -
Add llvm::Error C API, LLVMCantFail
It's barely testable - the test does exercise the code, but wouldn't fail on an empty implementation. It would cause a memory leak though (because the error handle wouldn't be unwrapped/reowned) which could be detected by asan and other leak detectors.
Configuration menu - View commit details
-
Copy full SHA for 45ef0d4 - Browse repository at this point
Copy the full SHA 45ef0d4View commit details -
Merge from 'main' to 'sycl-web' (200 commits)
CONFLICT (content): Merge conflict in clang/lib/Sema/SemaDecl.cpp
Configuration menu - View commit details
-
Copy full SHA for f712941 - Browse repository at this point
Copy the full SHA f712941View commit details -
[SandboxIR] Implement IntToPtrInst (#101359)
This patch implements sandboxir::IntToPtrInst which mirrors llvm::IntToPtrInst.
Configuration menu - View commit details
-
Copy full SHA for f0197a7 - Browse repository at this point
Copy the full SHA f0197a7View commit details -
[SCEV] Add coverage for flag inference with vscale strided IVs
Given vscale is a power of two, we should be able to prove no-self-wrap in these cases. We currently don't, but an upcoming change will fix this.
Configuration menu - View commit details
-
Copy full SHA for faf3333 - Browse repository at this point
Copy the full SHA faf3333View commit details -
[lldb] Unify the way we get the Target in CommandObject (#101208)
Currently, CommandObjects are obtaining a target in a variety of ways. Often the command incorrectly operates on the selected target. As an example, when a breakpoint command is running, the current target is passed into the command but the target that hit the breakpoint is not the selected target. In other places we use the CommandObject's execution context, which is frozen during the execution of the command, and comes with its own limitations. Finally, we often want to fall back to the dummy target if no real target is available. Instead of having to guess how to get the target, this patch introduces one helper function in CommandObject to get the most relevant target. In order of priority, that's the target from the command object's execution context, from the interpreter's execution context, the selected target or the dummy target. rdar://110846511
Configuration menu - View commit details
-
Copy full SHA for 8398ad9 - Browse repository at this point
Copy the full SHA 8398ad9View commit details -
[libc++][NFC] Add missing license headers
Also standardize the license comment in several files where it was different from what we normally do.
Configuration menu - View commit details
-
Copy full SHA for 6a54dfb - Browse repository at this point
Copy the full SHA 6a54dfbView commit details -
Remove already implemented target independent optimization opportunit…
…y (#101233) Fixes #101127 See this working example: https://godbolt.org/z/z15oj15eP
Configuration menu - View commit details
-
Copy full SHA for a847b0f - Browse repository at this point
Copy the full SHA a847b0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 28a0792 - Browse repository at this point
Copy the full SHA 28a0792View commit details -
[mlir][math] Fix polynomial
math.asin
approximation (#101247)The polynomial approximation for asin is only good between [-9/16, 9/16]. Values beyond that range must be remapped to achieve good numeric results. This is done by the equation below: `arcsin(x) = PI/2 - arcsin(sqrt(1.0 - x*x))`
Configuration menu - View commit details
-
Copy full SHA for a3fb301 - Browse repository at this point
Copy the full SHA a3fb301View commit details -
[SandboxIR] Implement FPToSIInst (#101362)
This patch implements sandboxir::FPToSIInst which mirrors llvm::FPToSIInst.
Configuration menu - View commit details
-
Copy full SHA for 9718f3d - Browse repository at this point
Copy the full SHA 9718f3dView commit details -
[MVT][TableGen] Extend Machine Value Type to
uint16_t
(#99657)RFC: https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274 compile-time-tracker: https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time Currently 208 out of 256 MVTs are used, it will be run out soon, so ultimately we need to extend the original `MVT::SimpleValueType` from `uint8_t` to `uint16_t` to accomodate more types. The `MatcherTable` uses `unsigned char` for encoding the matcher code, so the extended MVTs are no longer fit into the table, thus we need to use VBR to encode them as we do on others that are wider than 8 bits. The statistics below shows the difference of "Total Array size" of the matcher table that appears in every files: ``` Table Before After Change(%) WebAssemblyGenDAGISel.inc 23576 23775 0.844 NVPTXGenDAGISel.inc 173498 173498 0 RISCVGenDAGISel.inc 2179121 2369929 8.756 AVRGenDAGISel.inc 2754 2754 0 PPCGenDAGISel.inc 163315 163617 0.185 MipsGenDAGISel.inc 47280 47447 0.353 SystemZGenDAGISel.inc 56243 56461 0.388 AArch64GenDAGISel.inc 467893 487830 4.261 MSP430GenDAGISel.inc 8069 8069 0 LoongArchGenDAGISel.inc 78928 79131 0.257 XCoreGenDAGISel.inc 3432 3432 0 BPFGenDAGISel.inc 3733 3733 0 VEGenDAGISel.inc 65174 66456 1.967 LanaiGenDAGISel.inc 2067 2067 0 X86GenDAGISel.inc 628787 636987 1.304 ARMGenDAGISel.inc 170968 171036 0.040 HexagonGenDAGISel.inc 155764 155764 0 SparcGenDAGISel.inc 5762 5798 0.625 AMDGPUGenDAGISel.inc 504356 504463 0.021 R600GenDAGISel.inc 29785 29785 0 ``` The statistics below shows the runtime peak memory usage by compiling a simple C program: `/bin/time -v clang -target $TARGET -O3 -c test.c` ``` int test(int a) { return a * 3; } ``` ``` Target Before(kbytes) After(kbytes) Change(%) wasm64 110172 110088 -0.076 nvptx64 109784 109980 0.179 riscv64 114020 113656 -0.319 avr 110352 110068 -0.257 ppc64 112612 112476 -0.120 mips64 113588 113668 0.070 systemz 110860 110760 -0.090 aarch64 113704 113432 -0.239 msp430 110284 110200 -0.076 loongarch64 111052 110756 -0.267 xcore 108340 108020 -0.295 bpf 110620 110708 0.080 ve 110960 110920 -0.036 lanai 110180 109960 -0.200 x86_64 113640 113304 -0.296 arm64 113540 113172 -0.324 hexagon 114620 114684 0.056 sparc 110412 110136 -0.250 amdgcn 118164 117144 -0.863 r600 111200 110508 -0.622 ```
Configuration menu - View commit details
-
Copy full SHA for a4c6ebe - Browse repository at this point
Copy the full SHA a4c6ebeView commit details -
[Support] Erase blocks after DomTree::eraseNode (#101195)
Change eraseNode to require that the basic block is still contained inside the function. This is a preparation for using numbers of basic blocks inside the dominator tree, which are invalid for blocks that are not inside a function.
Configuration menu - View commit details
-
Copy full SHA for 6d103d7 - Browse repository at this point
Copy the full SHA 6d103d7View commit details -
[lldb] Add constant value mode for RegisterLocation in UnwindPlans (#…
…100624) This is useful for language runtimes that compute register values by inspecting the state of the currently running process. Currently, there are no mechanisms enabling these runtimes to set register values to arbitrary values. The alternative considered would involve creating a dwarf expression that produces an arbitrary integer (e.g. using OP_constu). However, the current data structure for Rows is such that they do not own any memory associated with dwarf expressions, which implies any such expression would need to have static storage and therefore could not contain a runtime value. Adding a new rule for constants leads to a simpler implementation. It's also worth noting that this does not make the "Location" union any bigger, since it already contains a pointer+size pair.
Configuration menu - View commit details
-
Copy full SHA for 9fe455f - Browse repository at this point
Copy the full SHA 9fe455fView commit details -
[SandboxIR] Implement FPToUIInst (#101369)
This patch implements sandboxir::FPToUIInst which mirrors llvm::FPToUIInst.
Configuration menu - View commit details
-
Copy full SHA for 8b17b12 - Browse repository at this point
Copy the full SHA 8b17b12View commit details -
Configuration menu - View commit details
-
Copy full SHA for 35a2e6d - Browse repository at this point
Copy the full SHA 35a2e6dView commit details -
[Modules][Diagnostic] Don't claim a METADATA mismatch is always in PC…
…H file. (#101280) You can provide more than one AST file as an input. Emit a path for a file with a problem, so you can disambiguate between multiple files. rdar://65005546
Configuration menu - View commit details
-
Copy full SHA for f9827e6 - Browse repository at this point
Copy the full SHA f9827e6View commit details -
[flang][OpenMP] Reland Fix copyprivate semantic checks (#95799) (#101…
…009) There are some cases in which variables used in OpenMP constructs are predetermined as private. The semantic checks for copyprivate were not handling those cases. Besides that, shared symbols were not being properly represented in some cases. When there was no previously declared private (implicit) symbol, no new association symbols, representing shared ones, were being created. These symbols must always be inserted in constructs that may privatize the original symbol: parallel, teams and task generating constructs. Fixes #87214 and #86907
Configuration menu - View commit details
-
Copy full SHA for 366eade - Browse repository at this point
Copy the full SHA 366eadeView commit details -
[AMDGPU][True16][MC] duplicate vop1 tests to fake16 and update real-t…
…rue16 flags for GFX12 (#100849) duplicate vop1 tests to fake16 and update real-true16 flags for GFX12 creating duplications here to avoid bulk copy in the following true16 patches --------- Co-authored-by: guochen2 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 055893f - Browse repository at this point
Copy the full SHA 055893fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6aa723d - Browse repository at this point
Copy the full SHA 6aa723dView commit details -
[lldb] Allow mapping object file paths (#101361)
This introduces a `target.object-map` which allows us to remap module locations, much in the same way as source mapping works today. This is useful, for instance, when debugging coredumps, so we can replace some of the locations where LLDB attempts to load shared libraries and executables from, without having to setup an entire sysroot.
Configuration menu - View commit details
-
Copy full SHA for 0a01e8f - Browse repository at this point
Copy the full SHA 0a01e8fView commit details -
[SandboxIR] Implement SIToFPInst (#101374)
This patch implements sandboxir::SIToFPInst which mirrors llvm::SIToFPInst.
Configuration menu - View commit details
-
Copy full SHA for 6d3317e - Browse repository at this point
Copy the full SHA 6d3317eView commit details -
Revert "[scudo] Separated committed and decommitted entries." (#101375)
Reverts llvm/llvm-project#100818
Configuration menu - View commit details
-
Copy full SHA for 496feda - Browse repository at this point
Copy the full SHA 496fedaView commit details -
Configuration menu - View commit details
-
Copy full SHA for d0b4b6b - Browse repository at this point
Copy the full SHA d0b4b6bView commit details -
[Sema][sycl] Restore additional SYCL condition in alignas handling
bf02f41 changed sema handling of alignas to accomodate C23, which implements alignas as a type specifier instead of attribute. When merged, the SYCL-specific conditions that were applied before for CXX11 weren't brought over. This patch re-adds it, which addresses a number of test regressions in SemaSYCL.
Configuration menu - View commit details
-
Copy full SHA for eab0074 - Browse repository at this point
Copy the full SHA eab0074View commit details -
[BOLT][DWARF] Sort GDBIndexTUEntryVector (#101264)
Sorts GDBIndexTUEntryVector in decreasing order by hash to ensure determinism when parallelized.
Configuration menu - View commit details
-
Copy full SHA for 33960ce - Browse repository at this point
Copy the full SHA 33960ceView commit details -
[Offload] Allow to record kernel launch stack traces (#100472)
Similar to (de)allocation traces, we can record kernel launch stack traces and display them in case of an error. However, the AMD GPU plugin signal handler, which is invoked on memroy faults, cannot pinpoint the offending kernel. Insteade print `<NUM>`, set via `OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The recoding/record uses a ring buffer of fixed size (for now 8). For `trap` errors, we print the actual kernel name, and trace if recorded.
Configuration menu - View commit details
-
Copy full SHA for 9a10132 - Browse repository at this point
Copy the full SHA 9a10132View commit details -
[libc][math][c23] Refactor expf16 (#101373)
Also updates and sorts CMake target dependencies, and corrects the smoke test that expected expf16(sNaN) to return sNaN instead of aNaN, although the test still passed, as FPMatcher only checks whether both sides are NaN, not whether they're the same NaN value.
Configuration menu - View commit details
-
Copy full SHA for b66aa3b - Browse repository at this point
Copy the full SHA b66aa3bView commit details -
AMDGPU: Add testcase for materializing sgpr frame indexes (#101306)
These add some IR tests for 57d10b4. These do rely on some lucky MIR placement to test the scc input, but I haven't found a better way to do it. Also, scc handling in inline asm is extremely buggy.
Configuration menu - View commit details
-
Copy full SHA for ef67664 - Browse repository at this point
Copy the full SHA ef67664View commit details -
[mlir][Linalg] Deprecate
linalg::tileToForallOp
and `linalg::tileTo……ForallOpUsingTileSizes` (#91878) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. **Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` **Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF`** ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */ FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. **Changes to `transform.structured.tile_using_forall`** The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6740d70 - Browse repository at this point
Copy the full SHA 6740d70View commit details -
[Clang] Suppress missing architecture error when doing LTO (#100652)
Summary: The `nvlink-wrapper` can do LTO now, which means we can still create some LLVM-IR without needing an architecture. In the case that we try to invoke `nvlink` internally, that will still fail. This patch simply defers the error until later so we can use `--lto-emit-llvm` to get the IR without specifying an architecture.
Configuration menu - View commit details
-
Copy full SHA for 2bf58f5 - Browse repository at this point
Copy the full SHA 2bf58f5View commit details -
GNU ld since 2.41 supports this option, which is mildly useful. It omits the section header table and non-ALLOC sections (including .symtab/.strtab (--strip-all)). This option is simple to implement and might be used by LLDB to test program headers parsing without the section header table (#100900). -z sectionheader, which is the default, is also added. Pull Request: llvm/llvm-project#101286
Configuration menu - View commit details
-
Copy full SHA for 5d972c5 - Browse repository at this point
Copy the full SHA 5d972c5View commit details -
[NFC][LLVM] Add RealtimeSanitizer LLVM code owners (#101231)
Split from #100596
Configuration menu - View commit details
-
Copy full SHA for bf5e56d - Browse repository at this point
Copy the full SHA bf5e56dView commit details -
[RISCV] Use X0 for VLMax for slide1up/slide1down in lowerVectorIntrin…
…sicScalars. (#101384) Previously, we created a vsetvlimax intrinsic. Using X0 simplifies the code and enables some optimizations to kick when the exact value of vlmax is known.
Configuration menu - View commit details
-
Copy full SHA for 3626443 - Browse repository at this point
Copy the full SHA 3626443View commit details -
[libc][math][c23] Add dfma{l,f128} and dsub{l,f128} C23 math function…
…s (#101089) Co-authored-by: OverMighty <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 30b5d4a - Browse repository at this point
Copy the full SHA 30b5d4aView commit details -
[cmake] Replace remaining uses of llvm_check_linker_flag with CMake b…
…uiltin 89946bd changed uses of llvm_check_{compiler,linker} calls with equivalent CMake builtins and removed the llvm versions. Some references still existed to llvm_check_linker_flag, so this commit replaces those.
Configuration menu - View commit details
-
Copy full SHA for b287447 - Browse repository at this point
Copy the full SHA b287447View commit details -
[BOLT][DWARF][NFC] Split DIEBuilder::finish (#101244)
Split DIEBuilder::finish so that code updating .debug_names is in a separate function.
Configuration menu - View commit details
-
Copy full SHA for 910012e - Browse repository at this point
Copy the full SHA 910012eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c6a3f4e - Browse repository at this point
Copy the full SHA c6a3f4eView commit details -
Revert "[lldb] Reland 2402b32 with
/H
to debug the windows build is……sue" This reverts commit e72cdae, which broke LLVM's lldb builder for Windows msvc.
Configuration menu - View commit details
-
Copy full SHA for 9effefb - Browse repository at this point
Copy the full SHA 9effefbView commit details -
[SCEV] Use power of two facts involving vscale when inferring wrap fl…
…ags (#101380) SCEV has logic for inferring wrap flags on AddRecs which are known to control an exit based on whether the step is a power of two. This logic only considered constants, and thus did not trigger for steps such as (4 x vscale) which are common in scalably vectorized loops. The net effect is that we were very sensative to the preservation of nsw/nuw flags on such IVs, and could not infer trip counts if they got lost for any reason. --------- Co-authored-by: Nikita Popov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7583c48 - Browse repository at this point
Copy the full SHA 7583c48View commit details -
[mlir][Transforms] Dialect conversion: Skip materializations when run…
…ning without converter (#101318) TODO: test case
Configuration menu - View commit details
-
Copy full SHA for 2aa96fc - Browse repository at this point
Copy the full SHA 2aa96fcView commit details -
[mlir][sparse] implement
sparse_tensor.extract_value
operation. (#1……01220)
Peiming Liu authoredJul 31, 2024 Configuration menu - View commit details
-
Copy full SHA for 951a363 - Browse repository at this point
Copy the full SHA 951a363View commit details -
[TableGen] Pass ValueTypeByHwMode by const reference in a couple plac…
…es. NFC ValueTypeByHwMode contains a std::map. We shouldn't copy it if we don't need to . Fixes #101406.
Configuration menu - View commit details
-
Copy full SHA for c2dc46c - Browse repository at this point
Copy the full SHA c2dc46cView commit details -
[TableGen] Add an explicit cast to allow one TypeSetByHwMode construc…
…tor to be removed. NFC This constructor was taking a ValueTypeByMode by value to create an ArrayRef. By adding an explicit cast from ValueTypeByHwMode to TypeSetByHwMode we allow the ArrayRef to be implicitly converted from a single element.
Configuration menu - View commit details
-
Copy full SHA for 24f8d10 - Browse repository at this point
Copy the full SHA 24f8d10View commit details -
[libc++] Drop support for the C++20 Synchronization Library before C+…
…+20 (#82008) When we initially implemented the C++20 synchronization library, we reluctantly accepted for the implementation to be backported to C++03 upon request from the person who provided the patch. This was when we were only starting to have experience with the issues this can create, so we flinched. Nowadays, we have a much stricter stance about not backporting features to previous standards. We have recently started fixing several bugs (and near bugs) in our implementation of the synchronization library. A recurring theme during these reviews has been how difficult to understand the current code is, and upon inspection it becomes clear that being able to use a few recent C++ features (in particular lambdas) would help a great deal. The code would still be pretty intricate, but it would be a lot easier to reason about the flow of callbacks through things like __thread_poll_with_backoff. As a result, this patch drops support for the synchronization library before C++20. This makes us more strictly conforming and opens the door to major simplifications, in particular around atomic_wait which was supported all the way to C++03. This change will probably have some impact on downstream users, however since the C++20 synchronization library was added only in LLVM 10 (~3 years ago) and it's quite a niche feature, the set of people trying to use this part of the library before C++20 should be reasonably small.
Configuration menu - View commit details
-
Copy full SHA for bf1666f - Browse repository at this point
Copy the full SHA bf1666fView commit details -
[libc] Add vsscanf function (#101402)
Summary: Adds support for the `vsscanf` function similar to `sscanf`. Based off of llvm/llvm-project#97529.
Configuration menu - View commit details
-
Copy full SHA for 38ef692 - Browse repository at this point
Copy the full SHA 38ef692View commit details -
[mlir][sparse] introduce
sparse_tensor.coiterate
operation. (#101100)This PR introduces `sparse_tensor.coiterate` operation, which represents a loop that traverses multiple sparse iteration space.
Peiming Liu authoredJul 31, 2024 Configuration menu - View commit details
-
Copy full SHA for 785a24f - Browse repository at this point
Copy the full SHA 785a24fView commit details -
[RISCV] Remove unncessary FP extensions from some integer only vector…
… tests. I'm going to do a review to make sure we are testing Zvfhmin instead of Zvfh where clang expects it to work for half types, like loads/stores. Removing unnecessary FP makes less things to review.
Configuration menu - View commit details
-
Copy full SHA for 26766a0 - Browse repository at this point
Copy the full SHA 26766a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 74f9579 - Browse repository at this point
Copy the full SHA 74f9579View commit details -
[Clang] [NFC] Fix potential dereferencing of nullptr (#101405)
This patch replaces getAs with castAs and dyn_cast with cast to ensure type safety and prevents potential null pointer dereferences. These changes enforce compile-time checks for correct type casting in ASTContext and CodeGenModule.
Configuration menu - View commit details
-
Copy full SHA for cf79aba - Browse repository at this point
Copy the full SHA cf79abaView commit details -
[NVPTX] Make minimum/maximum work on older GPUs
We want to use newer instructions if we are targeting sufficiently new SM and PTX versions. If we cannot use those newer instructions, let LLVM synthesize the sequence from more fundamental instructions.
Configuration menu - View commit details
-
Copy full SHA for 6f318d4 - Browse repository at this point
Copy the full SHA 6f318d4View commit details -
[SandboxIR][NFC] Move BasicBlock class definition up (#101422)
To make future PRs smaller.
Configuration menu - View commit details
-
Copy full SHA for ee0f43a - Browse repository at this point
Copy the full SHA ee0f43aView commit details -
[RISCV][GlobalISel] Legalize Scalable Vector Loads and Stores (#84965)
This patch supports legalizing load and store instruction for scalable vectors in RISCV
Configuration menu - View commit details
-
Copy full SHA for a0d8fa5 - Browse repository at this point
Copy the full SHA a0d8fa5View commit details -
[GISEL][RISCV] RegBank Select for Scalable Vector Load/Store (#99932)
This patch supports GlobalISel for register bank selection for scalable vector load and store instructions in RISC-V
Configuration menu - View commit details
-
Copy full SHA for 1c66ef9 - Browse repository at this point
Copy the full SHA 1c66ef9View commit details -
[NFC][Clang] Clean up VisitUnaryPlus by removing unused FP feature ch…
…eck (#101412) This commit removes an unnecessary call to `E->hasStoredFPFeatures()` within the `VisitUnaryPlus` function. The method's return value was not being used, leading to a redundant operation. The removal of this line streamlines the function and eliminates an unneeded check for stored floating-point features.
Configuration menu - View commit details
-
Copy full SHA for d5d1cf0 - Browse repository at this point
Copy the full SHA d5d1cf0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 65d3c22 - Browse repository at this point
Copy the full SHA 65d3c22View commit details -
[SandboxIR] Implement PHINodes (#101111)
This patch implements sandboxir::PHINode which mirrors llvm::PHINode. Based almost entirely on work by vporpo.
Configuration menu - View commit details
-
Copy full SHA for 3403b59 - Browse repository at this point
Copy the full SHA 3403b59View commit details -
Forward declare OSSpinLockLock on MacOS since it's not shipped on the…
… system. (#101392) Fixes build errors on some SDKs. rdar://132607572
Configuration menu - View commit details
-
Copy full SHA for 3a4c7cc - Browse repository at this point
Copy the full SHA 3a4c7ccView commit details
Commits on Aug 1, 2024
-
[LegalizeTypes][RISCV][LoongArch] Optimize promotion of ucmp. (#101366)
ucmp can be promoted with either sext or zext. RISC-V and LoongArch prefer sext for promoting i32 to i64 unless the inputs are known to be zero extended already. This patch uses the existing SExtOrZExtPromotedOperands function that is used by SETCC promotion to intelligently handle this.
Configuration menu - View commit details
-
Copy full SHA for 307d124 - Browse repository at this point
Copy the full SHA 307d124View commit details -
[DirectX] Rename backend DXIL resource analysis passes to DXILResourc…
…eMD*. NFC These passes will be replaced soon as we move to the target extension based resource handling in the DirectX backend, but removing them now before the replacement stuff is all up and running would be very disruptive. However, we do need to move these passes out of the way to avoid symbol conflicts with the new DXILResourceAnalysis in the Analysis library. Note: I tried an even simpler hack in #100698 but it doesn't really work. A rename is the most expedient path forward here. Pull Request: llvm/llvm-project#101393
Configuration menu - View commit details
-
Copy full SHA for 1c5f6cf - Browse repository at this point
Copy the full SHA 1c5f6cfView commit details -
[lldb] Use Target references instead of pointers in CommandObject (NFC)
The GetTarget helper returns a Target reference so there's reason to convert it to a pointer and check its validity.
Configuration menu - View commit details
-
Copy full SHA for 5dbbc3b - Browse repository at this point
Copy the full SHA 5dbbc3bView commit details -
[RISCV] Use experimental.vp.splat to splat specific vector length ele…
…ments. (#101329) Previously, llvm IR is hard to create a scalable vector splat with a specific vector length, so we use riscv.vmv.v.x and riscv.vmv.v.f to do this work. But the two rvv intrinsics needs strict type constraint which can not support fixed vector types and illegal vector types. Using vp.splat could preserve old functionality and also generate more optimized code for vector types and illegal vectors. This patch also fixes crash for getEVT not serving ptr types.
Configuration menu - View commit details
-
Copy full SHA for 87af9ee - Browse repository at this point
Copy the full SHA 87af9eeView commit details -
[TableGen][MVT] Lower the maximum 16-bit MVT from 16384 to 511. (#101…
…401) MachineValueTypeSet in tablegen allocates an array with a bit per MVT. This used to be 256 bits, with the introduction of 16-bit MVT it ballooned to 65536 bits. I suspect this is increasing the memory usage of many of the data structures used by CodeGenDAGPatterns. Since we don't need the full 16-bit range yet, this patch proposes lowering the maximum MVT to 511 and using only 512 bits for MachineValueTypeSet's storage.
Configuration menu - View commit details
-
Copy full SHA for e2c74aa - Browse repository at this point
Copy the full SHA e2c74aaView commit details -
[RISCV][GISel] Slightly simplify the regbank selection for G_LOAD/STO…
…RE. NFC (#101431) Merge the isVector early out with the previous check for isVector.
Configuration menu - View commit details
-
Copy full SHA for a1ba4fb - Browse repository at this point
Copy the full SHA a1ba4fbView commit details -
[mlir][spirv] Fix tablegen generator script's stripping of prefixes (…
…#101378) This script looks for existing definitions with the `SPIRV_` prefix, so that it can preserve them when updating the file. When the commit 2d62833 changed the prefix from `SPV_`, the number of characters to strip from matched names was not updated, which broke this feature. This commit fixes remaining cases that weren't fixed by 339c87a. The relationship of this script to the files it is meant to maintain is still bitrotten in other ways.
Configuration menu - View commit details
-
Copy full SHA for bc6834f - Browse repository at this point
Copy the full SHA bc6834fView commit details -
[MemProf] Fix when function has indirect call (#101170)
When function has indirect call in LTO mode, it causes `assert(Alias)` in `findProfiledCalleeThroughTailCalls`
Configuration menu - View commit details
-
Copy full SHA for e6aeb3f - Browse repository at this point
Copy the full SHA e6aeb3fView commit details -
[SandboxIR][NFC] Factor out common test for CastInst subclasses (#101…
…410) The tests for most CastInst sub-classes, except AddrSpaceCastInst, are very similar. This patch creates a common template function for all of them.
Configuration menu - View commit details
-
Copy full SHA for 9227fd7 - Browse repository at this point
Copy the full SHA 9227fd7View commit details -
[mlir][Transforms] Preserve all analysis in print passes (#101315)
PrintIRPass, PrintOpStatsPass and PrintOpGraphPass don't mutate IR so preserve all analysis to save computation resource a bit.
Configuration menu - View commit details
-
Copy full SHA for 42c413b - Browse repository at this point
Copy the full SHA 42c413bView commit details -
Configuration menu - View commit details
-
Copy full SHA for ed12f80 - Browse repository at this point
Copy the full SHA ed12f80View commit details -
[nsan][NFC] Use cast when dyn_cast is not needed. (#101147)
Use `cast` instead to replace `dyn_cast` when `dyn_cast` is not needed/not checked.
Configuration menu - View commit details
-
Copy full SHA for 430b90f - Browse repository at this point
Copy the full SHA 430b90fView commit details -
[RISCV] Increase default tail duplication threshold to 6 at -O3 (#98873)
This is just like AArch64. Changing the threshold to 6 will increase the code size, but will also decrease unconditional branches. CPUs with wide fetch/issue units can benefit from it. The value 6 may be debatable, we can set it to `SchedModel.IssueWidth`.
Configuration menu - View commit details
-
Copy full SHA for 27b6080 - Browse repository at this point
Copy the full SHA 27b6080View commit details -
[TargetLowering] Remove weird use of MVT::isVoid in an assert. (#101436)
At the time this was written there were no vector types in MVT. The order was: -scalar integer types -scalar FP types -isVoid I believe this isVoid check was to catch walking off the end of the scalar FP types. While the isInteger()==isInteger caught walking off the end of scalar integer types. These days we have: -scalar integer types -scalar FP types -fixed vector integer types -fixed vector FP types -scalable vector integer types -scalable vector FP types. -Glue -isVoid So checking isVoid doesn't detect what it used to. I've changed it to check isFloatingPoint() == isFloatingPoint() instead.
Configuration menu - View commit details
-
Copy full SHA for 991a621 - Browse repository at this point
Copy the full SHA 991a621View commit details -
[BOLT][NFC] Add timers for MetadataManager invocations
Test Plan: added bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: llvm/llvm-project#101267
Configuration menu - View commit details
-
Copy full SHA for fb97b4f - Browse repository at this point
Copy the full SHA fb97b4fView commit details -
[BOLT][NFC] Print timers in perf2bolt invocation
When BOLT is run in AggregateOnly mode (perf2bolt), it exits with code zero so destructors are not run thus TimerGroup never prints the timers. Add explicit printing just before the exit to honor options requesting timers (`--time-rewrite`, `--time-aggr`). Test Plan: updated bolt/test/timers.c Reviewers: ayermolo, maksfb, rafaelauler, dcci Reviewed By: dcci Pull Request: llvm/llvm-project#101270
Configuration menu - View commit details
-
Copy full SHA for 3f51bec - Browse repository at this point
Copy the full SHA 3f51becView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d068f7 - Browse repository at this point
Copy the full SHA 9d068f7View commit details -
AMDGPU/GlobalISel: Permit mapping G_FRAME_INDEX to sgprs (#101325)
eliminateFrameIndex should now properly handle materializing frame indices in SGPRs, so treat this like the other constant operand types. On average this will produce worse code; we need to detect VGPR uses, and improve SGPR->VGPR frame index folds.
Configuration menu - View commit details
-
Copy full SHA for 86815a1 - Browse repository at this point
Copy the full SHA 86815a1View commit details -
[MIR] Remove separate Size variable from parseMachineMemoryOperand. N…
…FC (#101453) Size is updated in sync with MemoryType. Instead of maintaining a separate Size, use the size from MemoryType where needed.
Configuration menu - View commit details
-
Copy full SHA for 72ed808 - Browse repository at this point
Copy the full SHA 72ed808View commit details -
Configuration menu - View commit details
-
Copy full SHA for 129a8e1 - Browse repository at this point
Copy the full SHA 129a8e1View commit details -
[GlobalISel][TableGen] MIR Pattern Variadics (#100563)
Allow for matching & rewriting a variable number of arguments in an instructions. Solves #87459
Configuration menu - View commit details
-
Copy full SHA for 972c029 - Browse repository at this point
Copy the full SHA 972c029View commit details -
[RISCV] Add vector bf16 load/store intrinsic tests. NFC
This adds bf16 to the unit stride, strided, and index load and store intrinsics. clang already assumes these work with Zvfbfmin.
Configuration menu - View commit details
-
Copy full SHA for 04e8433 - Browse repository at this point
Copy the full SHA 04e8433View commit details -
[RISCV] Replace Zvfh with Zvfhmin on vector load/store intrinsic test…
…s. NFC clang uses these with Zvfhmin so we should test them.
Configuration menu - View commit details
-
Copy full SHA for 84a3739 - Browse repository at this point
Copy the full SHA 84a3739View commit details -
[GlobalISel][TableGen] Make variadic-errors.td test more robust
Use a regex instead of hardcoded numbers for anonymous pattern suffixes.
Configuration menu - View commit details
-
Copy full SHA for ab33c3d - Browse repository at this point
Copy the full SHA ab33c3dView commit details -
[C++20] [Modules] Always emit the inline builtins (#101278)
See the attached test for the motivation example. If we're too greedy to not emit the definition for inline builtins, we may meet a middle end crash. And it should be good to emit inline builtins always.
Configuration menu - View commit details
-
Copy full SHA for e167f75 - Browse repository at this point
Copy the full SHA e167f75View commit details -
AMDGPU: Cleanup extract_subvector actions (NFC) (#101454)
The base AMDGPUISelLowering was setting custom action on 16-bit vector types, but also set in SIISelLowering.
Configuration menu - View commit details
-
Copy full SHA for 1d2b2d2 - Browse repository at this point
Copy the full SHA 1d2b2d2View commit details -
[RISCV] Add back missing vmv_v_x_vl pattern predicates (#101455)
Looks like these got left behind in 17e2d07
Configuration menu - View commit details
-
Copy full SHA for fdce0bf - Browse repository at this point
Copy the full SHA fdce0bfView commit details -
[lldb][FreeBSD] Fix NativeRegisterContextFreeBSD_{arm,mips64,powerpc}…
… declarations (#101403) Similar to #97796, fix the type of the `native_thread` parameter for the arm, mips64 and powerpc variants of `NativeRegisterContextFreeBSD_*`. Otherwise, this leads to compile errors similar to: ``` lldb/source/Plugins/Process/FreeBSD/NativeRegisterContextFreeBSD_powerpc.cpp:85:39: error: out-of-line definition of 'NativeRegisterContextFreeBSD_powerpc' does not match any declaration in 'lldb_private::process_freebsd::NativeRegisterContextFreeBSD_powerpc' 85 | NativeRegisterContextFreeBSD_powerpc::NativeRegisterContextFreeBSD_powerpc( | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ```
Configuration menu - View commit details
-
Copy full SHA for 7088a5e - Browse repository at this point
Copy the full SHA 7088a5eView commit details -
Revert "[mlir][Transforms] Dialect conversion: Skip materializations …
…when running without converter (#101318)" This reverts commit 2aa96fc. This was merged without a test. Also it seems it was only fixing an issue for users which used a particular workaround that is not actually needed anymore (skipping UnrealizedConversionCast operands).
Configuration menu - View commit details
-
Copy full SHA for 17ba4f4 - Browse repository at this point
Copy the full SHA 17ba4f4View commit details -
[libc++][NFC] Avoid opening namespace std in the tests (#94160)
This also adds a few FIXMEs where we use UB in the tests.
Configuration menu - View commit details
-
Copy full SHA for 5dfdac7 - Browse repository at this point
Copy the full SHA 5dfdac7View commit details -
[LoongArch] Pre-commit test for aligning stack objects passed to memo…
…ry intrinsics. NFC
Configuration menu - View commit details
-
Copy full SHA for f51a479 - Browse repository at this point
Copy the full SHA f51a479View commit details -
[SimplifyLibCalls] Constant fold nan libcall (#101459)
Reference: https://en.cppreference.com/w/c/numeric/math/nan The logic is copied from clang frontend: https://github.com/llvm/llvm-project/blob/1d2b2d29d733200b704f38d220d22ecc07d6cf42/clang/lib/AST/ExprConstant.cpp#L14741-L14777 --------- Co-authored-by: Nikita Popov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4f42deb - Browse repository at this point
Copy the full SHA 4f42debView commit details -
[C++20][Modules] Allow using stdarg.h with header units (#100739)
Summary: Macro like `va_start`/`va_end` marked as builtin functions that makes these identifiers special and it results in redefinition of the identifiers as builtins and it hides macro definitions during preloading C++ modules. In case of modules Clang ignores special identifiers but `PP.getCurrentModule()` was not set. This diff fixes IsModule detection logic for this particular case. Test Plan: check-clang --------- Co-authored-by: Chuanqi Xu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f3761a4 - Browse repository at this point
Copy the full SHA f3761a4View commit details -
Simplify hot-path size computations in BumpPtrAllocator. (#101312)
Configuration menu - View commit details
-
Copy full SHA for 65c000a - Browse repository at this point
Copy the full SHA 65c000aView commit details -
[AMDGPU] SIWholeQuadMode: avoid execz effects in exact regions (#101157)
Exact mode regions within WQM may have EXEC=0 in divergent control flow. This occurs if a branch is only taken by helper lanes and an instruction requiring WQM disabling is encountered. The current code extends the exact region as far as possible; however, this can result in it including instructions with unwanted side effects at EXEC=0. In particular readfirstlane combined with scalar loads can produce invalid memory accesses in this circumstance. Workaround this by shrinking exact regions to only the instructions requiring WQM disabling when unwanted side effects are present. Eventually we should branch over these regions when EXEC=0, but this requires visibility of CFG/divergence information not currently available.
Configuration menu - View commit details
-
Copy full SHA for 3611c0b - Browse repository at this point
Copy the full SHA 3611c0bView commit details -
[flang][OpenMP] Delayed privatization for variables with `equivalence…
…` association (#100531) Handles variables that are storage associated via `equivalence`. The problem is that these variables are declared as `fir.ptr`s while their privatized storage is declared as `fir.ref` which was triggering a validation error in the OpenMP dialect.
Configuration menu - View commit details
-
Copy full SHA for bbadbf7 - Browse repository at this point
Copy the full SHA bbadbf7View commit details -
[clang][NFC] Add CWG882 test (Defining
main
as deleted) (#101382)https://cplusplus.github.io/CWG/issues/882.html This was implemented for Clang 3.5 by b63b6ee
Configuration menu - View commit details
-
Copy full SHA for 3b3b891 - Browse repository at this point
Copy the full SHA 3b3b891View commit details -
[mlir][vector] Add tests xfer-permute-lowering (nfc)(2/n) (#96033)
Adds more tests to: * vector-transfer-permutation-lowering.mlir Specifically, adds tests for: * out-of-bounds access for the `TransferWritePermutationLowering` pattern * in-bounds access for `TransferWriteNonPermutationLowering` + `TransferWritePermutationLowering` Also renames `@permutation_with_mask_xfer_write_fixed_width` as `@xfer_write_non_transposing_permutation_map`. This is a part of a larger effort to make sure that all key cases for patterns under populateVectorTransferPermutationMapLoweringPatterns (*) are tested. I also want to make sure that tests use consistent function and variable names. (*) transform.apply_patterns.vector.transfer_permutation_patterns in TD parlance)
Configuration menu - View commit details
-
Copy full SHA for 85fbc4f - Browse repository at this point
Copy the full SHA 85fbc4fView commit details -
Revert "Simplify hot-path size computations in BumpPtrAllocator. (#10…
…1312)" This reverts commit 65c000a.
Configuration menu - View commit details
-
Copy full SHA for 67730ae - Browse repository at this point
Copy the full SHA 67730aeView commit details -
[LowerMatrixIntrinsics] Fix type suffix for matrix.multiply.* (#100940)
Based on the [proposal PDF](https://llvm.org/devmtg/2020-09/slides/Hahn-Matrix_Support_in_LLVM_and_Clang.pdf) and the test code under [llvm/test/Transforms/LowerMatrixIntrinsics](https://github.com/llvm/llvm-project/tree/main/llvm/test/Transforms/LowerMatrixIntrinsics), the suffix for the `@llvm.matrix.multiply.*` intrinsic should be {output matrix type}.{input matrix 1 type}.{input matrix 2 type} (e.g., `@llvm.matrix.multiply.v4i32.v4i32.v4i32`). This PR corrects the places where these suffixes do not follow the aforementioned format.
Configuration menu - View commit details
-
Copy full SHA for 05d3f5e - Browse repository at this point
Copy the full SHA 05d3f5eView commit details -
[clang][analyzer] Improve PointerSubChecker (#96501)
The checker could report false positives if pointer arithmetic was done on pointers to non-array data before pointer subtraction. Another problem is fixed that could cause false positive if members of the same structure but in different memory objects are subtracted.
Configuration menu - View commit details
-
Copy full SHA for cab91ec - Browse repository at this point
Copy the full SHA cab91ecView commit details -
[NFC][libc++][libc++abi][libunwind][test] Fix/unify AIX triples used …
…in LIT tests (#101196) This patch fixes/unifies AIX target triples used in libc++, libc++abi, and libunwind LIT tests.
Configuration menu - View commit details
-
Copy full SHA for 2d36550 - Browse repository at this point
Copy the full SHA 2d36550View commit details -
[Inliner] Fix bugs for partial inlining with vector
In the cost model of partial inlining, cost for intrinsics will be applied. However, some intrinsics for vector have invalid cost, which is not allowed for partial inlining. Instead of assertion, we directly do not do partial inlining in this circumstance to avoid compiling errors.
Configuration menu - View commit details
-
Copy full SHA for 0a5e572 - Browse repository at this point
Copy the full SHA 0a5e572View commit details -
[libc] Change the GPU loaders to LLVM executables (#101442)
Summary: I am going to rework these tools to just me LLVM tools. This patch is pretty much NFC to set up the CMake for that.
Configuration menu - View commit details
-
Copy full SHA for feeb833 - Browse repository at this point
Copy the full SHA feeb833View commit details -
Revert "[Inliner] Fix bugs for partial inlining with vector"
This reverts commit llvm/llvm-project@0a5e572, since I forgot to start a pull request.
Configuration menu - View commit details
-
Copy full SHA for 241a05a - Browse repository at this point
Copy the full SHA 241a05aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 097a1d2 - Browse repository at this point
Copy the full SHA 097a1d2View commit details -
AMDGPU: Add baseline test for copysign combine
We can use known bits information to avoid masking out one or both of the operands.
Configuration menu - View commit details
-
Copy full SHA for 2feb058 - Browse repository at this point
Copy the full SHA 2feb058View commit details -
[NVPTX][NFC] Remove unneeded declarations in test (#101167)
Only the bf16 declarations are needed, as only they are lowered in AutoUpgrade.cpp. f16 and other builtins have LLVM intrinsics already defined.
Configuration menu - View commit details
-
Copy full SHA for 3d1e1d9 - Browse repository at this point
Copy the full SHA 3d1e1d9View commit details -
[libc++] Remove dedicated namespaces for ranges functions (#76543)
We originally put implementation-detail function objects into individual namespaces for `std::ranges` without a good reason for doing so. This practice was continued, presumably because there was prior art. Since there's no reason to keep these namespaces, this commit removes them, which will slightly impact binary size. This commit does not apply to CPOs, some of which need additional work.
Configuration menu - View commit details
-
Copy full SHA for d10dc5a - Browse repository at this point
Copy the full SHA d10dc5aView commit details -
[libc++] Fix missing declarations of uses_allocator_construction_args…
… (#67044) We were not declaring `__uses_allocator_construction_args` helper functions, leading to several valid uses failing to compile. This patch solves the problem by moving these helper functions into a struct, which also reduces the amount of redundant SFINAE we need to perform since most overloads are checking for a cv-qualfied pair. Fixes #66714 Co-authored-by: Louis Dionne <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for beecf2c - Browse repository at this point
Copy the full SHA beecf2cView commit details -
[libc++] Avoid using **this in error messages for expected monadic op…
…erations (#84840) Instead of using **this in error messages for std::expected monadic operations, use value(). As shown in LWG3969, **this can trigger unintended ADL and while it's only an error message, we might as well be ADL-correct there too. Co-authored-by: Louis Dionne <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3891468 - Browse repository at this point
Copy the full SHA 3891468View commit details -
[NFC] [Clang] Some core issues have changed status from tentatively r…
…eady -> ready / review (#97200) Also classes the "ready" status similarly to "tentatively ready" in make_cxx_dr_status
Configuration menu - View commit details
-
Copy full SHA for 14c8feb - Browse repository at this point
Copy the full SHA 14c8febView commit details -
[LLVM][ISel][SVE] Remove redundant merging fp patterns. (#101351)
Since "vselect cond, (binop, x, y), x" became the canonical form the equivalent PatFrags for "binop x, (vselect cond, y, 0)" are no longer required.
Configuration menu - View commit details
-
Copy full SHA for 1fbd7be - Browse repository at this point
Copy the full SHA 1fbd7beView commit details -
[lldb][test] Disable vla test on Windows
For the same reasons as 6cfac49. This test was added in llvm/llvm-project#100710. It fails because when we're linking with link.exe, -gdwarf has no effect and we get a PDB file anyway. The Windows on Arm lldb bot uses link.exe. "C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\Hostx86\\arm64\\link.exe" <...> 08/01/2024 01:47 PM 2,956,488 vla.cpp.ilk 08/01/2024 01:47 PM 6,582,272 vla.cpp.pdb 08/01/2024 01:47 PM 734,208 vla.cpp.tmp
Configuration menu - View commit details
-
Copy full SHA for 229a165 - Browse repository at this point
Copy the full SHA 229a165View commit details -
[Flang][Driver] Introduce -fopenmp-targets offloading option (#100152)
This patch modifies the flang driver to introduce the `-fopenmp-targets` option to the frontend compiler invocations corresponding to the OpenMP host device on offloading-enabled compilations. This option holds the list of offloading triples associated to the compilation and is used by clang to determine whether offloading calls should be generated for the host.
Configuration menu - View commit details
-
Copy full SHA for e145123 - Browse repository at this point
Copy the full SHA e145123View commit details -
[AIX] Turn on
#pragma mc_func
check by default (#101336)llvm/llvm-project#99888 added a check (and corresponding options) to flag uses of `#pragma mc_func` on AIX. This PR turns on the check by default.
Configuration menu - View commit details
-
Copy full SHA for b933517 - Browse repository at this point
Copy the full SHA b933517View commit details -
[clang] Fix crash with multiple non-parenthsized
sizeof
(#101297)There are 5 unary operators that can be followed by a non-parenthesized expression: `sizeof`, `__datasizeof`, `__alignof`, `alignof`, `_Alignof`. When we nest them too deep, `BalancedDelimiterTracker` does not help, because there are no parentheses, and we crash. Instead, this patch recognize chains of those operators, and parse them with sufficient stack space. Fixes #45061
Configuration menu - View commit details
-
Copy full SHA for 130c135 - Browse repository at this point
Copy the full SHA 130c135View commit details -
[Clang] Fix definition of layout-compatible to ignore empty classes (…
…#92103) Also changes the behaviour of `__builtin_is_layout_compatible` None of the historic nor the current definition of layout-compatible classes mention anything about base classes (other than implicitly through being standard-layout) and are defined in terms of members, not direct members.
Configuration menu - View commit details
-
Copy full SHA for 5d7357c - Browse repository at this point
Copy the full SHA 5d7357cView commit details -
[libc++] Increase atomic_ref's required alignment for small types (#9…
…9654) This patch increases the alignment requirement for std::atomic_ref such that we can guarantee lockfree operations more often. Specifically, we require types that are 1, 2, 4, 8, or 16 bytes in size to be aligned to at least their size to be used with std::atomic_ref. This is the case for most types, however a notable exception is `long long` on x86, which is 8 bytes in length but has an alignment of 4. As a result of this patch, one has to be more careful about the alignment of objects used with std::atomic_ref. Failure to provide a properly-aligned object to std::atomic_ref is a precondition violation and is technically UB. On the flipside, this allows us to provide an atomic_ref that is actually lockfree more often, which is an important QOI property. More information in the discussion at llvm/llvm-project#99570 (comment). Co-authored-by: Louis Dionne <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 59ca618 - Browse repository at this point
Copy the full SHA 59ca618View commit details -
[InstCombine] Convert mem intrinsic with null into a noop (#100388)
When src/dest passed into memset/memcpy is null: ``` len == 0: this call is a noop. len != 0: the behavior is undefined. ``` See also https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics Alive2: https://alive2.llvm.org/ce/z/tJeRNL This patch converts these mem intrinsic calls into an assumption `len == 0` to mitigate code-size bloat caused by JumpThreading.
Configuration menu - View commit details
-
Copy full SHA for 4e89d11 - Browse repository at this point
Copy the full SHA 4e89d11View commit details -
[libc++][stringbuf] Test and document LWG2995. (#100879)
As mentioned in the LWG issue libc++ has already implemented the optimization. This adds tests and documents the implementation defined behaviour. Drive-by fixes an initialization.
Configuration menu - View commit details
-
Copy full SHA for d5a6ec1 - Browse repository at this point
Copy the full SHA d5a6ec1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ad15e5 - Browse repository at this point
Copy the full SHA 5ad15e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for e7630a0 - Browse repository at this point
Copy the full SHA e7630a0View commit details -
[RISCV] Support f16 vmv.v.v and vmerge.vvm intrinsics with Zvfhmin. (…
…#101457) Clang expects that this works.
Configuration menu - View commit details
-
Copy full SHA for d2c0459 - Browse repository at this point
Copy the full SHA d2c0459View commit details -
[Mem2Reg] Replace block maps with block numbers (#101391)
Very minor performance improvement.
Configuration menu - View commit details
-
Copy full SHA for e833e8b - Browse repository at this point
Copy the full SHA e833e8bView commit details -
[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727)
Currently, the LowerConstantIntrinsics pass does an RPO traversal of every function... only to find that many functions don't have constant intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is already a pre-isel intrinsic lowering pass, which iterates over intrinsic declarations and lowers all users. Call lowerConstantIntrinsics from this pass to avoid the extra iteration over the entire IR and the RPO traversal.
Configuration menu - View commit details
-
Copy full SHA for b5fc083 - Browse repository at this point
Copy the full SHA b5fc083View commit details -
[ConstantRange] Add support for
shlWithNoWrap
(#100594)This patch adds initial support for `ConstantRange:: shlWithNoWrap` to fold dtcxzyw/llvm-tools#22. However, this patch cannot fix the original issue. Improvements will be submitted in subsequent patches.
Configuration menu - View commit details
-
Copy full SHA for 1a5d892 - Browse repository at this point
Copy the full SHA 1a5d892View commit details -
[Hexagon] Do not optimize address of another function's block (#101209)
When the constant extender optimization pass encounters an instruction that uses an extended address pointing to another function's block, avoid adding the instruction to the extender list for the current machine function. Fixes llvm/llvm-project#99714
Configuration menu - View commit details
-
Copy full SHA for 68df06a - Browse repository at this point
Copy the full SHA 68df06aView commit details -
[libc] Remove verbose printing from hdrgen tool (#101376)
Summary: This fills the terminal with information already present from the `add_custom_command(COMMENT ...)` field, so it breaks everything into new lines. Remove this print to clean that up.
Configuration menu - View commit details
-
Copy full SHA for 6d40580 - Browse repository at this point
Copy the full SHA 6d40580View commit details -
[Hexagon] Fix concat lowering for HVX for 64B vector length (#98318)
When concatenation of vector instructions is formed, as a part of it vector rotation is performed. The direction of the shift was not correctly calculated. This fixes the rotation factor.
Configuration menu - View commit details
-
Copy full SHA for 2771ea4 - Browse repository at this point
Copy the full SHA 2771ea4View commit details -
[mlir][vector] Update tests for xfer-permute-lowering (nfc) (#101468)
Updates formatting and variable names in: * vector-transfer-permutation-lowering.mlir This is primarily to improve consistency, both within this particular test file as well as across tests. In particular, with this PR I'm adopting similar naming convention to that that's already present in vector-transfer-flatten.mlir. Overview of changes: * All memref input arguments are re-named as `%mem`. * All vector input arguments are re-named as `%vec`. * All tensor input arguments are re-named as `%dest`. * LIT variables are update to be consistent with input arguments. * Renamed all output arguments as `%res`. * Updated indentation to be more C-like.
Configuration menu - View commit details
-
Copy full SHA for 98e4413 - Browse repository at this point
Copy the full SHA 98e4413View commit details -
[flang][runtime] Avoid call recursion in CopyElement runtime. (#101421)
Device compilers may fail to identify maximum stack size required by a kernel that calls CopyElement due to potential recursive calls. To avoid this, we can use dynamically allocated Stack. To avoid dynamic allocations on the host for simple cases, the Stack implementation has a reserved space (that ends up being allocated on the program stack). I tested both pre-allocated and 0-reserve implementations on the host, and all passed. The actual reserve values might be tuned as needed.
Configuration menu - View commit details
-
Copy full SHA for 2177a17 - Browse repository at this point
Copy the full SHA 2177a17View commit details -
[flang] Add ability to have special allocator for descriptor data (#1…
…00690) This patch enhances the descriptor with the ability to have specialized allocator. The allocators are registered in a dedicated registry and the index of the desired allocator is stored in the descriptor. The default allocator, std::malloc, is registered at index 0. In order to have this allocator index in the descriptor, the f18Addendum field is repurposed to be able to hold the presence flag for the addendum (lsb) and the allocator index. Since this is a change in the semantic and name of the 7th field of the descriptor, the CFI_VERSION is bumped to the date of the initial change. This patch only adds the ability to have this features as part of the descriptor but does not add specific allocator yet. CUDA fortran will be the first user of this feature to allocate descriptor data in the different type of device memory base on the CUDA attribute. --------- Co-authored-by: Slava Zakharin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6df4e7c - Browse repository at this point
Copy the full SHA 6df4e7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for c7c5e05 - Browse repository at this point
Copy the full SHA c7c5e05View commit details -
[NFC][asan][odr] Use IntrusiveList for a ListOfGlobals
Extracted from #100923.
Configuration menu - View commit details
-
Copy full SHA for 2a5f7e5 - Browse repository at this point
Copy the full SHA 2a5f7e5View commit details -
[libc] Implement vasprintf and asprintf (#98824)
[libc] Implement vasprintf and asprintf --------- Co-authored-by: Izaak Schroeder <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a5e67fb - Browse repository at this point
Copy the full SHA a5e67fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c31123 - Browse repository at this point
Copy the full SHA 0c31123View commit details -
[MachO] Remove redundant bounds check (#100176)
The condition was duplicated, the correct one for this message would have been `ImportsEnd > SymbolsEnd`. However, this is a subset of `ImportEnd > Symbols` (since `Symbols <= SymbolsEnd`), so it can be removed altogether. I made this thinko in 686d8ce. Note that that change wasn't intended to be permanent, and served as a quick stopgap to facilitate testing chained fixups in LLD before Apple upstreamed their implementation. Fixes #90662 Fixes #87203
Configuration menu - View commit details
-
Copy full SHA for 7da1dbb - Browse repository at this point
Copy the full SHA 7da1dbbView commit details -
[ELF] Support relocatable files using CREL with explicit addends
... using the temporary section type code 0x40000020 (`clang -c -Wa,--crel,--allow-experimental-crel`). LLVM will change the code and break compatibility (Clang and lld of different versions are not guaranteed to cooperate, unlike other features). CREL with implicit addends are not supported. --- Introduce `RelsOrRelas::crels` to iterate over SHT_CREL sections and update users to check `crels`. (The decoding performance is critical and error checking is difficult. Follow `skipLeb` and `R_*LEB128` handling, do not use `llvm::decodeULEB128`, whichs compiles to a lot of code.) A few users (e.g. .eh_frame, LLDDwarfObj, s390x) require random access. Pass `/*supportsCrel=*/false` to `relsOrRelas` to allocate a buffer and convert CREL to RELA (`relas` instead of `crels` will be used). Since allocating a buffer increases, the conversion is only performed when absolutely necessary. --- Non-alloc SHT_CREL sections may be created in -r and --emit-relocs links. SHT_CREL and SHT_RELA components need reencoding since r_offset/r_symidx/r_type/r_addend may change. (r_type may change because relocations referencing a symbol in a discarded section are converted to `R_*_NONE`). * SHT_CREL components: decode with `RelsOrRelas` and re-encode (`OutputSection::finalizeNonAllocCrel`) * SHT_RELA components: convert to CREL (`relToCrel`). An output section can only have one relocation section. * SHT_REL components: print an error for now. SHT_REL to SHT_CREL conversion for -r/--emit-relocs is complex and unsupported yet. Link: https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600 Pull Request: llvm/llvm-project#98115
Configuration menu - View commit details
-
Copy full SHA for 0af07c0 - Browse repository at this point
Copy the full SHA 0af07c0View commit details -
[SandboxIR][NFC] Introduce templated CastInstImpl to simplify subclas…
…ses (#101427) The CastInst subclasses all have pretty much the same implementation. Add a helper templated class to help stamp out the subclasses more succinctly.
Configuration menu - View commit details
-
Copy full SHA for d68a4d5 - Browse repository at this point
Copy the full SHA d68a4d5View commit details -
[SystemZ][z/OS] Fix incorrect codegen for ADA_ENTRY pseudo instructio…
…n (#101415) The current MCInstBuilder for generating an ALGFI when loading something from the ADA is incorrect and will crash the compiler. r0 must also be excluded from the registers returned as the result, since it is treated as the value "0" on z/OS. Also add some tests to properly test the paths where LLILF and ALGFI are generated. --------- Co-authored-by: Tony Tao <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bc747c3 - Browse repository at this point
Copy the full SHA bc747c3View commit details -
[libc] created fuzz test for sin function (#101411)
Verifies that sin function output is correct by comparing with MPFR output. NaN and inf are not tested (as our output will vary compared to MPFR), and signed zeroes are already tested in unit tests.
Configuration menu - View commit details
-
Copy full SHA for 90065da - Browse repository at this point
Copy the full SHA 90065daView commit details -
[libc] Fix math fuzzers (#101529)
Fix minor typos that accumulated while the math fuzzers were disabled.
Configuration menu - View commit details
-
Copy full SHA for 3497211 - Browse repository at this point
Copy the full SHA 3497211View commit details -
[libc] heap_sort_fuzz deleted unnecessary includes (#101535)
Including src/__suppot/macros/config.h is unnecessary
Configuration menu - View commit details
-
Copy full SHA for 83e6d87 - Browse repository at this point
Copy the full SHA 83e6d87View commit details -
AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowe…
…ring (#96759) Consider the new atomic metadata when choosing to expand as cmpxchg instead.
Configuration menu - View commit details
-
Copy full SHA for 41439d5 - Browse repository at this point
Copy the full SHA 41439d5View commit details -
[libc++] Check correctly ref-qualified __is_callable in algorithms (#…
…73451) We were only checking that the comparator was rvalue callable, when in reality the algorithms always call comparators as lvalues. This patch also refactors the tests for callable requirements and expands it to a few missing algorithms. Fixes #69554
Configuration menu - View commit details
-
Copy full SHA for 8d151f8 - Browse repository at this point
Copy the full SHA 8d151f8View commit details -
[AMDGPU][True16][MC] Support v_swap_b16. (#100442)
support V_SWAP_B16 true16 encoding in asm/disasm for GFX11/12 Co-authored-by: guochen2 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ab91371 - Browse repository at this point
Copy the full SHA ab91371View commit details -
[lld][InstrProf] Add "Separate" irpgo-profile-sort option (#101084)
Add the "Separate" option `--irpgo-profile-sort <profile` instead of just the "Joined" option `--irpgo-profile-sort=<profile>`. This is useful if the path has a `,` for some reason which would break when trying to use `-Wl,--irpgo-profile-sort=<profile-with-comma>`. While I'm here, use `static_cast<>` instead of the C style cast introduced in llvm/llvm-project#100627
Configuration menu - View commit details
-
Copy full SHA for f95bd62 - Browse repository at this point
Copy the full SHA f95bd62View commit details -
workflows: Fix libclc-tests (#101524)
The old out-of-tree build configuration stopped working and in tree builds are supported now, so we should use the in tree configuration. The only downside is we can't run the tests any more, but at least we will be able to test the build again.
Configuration menu - View commit details
-
Copy full SHA for 0512ba0 - Browse repository at this point
Copy the full SHA 0512ba0View commit details -
[LV] Add more tests with switches.
Extra tests for llvm/llvm-project#99808, including cost model tests.
Configuration menu - View commit details
-
Copy full SHA for 8557035 - Browse repository at this point
Copy the full SHA 8557035View commit details -
[SandboxIR] Implement the remaining CastInst sub-classes (#101537)
This patch implements: sandboxir::UIToFPInst sandboxir::FPExtInst sandboxir::FPTruncInst sandboxir::SExtInst sandboxir::ZExtInst sandboxir::TruncInst
Configuration menu - View commit details
-
Copy full SHA for b6b0a24 - Browse repository at this point
Copy the full SHA b6b0a24View commit details -
[libc] Use LLVM CommandLine for loader tool (#101501)
Summary: This patch removes the ad-hoc parsing that I used previously and replaces it with the LLVM CommnadLine interface. This doesn't change any functionality, but makes it easier to maintain.
Configuration menu - View commit details
-
Copy full SHA for 5e32698 - Browse repository at this point
Copy the full SHA 5e32698View commit details -
[clang-format] Rename variable more sensitively (#100943)
Renaming to `Disallowed`.
Configuration menu - View commit details
-
Copy full SHA for 18b58d4 - Browse repository at this point
Copy the full SHA 18b58d4View commit details -
[clang] fix classification of a string literal expression used as ini…
…tializer (#101447)
Configuration menu - View commit details
-
Copy full SHA for ea46e20 - Browse repository at this point
Copy the full SHA ea46e20View commit details -
[Clang][NFC] Improve generation of GEP and RecordDecl loop (#101434)
As with other loops, we need only look at a RecordDecl's FieldDecls. Convert to using them. In the meantime, we can improve the generation of the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series of GEPs.
Configuration menu - View commit details
-
Copy full SHA for 160fb11 - Browse repository at this point
Copy the full SHA 160fb11View commit details -
[flang] Add allocator_idx attribute on fir.embox and fircg.ext_embox …
…(#101212) #100690 introduces allocator registry with the ability to store allocator index in the descriptor. This patch adds an attribute to fir.embox and fircg.ext_embox to be able to set the allocator index while populating the descriptor fields.
Configuration menu - View commit details
-
Copy full SHA for 0def9a9 - Browse repository at this point
Copy the full SHA 0def9a9View commit details -
[libc++] Revert "Check correctly ref-qualified __is_callable in algor…
…ithms (#73451)" This reverts commit 8d151f8, which broke some build bots. I think that is caused by an invalid argument order when checking __is_comparable in upper_bound.
Configuration menu - View commit details
-
Copy full SHA for 451bba6 - Browse repository at this point
Copy the full SHA 451bba6View commit details -
[Clang] Fix nomerge attribute not working with __builtin_trap(), __de…
…bugbreak(), __builtin_verbose_trap() (#101549) 1. It fixes the problem that llvm.trap() not getting the nomerge attribute. 2. It sets nomerge flag for the node if the instruction has nomerge arrtibute. This is a copy of https://reviews.llvm.org/D146164. This only attempts to fix `nomerge` for `__builtin_trap()`, `__debugbreak()`, `__builtin_verbose_trap()`, not working for non-trap builtins. Fixes #53011
Configuration menu - View commit details
-
Copy full SHA for 5e84646 - Browse repository at this point
Copy the full SHA 5e84646View commit details -
Configuration menu - View commit details
-
Copy full SHA for e89129e - Browse repository at this point
Copy the full SHA e89129eView commit details -
[SCEV] Prove no-self-wrap from negative power of two step (#101416)
We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. --------- Co-authored-by: Nikita Popov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f0944f4 - Browse repository at this point
Copy the full SHA f0944f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 97f723b - Browse repository at this point
Copy the full SHA 97f723bView commit details -
[libc][math][c23] Add dadd{l,f128} and ddiv{l,f128} C23 math function…
…s (#100456) - fadd removed because I need to add for different input types - finishing rest of basic operations - noticed duplicates will remove --------- Co-authored-by: OverMighty <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8f33f1d - Browse repository at this point
Copy the full SHA 8f33f1dView commit details -
[asan] Speed up ASan ODR indicator-based checking (#100923)
**Summary**: When ASan checks for a potential ODR violation on a global it loops over a linked list of all globals to find those with the matching value of an indicator. With the default setting 'detect_odr_violation=1', ASan doesn't report violations on same-size globals but it still has to traverse the list. For larger binaries with a ton of shared libs and globals (and a non-trivial volume of same-sized duplicates) this gets extremely expensive. This patch adds an indicator indexed (multi-)map of globals to speed up the search. > Note: asan used to use a map to store globals a while ago which was replaced with a list when the codebase [moved off of STL](llvm/llvm-project@e4bada2). Internally we see many examples where ODR checking takes *seconds* (even double digits). With this patch it's practically free and `__asan_register_globals` doesn't show up prominently in the perf profile anymore. There are several high-level questions: 1. I understand that the intent is that we hit the slow path rarely, ideally once before the process dies with an error. But in practice we hit the slow path a lot. It feels reasonable to keep the amount of work bounded even in the worst case, even if it requires a bit of extra memory. But if not, it'd be great to learn about the tradeoffs. 2. Poisoning based ODR checking remains on the slow path. Internally we build everything with `-fsanitize-address-use-odr-indicator` so I'm not sure if poisoning-based check would exhibit the same behavior (looking at the code, the shape looks very similar, so it might?). 3. Globals with an ODR indicator of `-1` need to be skipped for the purposes of ODR checking (cf. llvm/llvm-project@a257639). But they are still getting added to the list of globals and hence take up space and slow down the iteration over the list of globals. It would be a good saving if we could avoid adding them to the globals list. 4. Any reason to use a linked list instead of e.g. a vector to store globals? **Test Plan**: * `cmake --build build --target check-asan` looks good * Perf-wise things look good when linking against this version of compiler-rt. --------- Co-authored-by: Vitaly Buka <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c584c42 - Browse repository at this point
Copy the full SHA c584c42View commit details -
Simplify hot-path size computations in BumpPtrAllocator. (#101467)
Configuration menu - View commit details
-
Copy full SHA for 289c049 - Browse repository at this point
Copy the full SHA 289c049View commit details -
[libc++] Improve code gen for string's operator== (#100926)
If the string is too long for a short string, we can simply check for the long bit. If that's false we can do an early return. This improves the code gen slightly.
Configuration menu - View commit details
-
Copy full SHA for 3af26be - Browse repository at this point
Copy the full SHA 3af26beView commit details -
[libc++][NFC] Fix inconsistent quoting and spacing in our CSV files
There were a few places where we didn't properly quote entries in the CSV status pages, or where we followed inconsistent spacing. This causes issue when trying to synchronize status pages with Github issues.
Configuration menu - View commit details
-
Copy full SHA for 64946fd - Browse repository at this point
Copy the full SHA 64946fdView commit details -
[libc++] Add status page consistency change to git-blame-ignore-revs
To avoid breaking searchability of when a paper was implemented.
Configuration menu - View commit details
-
Copy full SHA for 8d83fae - Browse repository at this point
Copy the full SHA 8d83faeView commit details -
Revert "[Clang] Fix nomerge attribute not working with __builtin_trap…
…(), __debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 5e84646, which broke 'nomerge.ll' test on llvm bots.
Configuration menu - View commit details
-
Copy full SHA for 667598d - Browse repository at this point
Copy the full SHA 667598dView commit details -
Configuration menu - View commit details
-
Copy full SHA for b45d362 - Browse repository at this point
Copy the full SHA b45d362View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7471387 - Browse repository at this point
Copy the full SHA 7471387View commit details -
[Offload][OpenMP] Prettify error messages by "demangling" the kernel …
…name (#101400) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.
Configuration menu - View commit details
-
Copy full SHA for f3bfc56 - Browse repository at this point
Copy the full SHA f3bfc56View commit details -
Reapply "[Clang] Fix nomerge attribute not working with __builtin_tra…
…p(), __debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 667598d and fixes failed tests: llvm/test/CodeGen/X86/nomerge.ll and llvm/test/MC/AArch64/local-bounds-single-trap.ll.
Configuration menu - View commit details
-
Copy full SHA for ae6dc64 - Browse repository at this point
Copy the full SHA ae6dc64View commit details -
Fix codegen of consteval functions returning an empty class, and rela…
…ted issues (#93115) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.
Configuration menu - View commit details
-
Copy full SHA for 1762e01 - Browse repository at this point
Copy the full SHA 1762e01View commit details -
Add support for verifying local type units in .debug_names. (#101133)
This patch adds support for verifying local type units in .debug_names section. It adds a test to test if the TU index is valid, and a test that tests that an error is found inside the name entry for a type unit. We don't need to test all other errors in the name entry because these are essentially identical to compile unit entries, they just use a different DWARF unit offset index.
Configuration menu - View commit details
-
Copy full SHA for b6a2eb0 - Browse repository at this point
Copy the full SHA b6a2eb0View commit details -
[libc] created tan function fuzzer (#101570)
Also edited file header formatting on sin_fuz and cos_fuzz
Configuration menu - View commit details
-
Copy full SHA for 0142bd6 - Browse repository at this point
Copy the full SHA 0142bd6View commit details -
[mlir][emitc] Fix EmitC dialect's operations' descriptions (#101523)
- Added the dialect's prefix to operations' descriptions to follow the same style inside the TableGen file. - Minor changes in the 'emitc.yield' operation's description.
Configuration menu - View commit details
-
Copy full SHA for c89e9e7 - Browse repository at this point
Copy the full SHA c89e9e7View commit details -
Add a tutorial on mlir-opt (#96105)
This tutorial gives an introduction to the `mlir-opt` tool, focusing on how to run basic passes with and without options, run pass pipelines from the CLI, and point out particularly useful flags. --------- Co-authored-by: Jeremy Kun <[email protected]> Co-authored-by: Mehdi Amini <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7f19686 - Browse repository at this point
Copy the full SHA 7f19686View commit details
Commits on Aug 2, 2024
-
[SandboxIR] Implement UnaryInstruction class (#101541)
This patch implements sandboxir::UnaryInstruction class and updates sandboxir::LoadInst and sandboxir::CastInst to inherit from it instead of sandboxir::Instruction.
Configuration menu - View commit details
-
Copy full SHA for f9392fc - Browse repository at this point
Copy the full SHA f9392fcView commit details -
[M68k] Fix compilation pipeline check
- After 'lowerConstantIntrinsics' is merged into pre-isel lowering
Configuration menu - View commit details
-
Copy full SHA for 7b0f143 - Browse repository at this point
Copy the full SHA 7b0f143View commit details -
Configuration menu - View commit details
-
Copy full SHA for 54c9404 - Browse repository at this point
Copy the full SHA 54c9404View commit details -
[lldb] Change Module to have a concrete UnwindTable, update (#101130)
Currently a Module has a std::optional<UnwindTable> which is created when the UnwindTable is requested from outside the Module. The idea is to delay its creation until the Module has an ObjectFile initialized, which will have been done by the time we're doing an unwind. However, Module::GetUnwindTable wasn't doing any locking, so it was possible for two threads to ask for the UnwindTable for the first time, one would be created and returned while another thread would create one, destroy the first in the process of emplacing it. It was an uncommon crash, but it was possible. Grabbing the Module's mutex would be one way to address it, but when loading ELF binaries, we start creating the SymbolTable on one thread (ObjectFileELF) grabbing the Module's mutex, and then spin up worker threads to parse the individual DWARF compilation units, which then try to also get the UnwindTable and deadlock if they try to get the Module's mutex. This changes Module to have a concrete UnwindTable as an ivar, and when it adds an ObjectFile or SymbolFileVendor, it will call the Update method on it, which will re-evaluate which sections exist in the ObjectFile/SymbolFile. UnwindTable used to have an Initialize method which set all the sections, and an Update method which would set some of them if they weren't set. I unified these with the Initialize method taking a `force` option to re-initialize the section pointers even if they had been done already before. This is addressing a rare crash report we've received, and also a failure Adrian spotted on the -fsanitize=address CI bot last week, it's still uncommon with ASAN but it can happen with the standard testsuite. rdar://128876433
Configuration menu - View commit details
-
Copy full SHA for 7ad073a - Browse repository at this point
Copy the full SHA 7ad073aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c4fac0e - Browse repository at this point
Copy the full SHA c4fac0eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c5f1395 - Browse repository at this point
Copy the full SHA c5f1395View commit details -
[X86_32][C++] fix 0 sized struct case in vaarg. (#86388)
struct SuperEmpty { struct{ int a[0];} b;}; Such 0 sized structs in c++ mode can not be ignored in i386 for that c++ fields are never empty.But when EmitVAArg, its size is 0, so that va_list not increase.Maybe we can just Ignore this kind of arguments, like X86_64 did. Fix #86385.
Configuration menu - View commit details
-
Copy full SHA for 4461b69 - Browse repository at this point
Copy the full SHA 4461b69View commit details -
[mlir][bufferization] Improve performance of DropEquivalentBufferResu…
…ltsPass (#101281) By using DenseMap to minimize the traveral time of callOps, and the efficiency of running this pass has been greatly improved.
Configuration menu - View commit details
-
Copy full SHA for 6867324 - Browse repository at this point
Copy the full SHA 6867324View commit details -
Configuration menu - View commit details
-
Copy full SHA for e3d9b01 - Browse repository at this point
Copy the full SHA e3d9b01View commit details -
Configuration menu - View commit details
-
Copy full SHA for ca26ea2 - Browse repository at this point
Copy the full SHA ca26ea2View commit details -
[Attributor] Indicate optimistic fixed point if an instruction alread…
…y has non-zero address space (#101589)
Configuration menu - View commit details
-
Copy full SHA for 9373a43 - Browse repository at this point
Copy the full SHA 9373a43View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c375ae - Browse repository at this point
Copy the full SHA 6c375aeView commit details -
[Attributor] Use
getPointerAddressSpace
to replace a cast followed ……by a `getAddressSpace`
Configuration menu - View commit details
-
Copy full SHA for e7f73c0 - Browse repository at this point
Copy the full SHA e7f73c0View commit details -
[RISCV] Use Zvhmin instead of Zvfh on RUN lines for some intrinsic te…
…sts. NFC (#101540) Loads/stores/reinterpret/vfncvt.f.f.w/vfwcvt.f.f.v/vmerge/vmv.v.v are all expected to work for f16 vectors with Zvfhmin. Remove the handcrafted Zvfhmin test that partially tested this. Splits the vfwcvt.f.f.v and vfncvt.f.f.w tests into their own file so we can have a separate RUN line from the float<->int conversions.
Configuration menu - View commit details
-
Copy full SHA for 7a134f5 - Browse repository at this point
Copy the full SHA 7a134f5View commit details -
[LoongArch] Align stack objects passed to memory intrinsics (#101309)
Memcpy, and other memory intrinsics, typically try to use wider load/store if the source and destination addresses are aligned. In CodeGenPrepare, look for calls to memory intrinsics and, if the object is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit) boundaries if it is large enough that we expect memcpy to use wider load/store instructions to copy it. Fixes #101295
Configuration menu - View commit details
-
Copy full SHA for 8b26c02 - Browse repository at this point
Copy the full SHA 8b26c02View commit details -
[SPARC][IAS] Add v8plus feature bit (#101367)
Implement handling for `v8plus` feature bit to allow the user to switch between V8 and V8+ mode with 32-bit code. Currently this only sets the appropriate ELF machine type and flags; codegen changes will be done in future patches. This is done as a prerequisite for `-mv8plus` flag on clang (#98713).
Configuration menu - View commit details
-
Copy full SHA for aca971d - Browse repository at this point
Copy the full SHA aca971dView commit details -
Merge from 'sycl' to 'sycl-web'
iclsrc committedAug 2, 2024 Configuration menu - View commit details
-
Copy full SHA for 668af1c - Browse repository at this point
Copy the full SHA 668af1cView commit details -
[HLSL] cleanup builtin names elementwise usage (#101543)
Remove elementwise description for builtins that don't perform elementwise operations.
Configuration menu - View commit details
-
Copy full SHA for 96e6255 - Browse repository at this point
Copy the full SHA 96e6255View commit details
Commits on Aug 5, 2024
-
Merge from 'main' to 'sycl-web' (207 commits)
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGExprAgg.cpp
Configuration menu - View commit details
-
Copy full SHA for 9fe070a - Browse repository at this point
Copy the full SHA 9fe070aView commit details -
Merge from 'sycl' to 'sycl-web' (12 commits)
CONFLICT (content): Merge conflict in sycl/CMakeLists.txt
Configuration menu - View commit details
-
Copy full SHA for 5685396 - Browse repository at this point
Copy the full SHA 5685396View commit details -
Merge from 'sycl' to 'sycl-web' (5 commits)
CONFLICT (content): Merge conflict in llvm/lib/SYCLLowerIR/SYCLVirtualFunctionsAnalysis.cpp
Configuration menu - View commit details
-
Copy full SHA for c281123 - Browse repository at this point
Copy the full SHA c281123View commit details -
Configuration menu - View commit details
-
Copy full SHA for ab66ccc - Browse repository at this point
Copy the full SHA ab66cccView commit details -
Merge from 'sycl' to 'sycl-web' (3 commits)
iclsrc committedAug 5, 2024 Configuration menu - View commit details
-
Copy full SHA for 9c4aab8 - Browse repository at this point
Copy the full SHA 9c4aab8View commit details
Commits on Aug 15, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5e66b4f - Browse repository at this point
Copy the full SHA 5e66b4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7e1776d - Browse repository at this point
Copy the full SHA 7e1776dView commit details -
Update LLVM version from 19 to 20 (#2652)
Original commit: KhronosGroup/SPIRV-LLVM-Translator@097435f74df64bd
Configuration menu - View commit details
-
Copy full SHA for 8a14027 - Browse repository at this point
Copy the full SHA 8a14027View commit details -
Update llvm.memmove test after LLVM change (#2655)
Update a test after llvm-project commit 92a0654 ("[LowerMemIntrinsics] Lower llvm.memmove to wide memory accesses (#100122)", 2024-07-26). Original commit: KhronosGroup/SPIRV-LLVM-Translator@84f525abd741c30
Configuration menu - View commit details
-
Copy full SHA for 30b827e - Browse repository at this point
Copy the full SHA 30b827eView commit details -
Upgrade in-tree job to Ubuntu 22.04 (#2658)
The spirv-tools package used by the job seems no longer available for Ubuntu 20.04. Original commit: KhronosGroup/SPIRV-LLVM-Translator@88e546a689b2679
Configuration menu - View commit details
-
Copy full SHA for d0c3bea - Browse repository at this point
Copy the full SHA d0c3beaView commit details -
Fix addrspace generation in reverse translation for global annotations (
#2656) This change fixes the assertion: Assertion `C->getType() == Ty->getElementType() && "Wrong type in array element initializer"' failed Original commit: KhronosGroup/SPIRV-LLVM-Translator@e099f77cc6d02b9
Configuration menu - View commit details
-
Copy full SHA for e7bfe86 - Browse repository at this point
Copy the full SHA e7bfe86View commit details -
Add translation for Intrinsic::{atan,acos,asin,cosh,sinh,tanh} (#2657)
Add translation for atan, acos, asin, cosh, sinh and tanh LLVM intrinsics which are mapped to corresponding OpenCL extended instructions. Original commit: KhronosGroup/SPIRV-LLVM-Translator@95605477e7fe635
Configuration menu - View commit details
-
Copy full SHA for 9479076 - Browse repository at this point
Copy the full SHA 9479076View commit details -
Removed OpAtomicCompareExchangeWeak (#2665)
Verified locally by changing the version from `65536` to `66560` in `test/transcoding/atomics.spt`. Original commit: KhronosGroup/SPIRV-LLVM-Translator@62ea823e64307e8
Configuration menu - View commit details
-
Copy full SHA for 3fa5900 - Browse repository at this point
Copy the full SHA 3fa5900View commit details -
Translate floating-point atomic_compare_exchange as integer (#2668)
OpenCL spec supports atomic_float/atomic_double type for atomic_compare_exchange* functions. However, value and return type in OpAtomicCompareExchange in SPIR-V spec must be integer type. Therefore, in OCLToSPIRV translation we need to translate floating-point type to corresponding integer variant that has the same type size. Floating-point value is bitcasted so that bits remain the same. Original commit: KhronosGroup/SPIRV-LLVM-Translator@e5544014fba77d3
Configuration menu - View commit details
-
Copy full SHA for c6519ef - Browse repository at this point
Copy the full SHA c6519efView commit details
Commits on Aug 16, 2024
-
Add missing fpbuiltin math functions. (#15039)
This change due to llvm/llvm-project#98949.
Configuration menu - View commit details
-
Copy full SHA for 22f9f89 - Browse repository at this point
Copy the full SHA 22f9f89View commit details -
[CodeGenCUDA] Update module flag value in test
We overwrite the value in 8096a6f from llvm::Module::Override (4) to llvm::Module::Max (7).
Configuration menu - View commit details
-
Copy full SHA for 8dd5df1 - Browse repository at this point
Copy the full SHA 8dd5df1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2b80605 - Browse repository at this point
Copy the full SHA 2b80605View commit details -
[sycl-web] Undo bad conflict resolution and adjust tests to new upstr…
…eam behavior. (#15051) Signed-off-by: Marcos Maronas <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 616728e - Browse repository at this point
Copy the full SHA 616728eView commit details
Commits on Aug 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 43f6fd3 - Browse repository at this point
Copy the full SHA 43f6fd3View commit details
Commits on Aug 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 86385ed - Browse repository at this point
Copy the full SHA 86385edView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9fd763d - Browse repository at this point
Copy the full SHA 9fd763dView commit details
Commits on Aug 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0cd0a6a - Browse repository at this point
Copy the full SHA 0cd0a6aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 200d768 - Browse repository at this point
Copy the full SHA 200d768View commit details
Commits on Aug 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 78703d9 - Browse repository at this point
Copy the full SHA 78703d9View commit details
Commits on Aug 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a142ad3 - Browse repository at this point
Copy the full SHA a142ad3View commit details