Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow link to llvm shared library for current distros #68

Open
wants to merge 10,000 commits into
base: amd-stg-open
Choose a base branch
from

Conversation

littlewu2508
Copy link

Current distros build llvm shared libs. Same as comgr: 1322aa0

andjo403 and others added 30 commits April 30, 2024 09:42
…lvm#90486)

Noticed that there already was a function in APInt that updated a
FoldingSet so there was no need for me to add it in
llvm#84617.
This ensures the explicit value is generated (and not a load into the
values array). Note that actually not storing values array at all is
still TBD, this is just the very first step.
…imental late parsing mode "extension" (llvm#88596)

This patch changes the `LateParsed` field of `Attr` in `Attr.td` to be
an instantiation of the new `LateAttrParseKind` class. The instation can be one of the following:

* `LateAttrParsingNever` - Corresponds with the false value of `LateParsed` prior to this patch (the default for an attribute).
* `LateAttrParseStandard` - Corresponds with the true value of `LateParsed` prior to this patch.
* `LateAttrParseExperimentalExt` - A new mode described below.

`LateAttrParseExperimentalExt` is an experimental extension to
`LateAttrParseStandard`. Essentially this allows
`Parser::ParseGNUAttributes(...)` to distinguish between these cases:

1. Only `LateAttrParseExperimentalExt` attributes should be late parsed.
2. Both `LateAttrParseExperimentalExt` and `LateAttrParseStandard`
  attributes should be late parsed.

Callers (and indirect callers) of `Parser::ParseGNUAttributes(...)`
indicate the desired behavior by setting a flag in the
`LateParsedAttrList` object that is passed to the function.

In addition to the above, a new driver and frontend flag
(`-fexperimental-late-parse-attributes`) with a corresponding LangOpt
(`ExperimentalLateParseAttributes`) is added that changes how
`LateAttrParseExperimentalExt` attributes are parsed.

* When the flag is disabled (default), in cases where only
  `LateAttrParsingExperimentalOnly` late parsing is requested, the
  attribute will be parsed immediately (i.e. **NOT** late parsed). This
  allows the attribute to act just like a `LateAttrParseStandard`
  attribute when the flag is disabled.

* When the flag is enabled, in cases where only
  `LateAttrParsingExperimentalOnly` late parsing is requested, the
  attribute will be late parsed.

The motivation behind this change is to allow the new `counted_by`
attribute (part of `-fbounds-safety`) to support late parsing but
**only** when `-fexperimental-late-parse-attributes` is enabled. This
attribute needs to support late parsing to allow it to refer to fields
later in a struct definition (or function parameters declared later).
However, there isn't a precedent for supporting late attribute parsing
in C so this flag allows the new behavior to exist in Clang but not be
on by default. This behavior was requested as part of the
`-fbounds-safety` RFC process
(https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854/68).

This patch doesn't introduce any uses of `LateAttrParseExperimentalExt`.
This will be added for the `counted_by` attribute in a future patch
(llvm#87596). A consequence is the
new behavior added in this patch is not yet testable. Hence, the lack of
tests covering the new behavior.

rdar://125400257
In new pass system, `MachineFunction` could be an analysis result again,
machine module pass can now fetch them from analysis manager.
`MachineModuleInfo` no longer owns them.
Remove `FreeMachineFunctionPass`, replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`.

Now `FreeMachineFunction` is replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`, the workaround in
`MachineFunctionPassManager` is no longer needed, there is no difference
between `unittests/MIR/PassBuilderCallbacksTest.cpp` and
`unittests/IR/PassBuilderCallbacksTest.cpp`.
This is used when -march=native run on an unknown CPU to old version of
LLVM.
Skip updating references for operands that do not directly
refer to jump table symbols but fall within a jump table's
address range to prevent unintended modifications.
within module purview

Close llvm#90259

Technically, the static declarations shouldn't be leaked from the module
interface, otherwise it is an illegal program according to the spec. So
we can get rid of the static declarations from the reduced BMI
technically. Then we can close the above issue.

However, there are too many `static inline` codes in existing headers.
So it will be a pretty big breaking change if we do this globally.
Change-Id: Icf8748fff11482f16cbeb1f19baf5a3404b57c6e
Disable this test on x86_64h for LSan.

This test is failing with malformed object only on x86_64h.
Disabling for now. 

rdar://125052424
…elism with multi-frame parallelism

https://reviews.llvm.org/D133679 utilizes zstd's multithread API to
create one single frame. This provides a higher compression ratio but is
significantly slower than concatenating multiple frames.

With manual parallelism, it is easier to parallelize memcpy in
OutputSection::writeTo for parallel memcpy.

In addition, as the individual allocated decompression buffers are much
smaller, we can make a wild guess (compressed_size/4) without worrying
about a resize (due to wrong guess) would waste memory.
…ng-parentheses` (llvm#90279)

When a binary operator is the last operand of a macro, the end location
that is past the `BinaryOperator` will be inside the macro and therefore
an
invalid location to insert a `FixIt` into, which is why the check bails
when encountering such a pattern.
However, the end location is only required for the `FixIt` and the
diagnostic can still be emitted, just without an attached fix.
…te module file for C++20 modules instead of PCHGenerator

Previously we're re-using PCHGenerator to generate the module file for
C++20 modules. But this is slighty more or less odd. This patch tries
to use a new class 'CXX20ModulesGenerator' to generate the module file
for C++20 modules.
llvm#90522)

LegalizeVectorType is responsible for legalizing nodes that perform an
operation on each element may need to scalarize.

This is not true for nodes like VP_REDUCE.*, BUILD_VECTOR,
SHUFFLE_VECTOR, EXTRACT_SUBVECTOR, etc.

This patch drops any nodes with a scalar result from LegalizeVectorOps
and handles them in LegalizeDAG instead.

This required moving the reduction promotion to LegalizeDAG. I have
removed the support integer promotion as it was incorrect for integer
min/max reductions. Since it was untested, it was best to assert on it
until it was really needed.

There are a couple regressions that can be fixed with a small DAG
combine which I will do as a follow up.
Close llvm#75057

Previously, I thought the diagnostic mappings is not meaningful with
modules incorrectly. And this problem get revealed by another change
recently. So this patch tried to rever the previous "optimization"
partially.
…uctions

Marking them as `hasSideEffects=1` stops some optimizations.

According to `Target.td`:

> // Does the instruction have side effects that are not captured by any
> // operands of the instruction or other flags?
> bit hasSideEffects = ?;

It seems we don't need to set `hasSideEffects` for vleNff since we have
modelled `vl` as an output operand.

As for saturating instructions, I think that explicit Def/Use list
is kind of side effects captured by any operands of the instruction,
so we don't need to set `hasSideEffects` either. And I have just
investigated AArch64's implementation, they don't set this flag and
don't add `Def` list.

These changes make optimizations like `performCombineVMergeAndVOps`
and MachineCSE possible for these instructions.

As a consequence, `copyprop.mir` can't test what we want to test in
https://reviews.llvm.org/D155140, so we replace `vssra.vi` with a
VCIX instruction (it has side effects).

Reviewers: jacquesguan, topperc, preames, asb, lukel97

Reviewed By: topperc, lukel97

Pull Request: llvm#90049
and "[NFC] [C++20] [Modules] Use new class CXX20ModulesGenerator to
generate module file for C++20 modules instead of PCHGenerator"

This reverts commit fb21343.
and commit 18268ac.

It looks like there are some problems about linking the compiler
)

We can use the original vector as long as the type of X matches the
result type of the vmv_s_x_vl.
Simultaneously implemented parsing support for the `%desc_*` modifiers.

Reviewers: SixWeining, heiher, xen0n

Reviewed By: xen0n, SixWeining

Pull Request: llvm#90158
Close llvm#75057

Previously, I thought the diagnostic mappings is not meaningful with
modules incorrectly. And this problem get revealed by another change
recently. So this patch tried to rever the previous "optimization"
partially.
Extends `omp.private` with a new region: `dealloc` where deallocation
logic for Fortran deallocatables will be outlined (this will happen in
later PRs).
This removes various subtitles or converts them to bold text so that the
table of contents is less cluttered.

This includes "Example", "Notes", "Priority To Implement" and
"Response".
The implementation only enables when the `-enable-tlsdesc` option is
passed and the TLS model is `dynamic`.

LoongArch's GCC has the same option(-mtls-dialet=) as RISC-V.

Reviewers: heiher, MaskRay, SixWeining

Reviewed By: SixWeining, MaskRay

Pull Request: llvm#90159
…ixed. (llvm#90484)

The original PR llvm#90083 had to be reverted in PR llvm#90444 as it caused one
of the gfortran tests to fail. The issue was using `isIntOrIndex` for
checking for integer type. It allowed index type which later caused
assertion when calling `getIntOrFloatBitWidth`. I have now replaced it
with `isInteger` which should fix this regression.
…90471)

In the debug intrinsic class heirachy, a dbg.assign is a (inherits from)
dbg.value, so `findDbgValues` returns dbg.values and dbg.assigns (by
design). That hierarchy doesn't exist for DbgRecords - fix findDbgValues
to return dbg_assign records as well as dbg_values and add unittest.
keith and others added 26 commits May 1, 2024 09:01
Previously if you passed an ELF binary it would be silently copied with no changes.
Section unification cannot just use names, because it's valid for ELF
binaries to have multiple sections with the same name. We should check
other section properties too.

Fixes llvm#88001.

rdar://124467787
I strongly suspect nobody ever used that macro since it wasn't very well
known. Furthermore, it only affects a handful of diagnostics and I think
it makes sense to either provide them unconditionally, or to not
provided them at all.
…n emulation (llvm#89131)

This PR builds on llvm#79494 with an additional path for efficient unsigned `i4 ->i8` type extension for 1D/2D operations. This will impact any i4 -> i8/i16/i32/i64 unsigned extensions as well as sitofp i4 -> f8/f16/f32/f64.
…0508)

Empty ISG1 and OSG1 parts are generated for compute shader since there's
no signature for compute shader.

Fixes llvm#88778
The output on eel.is has similar oddities, so I expect this was copy
pasted.
…m#89992)

The dependency scanner only puts top-level affecting module map files on
the command line for explicitly building a module. This is done because
any affecting child module map files should be referenced by the
top-level one, meaning listing them explicitly does not have any meaning
and only makes the command lines longer.

However, a problem arises whenever the definition of an affecting module
lives in a module map that is not top-level. Considering the rules
explained above, such module map file would not make it to the command
line. That's why 83973cf started
marking the parents of an affecting module map file as affecting too.
This way, the top-level file does make it into the command line.

This can be problematic, though. On macOS, for example, the Darwin
module lives in "/usr/include/Darwin.modulemap" one of many module map
files included by "/usr/include/module.modulemap". Reporting the parent
on the command line forces explicit builds to parse all the other module
map files included by it, which is not necessary and can get expensive
in terms of file system traffic.

This patch solves that performance issue by stopping marking parent
module map files as affecting, and marking module map files as top-level
whenever they are top-level among the set of affecting files, not among
the set of all known files. This means that the top-level
"/usr/include/module.modulemap" is now not marked as affecting and
"/usr/include/Darwin.modulemap" is.
In case of functions without a stack frame no "stack" field is
serialized into MIR which leads to isCalleeSavedInfoValid being false
when reading a MIR file back in. To fix this we should serialize
MachineFrameInfo::isCalleeSavedInfoValid() into MIR.
ROTL and ROTR can take a shift amount larger than the element size, in
which case the effective shift amount should be the shift amount modulo
the element size.

This patch adds the modulo step when the shift amount isn't known at
compile time. Without it the existing implementation would end up
shifting beyond the type size and give incorrect results.
…lvm#90700)

Instead of hardcoding the 4 current profile prefixes, treat profile
selection as a fallback if we don't find "rv32" or "rv64".

Update the error message accordingly.
…O summaries" (llvm#90610) (llvm#90692)

This reverts commit 2aabfc8.

Add fixes to LLD and Gold tests missed in original change.

Co-authored-by: Jan Voung <[email protected]>
It was pointed out in post commit review of
llvm#90597 that the pass should
never have been run in parallel over all functions (and now other top
level operations) in the first place. The mutex used in the pass was
ineffective at preventing races since each instance of the pass would
have a different mutex.
Fixes llvm#84968. 

Implements the `fcntl()` function defined in the `fcntl.h` header.
…ons. (llvm#90414)

Treat a compound operator such as |=, array subscription, sizeof, and
non-type template parameter as trivial so long as subexpressions are
also trivial.

Also treat true/false boolean literal as trivial.
The function may return Z_MEM_ERROR or Z_STREAM_ERR. The former does not
have a good way of testing. The latter will be possible with a pending
change that allows setting the compression level, which will come with a
test.
zstd excels at scaling from low-ratio-very-fast to
high-ratio-pretty-slow. Some users prioritize speed and prefer disk read
speed, while others focus on achieving the highest compression ratio
possible, similar to traditional high-ratio codecs like LZMA.

Add an optional `level` to `--compress-sections` (llvm#84855) to cater to
these diverse needs. While we initially aimed for a one-size-fits-all
approach, this no longer seems to work.
(https://richg42.blogspot.com/2015/11/the-lossless-decompression-pareto.html)

When --compress-debug-sections is used together, make
--compress-sections take precedence since --compress-sections is usually
more specific.

Remove the level distinction between -O/-O1 and -O2 for
--compress-debug-sections=zlib for a more consistent user experience.

Pull Request: llvm#90567
Change-Id: I4968e32ce2fcf8592f4ab65f9b2eb89b5fbb67dc
Change-Id: I8d57fc9053f1ee71230ac48337f73b474581188f
@littlewu2508 littlewu2508 force-pushed the device-libs-link-llvm-dylib branch from 1c7e7f8 to 509146d Compare May 2, 2024 09:58
@littlewu2508 littlewu2508 force-pushed the device-libs-link-llvm-dylib branch from 509146d to 7311c1b Compare May 2, 2024 10:27
@lamb-j
Copy link
Collaborator

lamb-j commented May 3, 2024

Did you mean to make a PR against amd-staging (amd-stg-open is deprecated now)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment