Skip to content

Develop upstream sync 250427 #2939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1,344 commits into
base: develop-upstream
Choose a base branch
from

Conversation

linchen1-robot
Copy link

No description provided.

tensorflower-gardener and others added 30 commits April 22, 2025 02:14
PiperOrigin-RevId: 750102543
PiperOrigin-RevId: 750108264
PiperOrigin-RevId: 750118925
…K rewrite pass

In the splitK rewrite, the type of the intermediate tensor should be the accumulator type rather than dot output type (unless disabled by the flag).

PiperOrigin-RevId: 750121205
…tion in the HLO runner.

Imported from GitHub PR openxla/xla#25166

Copybara import of the project:

--
faa4fd13636facefe1625d168d036c1abc8dbb81 by Ilia Sergachev <[email protected]>:

Add slow operation alarm for argument initialization in the HLO runner.

Merging this change closes tensorflow#25166

PiperOrigin-RevId: 750121264
PiperOrigin-RevId: 750121823
PiperOrigin-RevId: 750122018
…rolling.

Imported from GitHub PR openxla/xla#25275

Currently, this pass drops the `known_induction_variable` and `known_init_step` fields in `WhileLoopBackendConfig`. This change keeps them if they were present before and updates them with the updated value, just like `known_trip_count`. This is only done for double buffered loops, not for fully unrolled ones.
Copybara import of the project:

--
6bd44597c2e1dba443a367edf4a21024f6248cb6 by Johannes Reifferscheid <[email protected]>:

Keep loop metadata after loop unrolling.

--
369ad9bedd031d1e1fd6abacb7a5002f290c72f2 by Johannes Reifferscheid <[email protected]>:

Fix init update and add tests.

Merging this change closes tensorflow#25275

PiperOrigin-RevId: 750122105
Imported from GitHub PR openxla/xla#25388

Fix failed test in ROCM CI build.
Copybara import of the project:

--
37540db7cd36d668996ce6b01ac16cad122c463f by alekstheod <[email protected]>:

Fix gpu_kernel_test

Merging this change closes tensorflow#25388

PiperOrigin-RevId: 750128132
Reverts 1984135

PiperOrigin-RevId: 750135878
This means the cub_sort_thunk and the sort_rewriter
won't directly depend on the CUB implementation anymore.

Instead they go through the FFI registry.
This decouples CUDA and ROCm implementations of CUB from the
users of CUB and will also help with splitting the
compiler from the runtime.

This also removes the requirement for having a stub for SortRewriter.

PiperOrigin-RevId: 750167216
… on the function arguments.

The `frontend_attr` dictionary may be used by other systems in XLA, and there may be a chance that function argument may not be sharded/have a `xla.sdy.sharding` attr.

PiperOrigin-RevId: 750184012
PiperOrigin-RevId: 750188041
Imported from GitHub PR openxla/xla#25489

Copybara import of the project:

--
095bede8e28bf1d02620a129e3854d33c0d4593e by Dimitris Vardoulakis <[email protected]>:

Fix typos in gpu_dot_fusion_cost_model.h

Merging this change closes tensorflow#25489

PiperOrigin-RevId: 750195619
PiperOrigin-RevId: 750217362
Prepare for enabling XLA-pthreadpool adaptor by default in XLA and implement more pthreadpool APIs that might be used when XLA is linked together with XNNPACK (i.e. in tflite).

PiperOrigin-RevId: 750222358
Now the correctly processes profiles with multiple kernels as a list of rows (previously replaced kernels with the same name making analysis of multiple kernels impossible).
To enable processing of complex profiles we now have aggregation and filter. Aggregation looks at the prefix of the metric and might not be completely correct at the moment.

Filtering allows to pick a subset of kernels and only aggregate ones that we care about, e.g. --filter='after:name:setup' will only consider kernels that were run after the kernel with "setup" in the name. Again filter is somewhat simplistic but overall that provides an ability to e.g. sum durations or memory transfer across multiple kernels that nsight compute does not provide.

Misc:

- Updated argument names, profile name is passed as an argument, making the tool easier to use.

- To properly aggregate values it now always works with basic units, for example 'ns' instead of 's' or 'ms'.

PiperOrigin-RevId: 750232931
… versions from logs

PiperOrigin-RevId: 750240223
…_parametrized_test.cc`.

Delete them in the old parametrized test file (`fusion_emitter_parametrized_legacy_test.cc`).

PiperOrigin-RevId: 750240820
…ples a path at random and finds the best configurations of the nodes in the sampled path.

PiperOrigin-RevId: 750241531
PiperOrigin-RevId: 750254159
A loaded executable should have-an executable. The inheritance relationship as it exists today leads to unnecessary complexity in the API design, and plumbing through the same behavior on the LoadedExecutable as in the Executable. This separates out the functionality, delegating calls to the executable contained by LoadedExecutable via the GetExecutable() call.

PiperOrigin-RevId: 750254936
`SetProto()` is redundant with `ToProto()`. Prefer the latter as it's a better style to return the value directly than to return it as an output parameter.

PiperOrigin-RevId: 750258019
ezhulenev and others added 30 commits April 25, 2025 13:19
… compile time function pointers

```
----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_EnumAttrs                      31.1 ns         31.1 ns     22541907
BM_EnumAttrsFunction              37.4 ns         37.4 ns     1874469
BM_EnumAttrsFunctionWrapper       31.3 ns         31.3 ns     22312783
```

PiperOrigin-RevId: 751532370
XLA has migrated from `std::string_view` to `absl::string_view`, which supports construction from `nullptr`. The old comments are no longer relevant.

PiperOrigin-RevId: 751543245
…ython.

Stop exposing XlaBuilder, XlaOp and a number of related classes from JAX.

PiperOrigin-RevId: 751555286
…nting a `Layout`:

- If all dimensions of a layout are dense, there's no need to print their attributes as the default is "all dimensions are dense".

Also extend the unit tests for `Layout::ToString()` to be more comprehensive.

Also fix some mistakes in the `Layout::Print()` comment.

PiperOrigin-RevId: 751559635
PiperOrigin-RevId: 751560477
PiperOrigin-RevId: 751576285
PiperOrigin-RevId: 751578594
…prefer clearing the tree by calling the method.

PiperOrigin-RevId: 751580849
JAX does not use this class any more.

PiperOrigin-RevId: 751584043
…st memory space color" from HostOffloadLegalize.

The pass is currently identical before and after LayoutAssignment. Host memory space color is now available in xla::Layout.

PiperOrigin-RevId: 751603615
```
name                                       old cpu/op   new cpu/op   delta
BM_HloModule/jax.issue.26021/process_time  35.7µs ± 8%  31.6µs ± 4%  -11.48%  (p=0.000 n=80+76)

name                                       old time/op          new time/op          delta
BM_HloModule/jax.issue.26021/process_time  35.6µs ± 6%          31.6µs ± 3%  -11.31%  (p=0.000 n=80+78)
```

Improves benchmark from jax-ml/jax#26021

PiperOrigin-RevId: 751615510
In preparation for using ObjectPool in GPU runtime move it to top level xla/runtime folder

PiperOrigin-RevId: 751618221
…command

Port custom call thunk optimizations from XLA:CPU to GPU backend.

PiperOrigin-RevId: 751624394
PiperOrigin-RevId: 751648430
…ed on

simpler underlying raw-buffer primitives.

PiperOrigin-RevId: 751658445
Updates LLVM usage to match
[c60f24dca96d](llvm/llvm-project@c60f24dca96d)

PiperOrigin-RevId: 751669474
PiperOrigin-RevId: 751675251
PiperOrigin-RevId: 751712422
…t is redundant.

PiperOrigin-RevId: 751777386
This method is redundant with `ToProto()`.

PiperOrigin-RevId: 751804504
* Enable tosa support under conditional compilation

* fix genrule

* Move all conditional stuff under tosa

* rename; actually need an implementation for the -DTF_TOSA_ENABLED to work

* moving the conditional compilation back into mlir out of tosa

* moved everything under tosa

* internal -> :internal
…work`.

This change is needed to fix TF GPU wheel after tensorflow#91495 was submitted.

PiperOrigin-RevId: 751821415
…macro has been removed from Eigen.

PiperOrigin-RevId: 751910419
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.