forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 100
Develop upstream sync 250427 #2939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
linchen1-robot
wants to merge
1,344
commits into
develop-upstream
Choose a base branch
from
develop-upstream-sync-250427
base: develop-upstream
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+141,840
−164,389
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PiperOrigin-RevId: 750102543
PiperOrigin-RevId: 750102601
PiperOrigin-RevId: 750108264
PiperOrigin-RevId: 750110090
PiperOrigin-RevId: 750118925
…K rewrite pass In the splitK rewrite, the type of the intermediate tensor should be the accumulator type rather than dot output type (unless disabled by the flag). PiperOrigin-RevId: 750121205
…tion in the HLO runner. Imported from GitHub PR openxla/xla#25166 Copybara import of the project: -- faa4fd13636facefe1625d168d036c1abc8dbb81 by Ilia Sergachev <[email protected]>: Add slow operation alarm for argument initialization in the HLO runner. Merging this change closes tensorflow#25166 PiperOrigin-RevId: 750121264
PiperOrigin-RevId: 750121823
PiperOrigin-RevId: 750122018
…rolling. Imported from GitHub PR openxla/xla#25275 Currently, this pass drops the `known_induction_variable` and `known_init_step` fields in `WhileLoopBackendConfig`. This change keeps them if they were present before and updates them with the updated value, just like `known_trip_count`. This is only done for double buffered loops, not for fully unrolled ones. Copybara import of the project: -- 6bd44597c2e1dba443a367edf4a21024f6248cb6 by Johannes Reifferscheid <[email protected]>: Keep loop metadata after loop unrolling. -- 369ad9bedd031d1e1fd6abacb7a5002f290c72f2 by Johannes Reifferscheid <[email protected]>: Fix init update and add tests. Merging this change closes tensorflow#25275 PiperOrigin-RevId: 750122105
Imported from GitHub PR openxla/xla#25388 Fix failed test in ROCM CI build. Copybara import of the project: -- 37540db7cd36d668996ce6b01ac16cad122c463f by alekstheod <[email protected]>: Fix gpu_kernel_test Merging this change closes tensorflow#25388 PiperOrigin-RevId: 750128132
PiperOrigin-RevId: 750133157
PiperOrigin-RevId: 750135874
Reverts 1984135 PiperOrigin-RevId: 750135878
This means the cub_sort_thunk and the sort_rewriter won't directly depend on the CUB implementation anymore. Instead they go through the FFI registry. This decouples CUDA and ROCm implementations of CUB from the users of CUB and will also help with splitting the compiler from the runtime. This also removes the requirement for having a stub for SortRewriter. PiperOrigin-RevId: 750167216
… on the function arguments. The `frontend_attr` dictionary may be used by other systems in XLA, and there may be a chance that function argument may not be sharded/have a `xla.sdy.sharding` attr. PiperOrigin-RevId: 750184012
PiperOrigin-RevId: 750185511
Imported from GitHub PR openxla/xla#25489 Copybara import of the project: -- 095bede8e28bf1d02620a129e3854d33c0d4593e by Dimitris Vardoulakis <[email protected]>: Fix typos in gpu_dot_fusion_cost_model.h Merging this change closes tensorflow#25489 PiperOrigin-RevId: 750195619
PiperOrigin-RevId: 750217362
Prepare for enabling XLA-pthreadpool adaptor by default in XLA and implement more pthreadpool APIs that might be used when XLA is linked together with XNNPACK (i.e. in tflite). PiperOrigin-RevId: 750222358
Now the correctly processes profiles with multiple kernels as a list of rows (previously replaced kernels with the same name making analysis of multiple kernels impossible). To enable processing of complex profiles we now have aggregation and filter. Aggregation looks at the prefix of the metric and might not be completely correct at the moment. Filtering allows to pick a subset of kernels and only aggregate ones that we care about, e.g. --filter='after:name:setup' will only consider kernels that were run after the kernel with "setup" in the name. Again filter is somewhat simplistic but overall that provides an ability to e.g. sum durations or memory transfer across multiple kernels that nsight compute does not provide. Misc: - Updated argument names, profile name is passed as an argument, making the tool easier to use. - To properly aggregate values it now always works with basic units, for example 'ns' instead of 's' or 'ms'. PiperOrigin-RevId: 750232931
… versions from logs PiperOrigin-RevId: 750240223
…_parametrized_test.cc`. Delete them in the old parametrized test file (`fusion_emitter_parametrized_legacy_test.cc`). PiperOrigin-RevId: 750240820
…ng in OSS. PiperOrigin-RevId: 750240827
…ples a path at random and finds the best configurations of the nodes in the sampled path. PiperOrigin-RevId: 750241531
…llel loops PiperOrigin-RevId: 750251414
PiperOrigin-RevId: 750254159
A loaded executable should have-an executable. The inheritance relationship as it exists today leads to unnecessary complexity in the API design, and plumbing through the same behavior on the LoadedExecutable as in the Executable. This separates out the functionality, delegating calls to the executable contained by LoadedExecutable via the GetExecutable() call. PiperOrigin-RevId: 750254936
`SetProto()` is redundant with `ToProto()`. Prefer the latter as it's a better style to return the value directly than to return it as an output parameter. PiperOrigin-RevId: 750258019
… compile time function pointers ``` ---------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------- BM_EnumAttrs 31.1 ns 31.1 ns 22541907 BM_EnumAttrsFunction 37.4 ns 37.4 ns 1874469 BM_EnumAttrsFunctionWrapper 31.3 ns 31.3 ns 22312783 ``` PiperOrigin-RevId: 751532370
PiperOrigin-RevId: 751536187
PiperOrigin-RevId: 751542019
XLA has migrated from `std::string_view` to `absl::string_view`, which supports construction from `nullptr`. The old comments are no longer relevant. PiperOrigin-RevId: 751543245
…ython. Stop exposing XlaBuilder, XlaOp and a number of related classes from JAX. PiperOrigin-RevId: 751555286
…nting a `Layout`: - If all dimensions of a layout are dense, there's no need to print their attributes as the default is "all dimensions are dense". Also extend the unit tests for `Layout::ToString()` to be more comprehensive. Also fix some mistakes in the `Layout::Print()` comment. PiperOrigin-RevId: 751559635
PiperOrigin-RevId: 751560477
PiperOrigin-RevId: 751576285
PiperOrigin-RevId: 751578594
…prefer clearing the tree by calling the method. PiperOrigin-RevId: 751580849
JAX does not use this class any more. PiperOrigin-RevId: 751584043
PiperOrigin-RevId: 751601296
…st memory space color" from HostOffloadLegalize. The pass is currently identical before and after LayoutAssignment. Host memory space color is now available in xla::Layout. PiperOrigin-RevId: 751603615
``` name old cpu/op new cpu/op delta BM_HloModule/jax.issue.26021/process_time 35.7µs ± 8% 31.6µs ± 4% -11.48% (p=0.000 n=80+76) name old time/op new time/op delta BM_HloModule/jax.issue.26021/process_time 35.6µs ± 6% 31.6µs ± 3% -11.31% (p=0.000 n=80+78) ``` Improves benchmark from jax-ml/jax#26021 PiperOrigin-RevId: 751615510
In preparation for using ObjectPool in GPU runtime move it to top level xla/runtime folder PiperOrigin-RevId: 751618221
…command Port custom call thunk optimizations from XLA:CPU to GPU backend. PiperOrigin-RevId: 751624394
permissive. PiperOrigin-RevId: 751655159
…ed on simpler underlying raw-buffer primitives. PiperOrigin-RevId: 751658445
Updates LLVM usage to match [c60f24dca96d](llvm/llvm-project@c60f24dca96d) PiperOrigin-RevId: 751669474
PiperOrigin-RevId: 751675251
PiperOrigin-RevId: 751712422
PiperOrigin-RevId: 751712467
…t is redundant. PiperOrigin-RevId: 751777386
This method is redundant with `ToProto()`. PiperOrigin-RevId: 751804504
* Enable tosa support under conditional compilation * fix genrule * Move all conditional stuff under tosa * rename; actually need an implementation for the -DTF_TOSA_ENABLED to work * moving the conditional compilation back into mlir out of tosa * moved everything under tosa * internal -> :internal
…work`. This change is needed to fix TF GPU wheel after tensorflow#91495 was submitted. PiperOrigin-RevId: 751821415
…macro has been removed from Eigen. PiperOrigin-RevId: 751910419
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.