- Jan 03, 2025
-
-
Xiaodong Wang authored
Replace https://github.com/pytorch/pytorch/pull/138947 for re-import. Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695 Approved by: https://github.com/malfet Co-authored-by:
Andy Lugo <Andy.LugoReyes@amd.com> Co-authored-by:
Jithun Nair <jithun.nair@amd.com>
-
- Dec 17, 2024
-
-
PyTorch MergeBot authored
This reverts commit 500d0292. Reverted https://github.com/pytorch/pytorch/pull/138947 on behalf of https://github.com/atalman due to Breaks default windows checkout ([comment](https://github.com/pytorch/pytorch/pull/138947#issuecomment-2548998359))
-
Andy Lugo authored
Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling `torch.backends.cuda.preferred_rocm_fa_library("ck")`. Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via `USE_FLASH_ATTENTION`) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138947 Approved by: https://github.com/pruthvistony, https://github.com/xw285cornell, https://github.com/leitian Co-authored-by:
Xiaodong Wang <xw285@cornell.edu>
-
- Aug 24, 2024
-
-
Jonathan Deakin authored
Some historical commits from arm: - 2021 664126ba - 2023 26301447 - 2024 ce613001 See https://github.com/pytorch/pytorch/pull/126687 for initial discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133982 Approved by: https://github.com/malfet
-
- Mar 04, 2022
-
-
wayi1 authored
Summary: Implement hierarchical model averaging proposed in https://github.com/pytorch/pytorch/issues/71325. Unit tests are added. Since I don't have access to 4-GPU machines in open-source environment, expect that the branch with the prefix of `ci-all` can run the test that requires 4 GPUs. In the future, the internals of `PeriodicModelAveraging` can be simplified as an implementation of a specialized hierarchical model averaging, where `period_group_size_dict` only has a pair of period and world size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73285 Reviewed By: mrshenli Differential Revision: D34457792 Pulled By: rohan-varma fbshipit-source-id: 39a6c5bf8a2852b6394a56abbad17b8a909b9fba (cherry picked from commit 5f543d46103edb515db199dbb80db43c85665f29)
-
- Oct 22, 2020
-
-
Pritam Damania authored
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44090 This is an initial commit pulling in the torchgpipe fork at https://github.com/facebookresearch/fairscale. The purpose of this commit is to just pull in the code and ensure all tests and builds work fine. We will slowly modify this to match our intended API mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would address further changes needed on top of the initial commit.. We're pulling the code into the `torch.distributed._pipeline.sync` package. The package is private on purpose since there is a lot of work (ex: docs, API changes etc.) that needs to go in before we can actually officially support this. ghstack-source-id: 114864254 Test Plan: 1) waitforbuildbot 2) Ran all tests on my devgpu Reviewed By: mrshenli Differential Revision: D23493316 fbshipit-source-id: fe3c8b7dadeeb86abdc00e8a8652491b0b16743a
-
- Jun 12, 2018
-
-
Edward Z. Yang authored
Signed-off-by:
Edward Z. Yang <ezyang@cs.stanford.edu>
-
- Sep 28, 2017
-
-
Yangqing Jia authored
Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
-
- Feb 09, 2017
-
-
Pieter Noordhuis authored
Summary: In the GitHub repository this directory will be mirrored similar to folly, such that the repository has a single top level directory called "gloo". This allows for versioning or renaming of the project root, without having to mangle the include paths; they will always use the "gloo" prefix. fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7
-
- Jan 05, 2017
-
-
Bram Wasti authored
-
- Dec 09, 2016
-
-
Bram Wasti authored
-
- Nov 15, 2016
-
-
Yangqing Jia authored
-
- Sep 18, 2016
-
-
Adam Paszke authored
-
- Sep 07, 2016
-
-
Soumith Chintala authored
-
- Sep 04, 2015
-
-
Yangqing Jia authored
-
- Aug 28, 2015
-
-
Yangqing Jia authored
-
- Aug 08, 2015
-
-
Yangqing Jia authored
-
- Jul 07, 2015
-
-
Yangqing Jia authored
-
- Jun 25, 2015
-
-
Yangqing Jia authored
commits.
-