Commits · main · CodeLinaro / aic / pytorch

Jan 03, 2025

[ROCm] CK Flash Attention Backend (#143695) · 0a94bb43

Xiaodong Wang authored 2 months ago

Replace https://github.com/pytorch/pytorch/pull/138947 for re-import.

Replaces https://github.com/ROCm/pytorch/pull/1592

This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics.

Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author

NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695
Approved by: https://github.com/malfet

Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>

0a94bb43

Dec 17, 2024

Revert "[ROCm] CK Flash Attention Backend (#138947)" · 969b07b9

PyTorch MergeBot authored 3 months ago

This reverts commit 500d0292.

Reverted https://github.com/pytorch/pytorch/pull/138947 on behalf of https://github.com/atalman due to Breaks default windows checkout ([comment](https://github.com/pytorch/pytorch/pull/138947#issuecomment-2548998359))

969b07b9

[ROCm] CK Flash Attention Backend (#138947) · 500d0292

Andy Lugo authored 3 months ago

Replaces https://github.com/ROCm/pytorch/pull/1592

This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling `torch.backends.cuda.preferred_rocm_fa_library("ck")`. Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via `USE_FLASH_ATTENTION`) and is selected at runtime by the existing heuristics.

NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138947
Approved by: https://github.com/pruthvistony, https://github.com/xw285cornell, https://github.com/leitian

Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

500d0292

Aug 24, 2024

Add Arm copyright line to LICENSE (#133982) · 9cd53b32

Jonathan Deakin authored 7 months ago

Some historical commits from arm:
- 2021 664126ba
- 2023 26301447
- 2024 ce613001

See https://github.com/pytorch/pytorch/pull/126687 for initial discussion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133982
Approved by: https://github.com/malfet

9cd53b32

Mar 04, 2022

[Model Averaging] Support hierarchical model averaging (#73285) · 0bb3b065

wayi1 authored 3 years ago

Summary:
Implement hierarchical model averaging proposed in https://github.com/pytorch/pytorch/issues/71325.

Unit tests are added. Since I don't have access to 4-GPU machines in open-source environment, expect that the branch with the prefix of `ci-all` can run the test that requires 4 GPUs.

In the future, the internals of `PeriodicModelAveraging` can be simplified as an implementation of a specialized hierarchical model averaging, where `period_group_size_dict` only has a pair of period and world size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73285

Reviewed By: mrshenli

Differential Revision: D34457792

Pulled By: rohan-varma

fbshipit-source-id: 39a6c5bf8a2852b6394a56abbad17b8a909b9fba
(cherry picked from commit 5f543d46103edb515db199dbb80db43c85665f29)

0bb3b065

Oct 22, 2020

Pull in fairscale.nn.Pipe into PyTorch. (#44090) · 06d50b5e

Pritam Damania authored 4 years ago

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114864254

Test Plan:
1) waitforbuildbot
2) Ran all tests on my devgpu

Reviewed By: mrshenli

Differential Revision: D23493316

fbshipit-source-id: fe3c8b7dadeeb86abdc00e8a8652491b0b16743a

06d50b5e

Jun 12, 2018
- Move copyright lines back to NOTICE file, fixes #6911 (#8310) · a161639f
  Edward Z. Yang authored 6 years ago
```
Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
```
  a161639f
Sep 28, 2017

Re-license to Apache · 8286ce1e

Yangqing Jia authored 7 years ago

Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902

8286ce1e

Feb 09, 2017

Import gloo · efd89986

Pieter Noordhuis authored 8 years ago

Summary:
In the GitHub repository this directory will be mirrored similar to
folly, such that the repository has a single top level directory
called "gloo". This allows for versioning or renaming of the
project root, without having to mangle the include paths; they will
always use the "gloo" prefix.

fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7

efd89986

Jan 05, 2017
- adding license back · c1e6aa58
  Bram Wasti authored 8 years ago
  
  c1e6aa58
Dec 09, 2016
- added initial documentation template · 66a71c02
  Bram Wasti authored 8 years ago
  
  66a71c02
Nov 15, 2016
- fbsync. TODO: check if build files need update. · 238ceab8
  Yangqing Jia authored 8 years ago
  
  238ceab8
Sep 18, 2016
- Add myself to LICENSE file · a90c259e
  Adam Paszke authored 8 years ago
  
  a90c259e
Sep 07, 2016
- add torch license · 07d1acd7
  Soumith Chintala authored 8 years ago
  
  07d1acd7
Sep 04, 2015
- A memory pool implementation based on cnmem. Added cnmem license to LICENSE. · ecd46d5e
  Yangqing Jia authored 9 years ago
  
  ecd46d5e
Aug 28, 2015
- move LICENSE.caffe into LICENSE, and added related correct attributions. · f528f46c
  Yangqing Jia authored 9 years ago
  
  f528f46c
Aug 08, 2015
- Update LICENSE · 60e94b52
  Yangqing Jia authored 9 years ago
  
  60e94b52
Jul 07, 2015
- Update license and readme. · 59e1ad7e
  Yangqing Jia authored 9 years ago
  
  59e1ad7e
Jun 25, 2015
- A clean init for Caffe2, removing my earlier hacky · 2ed1077a
  Yangqing Jia authored 9 years ago
```
commits.
```
  2ed1077a