- Jul 24, 2013
-
-
Herbert Xu authored
This reverts commits 67822649 39761214 0b95a7f8 31d93962 2d31e518 Unfortunately this change broke boot on some systems that used an initrd which does not include the newly created crct10dif modules. As these modules are required by sd_mod under certain configurations this is a serious problem. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jun 21, 2013
-
-
Jussi Kivilinna authored
This reverts commit cf1521a1. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 8-way twofish/AVX implementation is therefore faster and this implementation should be removed. Converting this implementation to use the same method as in twofish/AVX for table look-ups would give additional ~3% speed up vs twofish/AVX, but would hardly be worth of the added code and binary size. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
This reverts commit 60488010. Instruction (vpgatherdd) that this implementation relied on turned out to be slow performer on real hardware (i5-4570). The previous 4-way blowfish implementation is therefore faster and this implementation should be removed. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jun 05, 2013
-
-
Jussi Kivilinna authored
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly slower than blowfish-amd64. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly slower than twofish_avx. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- May 24, 2013
-
-
Tim Chen authored
Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned on to enable the feature. The crc_t10dif crypto library function will use this faster algorithm when crct10dif_pclmul module is loaded. Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- May 20, 2013
-
-
Tim Chen authored
When CRC T10 DIF is calculated using the crypto transform framework, we wrap the crc_t10dif function call to utilize it. This allows us to take advantage of any accelerated CRC T10 DIF transform that is plugged into the crypto framework. Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Apr 25, 2013
-
-
Jussi Kivilinna authored
Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
The Kconfig setting for glue helper module is CRYPTO_GLUE_HELPER_X86, but recent change for aesni_intel used CRYPTO_GLUE_HELPER instead. Patch corrects this issue. Cc: kbuild-all@01.org Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI. This work is based on Tom St Denis' earlier patch, http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2 Cc: Tom St Denis <tstdenis@elliptictech.com> Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
The GMAC code assumes that dst==src, which causes problems when trying to add rfc4543(gcm(aes)) test vectors. So fix this code to work when source and destination buffer are different. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Tim Chen authored
crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Tim Chen authored
crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Feb 26, 2013
-
-
Herbert Xu authored
This bool option can never be set to anything other than y. So let's just kill it. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jan 19, 2013
-
-
Alexander Boyko authored
This patch adds crc32 algorithms to shash crypto api. One is wrapper to gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal. This instruction present from Intel Westmere and AMD Bulldozer CPUs. For intel core i5 I got 450MB/s for table implementation and 2100MB/s for pclmulqdq implementation. Signed-off-by:
Alexander Boyko <alexander_boyko@xyratex.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jan 11, 2013
-
-
Kees Cook authored
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by:
Kees Cook <keescook@chromium.org> Acked-by:
David S. Miller <davem@davemloft.net>
-
- Jan 10, 2013
-
-
Michael Ellerman authored
This patch adds a crypto driver which provides a powerpc accelerated implementation of SHA-1, accelerated in that it is written in asm. Original patch by Paul, minor fixups for upstream by moi. Lightly tested on 64-bit with the test program here: http://michael.ellerman.id.au/files/junkcode/sha1test.c Seems to work, and is "not slower" than the generic version. Needs testing on 32-bit. Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Michael Ellerman <michael@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Dec 06, 2012
-
-
Jussi Kivilinna authored
CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Nov 09, 2012
-
-
Jussi Kivilinna authored
This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block cipher. Implementation process data in sixteen block chunks, which are byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre- and post-filtering. Patch has been tested with tcrypt and automated filesystem tests. tcrypt test results: Intel Core i5-2450M: camellia-aesni-avx vs camellia-asm-x86_64-2way: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.98x 0.96x 0.99x 0.96x 0.96x 0.95x 0.95x 0.94x 0.97x 0.98x 64B 0.99x 0.98x 1.00x 0.98x 0.98x 0.99x 0.98x 0.93x 0.99x 0.98x 256B 2.28x 2.28x 1.01x 2.29x 2.25x 2.24x 1.96x 1.97x 1.91x 1.90x 1024B 2.57x 2.56x 1.00x 2.57x 2.51x 2.53x 2.19x 2.17x 2.19x 2.22x 8192B 2.49x 2.49x 1.00x 2.53x 2.48x 2.49x 2.17x 2.17x 2.22x 2.22x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.98x 0.99x 0.97x 0.97x 0.96x 0.97x 0.98x 0.98x 0.99x 64B 1.00x 1.00x 1.01x 0.99x 0.98x 0.99x 0.99x 0.99x 0.99x 0.99x 256B 2.37x 2.37x 1.01x 2.39x 2.35x 2.33x 2.10x 2.11x 1.99x 2.02x 1024B 2.58x 2.60x 1.00x 2.58x 2.56x 2.56x 2.28x 2.29x 2.28x 2.29x 8192B 2.50x 2.52x 1.00x 2.56x 2.51x 2.51x 2.24x 2.25x 2.26x 2.29x Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Oct 15, 2012
-
-
Tim Chen authored
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu. A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf Signed-off-by:
Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Oct 08, 2012
-
-
David Howells authored
Create a key type that can be used to represent an asymmetric key type for use in appropriate cryptographic operations, such as encryption, decryption, signature generation and signature verification. The key type is "asymmetric" and can provide access to a variety of cryptographic algorithms. Possibly, this would be better as "public_key" - but that has the disadvantage that "public key" is an overloaded term. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
- Oct 03, 2012
-
-
Dave Jones authored
Asking for this option on x86 seems a bit pointless. Signed-off-by:
Dave Jones <davej@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 06, 2012
-
-
David McCullough authored
Add assembler versions of AES and SHA1 for ARM platforms. This has provided up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1. Platform CPU SPeed Endian Before (bps) After (bps) Improvement IXP425 533 MHz big 11217042 15566294 ~38% KS8695 166 MHz little 3828549 5795373 ~51% Signed-off-by:
David McCullough <ucdevel@gmail.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Aug 29, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 26, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 23, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 22, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Aug 20, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Jussi Kivilinna authored
crypto: aesni_intel - improve lrw and xts performance by utilizing parallel AES-NI hardware pipelines Use parallel LRW and XTS encryption facilities to better utilize AES-NI hardware pipelines and gain extra performance. Tcrypt benchmark results (async), old vs new ratios: Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7) aes:128bit lrw:256bit xts:256bit size lrw-enc lrw-dec xts-dec xts-dec 16B 0.99x 1.00x 1.22x 1.19x 64B 1.38x 1.50x 1.58x 1.61x 256B 2.04x 2.02x 2.27x 2.29x 1024B 2.56x 2.54x 2.89x 2.92x 8192B 2.85x 2.99x 3.40x 3.23x aes:192bit lrw:320bit xts:384bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.08x 1.08x 1.16x 1.17x 64B 1.48x 1.54x 1.59x 1.65x 256B 2.18x 2.17x 2.29x 2.28x 1024B 2.67x 2.67x 2.87x 3.05x 8192B 2.93x 2.84x 3.28x 3.33x aes:256bit lrw:348bit xts:512bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.07x 1.07x 1.18x 1.19x 64B 1.56x 1.56x 1.70x 1.71x 256B 2.22x 2.24x 2.46x 2.46x 1024B 2.76x 2.77x 3.13x 3.05x 8192B 2.99x 3.05x 3.40x 3.30x Cc: Huang Ying <ying.huang@intel.com> Signed-off-by:
Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Reviewed-by:
Kim Phillips <kim.phillips@freescale.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Aug 01, 2012
-
-
Seth Jennings authored
This patch add the 842 cryptographic API driver that submits compression requests to the 842 hardware compression accelerator driver (nx-compress). If the hardware accelerator goes offline for any reason (dynamic disable, migration, etc...), this driver will use LZO as a software failover for all future compression requests. For decompression requests, the 842 hardware driver contains a software implementation of the 842 decompressor to support the decompression of data that was compressed before the accelerator went offline. Signed-off-by:
Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by:
Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Johannes Goetzfried authored
This patch adds a x86_64/avx assembler implementation of the Cast6 block cipher. The implementation processes eight blocks in parallel (two 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast6-avx-x86_64 vs. cast6-generic 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x 64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x 256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x 1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x 8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x 64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x 256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x 1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x 8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x Signed-off-by:
Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
Johannes Goetzfried authored
This patch adds a x86_64/avx assembler implementation of the Cast5 block cipher. The implementation processes sixteen blocks in parallel (four 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast5-avx-x86_64 vs. cast5-generic 64bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.02x 1.01x 64B 1.00x 1.00x 0.98x 1.00x 1.01x 1.02x 256B 2.03x 2.01x 0.95x 2.11x 2.12x 2.13x 1024B 2.30x 2.24x 0.95x 2.29x 2.35x 2.35x 8192B 2.31x 2.27x 0.95x 2.31x 2.39x 2.39x 128bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.01x 1.01x 64B 1.00x 1.00x 0.98x 1.01x 1.02x 1.01x 256B 2.17x 2.13x 0.96x 2.19x 2.19x 2.19x 1024B 2.29x 2.32x 0.95x 2.34x 2.37x 2.38x 8192B 2.35x 2.32x 0.95x 2.35x 2.39x 2.39x Signed-off-by:
Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-
- Jun 27, 2012
-
-
Sebastian Andrzej Siewior authored
Since commit ce6dd368 ("crypto: arc4 - improve performance by adding ecb(arc4)) we need to pull in a blkcipher. |ERROR: "crypto_blkcipher_type" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_done" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_virt" [crypto/arc4.ko] undefined! Signed-off-by:
Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au>
-