- Jul 14, 2021
-
-
Pavel Begunkov authored
commit 976517f1 upstream. There is a complaint against sys_io_uring_enter() blocking if it submits stdin reads. The problem is in __io_file_supports_async(), which sees that it's a cdev and allows it to be processed inline. Punt char devices using generic rules of io_file_supports_async(), including checking for presence of *_iter() versions of rw callbacks. Apparently, it will affect most of cdevs with some exceptions like null and zero devices. Cc: stable@vger.kernel.org Reported-by:
Birk Hirdman <lonjil@gmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Long Li authored
commit c9c9762d upstream. After commit 07173c3e ("block: enable multipage bvecs"), a bvec can have multiple pages. But bio_will_gap() still assumes one page bvec while checking for merging. If the pages in the bvec go across the seg_boundary_mask, this check for merging can potentially succeed if only the 1st page is tested, and can fail if all the pages are tested. Later, when SCSI builds the SG list the same check for merging is done in __blk_segment_map_sg_merge() with all the pages in the bvec tested. This time the check may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those BIOs were merged). If this check fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages. This results in incorrect pages written to the disk. Fix this by returning the multi-page bvec when testing gaps for merging. Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 07173c3e ("block: enable multipage bvecs") Signed-off-by:
Long Li <longli@microsoft.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Yongjun authored
commit 0508c1ad upstream. 'ret' will be overwritten to 0 if erofs_sb_has_sb_chksum() return true, thus 0 will return in some error handling cases. Fix to return negative error code -EINVAL instead of 0. Link: https://lore.kernel.org/r/20210519141657.3062715-1-weiyongjun1@huawei.com Fixes: b858a484 ("erofs: support superblock checksum") Cc: stable <stable@vger.kernel.org> # 5.5+ Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by:
Gao Xiang <xiang@kernel.org> Reviewed-by:
Chao Yu <yuchao0@huawei.com> Signed-off-by:
Gao Xiang <xiang@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jarkko Sakkinen authored
commit 0178f9d0 upstream. Do not tear down the system when getting invalid status from a TPM chip. This can happen when panic-on-warn is used. Instead, introduce TPM_TIS_INVALID_STATUS bitflag and use it to trigger once the error reporting per chip. In addition, print out the value of TPM_STS for improved forensics. Link: https://lore.kernel.org/keyrings/YKzlTR1AzUigShtZ@kroah.com/ Fixes: 55707d53 ("tpm_tis: Add a check for invalid status") Cc: stable@vger.kernel.org Signed-off-by:
Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 2fc2b430 upstream. Typically, the cryptographic APIs that fscrypt uses take keys as byte arrays, which avoids endianness issues. However, siphash_key_t is an exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is expected to be given directly as two 64-bit words in CPU endianness. fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key() forgot to take this into account. Therefore, the SipHash keys used to index encrypted+casefolded directories differ on big endian vs. little endian platforms, as do the SipHash keys used to hash inode numbers for IV_INO_LBLK_32-encrypted directories. This makes such directories non-portable between these platforms. Fix this by always using the little endian order. This is a breaking change for big endian platforms, but this should be fine in practice since these features (encrypt+casefold support, and the IV_INO_LBLK_32 flag) aren't known to actually be used on any big endian platforms yet. Fixes: aa408f83 ("fscrypt: derive dirhash key for casefolded directories") Fixes: e3b1078b ("fscrypt: add support for IV_INO_LBLK_32 policies") Cc: <stable@vger.kernel.org> # v5.6+ Link: https://lore.kernel.org/r/20210605075033.54424-1-ebiggers@kernel.org Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 77f30bfc upstream. When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the minor_hash to 0 if the (major) hash is 0. This doesn't make sense because 0 is a valid hash code, so we shouldn't ignore the filesystem-provided minor_hash in that case. Fix this by removing the special case for 'hash == 0'. This is an old bug that appears to have originated when the encryption code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and f2fs code passed the hash by pointer instead of by value. So 'if (hash)' actually made sense then, as it was checking whether a pointer was NULL. But now the hashes are passed by value, and filesystems just pass 0 for any hashes they don't have. There is no need to handle this any differently from the hashes actually being 0. It is difficult to reproduce this bug, as it only made a difference in the case where a filename's 32-bit major hash happened to be 0. However, it probably had the largest chance of causing problems on ubifs, since ubifs uses minor_hash to do lookups of no-key names, in addition to using it as a readdir cookie. ext4 only uses minor_hash as a readdir cookie, and f2fs doesn't use minor_hash at all. Fixes: 0b81d077 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Cc: <stable@vger.kernel.org> # v4.6+ Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sibi Sankar authored
commit d6fbfdbc upstream. Fix IPCC (Inter-Processor Communication Controller) channel exhaustion by setting the channel private data to NULL on mbox shutdown. Err Logs: remoteproc: MBA booted without debug policy, loading mpss remoteproc: glink-edge: failed to acquire IPC channel remoteproc: failed to probe subdevices for remoteproc: -16 Fixes: fa74a025 ("mailbox: Add support for Qualcomm IPCC") Signed-off-by:
Sibi Sankar <sibis@codeaurora.org> Cc: stable@vger.kernel.org Reviewed-by:
Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by:
Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Varun Prakash authored
commit 6ecdafae upstream. Instead of calling dma_unmap_sg() after completing WRITE I/O, call dma_unmap_sg() before calling target_execute_cmd() to sync the DMA buffer. Link: https://lore.kernel.org/r/1618403949-3443-1-git-send-email-varun@chelsio.com Cc: <stable@vger.kernel.org> # 5.4+ Signed-off-by:
Varun Prakash <varun@chelsio.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Javed Hasan authored
commit 40445fd2 upstream. As per the FC-GS-5 specification, attribute lengths of node_name and manufacturer should in range of "4 to 64 Bytes" only. Link: https://lore.kernel.org/r/20210603101404.7841-2-jhasan@marvell.com Fixes: e721eb06 ("scsi: scsi_transport_fc: Match HBA Attribute Length with HBAAPI V2.0 definitions") CC: stable@vger.kernel.org Reviewed-by:
Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by:
Javed Hasan <jhasan@marvell.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Namjae Jeon authored
commit 1e5654de upstream. The compatibility issue between linux exfat and exfat of some camera company was reported from Florian. In their exfat, if the number of files exceeds any limit, the DataLength in stream entry of the directory is no longer updated. So some files created from camera does not show in linux exfat. because linux exfat doesn't allow that cpos becomes larger than DataLength of stream entry. This patch check DataLength in stream entry only if the type is ALLOC_NO_FAT_CHAIN and add the check ensure that dentry offset does not exceed max dentries size(256 MB) to avoid the circular FAT chain issue. Fixes: ca061973 ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.9 Reported-by:
Florian Cramer <flrncrmr@gmail.com> Reviewed-by:
Sungjong Seo <sj1557.seo@samsung.com> Tested-by:
Chris Down <chris@chrisdown.name> Signed-off-by:
Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guo Ren authored
[ Upstream commit 6ea42c84 ] The current csky logic of sys_cacheflush is wrong, it'll cause icache flush call dcache flush again. Now fixup it with a conditional "break & fallthrough". Fixes: 997153b9 ("csky: Add flush_icache_mm to defer flush icache all") Fixes: 0679d29d ("csky: fix syscache.c fallthrough warning") Acked-by:
Randy Dunlap <rdunlap@infradead.org> Co-Developed-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Guo Ren <guoren@linux.alibaba.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Randy Dunlap authored
[ Upstream commit 0679d29d ] This case of the switch statement falls through to the following case. This appears to be on purpose, so declare it as OK. ../arch/csky/mm/syscache.c: In function '__do_sys_cacheflush': ../arch/csky/mm/syscache.c:17:3: warning: this statement may fall through [-Wimplicit-fallthrough=] 17 | flush_icache_mm_range(current->mm, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 18 | (unsigned long)addr, | ~~~~~~~~~~~~~~~~~~~~ 19 | (unsigned long)addr + bytes); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../arch/csky/mm/syscache.c:20:2: note: here 20 | case DCACHE: | ^~~~ Fixes: 997153b9 ("csky: Add flush_icache_mm to defer flush icache all") Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Guo Ren <guoren@kernel.org> Cc: linux-csky@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Arnaldo Carvalho de Melo authored
[ Upstream commit c435c166 ] Zhihao sent a patch but it made llvm__compile_bpf() return what asprintf() returns on error, which is just -1, but since this function returns -errno, fix it by returning -ENOMEM for this case instead. Fixes: cb763714 ("perf llvm: Allow passing options to llc ...") Fixes: 5eab5a7e ("perf llvm: Display eBPF compiling command ...") Reported-by:
Hulk Robot <hulkci@huawei.com> Reported-by:
Zhihao Cheng <chengzhihao1@huawei.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yu Kuai <yukuai3@huawei.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20210609115945.2193194-1-chengzhihao1@huawei.com Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dave Hansen authored
[ Upstream commit 6039ca25 ] The pkey test code keeps a "shadow" of the pkey register around. This ensures that any bugs which might write to the register can be caught more quickly. Generally, userspace has a good idea when the kernel is going to write to the register. For instance, alloc_pkey() is passed a permission mask. The caller of alloc_pkey() can update the shadow based on the return value and the mask. But, the kernel can also modify the pkey register in a more sneaky way. For mprotect(PROT_EXEC) mappings, the kernel will allocate a pkey and write the pkey register to create an execute-only mapping. The kernel never tells userspace what key it uses for this. This can cause the test to fail with messages like: protection_keys_64.2: pkey-helpers.h:132: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed. because the shadow was not updated with the new kernel-set value. Forcibly update the shadow value immediately after an mprotect(). Link: https://lkml.kernel.org/r/20210611164200.EF76AB73@viggo.jf.intel.com Fixes: 6af17cf8 ("x86/pkeys/selftests: Add PROT_EXEC test") Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dave Hansen authored
[ Upstream commit bf68294a ] The alloc_pkey() sefltest function wraps the sys_pkey_alloc() system call. On success, it updates its "shadow" register value because sys_pkey_alloc() updates the real register. But, the success check is wrong. pkey_alloc() considers any non-zero return code to indicate success where the pkey register will be modified. This fails to take negative return codes into account. Consider only a positive return value as a successful call. Link: https://lkml.kernel.org/r/20210611164157.87AB4246@viggo.jf.intel.com Fixes: 5f23f6d0 ("x86/pkeys: Add self-tests") Reported-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dave Hansen authored
[ Upstream commit f36ef407 ] Patch series "selftests/vm/pkeys: Bug fixes and a new test". There has been a lot of activity on the x86 front around the XSAVE architecture which is used to context-switch processor state (among other things). In addition, AMD has recently joined the protection keys club by adding processor support for PKU. The AMD implementation helped uncover a kernel bug around the PKRU "init state", which actually applied to Intel's implementation but was just harder to hit. This series adds a test which is expected to help find this class of bug both on AMD and Intel. All the work around pkeys on x86 also uncovered a few bugs in the selftest. This patch (of 4): The "random" pkey allocation code currently does the good old: srand((unsigned int)time(NULL)); *But*, it unfortunately does this on every random pkey allocation. There may be thousands of these a second. time() has a one second resolution. So, each time alloc_random_pkey() is called, the PRNG is *RESET* to time(). This is nasty. Normally, if you do: srand(<ANYTHING>); foo = rand(); bar = rand(); You'll be quite guaranteed that 'foo' and 'bar' are different. But, if you do: srand(1); foo = rand(); srand(1); bar = rand(); You are quite guaranteed that 'foo' and 'bar' are the *SAME*. The recent "fix" effectively forced the test case to use the same "random" pkey for the whole test, unless the test run crossed a second boundary. Only run srand() once at program startup. This explains some very odd and persistent test failures I've been seeing. Link: https://lkml.kernel.org/r/20210611164153.91B76FB8@viggo.jf.intel.com Link: https://lkml.kernel.org/r/20210611164155.192D00FF@viggo.jf.intel.com Fixes: 6e373263 ("selftests/vm/pkeys: fix alloc_random_pkey() to make it really random") Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Trent Piepho authored
[ Upstream commit 65a0d3c1 ] If the input is out of the range of the allowed values, either larger than the largest value or closer to zero than the smallest non-zero allowed value, then a division by zero would occur. In the case of input too large, the division by zero will occur on the first iteration. The best result (largest allowed value) will be found by always choosing the semi-convergent and excluding the denominator based limit when finding it. In the case of the input too small, the division by zero will occur on the second iteration. The numerator based semi-convergent should not be calculated to avoid the division by zero. But the semi-convergent vs previous convergent test is still needed, which effectively chooses between 0 (the previous convergent) vs the smallest allowed fraction (best semi-convergent) as the result. Link: https://lkml.kernel.org/r/20210525144250.214670-1-tpiepho@gmail.com Fixes: 323dd2c3 ("lib/math/rational.c: fix possible incorrect result from rational fractions helper") Signed-off-by:
Trent Piepho <tpiepho@gmail.com> Reported-by:
Yiyuan Guo <yguoaz@gmail.com> Reviewed-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Oskar Schirmer <oskar@scara.com> Cc: Daniel Latypov <dlatypov@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit 28473d91 ] We should use release_z3fold_page_locked() to release z3fold page when it's locked, although it looks harmless to use release_z3fold_page() now. Link: https://lkml.kernel.org/r/20210619093151.1492174-7-linmiaohe@huawei.com Fixes: dcf5aedb ("z3fold: stricter locking and more careful reclaim") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit dac0d1cf ] There is a memory leak in z3fold_destroy_pool() as it forgets to free_percpu pool->unbuddied. Call free_percpu for pool->unbuddied to fix this issue. Link: https://lkml.kernel.org/r/20210619093151.1492174-6-linmiaohe@huawei.com Fixes: d30561c5 ("z3fold: use per-cpu unbuddied lists") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Vitaly Wool <vitaly.wool@konsulko.com> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ralph Campbell authored
[ Upstream commit ebfe1b8f ] The external function definitions don't need the "extern" keyword. Remove them so future changes don't copy the function definition style. Link: https://lkml.kernel.org/r/20201106235135.32109-1-rcampbell@nvidia.com Signed-off-by:
Ralph Campbell <rcampbell@nvidia.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Mike Kravetz authored
[ Upstream commit 48b8d744 ] Patch series "Fix prep_compound_gigantic_page ref count adjustment". These patches address the possible race between prep_compound_gigantic_page and __page_cache_add_speculative as described by Jann Horn in [1]. The first patch simply removes the unnecessary/obsolete helper routine prep_compound_huge_page to make the actual fix a little simpler. The second patch is the actual fix and has a detailed explanation in the commit message. This potential issue has existed for almost 10 years and I am unaware of anyone actually hitting the race. I did not cc stable, but would be happy to squash the patches and send to stable if anyone thinks that is a good idea. [1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/ This patch (of 2): I could not think of a reliable way to recreate the issue for testing. Rather, I 'simulated errors' to exercise all the error paths. The routine prep_compound_huge_page is a simple wrapper to call either prep_compound_gigantic_page or prep_compound_page. However, it is only called from gather_bootmem_prealloc which only processes gigantic pages. Eliminate the routine and call prep_compound_gigantic_page directly. Link: https://lkml.kernel.org/r/20210622021423.154662-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20210622021423.154662-2-mike.kravetz@oracle.com Signed-off-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Youquan Song <youquan.song@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yanfei Xu authored
[ Upstream commit 5291c09b ] Gigantic page is a compound page and its order is more than 1. Thus it must be available for hpage_pincount. Let's remove the redundant check for gigantic page. Link: https://lkml.kernel.org/r/20210202112002.73170-1-yanfei.xu@windriver.com Signed-off-by:
Yanfei Xu <yanfei.xu@windriver.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit c78a7f36 ] Since commit a5516438 ("hugetlb: modular state for hugetlb page size"), we can use huge_page_order to access hstate->order and pages_per_huge_page to fetch the pages per huge page. But gather_bootmem_prealloc() forgot to use it. Link: https://lkml.kernel.org/r/20210114114435.40075-1-linmiaohe@huawei.com Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit babbbdd0 ] If other processes are mapping any other subpages of the hugepage, i.e. in pte-mapped thp case, page_mapcount() will return 1 incorrectly. Then we would discard the page while other processes are still mapping it. Fix it by using total_mapcount() which can tell whether other processes are still mapping it. Link: https://lkml.kernel.org/r/20210511134857.1581273-6-linmiaohe@huawei.com Fixes: b8d3c4c3 ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called") Reviewed-by:
Yang Shi <shy828301@gmail.com> Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit e6be37b2 ] Since commit 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS"), read-only THP file mapping is supported. But it forgot to add checking for it in transparent_hugepage_enabled(). To fix it, we add checking for read-only THP file mapping and also introduce helper transhuge_vma_enabled() to check whether thp is enabled for specified vma to reduce duplicated code. We rename transparent_hugepage_enabled to transparent_hugepage_active to make the code easier to follow as suggested by David Hildenbrand. [linmiaohe@huawei.com: define transhuge_vma_enabled next to transhuge_vma_suitable] Link: https://lkml.kernel.org/r/20210514093007.4117906-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210511134857.1581273-4-linmiaohe@huawei.com Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Yang Shi <shy828301@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Miaohe Lin authored
[ Upstream commit b2bd53f1 ] Patch series "Cleanup and fixup for huge_memory:, v3. This series contains cleanups to remove dedicated macro and remove unnecessary tlb_remove_page_size() for huge zero pmd. Also this adds missing read-only THP checking for transparent_hugepage_enabled() and avoids discarding hugepage if other processes are mapping it. More details can be found in the respective changelogs. Thi patch (of 5): Rewrite the pgoff checking logic to remove macro HPAGE_CACHE_INDEX_MASK which is only used here to simplify the code. Link: https://lkml.kernel.org/r/20210511134857.1581273-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210511134857.1581273-2-linmiaohe@huawei.com Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Yang Shi <shy828301@gmail.com> Reviewed-by:
Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Aneesh Kumar K.V authored
[ Upstream commit bae84953 ] Differentiate between hardware not supporting hugepages and user disabling THP via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' For the devdax namespace, the kernel handles the above via the supported_alignment attribute and failing to initialize the namespace if the namespace align value is not supported on the platform. For the fsdax namespace, the kernel will continue to initialize the namespace. This can result in the kernel creating a huge pte entry even though the hardware don't support the same. We do want hugepage support with pmem even if the end-user disabled THP via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence differentiate between hardware/firmware lacking support vs user-controlled disable of THP and prevent a huge fault if the hardware lacks hugepage support. Link: https://lkml.kernel.org/r/20210205023956.417587-1-aneesh.kumar@linux.ibm.com Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Alex Williamson authored
[ Upstream commit 6a45ece4 ] io_remap_pfn_range() will trigger a BUG_ON if it encounters a populated pte within the mapping range. This can occur because we map the entire vma on fault and multiple faults can be blocked behind the vma_lock. This leads to traces like the one reported below. We can use our vma_list to test whether a given vma is mapped to avoid this issue. [ 1591.733256] kernel BUG at mm/memory.c:2177! [ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1 [ 1591.770735] Hardware name: , BIOS HixxxxFPGA 1P B600 V121-1 [ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [ 1591.786134] pc : remap_pfn_range+0x214/0x340 [ 1591.793564] lr : remap_pfn_range+0x1b8/0x340 [ 1591.799117] sp : ffff80001068bbd0 [ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000 [ 1591.810404] x27: 0000001100910000 x26: 0000001300910000 [ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358 [ 1591.825144] x23: 0000001140000000 x22: 0000000000000041 [ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000 [ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000 [ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540 [ 1591.853570] x15: 0000000000000000 x14: 0000000000000000 [ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880 [ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000 [ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0 [ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000 [ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001 [ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3 [ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868 [ 1591.910234] Call trace: [ 1591.914837] remap_pfn_range+0x214/0x340 [ 1591.921765] vfio_pci_mmap_fault+0xac/0x130 [vfio_pci] [ 1591.931200] __do_fault+0x44/0x12c [ 1591.937031] handle_mm_fault+0xcc8/0x1230 [ 1591.942475] do_page_fault+0x16c/0x484 [ 1591.948635] do_translation_fault+0xbc/0xd8 [ 1591.954171] do_mem_abort+0x4c/0xc0 [ 1591.960316] el0_da+0x40/0x80 [ 1591.965585] el0_sync_handler+0x168/0x1b0 [ 1591.971608] el0_sync+0x174/0x180 [ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000) Fixes: 11c4cd07 ("vfio-pci: Fault mmaps to enable vma tracking") Reported-by:
Zeng Tao <prime.zeng@hisilicon.com> Suggested-by:
Zeng Tao <prime.zeng@hisilicon.com> Link: https://lore.kernel.org/r/162497742783.3883260.3282953006487785034.stgit@omen Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pali Rohár authored
[ Upstream commit 2cbfdede ] UART1 (standard variant with DT node name 'uart0') has register space 0x12000-0x12018 and not whole size 0x200. So fix also this in example. Signed-off-by:
Pali Rohár <pali@kernel.org> Fixes: c737abc1 ("arm64: dts: marvell: Fix A37xx UART0 register size") Link: https://lore.kernel.org/r/20210624224909.6350-6-pali@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pali Rohár authored
[ Upstream commit deeaf963 ] For default (x16) scheme which is currently used by mvebu-uart.c driver, maximal divisor of UART base clock is 1023*16. Therefore there is limit for minimal supported baudrate. This change calculate it correctly and prevents setting invalid divisor 0 into hardware registers. Signed-off-by:
Pali Rohár <pali@kernel.org> Fixes: 68a0db1d ("serial: mvebu-uart: add function to change baudrate") Link: https://lore.kernel.org/r/20210624224909.6350-4-pali@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pali Rohár authored
[ Upstream commit ecd6b010 ] Testing mvuart->clk for non-error is not enough as mvuart->clk may contain valid clk pointer but when clk_prepare_enable(mvuart->clk) failed then port->uartclk is zero. When mvuart->clk is not available then port->uartclk is zero too. Parent clock rate port->uartclk is needed to calculate UART clock divisor and without it is not possible to change baudrate. So fix test condition when it is possible to change baudrate. Signed-off-by:
Pali Rohár <pali@kernel.org> Fixes: 68a0db1d ("serial: mvebu-uart: add function to change baudrate") Link: https://lore.kernel.org/r/20210624224909.6350-3-pali@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christophe JAILLET authored
[ Upstream commit 0cbbeaf3 ] The intent here is to return an error code if we don't find what we are looking for in the 'list_for_each_entry()' loop. 's' is not NULL if the list is empty or if we scan the complete list. Introduce a new 'found' variable to handle such cases. Fixes: 60dd4929 ("ALSA: firewire-lib: handle several AMDTP streams in callback handler of IRQ target") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by:
Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/9c9a53a4905984a570ba5672cbab84f2027dedc1.1624560484.git.christophe.jaillet@wanadoo.fr Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Vaibhav Jain authored
[ Upstream commit ed78f56e ] In case performance stats for an nvdimm are not available, reading the 'perf_stats' sysfs file returns an -ENOENT error. A better approach is to make the 'perf_stats' file entirely invisible to indicate that performance stats for an nvdimm are unavailable. So this patch updates 'papr_nd_attribute_group' to add a 'is_visible' callback implemented as newly introduced 'papr_nd_attribute_visible()' that returns an appropriate mode in case performance stats aren't supported in a given nvdimm. Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is available when 'papr_nd_attribute_visible()' is called during nvdimm initialization. Even though 'perf_stats' attribute is available since v5.9, there are no known user-space tools/scripts that are dependent on presence of its sysfs file. Hence I dont expect any user-space breakage with this patch. Fixes: 2d02bf83 ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP") Signed-off-by:
Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513092349.285021-1-vaibhav@linux.ibm.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Nicholas Piggin authored
[ Upstream commit f35d2f24 ] copy-paste contains implicit "copy buffer" state that can contain arbitrary user data (if the user process executes a copy instruction). This could be snooped by another process if a context switch hits while the state is live. So cp_abort is executed on context switch to clear out possible sensitive data and prevent the leak. cp_abort is done after the low level _switch(), which means it is never reached by newly created tasks, so they could snoop on this buffer between their first and second context switch. Fix this by doing the cp_abort before calling _switch. Add some comments which should make the issue harder to miss. Fixes: 07d2a628 ("powerpc/64s: Avoid cpabort in context switch when possible") Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210622053036.474678-1-npiggin@gmail.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Andy Shevchenko authored
[ Upstream commit 0e8554b5 ] Parse to and export from UUID own type, before dereferencing. This also fixes wrong comment (Little Endian UUID is something else) and should eliminate the direct strict types assignments. Fixes: 43001c52 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie") Fixes: 259a948c ("powerpc/pseries/scm: Use a specific endian format for storing uuid from the device tree") Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210616134303.58185-1-andriy.shevchenko@linux.intel.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Nicholas Piggin authored
[ Upstream commit bab26238 ] printk_safe_flush_on_panic() has special lock breaking code for the case where we panic()ed with the console lock held. It relies on panic IPI causing other CPUs to mark themselves offline. Do as most other architectures do. This effectively reverts commit de6e5d38 ("powerpc: smp_send_stop do not offline stopped CPUs"), unfortunately it may result in some false positive warnings, but the alternative is more situations where we can crash without getting messages out. Fixes: de6e5d38 ("powerpc: smp_send_stop do not offline stopped CPUs") Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623041245.865134-1-npiggin@gmail.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Vignesh Raghavendra authored
[ Upstream commit b67e830d ] On K3 family of SoCs (which includes AM654 SoC), it is observed that RX TIMEOUT is signalled after RX FIFO has been drained, in which case a dummy read of RX FIFO is required to clear RX TIMEOUT condition. Otherwise, this would lead to an interrupt storm. Fix this by introducing UART_RX_TIMEOUT_QUIRK flag and doing a dummy read in IRQ handler when RX TIMEOUT is reported with no data in RX FIFO. Fixes: be708744 ("serial: 8250_omap: Add support for AM654 UART controller") Reported-by:
Jan Kiszka <jan.kiszka@siemens.com> Tested-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20210622145704.11168-1-vigneshr@ti.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Vignesh Raghavendra authored
[ Upstream commit 439c7183 ] UARTs on TI SoCs prior to J7200 don't provide independent control over RX FIFO not empty interrupt (RHR_IT) and RX timeout interrupt. Starting with J7200 SoC, its possible to disable RHR_IT independent of RX timeout interrupt using bit 2 of IER2 register. So disable RHR_IT once RX DMA is started so as to avoid spurious interrupt being raised when data is in the RX FIFO but is yet to be drained by DMA (a known errata in older SoCs). Signed-off-by:
Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20201029051930.7097-1-vigneshr@ti.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Krzysztof Kozlowski authored
[ Upstream commit 07b60713 ] When running event-no-pid test on small machines (e.g. cloud 1-core instance), other events might not happen: + cat trace + cnt=0 + [ 0 -eq 0 ] + fail No other events were recorded [15] event tracing - restricts events based on pid notrace filtering [FAIL] Schedule a simple sleep task to be sure that some other process events get recorded. Fixes: ebed9628 ("selftests/ftrace: Add test to test new set_event_notrace_pid file") Signed-off-by:
Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Acked-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Shuah Khan <skhan@linuxfoundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christophe JAILLET authored
[ Upstream commit ee78b936 ] In 'ktd2692_parse_dt()', if an error occurs after a successful 'regulator_enable()' call, we should call 'regulator_enable()'. This is the same in 'ktd2692_probe()', if an error occurs after a successful 'ktd2692_parse_dt()' call. Instead of adding 'regulator_enable()' in several places, implement a resource managed solution and simplify the remove function accordingly. Fixes: b7da8c5c ("leds: Add ktd2692 flash LED driver") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by:
Pavel Machek <pavel@ucw.cz> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-