- Feb 10, 2021
-
-
Greg Kroah-Hartman authored
Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Igor Matheus Andrade Torrente <igormtorrente@gmail.com> Tested-by:
Jason Self <jason@bluehome.net> Tested-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Ross Schmidt <ross.schm.dev@gmail.com> Link: https://lore.kernel.org/r/20210208145810.230485165@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pali Rohár authored
commit 3241929b upstream. Older ATF does not provide SMC call for USB 3.0 phy power on functionality and therefore initialization of xhci-hcd is failing when older version of ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP. [ 3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware [ 3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95 [ 3.123465] xhci-hcd: probe of d0058000.usb failed with error -95 This patch introduces a new plat_setup callback for xhci platform drivers which is called prior calling usb_add_hcd() function. This function at its beginning skips PHY init if hcd->skip_phy_initialization is set. Current init_quirk callback for xhci platform drivers is called from xhci_plat_setup() function which is called after chip reset completes. It happens in the middle of the usb_add_hcd() function and therefore this callback cannot be used for setting if PHY init should be skipped or not. For Armada 3720 this patch introduce a new xhci_mvebu_a3700_plat_setup() function configured as a xhci platform plat_setup callback. This new function calls phy_power_on() and in case it returns -EOPNOTSUPP then XHCI_SKIP_PHY_INIT quirk is set to instruct xhci-plat to skip PHY initialization. This patch fixes above failure by ignoring 'not supported' error in xhci-hcd driver. In this case it is expected that phy is already power on. It fixes initialization of xhci-hcd on Espressobin boards where is older Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power. This is regression introduced in commit bd3d25b0 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined and therefore xhci-hcd on Espressobin with older ATF started failing. Fixes: bd3d25b0 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs") Cc: <stable@vger.kernel.org> # 5.1+: ea17a0f1: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Cc: <stable@vger.kernel.org> # 5.1+: f768e718: usb: host: xhci-plat: add priv quirk for skip PHY initialization Tested-by:
Tomasz Maciej Nowak <tmn505@gmail.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> # On R-Car Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> # xhci-plat Acked-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20210201150803.7305-1-pali@kernel.org [pali: Backported to 5.4 by replacing of_phy_put() with phy_put()] Signed-off-by:
Pali Rohár <pali@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Ovechkin authored
commit 938e0fcd upstream. Commit e5f0e8f8 ("net: sched: introduce and use qdisc tree flush/purge helpers") introduced qdisc tree flush/purge helpers, but erroneously used flush helper instead of purge helper in qdisc_replace function. This issue was found in our CI, that tests various qdisc setups by configuring qdisc and sending data through it. Call of invalid helper sporadically leads to corruption of vt_tree/cf_tree of hfsc_class that causes kernel oops: Oops: 0000 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-8f6859df #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:rb_insert_color+0x18/0x190 Code: c3 31 c0 c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 05 01 00 00 48 8b 10 f6 c2 01 0f 85 34 01 00 00 <48> 8b 4a 08 49 89 d0 48 39 c1 74 7d 48 85 c9 74 32 f6 01 01 75 2d RSP: 0018:ffffc900000b8bb0 EFLAGS: 00010246 RAX: ffff8881ef4c38b0 RBX: ffff8881d956e400 RCX: ffff8881ef4c38b0 RDX: 0000000000000000 RSI: ffff8881d956f0a8 RDI: ffff8881d956e4b0 RBP: 0000000000000000 R08: 000000d5c4e249da R09: 1600000000000000 R10: ffffc900000b8be0 R11: ffffc900000b8b28 R12: 0000000000000001 R13: 000000000000005a R14: ffff8881f0905000 R15: ffff8881f0387d00 FS: 0000000000000000(0000) GS:ffff8881f8b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000001f4796004 CR4: 0000000000060ee0 Call Trace: <IRQ> init_vf.isra.19+0xec/0x250 [sch_hfsc] hfsc_enqueue+0x245/0x300 [sch_hfsc] ? fib_rules_lookup+0x12a/0x1d0 ? __dev_queue_xmit+0x4b6/0x930 ? hfsc_delete_class+0x250/0x250 [sch_hfsc] __dev_queue_xmit+0x4b6/0x930 ? ip6_finish_output2+0x24d/0x590 ip6_finish_output2+0x24d/0x590 ? ip6_output+0x6c/0x130 ip6_output+0x6c/0x130 ? __ip6_finish_output+0x110/0x110 mld_sendpack+0x224/0x230 mld_ifc_timer_expire+0x186/0x2c0 ? igmp6_group_dropped+0x200/0x200 call_timer_fn+0x2d/0x150 run_timer_softirq+0x20c/0x480 ? tick_sched_do_timer+0x60/0x60 ? tick_sched_timer+0x37/0x70 __do_softirq+0xf7/0x2cb irq_exit+0xa0/0xb0 smp_apic_timer_interrupt+0x74/0x150 apic_timer_interrupt+0xf/0x20 </IRQ> Fixes: e5f0e8f8 ("net: sched: introduce and use qdisc tree flush/purge helpers") Signed-off-by:
Alexander Ovechkin <ovov@yandex-team.ru> Reported-by:
Alexander Kuznetsov <wwfq@yandex-team.ru> Acked-by:
Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Acked-by:
Dmitry Yakunin <zeil@yandex-team.ru> Acked-by:
Cong Wang <xiyou.wangcong@gmail.com> Link: https://lore.kernel.org/r/20210201200049.299153-1-ovov@yandex-team.ru Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
DENG Qingfang authored
commit f72f2fb8 upstream. Having multiple destination ports for a unicast address does not make sense. Make port_db_load_purge override existent unicast portvec instead of adding a new port bit. Fixes: 88472939 ("net: dsa: mv88e6xxx: handle multiple ports in ATU") Signed-off-by:
DENG Qingfang <dqfext@gmail.com> Reviewed-by:
Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20210130134334.10243-1-dqfext@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vadim Fedorenko authored
commit 28e104d0 upstream. dev->hard_header_len for tunnel interface is set only when header_ops are set too and already contains full overhead of any tunnel encapsulation. That's why there is not need to use this overhead twice in mtu calc. Fixes: fdafed45 ("ip_gre: set dev->hard_header_len and dev->needed_headroom properly") Reported-by:
Slava Bacherikov <mail@slava.cc> Signed-off-by:
Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1611959267-20536-1-git-send-email-vfedorenko@novek.ru Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chinmay Agarwal authored
commit eb4e8fac upstream. Following race condition was detected: <CPU A, t0> - neigh_flush_dev() is under execution and calls neigh_mark_dead(n) marking the neighbour entry 'n' as dead. <CPU B, t1> - Executing: __netif_receive_skb() -> __netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process() calls __neigh_lookup() which takes a reference on neighbour entry 'n'. <CPU A, t2> - Moves further along neigh_flush_dev() and calls neigh_cleanup_and_release(n), but since reference count increased in t2, 'n' couldn't be destroyed. <CPU B, t3> - Moves further along, arp_process() and calls neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds the neighbour entry back in gc_list(neigh_mark_dead(), removed it earlier in t0 from gc_list) <CPU B, t4> - arp_process() finally calls neigh_release(n), destroying the neighbour entry. This leads to 'n' still being part of gc_list, but the actual neighbour structure has been freed. The situation can be prevented from happening if we disallow a dead entry to have any possibility of updating gc_list. This is what the patch intends to achieve. Fixes: 9c29a2f5 ("neighbor: Fix locking order for gc_list changes") Signed-off-by:
Chinmay Agarwal <chinagar@codeaurora.org> Reviewed-by:
Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by:
David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210127165453.GA20514@chinagar-linux.qualcomm.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kai-Heng Feng authored
commit 2e99dedc upstream. Similar to commit 165ae7a8 ("igb: Report speed and duplex as unknown when device is runtime suspended"), if we try to read speed and duplex sysfs while the device is runtime suspended, igc will complain and stops working: [ 123.449883] igc 0000:03:00.0 enp3s0: PCIe link lost, device now detached [ 123.450052] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 123.450056] #PF: supervisor read access in kernel mode [ 123.450058] #PF: error_code(0x0000) - not-present page [ 123.450059] PGD 0 P4D 0 [ 123.450064] Oops: 0000 [#1] SMP NOPTI [ 123.450068] CPU: 0 PID: 2525 Comm: udevadm Tainted: G U W OE 5.10.0-1002-oem #2+rkl2-Ubuntu [ 123.450078] RIP: 0010:igc_rd32+0x1c/0x90 [igc] [ 123.450080] Code: c0 5d c3 b8 fd ff ff ff c3 0f 1f 44 00 00 0f 1f 44 00 00 55 89 f0 48 89 e5 41 56 41 55 41 54 49 89 c4 53 48 8b 57 08 48 01 d0 <44> 8b 28 41 83 fd ff 74 0c 5b 44 89 e8 41 5c 41 5d 4 [ 123.450083] RSP: 0018:ffffb0d100d6fcc0 EFLAGS: 00010202 [ 123.450085] RAX: 0000000000000008 RBX: ffffb0d100d6fd30 RCX: 0000000000000000 [ 123.450087] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff945a12716c10 [ 123.450089] RBP: ffffb0d100d6fce0 R08: ffff945a12716550 R09: ffff945a09874000 [ 123.450090] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008 [ 123.450092] R13: ffff945a12716000 R14: ffff945a037da280 R15: ffff945a037da290 [ 123.450094] FS: 00007f3b34c868c0(0000) GS:ffff945b89200000(0000) knlGS:0000000000000000 [ 123.450096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.450098] CR2: 0000000000000008 CR3: 00000001144de006 CR4: 0000000000770ef0 [ 123.450100] PKRU: 55555554 [ 123.450101] Call Trace: [ 123.450111] igc_ethtool_get_link_ksettings+0xd6/0x1b0 [igc] [ 123.450118] __ethtool_get_link_ksettings+0x71/0xb0 [ 123.450123] duplex_show+0x74/0xc0 [ 123.450129] dev_attr_show+0x1d/0x40 [ 123.450134] sysfs_kf_seq_show+0xa1/0x100 [ 123.450137] kernfs_seq_show+0x27/0x30 [ 123.450142] seq_read+0xb7/0x400 [ 123.450148] ? common_file_perm+0x72/0x170 [ 123.450151] kernfs_fop_read+0x35/0x1b0 [ 123.450155] vfs_read+0xb5/0x1b0 [ 123.450157] ksys_read+0x67/0xe0 [ 123.450160] __x64_sys_read+0x1a/0x20 [ 123.450164] do_syscall_64+0x38/0x90 [ 123.450168] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 123.450170] RIP: 0033:0x7f3b351fe142 [ 123.450173] Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 123.450174] RSP: 002b:00007fffef2ec138 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 123.450177] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b351fe142 [ 123.450179] RDX: 0000000000001001 RSI: 00005644c047f070 RDI: 0000000000000003 [ 123.450180] RBP: 00007fffef2ec340 R08: 00005644c047f070 R09: 00007f3b352d9320 [ 123.450182] R10: 00005644c047c010 R11: 0000000000000246 R12: 00005644c047cbf0 [ 123.450184] R13: 00005644c047e6d0 R14: 0000000000000003 R15: 00007fffef2ec140 [ 123.450189] Modules linked in: rfcomm ccm cmac algif_hash algif_skcipher af_alg bnep toshiba_acpi industrialio toshiba_haps hp_accel lis3lv02d btusb btrtl btbcm btintel bluetooth ecdh_generic ecc joydev input_leds nls_iso8859_1 snd_sof_pci snd_sof_intel_byt snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_hda_codec_hdmi snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core ath10k_pci snd_hwdep intel_rapl_msr intel_rapl_common ath10k_core soundwire_bus snd_soc_core x86_pkg_temp_thermal ath intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine mac80211 snd_pcm coretemp snd_seq_midi snd_seq_midi_event snd_rawmidi kvm_intel cfg80211 snd_seq snd_seq_device snd_timer mei_hdcp kvm libarc4 snd crct10dif_pclmul ghash_clmulni_intel aesni_intel mei_me dell_wmi [ 123.450266] dell_smbios soundcore sparse_keymap dcdbas crypto_simd cryptd mei dell_uart_backlight glue_helper ee1004 wmi_bmof intel_wmi_thunderbolt dell_wmi_descriptor mac_hid efi_pstore acpi_pad acpi_tad intel_cstate sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul rc_core drm intel_lpss_pci i2c_i801 ahci igc intel_lpss i2c_smbus idma64 xhci_pci libahci virt_dma xhci_pci_renesas wmi video pinctrl_tigerlake [ 123.450335] CR2: 0000000000000008 [ 123.450338] ---[ end trace 9f731e38b53c35cc ]--- The more generic approach will be wrap get_link_ksettings() with begin() and complete() callbacks, and calls runtime resume and runtime suspend routine respectively. However, igc is like igb, runtime resume routine uses rtnl_lock() which upper ethtool layer also uses. So to prevent a deadlock on rtnl, take a different approach, use pm_runtime_suspended() to avoid reading register while device is runtime suspended. Fixes: 8c5ad0da ("igc: Add ethtool support") Signed-off-by:
Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by:
Sasha Neftin <sasha.neftin@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xiao Ni authored
commit dc5d17a3 upstream. One customer reports a crash problem which causes by flush request. It triggers a warning before crash. /* new request after previous flush is completed */ if (ktime_after(req_start, mddev->prev_flush_start)) { WARN_ON(mddev->flush_bio); mddev->flush_bio = bio; bio = NULL; } The WARN_ON is triggered. We use spin lock to protect prev_flush_start and flush_bio in md_flush_request. But there is no lock protection in md_submit_flush_data. It can set flush_bio to NULL first because of compiler reordering write instructions. For example, flush bio1 sets flush bio to NULL first in md_submit_flush_data. An interrupt or vmware causing an extended stall happen between updating flush_bio and prev_flush_start. Because flush_bio is NULL, flush bio2 can get the lock and submit to underlayer disks. Then flush bio1 updates prev_flush_start after the interrupt or extended stall. Then flush bio3 enters in md_flush_request. The start time req_start is behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK again. INIT_WORK() will re-initialize the list pointers in the work_struct, which then can result in a corrupted work list and the work_struct queued a second time. With the work list corrupted, it can lead in invalid work items being used and cause a crash in process_one_work. We need to make sure only one flush bio can be handled at one same time. So add spin lock in md_submit_flush_data to protect prev_flush_start and flush_bio in an atomic way. Reviewed-by:
David Jeffery <djeffery@redhat.com> Signed-off-by:
Xiao Ni <xni@redhat.com> Signed-off-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nadav Amit authored
commit 29b32839 upstream. When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on. Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f5 ("intel-iommu: Avoid global flushes with caching mode.") This behavior changed after commit 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one. Getting batched TLB flushes to use page-specific invalidations again in such circumstances is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability. Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization. Fixes: 13cf0174 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by:
Nadav Amit <namit@vmware.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Acked-by:
Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210127175317.1600473-1-namit@vmware.com Signed-off-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Nadav Amit <namit@vmware.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Benjamin Valentin authored
commit 9bbd77d5 upstream. There is a fork of this driver on GitHub [0] that has been updated with new device IDs. Merge those into the mainline driver, so the out-of-tree fork is not needed for users of those devices anymore. [0] https://github.com/paroj/xpad Signed-off-by:
Benjamin Valentin <benpicco@googlemail.com> Link: https://lore.kernel.org/r/20210121142523.1b6b050f@rechenknecht2k11 Signed-off-by:
Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Luca Coelho authored
commit 64f55156 upstream. If we have only a single RX queue, such as when MSI-X is not available, we should not send the RFH_QUEUEU_CONFIG_CMD, because our only queue is the same as the command queue and will be configured as part of the context info. Our code was actually trying to send the command with 0 queues, which caused UMAC assert 0x1D04. Fix that by not sending the command when we have a single queue. Signed-off-by:
Luca Coelho <luciano.coelho@intel.com> Signed-off-by:
Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20201008180656.c35eeb3299f8.I08f79a6ebe150a7d180b7005b24504bfdba6d8b5@changeid Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Hansen authored
commit 25a068b8 upstream. Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE. Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain MFENCE judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. [ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ] Reported-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit 20bf2b37 upstream. With retpolines disabled, some configurations of GCC, and specifically the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation to the kernel by default. That breaks certain tracing scenarios by adding a superfluous ENDBR64 instruction before the fentry call, for functions which can be called indirectly. CET instrumentation isn't currently necessary in the kernel, as CET is only supported in user space. Disable it unconditionally and move it into the x86's Makefile as CET/CFI... enablement should be a per-arch decision anyway. [ bp: Massage and extend commit message. ] Fixes: 29be86d7 ("kbuild: add -fcf-protection=none when using retpoline flags") Reported-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Tested-by:
Nikolay Borisov <nborisov@suse.com> Cc: <stable@vger.kernel.org> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hugh Dickins authored
commit 1c2f6730 upstream. Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb56 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reviewed-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rokudo Yan authored
commit 74e21484 upstream. In fast_isolate_freepages, high_pfn will be used if a prefered one (ie PFN >= low_fn) not found. But the high_pfn is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before. As a result move_freelist_head(freelist, freepage) will have unexpected behavior (eg corrupt the MOVABLE freelist) Unable to handle kernel paging request at virtual address dead000000000200 Mem abort info: ESR = 0x96000044 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000044 CM = 0, WnR = 1 [dead000000000200] address between user and kernel address ranges -000|list_cut_before(inline) -000|move_freelist_head(inline) -000|fast_isolate_freepages(inline) -000|isolate_freepages(inline) -000|compaction_alloc(?, ?) -001|unmap_and_move(inline) -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p -002|__read_once_size(inline) -002|static_key_count(inline) -002|static_key_false(inline) -002|trace_mm_compaction_migratepages(inline) -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0) -003|kcompactd_do_work(inline) -003|kcompactd([X19] p = 0xFFFFFF93227FBC40) -004|kthread([X20] _create = 0xFFFFFFE1AFB26380) -005|ret_from_fork(asm) The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device. This patch fixes the issue by reset high_pfn before searching each free area, which ensure freepage and freelist match when call move_freelist_head in fast_isolate_freepages(). Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com Fixes: 5a811889 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by:
Rokudo Yan <wu-yan@tcl.com> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit ecbf4724 upstream. The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049e ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 0eb2df2b upstream. There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 7ffddd49 upstream. There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race window is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Muchun Song authored
commit 585fc0d2 upstream. If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Russell King authored
commit 39d3454c upstream. Building with gcc 4.9.2 reveals a latent bug in the PCI accessors for Footbridge platforms, which causes a fatal alignment fault while accessing IO memory. Fix this by making the assembly volatile. Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 943dea8a upstream. Set the emulator context to PROT64 if SYSENTER transitions from 32-bit userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at the end of x86_emulate_insn() will incorrectly truncate the new RIP. Note, this bug is mostly limited to running an Intel virtual CPU model on an AMD physical CPU, as other combinations of virtual and physical CPUs do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility mode is legal, and unconditionally transitions to 64-bit mode. On AMD CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel, SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring guest TLB shenanigans). Fixes: fede8076 ("KVM: x86: handle wrap around 32-bit address space") Cc: stable@vger.kernel.org Signed-off-by:
Jonny Barker <jonny@jonnybarker.com> [sean: wrote changelog] Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20210202165546.2390296-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit ccd85d90 upstream. Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thorsten Leemhuis authored
commit 538e4a8c upstream. Some Kingston A2000 NVMe SSDs sooner or later get confused and stop working when they use the deepest APST sleep while running Linux. The system then crashes and one has to cold boot it to get the SSD working again. Kingston seems to known about this since at least mid-September 2020: https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994 Someone working for a German company representing Kingston to the German press confirmed to me Kingston engineering is aware of the issue and investigating; the person stated that to their current knowledge only the deepest APST sleep state causes trouble. Therefore, make Linux avoid it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD. I have two such SSDs, but it seems the problem doesn't occur with them. I hence couldn't verify if this patch really fixes the problem, but all the data in front of me suggests it should. This patch can easily be reverted or improved upon if a better solution surfaces. FWIW, there are many reports about the issue scattered around the web; most of the users disabled APST completely to make things work, some just made Linux avoid the deepest sleep state: https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79 https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80 https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu For the record, some data from 'nvme id-ctrl /dev/nvme0' NVME Identify Controller: vid : 0x2646 ssvid : 0x2646 mn : KINGSTON SA2000M81000G fr : S5Z42105 [...] ps 0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Cc: stable@vger.kernel.org # 4.14+ Signed-off-by:
Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stylon Wang authored
commit 1a10e524 upstream. This reverts commit b24bdc37. It caused memory leak after S3 on 4K HDMI displays. Signed-off-by:
Stylon Wang <stylon.wang@amd.com> Reviewed-by:
Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by:
Anson Jacob <Anson.Jacob@amd.com> Tested-by:
Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Fengnan Chang authored
commit f92e04f7 upstream. When analysing tuples fails we may loop indefinitely to retry. Let's avoid this by using a 10s timeout and bail if not completed earlier. Signed-off-by:
Fengnan Chang <fengnanchang@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123033230.36442-1-fengnanchang@gmail.com Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Shilovsky authored
commit 91792bb8 upstream. Currently we try to guess if a compound request is going to succeed waiting for credits or not based on the number of requests in flight. This approach doesn't work correctly all the time because there may be only one request in flight which is going to bring multiple credits satisfying the compound request. Change the behavior to fail a request only if there are no requests in flight at all and proceed waiting for credits otherwise. Cc: <stable@vger.kernel.org> # 5.1+ Signed-off-by:
Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by:
Tom Talpey <tom@talpey.com> Reviewed-by:
Shyam Prasad N <nspmangalore@gmail.com> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gustavo A. R. Silva authored
commit 8d8d1dbe upstream. While addressing some warnings generated by -Warray-bounds, I found this bug that was introduced back in 2017: CC [M] fs/cifs/smb2pdu.o fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’: fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ At the time, the size of array _Dialects_ was changed from 1 to 3 in struct validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4, but those changes were never made in struct smb2_negotiate_req, which has led to a 3 and a half years old out-of-bounds bug in function SMB2_negotiate() (fs/cifs/smb2pdu.c). Fix this by increasing the size of array _Dialects_ in struct smb2_negotiate_req to 4. Fixes: 9764c02f ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)") Fixes: d5c7076b ("smb3: add smb3.1.1 to default dialect list") Cc: stable@vger.kernel.org Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aurelien Aptel authored
commit 21b200d0 upstream. Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by:
Aurelien Aptel <aaptel@suse.com> Suggested-by:
Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by:
Shyam Prasad N <nspmangalore@gmail.com> CC: stable@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mathias Nyman authored
commit d4a61063 upstream. xhci driver may in some special cases need to copy small amounts of payload data to a bounce buffer in order to meet the boundary and alignment restrictions set by the xHCI specification. In the majority of these cases the data is in a sg list, and driver incorrectly assumed data is always in urb->sg when using the bounce buffer. If data instead is contiguous, and in urb->transfer_buffer, we may still need to bounce buffer a small part if data starts very close (less than packet size) to a 64k boundary. Check if sg list is used before copying data to/from it. Fixes: f9c589e1 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer") Cc: stable@vger.kernel.org Reported-by:
Andreas Hartmann <andihartmann@01019freenet.de> Tested-by:
Andreas Hartmann <andihartmann@01019freenet.de> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20210203113702.436762-2-mathias.nyman@linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marc Zyngier authored
commit 4c457e8c upstream. When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 7018c897 upstream. Richard reports that the following test: (while true; do cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null done) & while true; do for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind done for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind done done ...fails with a crash signature like: divide error: 0000 [#1] SMP KASAN PTI RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm] [..] Call Trace: available_slots_show+0x4e/0x120 [libnvdimm] dev_attr_show+0x42/0x80 ? memset+0x20/0x40 sysfs_kf_seq_show+0x218/0x410 The root cause is that available_slots_show() consults driver-data, but fails to synchronize against device-unbind setting up a TOCTOU race to access uninitialized memory. Validate driver-data under the device-lock. Fixes: 4d88a97a ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure") Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Coly Li <colyli@suse.com> Reported-by:
Richard Palethorpe <rpalethorpe@suse.com> Acked-by:
Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wang ShaoBo authored
commit 0188b878 upstream. Our system encountered a re-init error when re-registering same kretprobe, where the kretprobe_instance in rp->free_instances is illegally accessed after re-init. Implementation to avoid re-registration has been introduced for kprobe before, but lags for register_kretprobe(). We must check if kprobe has been re-registered before re-initializing kretprobe, otherwise it will destroy the data struct of kretprobe registered, which can lead to memory leak, system crash, also some unexpected behaviors. We use check_kprobe_rereg() to check if kprobe has been re-registered before running register_kretprobe()'s body, for giving a warning message and terminate registration process. Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com Cc: stable@vger.kernel.org Fixes: 1f0ab409 ("kprobes: Prevent re-registration of the same kprobe") [ The above commit should have been done for kretprobes too ] Acked-by:
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by:
Ananth N Mavinakayanahalli <ananth@linux.ibm.com> Acked-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (VMware) authored
commit 7e0a9220 upstream. On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b14 ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by:
<pierre.gondois@arm.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felix Fietkau authored
commit 18fe0fae upstream. If the driver uses .sta_add, station entries are only uploaded after the sta is in assoc state. Fix early station rate table updates by deferring them until the sta has been uploaded. Cc: stable@vger.kernel.org Signed-off-by:
Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210201083324.3134-1-nbd@nbd.name [use rcu_access_pointer() instead since we won't dereference here] Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liangyan authored
commit e04527fe upstream. We need to lock d_parent->d_lock before dget_dlock, or this may have d_lockref updated parallelly like calltrace below which will cause dentry->d_lockref leak and risk a crash. CPU 0 CPU 1 ovl_set_redirect lookup_fast ovl_get_redirect __d_lookup dget_dlock //no lock protection here spin_lock(&dentry->d_lock) dentry->d_lockref.count++ dentry->d_lockref.count++ [ 49.799059] PGD 800000061fed7067 P4D 800000061fed7067 PUD 61fec5067 PMD 0 [ 49.799689] Oops: 0002 [#1] SMP PTI [ 49.800019] CPU: 2 PID: 2332 Comm: node Not tainted 4.19.24-7.20.al7.x86_64 #1 [ 49.800678] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014 [ 49.801380] RIP: 0010:_raw_spin_lock+0xc/0x20 [ 49.803470] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [ 49.803949] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [ 49.804600] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [ 49.805252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [ 49.805898] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [ 49.806548] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [ 49.807200] FS: 00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [ 49.807935] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.808461] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [ 49.809113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.809758] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 49.810410] Call Trace: [ 49.810653] d_delete+0x2c/0xb0 [ 49.810951] vfs_rmdir+0xfd/0x120 [ 49.811264] do_rmdir+0x14f/0x1a0 [ 49.811573] do_syscall_64+0x5b/0x190 [ 49.811917] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 49.812385] RIP: 0033:0x7ffbf505ffd7 [ 49.814404] RSP: 002b:00007ffbedffada8 EFLAGS: 00000297 ORIG_RAX: 0000000000000054 [ 49.815098] RAX: ffffffffffffffda RBX: 00007ffbedffb640 RCX: 00007ffbf505ffd7 [ 49.815744] RDX: 0000000004449700 RSI: 0000000000000000 RDI: 0000000006c8cd50 [ 49.816394] RBP: 00007ffbedffaea0 R08: 0000000000000000 R09: 0000000000017d0b [ 49.817038] R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000012 [ 49.817687] R13: 00000000072823d8 R14: 00007ffbedffb700 R15: 00000000072823d8 [ 49.818338] Modules linked in: pvpanic cirrusfb button qemu_fw_cfg atkbd libps2 i8042 [ 49.819052] CR2: 0000000000000088 [ 49.819368] ---[ end trace 4e652b8aa299aa2d ]--- [ 49.819796] RIP: 0010:_raw_spin_lock+0xc/0x20 [ 49.821880] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [ 49.822363] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [ 49.823008] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [ 49.823658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [ 49.825404] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [ 49.827147] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [ 49.828890] FS: 00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [ 49.830725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.832359] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [ 49.834085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.835792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Cc: <stable@vger.kernel.org> Fixes: a6c60655 ("ovl: redirect on rename-dir") Signed-off-by:
Liangyan <liangyan.peng@linux.alibaba.com> Reviewed-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Suggested-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Chen authored
commit f768e718 upstream. Some DRD controllers (eg, dwc3 & cdns3) have PHY management at their own driver to cover both device and host mode, so add one priv quirk for such users to skip PHY management from HCD core. Reviewed-by:
Jun Li <jun.li@nxp.com> Signed-off-by:
Peter Chen <peter.chen@nxp.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20200918131752.16488-5-mathias.nyman@linux.intel.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chunfeng Yun authored
commit a50ea34d upstream. No need to check the following endpoints after finding the endpoint wanted to drop. Fixes: 54f6a8af ("usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints") Cc: stable <stable@vger.kernel.org> Reported-by:
Ikjoon Jang <ikjn@chromium.org> Signed-off-by:
Chunfeng Yun <chunfeng.yun@mediatek.com> Link: https://lore.kernel.org/r/1612255104-5363-1-git-send-email-chunfeng.yun@mediatek.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chunfeng Yun authored
commit 54f6a8af upstream. For those unchecked endpoints, we don't allocate bandwidth for them, so no need free the bandwidth, otherwise will decrease the allocated bandwidth. Meanwhile use xhci_dbg() instead of dev_dbg() to print logs and rename bw_ep_list_new as bw_ep_chk_list. Fixes: 1d69f9d9 ("usb: xhci-mtk: fix unreleased bandwidth data") Cc: stable <stable@vger.kernel.org> Reviewed-and-tested-by:
Ikjoon Jang <ikjn@chromium.org> Signed-off-by:
Chunfeng Yun <chunfeng.yun@mediatek.com> Link: https://lore.kernel.org/r/1612159064-28413-1-git-send-email-chunfeng.yun@mediatek.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ikjoon Jang authored
commit 1d69f9d9 upstream. xhci-mtk needs XHCI_MTK_HOST quirk functions in add_endpoint() and drop_endpoint() to handle its own sw bandwidth management. It stores bandwidth data into an internal table every time add_endpoint() is called, and drops those in drop_endpoint(). But when bandwidth allocation fails at one endpoint, all earlier allocation from the same interface could still remain at the table. This patch moves bandwidth management codes to check_bandwidth() and reset_bandwidth() path. To do so, this patch also adds those functions to xhci_driver_overrides and lets mtk-xhci to release all failed endpoints in reset_bandwidth() path. Fixes: 08e469de ("usb: xhci-mtk: supports bandwidth scheduling with multi-TT") Signed-off-by:
Ikjoon Jang <ikjn@chromium.org> Link: https://lore.kernel.org/r/20210113180444.v6.1.Id0d31b5f3ddf5e734d2ab11161ac5821921b1e1e@changeid Cc: stable <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gary Bisson authored
commit 0e5a3c82 upstream. Commit fe8abf33 ("usb: dwc3: support clocks and resets for DWC3 core") introduced clock support and a new function named dwc3_core_init_for_resume() which enables the clock before calling dwc3_core_init() during resume as clocks get disabled during suspend. Unfortunately in this commit the DWC3_GCTL_PRTCAP_OTG case was forgotten and therefore during resume, a platform could call dwc3_core_init() without re-enabling the clocks first, preventing to resume properly. So update the resume path to call dwc3_core_init_for_resume() as it should. Fixes: fe8abf33 ("usb: dwc3: support clocks and resets for DWC3 core") Cc: stable@vger.kernel.org Signed-off-by:
Gary Bisson <gary.bisson@boundarydevices.com> Link: https://lore.kernel.org/r/20210125161934.527820-1-gary.bisson@boundarydevices.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-