Skip to content
Snippets Groups Projects
  1. Nov 18, 2021
    • Miquel Raynal's avatar
      mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines · b4e2e9fb
      Miquel Raynal authored
      
      commit f16b7d2a upstream.
      
      Following the introduction of the generic ECC engine infrastructure, it
      was necessary to reorganize the code and move the ECC configuration in
      the ->attach_chip() hook. Failing to do that properly lead to a first
      series of fixes supposed to stabilize the situation. Unfortunately, this
      only fixed the use of software ECC engines, preventing any other kind of
      engine to be used, including on-die ones.
      
      It is now time to (finally) fix the situation by ensuring that we still
      provide a default (eg. software ECC) but will still support different
      ECC engines such as on-die ECC engines if properly described in the
      device tree.
      
      There are no changes needed on the core side in order to do this, but we
      just need to leverage the logic there which allows:
      1- a subsystem default (set to Host engines in the raw NAND world)
      2- a driver specific default (here set to software ECC engines)
      3- any type of engine requested by the user (ie. described in the DT)
      
      As the raw NAND subsystem has not yet been fully converted to the ECC
      engine infrastructure, in order to provide a default ECC engine for this
      driver we need to set chip->ecc.engine_type *before* calling
      nand_scan(). During the initialization step, the core will consider this
      entry as the default engine for this driver. This value may of course
      be overloaded by the user if the usual DT properties are provided.
      
      Fixes: 8fc6f1f0 ("mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-7-miquel.raynal@bootlin.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4e2e9fb
    • Miquel Raynal's avatar
      mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines · 963db3cc
      Miquel Raynal authored
      
      commit b5b5b4dc upstream.
      
      Following the introduction of the generic ECC engine infrastructure, it
      was necessary to reorganize the code and move the ECC configuration in
      the ->attach_chip() hook. Failing to do that properly lead to a first
      series of fixes supposed to stabilize the situation. Unfortunately, this
      only fixed the use of software ECC engines, preventing any other kind of
      engine to be used, including on-die ones.
      
      It is now time to (finally) fix the situation by ensuring that we still
      provide a default (eg. software ECC) but will still support different
      ECC engines such as on-die ECC engines if properly described in the
      device tree.
      
      There are no changes needed on the core side in order to do this, but we
      just need to leverage the logic there which allows:
      1- a subsystem default (set to Host engines in the raw NAND world)
      2- a driver specific default (here set to software ECC engines)
      3- any type of engine requested by the user (ie. described in the DT)
      
      As the raw NAND subsystem has not yet been fully converted to the ECC
      engine infrastructure, in order to provide a default ECC engine for this
      driver we need to set chip->ecc.engine_type *before* calling
      nand_scan(). During the initialization step, the core will consider this
      entry as the default engine for this driver. This value may of course
      be overloaded by the user if the usual DT properties are provided.
      
      Fixes: f6341f64 ("mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-4-miquel.raynal@bootlin.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      963db3cc
    • Miquel Raynal's avatar
      mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines · 13566bc1
      Miquel Raynal authored
      
      commit f9d8570b upstream.
      
      Following the introduction of the generic ECC engine infrastructure, it
      was necessary to reorganize the code and move the ECC configuration in
      the ->attach_chip() hook. Failing to do that properly lead to a first
      series of fixes supposed to stabilize the situation. Unfortunately, this
      only fixed the use of software ECC engines, preventing any other kind of
      engine to be used, including on-die ones.
      
      It is now time to (finally) fix the situation by ensuring that we still
      provide a default (eg. software ECC) but will still support different
      ECC engines such as on-die ECC engines if properly described in the
      device tree.
      
      There are no changes needed on the core side in order to do this, but we
      just need to leverage the logic there which allows:
      1- a subsystem default (set to Host engines in the raw NAND world)
      2- a driver specific default (here set to software ECC engines)
      3- any type of engine requested by the user (ie. described in the DT)
      
      As the raw NAND subsystem has not yet been fully converted to the ECC
      engine infrastructure, in order to provide a default ECC engine for this
      driver we need to set chip->ecc.engine_type *before* calling
      nand_scan(). During the initialization step, the core will consider this
      entry as the default engine for this driver. This value may of course
      be overloaded by the user if the usual DT properties are provided.
      
      Fixes: 6dd09f77 ("mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-5-miquel.raynal@bootlin.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13566bc1
    • Miquel Raynal's avatar
      mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines · 9b366f52
      Miquel Raynal authored
      
      commit 6bcd2960 upstream.
      
      Following the introduction of the generic ECC engine infrastructure, it
      was necessary to reorganize the code and move the ECC configuration in
      the ->attach_chip() hook. Failing to do that properly lead to a first
      series of fixes supposed to stabilize the situation. Unfortunately, this
      only fixed the use of software ECC engines, preventing any other kind of
      engine to be used, including on-die ones.
      
      It is now time to (finally) fix the situation by ensuring that we still
      provide a default (eg. software ECC) but will still support different
      ECC engines such as on-die ECC engines if properly described in the
      device tree.
      
      There are no changes needed on the core side in order to do this, but we
      just need to leverage the logic there which allows:
      1- a subsystem default (set to Host engines in the raw NAND world)
      2- a driver specific default (here set to software ECC engines)
      3- any type of engine requested by the user (ie. described in the DT)
      
      As the raw NAND subsystem has not yet been fully converted to the ECC
      engine infrastructure, in order to provide a default ECC engine for this
      driver we need to set chip->ecc.engine_type *before* calling
      nand_scan(). During the initialization step, the core will consider this
      entry as the default engine for this driver. This value may of course
      be overloaded by the user if the usual DT properties are provided.
      
      Fixes: d525914b ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()")
      Cc: stable@vger.kernel.org
      Cc: Jan Hoffmann <jan@3e8.eu>
      Cc: Kestrel seventyfour <kestrelseventyfour@gmail.com>
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Tested-by: default avatarJan Hoffmann <jan@3e8.eu>
      Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-10-miquel.raynal@bootlin.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b366f52
    • Miquel Raynal's avatar
      mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines · cbc55cf4
      Miquel Raynal authored
      
      commit d707bb74 upstream.
      
      Following the introduction of the generic ECC engine infrastructure, it
      was necessary to reorganize the code and move the ECC configuration in
      the ->attach_chip() hook. Failing to do that properly lead to a first
      series of fixes supposed to stabilize the situation. Unfortunately, this
      only fixed the use of software ECC engines, preventing any other kind of
      engine to be used, including on-die ones.
      
      It is now time to (finally) fix the situation by ensuring that we still
      provide a default (eg. software ECC) but will still support different
      ECC engines such as on-die ECC engines if properly described in the
      device tree.
      
      There are no changes needed on the core side in order to do this, but we
      just need to leverage the logic there which allows:
      1- a subsystem default (set to Host engines in the raw NAND world)
      2- a driver specific default (here set to software ECC engines)
      3- any type of engine requested by the user (ie. described in the DT)
      
      As the raw NAND subsystem has not yet been fully converted to the ECC
      engine infrastructure, in order to provide a default ECC engine for this
      driver we need to set chip->ecc.engine_type *before* calling
      nand_scan(). During the initialization step, the core will consider this
      entry as the default engine for this driver. This value may of course
      be overloaded by the user if the usual DT properties are provided.
      
      Fixes: 59d93473 ("mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/linux-mtd/20210928222258.199726-2-miquel.raynal@bootlin.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbc55cf4
    • Halil Pasic's avatar
      s390/cio: make ccw_device_dma_* more robust · 1f420818
      Halil Pasic authored
      
      commit ad9a1451 upstream.
      
      Since commit 48720ba5 ("virtio/s390: use DMA memory for ccw I/O and
      classic notifiers") we were supposed to make sure that
      virtio_ccw_release_dev() completes before the ccw device and the
      attached dma pool are torn down, but unfortunately we did not.  Before
      that commit it used to be OK to delay cleaning up the memory allocated
      by virtio-ccw indefinitely (which isn't really intuitive for guys used
      to destruction happens in reverse construction order), but now we
      trigger a BUG_ON if the genpool is destroyed before all memory allocated
      from it is deallocated. Which brings down the guest. We can observe this
      problem, when unregister_virtio_device() does not give up the last
      reference to the virtio_device (e.g. because a virtio-scsi attached scsi
      disk got removed without previously unmounting its previously mounted
      partition).
      
      To make sure that the genpool is only destroyed after all the necessary
      freeing is done let us take a reference on the ccw device on each
      ccw_device_dma_zalloc() and give it up on each ccw_device_dma_free().
      
      Actually there are multiple approaches to fixing the problem at hand
      that can work. The upside of this one is that it is the safest one while
      remaining simple. We don't crash the guest even if the driver does not
      pair allocations and frees. The downside is the reference counting
      overhead, that the reference counting for ccw devices becomes more
      complex, in a sense that we need to pair the calls to the aforementioned
      functions for it to be correct, and that if we happen to leak, we leak
      more than necessary (the whole ccw device instead of just the genpool).
      
      Some alternatives to this approach are taking a reference in
      virtio_ccw_online() and giving it up in virtio_ccw_release_dev() or
      making sure virtio_ccw_release_dev() completes its work before
      virtio_ccw_remove() returns. The downside of these approaches is that
      these are less safe against programming errors.
      
      Cc: <stable@vger.kernel.org> # v5.3
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Fixes: 48720ba5 ("virtio/s390: use DMA memory for ccw I/O and classic notifiers")
      Reported-by: default avatar <bfu@redhat.com>
      Reviewed-by: default avatarVineeth Vijayan <vneethv@linux.ibm.com>
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f420818
    • Harald Freudenberger's avatar
      s390/ap: Fix hanging ioctl caused by orphaned replies · c9ca9669
      Harald Freudenberger authored
      
      commit 3826350e upstream.
      
      When a queue is switched to soft offline during heavy load and later
      switched to soft online again and now used, it may be that the caller
      is blocked forever in the ioctl call.
      
      The failure occurs because there is a pending reply after the queue(s)
      have been switched to offline. This orphaned reply is received when
      the queue is switched to online and is accidentally counted for the
      outstanding replies. So when there was a valid outstanding reply and
      this orphaned reply is received it counts as the outstanding one thus
      dropping the outstanding counter to 0. Voila, with this counter the
      receive function is not called any more and the real outstanding reply
      is never received (until another request comes in...) and the ioctl
      blocks.
      
      The fix is simple. However, instead of readjusting the counter when an
      orphaned reply is detected, I check the queue status for not empty and
      compare this to the outstanding counter. So if the queue is not empty
      then the counter must not drop to 0 but at least have a value of 1.
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9ca9669
    • Sven Schnelle's avatar
      s390/tape: fix timer initialization in tape_std_assign() · 57de1fbe
      Sven Schnelle authored
      
      commit 213fca9e upstream.
      
      commit 9c6c273a ("timer: Remove init_timer_on_stack() in favor
      of timer_setup_on_stack()") changed the timer setup from
      init_timer_on_stack(() to timer_setup(), but missed to change the
      mod_timer() call. And while at it, use msecs_to_jiffies() instead
      of the open coded timeout calculation.
      
      Cc: stable@vger.kernel.org
      Fixes: 9c6c273a ("timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()")
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57de1fbe
    • Vineeth Vijayan's avatar
      s390/cio: check the subchannel validity for dev_busid · 1174298a
      Vineeth Vijayan authored
      
      commit a4751f15 upstream.
      
      Check the validity of subchanel before reading other fields in
      the schib.
      
      Fixes: d3683c05 ("s390/cio: add dev_busid sysfs entry for each subchannel")
      CC: <stable@vger.kernel.org>
      Reported-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarVineeth Vijayan <vneethv@linux.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Link: https://lore.kernel.org/r/20211105154451.847288-1-vneethv@linux.ibm.com
      
      
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1174298a
    • Marek Vasut's avatar
      video: backlight: Drop maximum brightness override for brightness zero · 7d0341b3
      Marek Vasut authored
      
      commit 33a5471f upstream.
      
      The note in c2adda27 ("video: backlight: Add of_find_backlight helper
      in backlight.c") says that gpio-backlight uses brightness as power state.
      This has been fixed since in ec665b75 ("backlight: gpio-backlight:
      Correct initial power state handling") and other backlight drivers do not
      require this workaround. Drop the workaround.
      
      This fixes the case where e.g. pwm-backlight can perfectly well be set to
      brightness 0 on boot in DT, which without this patch leads to the display
      brightness to be max instead of off.
      
      Fixes: c2adda27 ("video: backlight: Add of_find_backlight helper in backlight.c")
      Cc: <stable@vger.kernel.org> # 5.4+
      Cc: <stable@vger.kernel.org> # 4.19.x: ec665b75: backlight: gpio-backlight: Correct initial power state handling
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Acked-by: default avatarNoralf Trønnes <noralf@tronnes.org>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d0341b3
    • Jack Andersen's avatar
      mfd: dln2: Add cell for initializing DLN2 ADC · 332306b1
      Jack Andersen authored
      commit 313c84b5 upstream.
      
      This patch extends the DLN2 driver; adding cell for adc_dln2 module.
      
      The original patch[1] fell through the cracks when the driver was added
      so ADC has never actually been usable. That patch did not have ACPI
      support which was added in v5.9, so the oldest supported version this
      current patch can be backported to is 5.10.
      
      [1] https://www.spinics.net/lists/linux-iio/msg33975.html
      
      
      
      Cc: <stable@vger.kernel.org> # 5.10+
      Signed-off-by: default avatarJack Andersen <jackoalan@gmail.com>
      Signed-off-by: default avatarNoralf Trønnes <noralf@tronnes.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Link: https://lore.kernel.org/r/20211018112541.25466-1-noralf@tronnes.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      332306b1
    • Michal Hocko's avatar
      mm, oom: do not trigger out_of_memory from the #PF · 1d457987
      Michal Hocko authored
      commit 60e2793d upstream.
      
      Any allocation failure during the #PF path will return with VM_FAULT_OOM
      which in turn results in pagefault_out_of_memory.  This can happen for 2
      different reasons.  a) Memcg is out of memory and we rely on
      mem_cgroup_oom_synchronize to perform the memcg OOM handling or b)
      normal allocation fails.
      
      The latter is quite problematic because allocation paths already trigger
      out_of_memory and the page allocator tries really hard to not fail
      allocations.  Anyway, if the OOM killer has been already invoked there
      is no reason to invoke it again from the #PF path.  Especially when the
      OOM condition might be gone by that time and we have no way to find out
      other than allocate.
      
      Moreover if the allocation failed and the OOM killer hasn't been invoked
      then we are unlikely to do the right thing from the #PF context because
      we have already lost the allocation context and restictions and
      therefore might oom kill a task from a different NUMA domain.
      
      This all suggests that there is no legitimate reason to trigger
      out_of_memory from pagefault_out_of_memory so drop it.  Just to be sure
      that no #PF path returns with VM_FAULT_OOM without allocation print a
      warning that this is happening before we restart the #PF.
      
      [VvS: #PF allocation can hit into limit of cgroup v1 kmem controller.
      This is a local problem related to memcg, however, it causes unnecessary
      global OOM kills that are repeated over and over again and escalate into a
      real disaster.  This has been broken since kmem accounting has been
      introduced for cgroup v1 (3.8).  There was no kmem specific reclaim for
      the separate limit so the only way to handle kmem hard limit was to return
      with ENOMEM.  In upstream the problem will be fixed by removing the
      outdated kmem limit, however stable and LTS kernels cannot do it and are
      still affected.  This patch fixes the problem and should be backported
      into stable/LTS.]
      
      Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com
      
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d457987
    • Vasily Averin's avatar
      mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks · ac7f6bef
      Vasily Averin authored
      commit 0b28179a upstream.
      
      Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3.
      
      Memory cgroup charging allows killed or exiting tasks to exceed the hard
      limit.  It can be misused and allowed to trigger global OOM from inside
      a memcg-limited container.  On the other hand if memcg fails allocation,
      called from inside #PF handler it triggers global OOM from inside
      pagefault_out_of_memory().
      
      To prevent these problems this patchset:
       (a) removes execution of out_of_memory() from
           pagefault_out_of_memory(), becasue nobody can explain why it is
           necessary.
       (b) allow memcg to fail allocation of dying/killed tasks.
      
      This patch (of 3):
      
      Any allocation failure during the #PF path will return with VM_FAULT_OOM
      which in turn results in pagefault_out_of_memory which in turn executes
      out_out_memory() and can kill a random task.
      
      An allocation might fail when the current task is the oom victim and
      there are no memory reserves left.  The OOM killer is already handled at
      the page allocator level for the global OOM and at the charging level
      for the memcg one.  Both have much more information about the scope of
      allocation/charge request.  This means that either the OOM killer has
      been invoked properly and didn't lead to the allocation success or it
      has been skipped because it couldn't have been invoked.  In both cases
      triggering it from here is pointless and even harmful.
      
      It makes much more sense to let the killed task die rather than to wake
      up an eternally hungry oom-killer and send him to choose a fatter victim
      for breakfast.
      
      Link: https://lkml.kernel.org/r/0828a149-786e-7c06-b70a-52d086818ea3@virtuozzo.com
      
      
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac7f6bef
    • Naveen N. Rao's avatar
      powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC · 1ada8699
      Naveen N. Rao authored
      
      upstream commit b7540d62
      
      Emit similar instruction sequences to commit a048a07d
      ("powerpc/64s: Add support for a store forwarding barrier at kernel
      entry/exit") when encountering BPF_NOSPEC.
      
      Mitigations are enabled depending on what the firmware advertises. In
      particular, we do not gate these mitigations based on current settings,
      just like in x86. Due to this, we don't need to take any action if
      mitigations are enabled or disabled at runtime.
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/956570cbc191cd41f8274bed48ee757a86dac62a.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
      
      
      [adjust macros to account for commits 1c9debbc and ef909ba9.
      adjust security feature checks to account for commit 84ed26fd]
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ada8699
    • Naveen N. Rao's avatar
    • Naveen N. Rao's avatar
    • Naveen N. Rao's avatar
      powerpc/lib: Add helper to check if offset is within conditional branch range · 51cf71d5
      Naveen N. Rao authored
      
      upstream commit 4549c3ea
      
      Add a helper to check if a given offset is within the branch range for a
      powerpc conditional branch instruction, and update some sites to use the
      new helper.
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/442b69a34ced32ca346a0d9a855f3f6cfdbbbd41.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51cf71d5
    • Vasily Averin's avatar
      memcg: prohibit unconditional exceeding the limit of dying tasks · 74293225
      Vasily Averin authored
      commit a4ebf1b6 upstream.
      
      Memory cgroup charging allows killed or exiting tasks to exceed the hard
      limit.  It is assumed that the amount of the memory charged by those
      tasks is bound and most of the memory will get released while the task
      is exiting.  This is resembling a heuristic for the global OOM situation
      when tasks get access to memory reserves.  There is no global memory
      shortage at the memcg level so the memcg heuristic is more relieved.
      
      The above assumption is overly optimistic though.  E.g.  vmalloc can
      scale to really large requests and the heuristic would allow that.  We
      used to have an early break in the vmalloc allocator for killed tasks
      but this has been reverted by commit b8c8a338 ("Revert "vmalloc:
      back off when the current task is killed"").  There are likely other
      similar code paths which do not check for fatal signals in an
      allocation&charge loop.  Also there are some kernel objects charged to a
      memcg which are not bound to a process life time.
      
      It has been observed that it is not really hard to trigger these
      bypasses and cause global OOM situation.
      
      One potential way to address these runaways would be to limit the amount
      of excess (similar to the global OOM with limited oom reserves).  This
      is certainly possible but it is not really clear how much of an excess
      is desirable and still protects from global OOMs as that would have to
      consider the overall memcg configuration.
      
      This patch is addressing the problem by removing the heuristic
      altogether.  Bypass is only allowed for requests which either cannot
      fail or where the failure is not desirable while excess should be still
      limited (e.g.  atomic requests).  Implementation wise a killed or dying
      task fails to charge if it has passed the OOM killer stage.  That should
      give all forms of reclaim chance to restore the limit before the failure
      (ENOMEM) and tell the caller to back off.
      
      In addition, this patch renames should_force_charge() helper to
      task_is_dying() because now its use is not associated witch forced
      charging.
      
      This patch depends on pagefault_out_of_memory() to not trigger
      out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
      and cause a global OOM killer.
      
      Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.com
      
      
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74293225
    • Dominique Martinet's avatar
    • Daniel Borkmann's avatar
      net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE · a8cdf34f
      Daniel Borkmann authored
      
      [ Upstream commit 3dc20f47 ]
      
      Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT
      state and NTF_USE flag with a dynamic NUD state from a user space control plane.
      Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing
      neighbor entry in combination with NTF_USE flag.
      
      This is due to the latter directly calling into neigh_event_send() without any
      meta data updates as happening in __neigh_update(). Thus, to enable this use
      case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the
      NUD_PERMANENT state in particular so that a latter neigh_event_send() is able
      to re-resolve a neighbor entry.
      
      Before fix, NUD_PERMANENT -> NUD_* & NTF_USE:
      
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
        [...]
      
      As can be seen, despite the admin-triggered replace, the entry remains in the
      NUD_PERMANENT state.
      
      After fix, NUD_PERMANENT -> NUD_* & NTF_USE:
      
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
        [...]
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT
        [...]
      
      After the fix, the admin-triggered replace switches to a dynamic state from
      the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can
      transition back from there, if needed, into NUD_PERMANENT.
      
      Similar before/after behavior can be observed for below transitions:
      
      Before fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE:
      
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
        [...]
      
      After fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE:
      
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
        [...]
        # ./ip/ip n replace 192.168.178.30 dev enp5s0 use
        # ./ip/ip n
        192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE
        [..]
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarRoopa Prabhu <roopa@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a8cdf34f
    • Jaegeuk Kim's avatar
      f2fs: should use GFP_NOFS for directory inodes · 0bf5c6a1
      Jaegeuk Kim authored
      
      commit 92d602bc upstream.
      
      We use inline_dentry which requires to allocate dentry page when adding a link.
      If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem)
      twice by f2fs_lock_op(). I think this should be okay, but how about stopping
      the lockdep complaint [1]?
      
      f2fs_create()
       - f2fs_lock_op()
       - f2fs_do_add_link()
        - __f2fs_find_entry
         - f2fs_get_read_data_page()
         -> kswapd
          - shrink_node
           - f2fs_evict_inode
            - f2fs_lock_op()
      
      [1]
      
      fs_reclaim
      ){+.+.}-{0:0}
      :
      kswapd0:        lock_acquire+0x114/0x394
      kswapd0:        __fs_reclaim_acquire+0x40/0x50
      kswapd0:        prepare_alloc_pages+0x94/0x1ec
      kswapd0:        __alloc_pages_nodemask+0x78/0x1b0
      kswapd0:        pagecache_get_page+0x2e0/0x57c
      kswapd0:        f2fs_get_read_data_page+0xc0/0x394
      kswapd0:        f2fs_find_data_page+0xa4/0x23c
      kswapd0:        find_in_level+0x1a8/0x36c
      kswapd0:        __f2fs_find_entry+0x70/0x100
      kswapd0:        f2fs_do_add_link+0x84/0x1ec
      kswapd0:        f2fs_mkdir+0xe4/0x1e4
      kswapd0:        vfs_mkdir+0x110/0x1c0
      kswapd0:        do_mkdirat+0xa4/0x160
      kswapd0:        __arm64_sys_mkdirat+0x24/0x34
      kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
      kswapd0:        do_el0_svc+0x28/0xa0
      kswapd0:        el0_svc+0x24/0x38
      kswapd0:        el0_sync_handler+0x88/0xec
      kswapd0:        el0_sync+0x1c0/0x200
      kswapd0:
      -> #1
      (
      &sbi->cp_rwsem
      ){++++}-{3:3}
      :
      kswapd0:        lock_acquire+0x114/0x394
      kswapd0:        down_read+0x7c/0x98
      kswapd0:        f2fs_do_truncate_blocks+0x78/0x3dc
      kswapd0:        f2fs_truncate+0xc8/0x128
      kswapd0:        f2fs_evict_inode+0x2b8/0x8b8
      kswapd0:        evict+0xd4/0x2f8
      kswapd0:        iput+0x1c0/0x258
      kswapd0:        do_unlinkat+0x170/0x2a0
      kswapd0:        __arm64_sys_unlinkat+0x4c/0x68
      kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
      kswapd0:        do_el0_svc+0x28/0xa0
      kswapd0:        el0_svc+0x24/0x38
      kswapd0:        el0_sync_handler+0x88/0xec
      kswapd0:        el0_sync+0x1c0/0x200
      
      Cc: stable@vger.kernel.org
      Fixes: bdbc90fa ("f2fs: don't put dentry page in pagecache into highmem")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Reviewed-by: default avatarLight Hsieh <light.hsieh@mediatek.com>
      Tested-by: default avatarLight Hsieh <light.hsieh@mediatek.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bf5c6a1
    • Guo Ren's avatar
      irqchip/sifive-plic: Fixup EOI failed when masked · 7930892c
      Guo Ren authored
      commit 69ea4630 upstream.
      
      When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver,
      only the first interrupt is handled, and following interrupts are never
      delivered (initially reported in [1]).
      
      That's because the RISC-V PLIC cannot EOI masked interrupts, as explained
      in the description of Interrupt Completion in the PLIC spec [2]:
      
      <quote>
      The PLIC signals it has completed executing an interrupt handler by
      writing the interrupt ID it received from the claim to the claim/complete
      register. The PLIC does not check whether the completion ID is the same
      as the last claim ID for that target. If the completion ID does not match
      an interrupt source that *is currently enabled* for the target, the
      completion is silently ignored.
      </quote>
      
      Re-enable the interrupt before completion if it has been masked during
      the handling, and remask it afterwards.
      
      [1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html
      [2] https://github.com/riscv/riscv-plic-spec/blob/8bc15a35d07c9edf7b5d23fec9728302595ffc4d/riscv-plic.adoc
      
      
      
      Fixes: bb0fed1c ("irqchip/sifive-plic: Switch to fasteoi flow")
      Reported-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Tested-by: default avatarNikita Shubin <nikita.shubin@maquefel.me>
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Atish Patra <atish.patra@wdc.com>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      [maz: amended commit message]
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211105094748.3894453-1-guoren@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7930892c
    • Michael Pratt's avatar
      posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() · f67f6eb7
      Michael Pratt authored
      
      commit ca7752ca upstream.
      
      copy_process currently copies task_struct.posix_cputimers_work as-is. If a
      timer interrupt arrives while handling clone and before dup_task_struct
      completes then the child task will have:
      
      1. posix_cputimers_work.scheduled = true
      2. posix_cputimers_work.work queued.
      
      copy_process clears task_struct.task_works, so (2) will have no effect and
      posix_cpu_timers_work will never run (not to mention it doesn't make sense
      for two tasks to share a common linked list).
      
      Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is
      never cleared. Since scheduled is set, future timer interrupts will skip
      scheduling work, with the ultimate result that the task will never receive
      timer expirations.
      
      Together, the complete flow is:
      
      1. Task 1 calls clone(), enters kernel.
      2. Timer interrupt fires, schedules task work on Task 1.
         2a. task_struct.posix_cputimers_work.scheduled = true
         2b. task_struct.posix_cputimers_work.work added to
             task_struct.task_works.
      3. dup_task_struct() copies Task 1 to Task 2.
      4. copy_process() clears task_struct.task_works for Task 2.
      5. Future timer interrupts on Task 2 see
         task_struct.posix_cputimers_work.scheduled = true and skip scheduling
         work.
      
      Fix this by explicitly clearing contents of task_struct.posix_cputimers_work
      in copy_process(). This was never meant to be shared or inherited across
      tasks in the first place.
      
      Fixes: 1fb497dd ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work")
      Reported-by: default avatarRhys Hiltner <rhys@justin.tv>
      Signed-off-by: default avatarMichael Pratt <mpratt@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f67f6eb7
    • Dave Jones's avatar
      x86/mce: Add errata workaround for Skylake SKX37 · 1372eb18
      Dave Jones authored
      
      commit e629fc14 upstream.
      
      Errata SKX37 is word-for-word identical to the other errata listed in
      this workaround.   I happened to notice this after investigating a CMCI
      storm on a Skylake host.  While I can't confirm this was the root cause,
      spurious corrected errors does sound like a likely suspect.
      
      Fixes: 2976908e ("x86/mce: Do not log spurious corrected mce errors")
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1372eb18
    • Maciej W. Rozycki's avatar
      MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL · 1ee5bc2b
      Maciej W. Rozycki authored
      
      commit a923a267 upstream.
      
      Fix assembly errors like:
      
      {standard input}: Assembler messages:
      {standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      {standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      {standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32'
      {standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
      make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1
      
      with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations
      using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT.
      
      This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture
      to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a
      block of code containing a DINS MIPS64r2 instruction conditionalized on
      MIPS_ISA_REV >= 2 within the scope of the downgrade.
      
      The assembly architecture override code pattern has been put there for
      LL/SC instructions, so that code compiles for configurations that select
      a processor to build for that does not support these instructions while
      still providing run-time support for processors that do, dynamically
      switched by non-constant `cpu_has_llsc'.  It went in with linux-mips.org
      commit aac8aa77 ("Enable a suitable ISA for the assembler around
      ll/sc so that code builds even for processors that don't support the
      instructions. Plus minor formatting fixes.") back in 2005.
      
      Fix the problem by wrapping these instructions along with the adjacent
      SYNC instructions only, following the practice established with commit
      cfd54de3 ("MIPS: Avoid move psuedo-instruction whilst using
      MIPS_ISA_LEVEL") and commit 378ed6f0 ("MIPS: Avoid using .set mips0
      to restore ISA").  Strictly speaking the SYNC instructions do not have
      to be wrapped as they are only used as a Loongson3 erratum workaround,
      so they will be enabled in the assembler by default, but do this so as
      to keep code consistent with other places.
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMaciej W. Rozycki <macro@orcam.me.uk>
      Fixes: c7e2d71d ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ee5bc2b
    • Helge Deller's avatar
      parisc: Fix backtrace to always include init funtion names · fc42bbb7
      Helge Deller authored
      
      commit 279917e2 upstream.
      
      I noticed that sometimes at kernel startup the backtraces did not
      included the function names of init functions. Their address were not
      resolved to function names and instead only the address was printed.
      
      Debugging shows that the culprit is is_ksym_addr() which is called
      by the backtrace functions to check if an address belongs to a function in
      the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y.
      
      When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y
      the function only tries to resolve the address via is_kernel() function,
      which checks like this:
      	if (addr >= _stext && addr <= _end)
                      return 1;
      On parisc the init functions are located before _stext, so this check fails.
      Other platforms seem to have all functions (including init functions)
      behind _stext.
      
      The following patch moves the _stext symbol at the beginning of the
      kernel and thus includes the init section. This fixes the check and does
      not seem to have any negative side effects on where the kernel mapping
      happens in the map_pages() function in arch/parisc/mm/init.c.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@kernel.org # 5.4+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc42bbb7
    • Arnd Bergmann's avatar
      ARM: 9156/1: drop cc-option fallbacks for architecture selection · 241c74cc
      Arnd Bergmann authored
      commit 418ace99 upstream.
      
      Naresh and Antonio ran into a build failure with latest Debian
      armhf compilers, with lots of output like
      
       tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode
      
      As it turns out, $(cc-option) fails early here when the FPU is not
      selected before CPU architecture is selected, as the compiler
      option check runs before enabling -msoft-float, which causes
      a problem when testing a target architecture level without an FPU:
      
      cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU
      
      Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this
      issue, but the fallback logic is already broken because all supported
      compilers (gcc-5 and higher) are much more recent than these options,
      and building with -march=armv5t as a fallback no longer works.
      
      The best way forward that I see is to just remove all the checks, which
      also has the nice side-effect of slightly improving the startup time for
      'make'.
      
      The -mtune=marvell-f option was apparently never supported by any mainline
      compiler, and the custom Codesourcery gcc build that did support is
      now too old to build kernels, so just use -mtune=xscale unconditionally
      for those.
      
      This should be safe to apply on all stable kernels, and will be required
      in order to keep building them with gcc-11 and higher.
      
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419
      
      
      
      Reported-by: default avatarAntonio Terceiro <antonio.terceiro@linaro.org>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Tested-by: default avatarSebastian Reichel <sebastian.reichel@collabora.com>
      Tested-by: default avatarKlaus Kudielka <klaus.kudielka@gmail.com>
      Cc: Matthias Klose <doko@debian.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      241c74cc
    • Michał Mirosław's avatar
      ARM: 9155/1: fix early early_iounmap() · 03f25781
      Michał Mirosław authored
      
      commit 0d08e7bf upstream.
      
      Currently __set_fixmap() bails out with a warning when called in early boot
      from early_iounmap(). Fix it, and while at it, make the comment a bit easier
      to understand.
      
      Cc: <stable@vger.kernel.org>
      Fixes: b089c31c ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap")
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03f25781
    • Willem de Bruijn's avatar
      selftests/net: udpgso_bench_rx: fix port argument · ee79560c
      Willem de Bruijn authored
      
      [ Upstream commit d336509c ]
      
      The below commit added optional support for passing a bind address.
      It configures the sockaddr bind arguments before parsing options and
      reconfigures on options -b and -4.
      
      This broke support for passing port (-p) on its own.
      
      Configure sockaddr after parsing all arguments.
      
      Fixes: 3327a9c4 ("selftests: add functionals test for UDP GRO")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee79560c
    • Rahul Lakkireddy's avatar
      cxgb4: fix eeprom len when diagnostics not implemented · 8b215edb
      Rahul Lakkireddy authored
      
      [ Upstream commit 4ca110bf ]
      
      Ensure diagnostics monitoring support is implemented for the SFF 8472
      compliant port module and set the correct length for ethtool port
      module eeprom read.
      
      Fixes: f56ec676 ("cxgb4: Add support for ethtool i2c dump")
      Signed-off-by: default avatarManoj Malviya <manojmalviya@chelsio.com>
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8b215edb
    • Dust Li's avatar
      net/smc: fix sk_refcnt underflow on linkdown and fallback · 93bc3ef6
      Dust Li authored
      
      [ Upstream commit e5d5aadc ]
      
      We got the following WARNING when running ab/nginx
      test with RDMA link flapping (up-down-up).
      The reason is when smc_sock fallback and at linkdown
      happens simultaneously, we may got the following situation:
      
      __smc_lgr_terminate()
       --> smc_conn_kill()
          --> smc_close_active_abort()
                 smc_sock->sk_state = SMC_CLOSED
                 sock_put(smc_sock)
      
      smc_sock was set to SMC_CLOSED and sock_put() been called
      when terminate the link group. But later application call
      close() on the socket, then we got:
      
      __smc_release():
          if (smc_sock->fallback)
              smc_sock->sk_state = SMC_CLOSED
              sock_put(smc_sock)
      
      Again we set the smc_sock to CLOSED through it's already
      in CLOSED state, and double put the refcnt, so the following
      warning happens:
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0
      Modules linked in:
      CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403
      Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
      RIP: 0010:refcount_warn_saturate+0x8d/0xf0
      Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48
      
      RSP: 0018:ffffc90000527e50 EFLAGS: 00010286
      RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027
      RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001
      R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340
      R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8
      FS:  00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       smc_release+0x353/0x3f0
       __sock_release+0x3d/0xb0
       sock_close+0x11/0x20
       __fput+0x93/0x230
       task_work_run+0x65/0xa0
       exit_to_user_mode_prepare+0xf9/0x100
       syscall_exit_to_user_mode+0x27/0x190
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This patch adds check in __smc_release() to make
      sure we won't do an extra sock_put() and set the
      socket to CLOSED when its already in CLOSED state.
      
      Fixes: 51f1de79 (net/smc: replace sock_put worker by socket refcounting)
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93bc3ef6
    • Eiichi Tsukata's avatar
      vsock: prevent unnecessary refcnt inc for nonblocking connect · 7e03b797
      Eiichi Tsukata authored
      
      [ Upstream commit c7cd82b9 ]
      
      Currently vosck_connect() increments sock refcount for nonblocking
      socket each time it's called, which can lead to memory leak if
      it's called multiple times because connect timeout function decrements
      sock refcount only once.
      
      Fixes it by making vsock_connect() return -EALREADY immediately when
      sock state is already SS_CONNECTING.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7e03b797
    • Vladimir Oltean's avatar
      net: stmmac: allow a tc-taprio base-time of zero · ad3d219e
      Vladimir Oltean authored
      
      [ Upstream commit f64ab8e4 ]
      
      Commit fe28c53e ("net: stmmac: fix taprio configuration when
      base_time is in the past") allowed some base time values in the past,
      but apparently not all, the base-time value of 0 (Jan 1st 1970) is still
      explicitly denied by the driver.
      
      Remove the bogus check.
      
      Fixes: b60189e0 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad3d219e
    • Guangbin Huang's avatar
      net: hns3: allow configure ETS bandwidth of all TCs · b30459c0
      Guangbin Huang authored
      
      [ Upstream commit 688db0c7 ]
      
      Currently, driver only allow configuring ETS bandwidth of TCs according
      to the max TC number queried from firmware. However, the hardware actually
      supports 8 TCs and users may need to configure ETS bandwidth of all TCs,
      so remove the restriction.
      
      Fixes: 330baff5 ("net: hns3: add ETS TC weight setting in SSU module")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b30459c0
    • Yufeng Mo's avatar
      net: hns3: fix kernel crash when unload VF while it is being reset · ee11f16f
      Yufeng Mo authored
      
      [ Upstream commit e140c798 ]
      
      When fully configure VLANs for a VF, then unload the VF while
      triggering a reset to PF, will cause a kernel crash because the
      irq is already uninit.
      
      [ 293.177579] ------------[ cut here ]------------
      [ 293.183502] kernel BUG at drivers/pci/msi.c:352!
      [ 293.189547] Internal error: Oops - BUG: 0 [#1] SMP
      ......
      [ 293.390124] Workqueue: hclgevf hclgevf_service_task [hclgevf]
      [ 293.402627] pstate: 80c00009 (Nzcv daif +PAN +UAO)
      [ 293.414324] pc : free_msi_irqs+0x19c/0x1b8
      [ 293.425429] lr : free_msi_irqs+0x18c/0x1b8
      [ 293.436545] sp : ffff00002716fbb0
      [ 293.446950] x29: ffff00002716fbb0 x28: 0000000000000000
      [ 293.459519] x27: 0000000000000000 x26: ffff45b91ea16b00
      [ 293.472183] x25: 0000000000000000 x24: ffffa587b08f4700
      [ 293.484717] x23: ffffc591ac30e000 x22: ffffa587b08f8428
      [ 293.497190] x21: ffffc591ac30e300 x20: 0000000000000000
      [ 293.509594] x19: ffffa58a062a8300 x18: 0000000000000000
      [ 293.521949] x17: 0000000000000000 x16: ffff45b91dcc3f48
      [ 293.534013] x15: 0000000000000000 x14: 0000000000000000
      [ 293.545883] x13: 0000000000000040 x12: 0000000000000228
      [ 293.557508] x11: 0000000000000020 x10: 0000000000000040
      [ 293.568889] x9 : ffff45b91ea1e190 x8 : ffffc591802d0000
      [ 293.580123] x7 : ffffc591802d0148 x6 : 0000000000000120
      [ 293.591190] x5 : ffffc591802d0000 x4 : 0000000000000000
      [ 293.602015] x3 : 0000000000000000 x2 : 0000000000000000
      [ 293.612624] x1 : 00000000000004a4 x0 : ffffa58a1e0c6b80
      [ 293.623028] Call trace:
      [ 293.630340] free_msi_irqs+0x19c/0x1b8
      [ 293.638849] pci_disable_msix+0x118/0x140
      [ 293.647452] pci_free_irq_vectors+0x20/0x38
      [ 293.656081] hclgevf_uninit_msi+0x44/0x58 [hclgevf]
      [ 293.665309] hclgevf_reset_rebuild+0x1ac/0x2e0 [hclgevf]
      [ 293.674866] hclgevf_reset+0x358/0x400 [hclgevf]
      [ 293.683545] hclgevf_reset_service_task+0xd0/0x1b0 [hclgevf]
      [ 293.693325] hclgevf_service_task+0x4c/0x2e8 [hclgevf]
      [ 293.702307] process_one_work+0x1b0/0x448
      [ 293.710034] worker_thread+0x54/0x468
      [ 293.717331] kthread+0x134/0x138
      [ 293.724114] ret_from_fork+0x10/0x18
      [ 293.731324] Code: f940b000 b4ffff00 a903e7b8 f90017b6 (d4210000)
      
      This patch fixes the problem by waiting for the VF reset done
      while unloading the VF.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee11f16f
    • Eric Dumazet's avatar
      net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any · 79aa8706
      Eric Dumazet authored
      
      [ Upstream commit 6dc25401 ]
      
      1) if q->tk_offset == TK_OFFS_MAX, then get_tcp_tstamp() calls
         ktime_mono_to_any() with out-of-bound value.
      
      2) if q->tk_offset is changed in taprio_parse_clockid(),
         taprio_get_time() might also call ktime_mono_to_any()
         with out-of-bound value as sysbot found:
      
      UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:908:27
      index 3 is out of range for type 'ktime_t *[3]'
      CPU: 1 PID: 25668 Comm: kworker/u4:0 Not tainted 5.15.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
       __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291
       ktime_mono_to_any+0x1d4/0x1e0 kernel/time/timekeeping.c:908
       get_tcp_tstamp net/sched/sch_taprio.c:322 [inline]
       get_packet_txtime net/sched/sch_taprio.c:353 [inline]
       taprio_enqueue_one+0x5b0/0x1460 net/sched/sch_taprio.c:420
       taprio_enqueue+0x3b1/0x730 net/sched/sch_taprio.c:485
       dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3785
       __dev_xmit_skb net/core/dev.c:3869 [inline]
       __dev_queue_xmit+0x1f6e/0x3630 net/core/dev.c:4194
       batadv_send_skb_packet+0x4a9/0x5f0 net/batman-adv/send.c:108
       batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline]
       batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline]
       batadv_iv_send_outstanding_bat_ogm_packet+0x6d7/0x8e0 net/batman-adv/bat_iv_ogm.c:1701
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      
      Fixes: 7ede7b03 ("taprio: make clock reference conversions easier")
      Fixes: 54002066 ("taprio: Adjust timestamps for TCP packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vedang Patel <vedang.patel@intel.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Link: https://lore.kernel.org/r/20211108180815.1822479-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79aa8706
    • Muchun Song's avatar
      seq_file: fix passing wrong private data · b5703462
      Muchun Song authored
      [ Upstream commit 10a6de19 ]
      
      DEFINE_PROC_SHOW_ATTRIBUTE() is supposed to be used to define a series
      of functions and variables to register proc file easily. And the users
      can use proc_create_data() to pass their own private data and get it
      via seq->private in the callback. Unfortunately, the proc file system
      use PDE_DATA() to get private data instead of inode->i_private. So fix
      it. Fortunately, there only one user of it which does not pass any
      private data, so this bug does not break any in-tree codes.
      
      Link: https://lkml.kernel.org/r/20211029032638.84884-1-songmuchun@bytedance.com
      
      
      Fixes: 97a32539 ("proc: convert everything to "struct proc_ops"")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Florent Revest <revest@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b5703462
    • Dan Carpenter's avatar
      gve: Fix off by one in gve_tx_timeout() · 4af0cd17
      Dan Carpenter authored
      
      [ Upstream commit 1c360cc1 ]
      
      The priv->ntfy_blocks[] has "priv->num_ntfy_blks" elements so this >
      needs to be >= to prevent an off by one bug.  The priv->ntfy_blocks[]
      array is allocated in gve_alloc_notify_blocks().
      
      Fixes: 87a7f321 ("gve: Recover from queue stall due to missed IRQ")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4af0cd17
    • John Fastabend's avatar
      bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding · c842a4c4
      John Fastabend authored
      
      [ Upstream commit e0dc3b93 ]
      
      Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling
      progress, e.g. offset and length of the skb. First this is poorly named and
      inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at
      this layer.
      
      But, more importantly strparser is using the following to access its metadata.
      
        (struct _strp_msg *)((void *)skb->cb + offsetof(struct qdisc_skb_cb, data))
      
      Where _strp_msg is defined as:
      
        struct _strp_msg {
              struct strp_msg            strp;                 /*     0     8 */
              int                        accum_len;            /*     8     4 */
      
              /* size: 12, cachelines: 1, members: 2 */
              /* last cacheline: 12 bytes */
        };
      
      So we use 12 bytes of ->data[] in struct. However in BPF code running parser
      and verdict the user has read capabilities into the data[] array as well. Its
      not too problematic, but we should not be exposing internal state to BPF
      program. If its really needed then we can use the probe_read() APIs which allow
      reading kernel memory. And I don't believe cb[] layer poses any API breakage by
      moving this around because programs can't depend on cb[] across layers.
      
      In order to fix another issue with a ctx rewrite we need to stash a temp
      variable somewhere. To make this work cleanly this patch builds a cb struct
      for sk_skb types called sk_skb_cb struct. Then we can use this consistently
      in the strparser, sockmap space. Additionally we can start allowing ->cb[]
      write access after this.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJussi Maki <joamaki@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c842a4c4
    • John Fastabend's avatar
      bpf, sockmap: Remove unhash handler for BPF sockmap usage · 8b5c98a6
      John Fastabend authored
      
      [ Upstream commit b8b8315e ]
      
      We do not need to handle unhash from BPF side we can simply wait for the
      close to happen. The original concern was a socket could transition from
      ESTABLISHED state to a new state while the BPF hook was still attached.
      But, we convinced ourself this is no longer possible and we also improved
      BPF sockmap to handle listen sockets so this is no longer a problem.
      
      More importantly though there are cases where unhash is called when data is
      in the receive queue. The BPF unhash logic will flush this data which is
      wrong. To be correct it should keep the data in the receive queue and allow
      a receiving application to continue reading the data. This may happen when
      tcp_abort() is received for example. Instead of complicating the logic in
      unhash simply moving all this to tcp_close() hook solves this.
      
      Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarJussi Maki <joamaki@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8b5c98a6
Loading