Skip to content
Snippets Groups Projects
  1. Dec 01, 2021
    • Alexander Mikhalitsyn's avatar
      shm: extend forced shm destroy to support objects from several IPC nses · a15261d2
      Alexander Mikhalitsyn authored
      commit 85b6d246 upstream.
      
      Currently, the exit_shm() function not designed to work properly when
      task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
      
      This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
      leads to use-after-free (reproducer exists).
      
      This is an attempt to fix the problem by extending exit_shm mechanism to
      handle shm's destroy from several IPC ns'es.
      
      To achieve that we do several things:
      
      1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
      
      2. during new shm object creation (newseg()/shmget syscall) we
         initialize this pointer by current task IPC ns
      
      3. exit_shm() fully reworked such that it traverses over all shp's in
         task->sysvshm.shm_clist and gets IPC namespace not from current task
         as it was before but from shp's object itself, then call
         shm_destroy(shp, ns).
      
      Note: We need to be really careful here, because as it was said before
      (1), our pointer to IPC ns non-refcnt'ed.  To be on the safe side we
      using special helper get_ipc_ns_not_zero() which allows to get IPC ns
      refcounter only if IPC ns not in the "state of destruction".
      
      Q/A
      
      Q: Why can we access shp->ns memory using non-refcounted pointer?
      A: Because shp object lifetime is always shorther than IPC namespace
         lifetime, so, if we get shp object from the task->sysvshm.shm_clist
         while holding task_lock(task) nobody can steal our namespace.
      
      Q: Does this patch change semantics of unshare/setns/clone syscalls?
      A: No. It's just fixes non-covered case when process may leave IPC
         namespace without getting task->sysvshm.shm_clist list cleaned up.
      
      Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
      Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
      
      
      Fixes: ab602f79 ("shm: make exit_shm work proportional to task activity")
      Co-developed-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a15261d2
  2. Nov 26, 2021
    • Alexander Mikhalitsyn's avatar
      ipc: WARN if trying to remove ipc object which is absent · 99032adf
      Alexander Mikhalitsyn authored
      commit 126e8bee upstream.
      
      Patch series "shm: shm_rmid_forced feature fixes".
      
      Some time ago I met kernel crash after CRIU restore procedure,
      fortunately, it was CRIU restore, so, I had dump files and could do
      restore many times and crash reproduced easily.  After some
      investigation I've constructed the minimal reproducer.  It was found
      that it's use-after-free and it happens only if sysctl
      kernel.shm_rmid_forced = 1.
      
      The key of the problem is that the exit_shm() function not handles shp's
      object destroy when task->sysvshm.shm_clist contains items from
      different IPC namespaces.  In most cases this list will contain only
      items from one IPC namespace.
      
      How can this list contain object from different namespaces? The
      exit_shm() function is designed to clean up this list always when
      process leaves IPC namespace.  But we made a mistake a long time ago and
      did not add a exit_shm() call into the setns() syscall procedures.
      
      The first idea was just to add this call to setns() syscall but it
      obviously changes semantics of setns() syscall and that's
      userspace-visible change.  So, I gave up on this idea.
      
      The first real attempt to address the issue was just to omit forced
      destroy if we meet shp object not from current task IPC namespace [1].
      But that was not the best idea because task->sysvshm.shm_clist was
      protected by rwsem which belongs to current task IPC namespace.  It
      means that list corruption may occur.
      
      Second approach is just extend exit_shm() to properly handle shp's from
      different IPC namespaces [2].  This is really non-trivial thing, I've
      put a lot of effort into that but not believed that it's possible to
      make it fully safe, clean and clear.
      
      Thanks to the efforts of Manfred Spraul working an elegant solution was
      designed.  Thanks a lot, Manfred!
      
      Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
      In shm_exit destroy all created and never attached segments") Eric's
      idea was to maintain a list of shm_clists one per IPC namespace, use
      lock-less lists.  But there is some extra memory consumption-related
      concerns.
      
      An alternative solution which was suggested by me was implemented in
      ("shm: reset shm_clist on setns but omit forced shm destroy").  The idea
      is pretty simple, we add exit_shm() syscall to setns() but DO NOT
      destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
      clean up the task->sysvshm.shm_clist list.
      
      This chages semantics of setns() syscall a little bit but in comparision
      to the "naive" solution when we just add exit_shm() without any special
      exclusions this looks like a safer option.
      
      [1] https://lkml.org/lkml/2021/7/6/1108
      [2] https://lkml.org/lkml/2021/7/14/736
      
      This patch (of 2):
      
      Let's produce a warning if we trying to remove non-existing IPC object
      from IPC namespace kht/idr structures.
      
      This allows us to catch possible bugs when the ipc_rmid() function was
      called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
      arguments.
      
      Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@virtuozzo.com
      Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@virtuozzo.com
      
      
      Co-developed-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99032adf
  3. May 26, 2021
    • Varad Gautam's avatar
      ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry · 4528c0c3
      Varad Gautam authored
      commit a11ddb37 upstream.
      
      do_mq_timedreceive calls wq_sleep with a stack local address.  The
      sender (do_mq_timedsend) uses this address to later call pipelined_send.
      
      This leads to a very hard to trigger race where a do_mq_timedreceive
      call might return and leave do_mq_timedsend to rely on an invalid
      address, causing the following crash:
      
        RIP: 0010:wake_q_add_safe+0x13/0x60
        Call Trace:
         __x64_sys_mq_timedsend+0x2a9/0x490
         do_syscall_64+0x80/0x680
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f5928e40343
      
      The race occurs as:
      
      1. do_mq_timedreceive calls wq_sleep with the address of `struct
         ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
         holds a valid `struct ext_wait_queue *` as long as the stack has not
         been overwritten.
      
      2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
         do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
         __pipelined_op.
      
      3. Sender calls __pipelined_op::smp_store_release(&this->state,
         STATE_READY).  Here is where the race window begins.  (`this` is
         `ewq_addr`.)
      
      4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
         will see `state == STATE_READY` and break.
      
      5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
         to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
         stack.  (Although the address may not get overwritten until another
         function happens to touch it, which means it can persist around for an
         indefinite time.)
      
      6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
         `struct ext_wait_queue *`, and uses it to find a task_struct to pass to
         the wake_q_add_safe call.  In the lucky case where nothing has
         overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
         In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
         bogus address as the receiver's task_struct causing the crash.
      
      do_mq_timedsend::__pipelined_op() should not dereference `this` after
      setting STATE_READY, as the receiver counterpart is now free to return.
      Change __pipelined_op to call wake_q_add_safe on the receiver's
      task_struct returned by get_task_struct, instead of dereferencing `this`
      which sits on the receiver's stack.
      
      As Manfred pointed out, the race potentially also exists in
      ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare.  Fix
      those in the same way.
      
      Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
      
      
      Fixes: c5b2cbdb ("ipc/mqueue.c: update/document memory barriers")
      Fixes: 8116b54e ("ipc/sem.c: document and update memory barriers")
      Fixes: 0d97a82b ("ipc/msg.c: update and document memory barriers")
      Signed-off-by: default avatarVarad Gautam <varad.gautam@suse.com>
      Reported-by: default avatarMatthias von Faber <matthias.vonfaber@aox-tech.de>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4528c0c3
  4. Sep 05, 2020
    • Tobias Klauser's avatar
      ipc: adjust proc_ipc_sem_dointvec definition to match prototype · fff1662c
      Tobias Klauser authored
      
      Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
      signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
      fixes the following sparse error/warning:
      
        ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
        ipc/ipc_sysctl.c:94:47:    expected void *buffer
        ipc/ipc_sysctl.c:94:47:    got void [noderef] __user *buffer
        ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
        ipc/ipc_sysctl.c:194:35:    expected int ( [usertype] *proc_handler )( ... )
        ipc/ipc_sysctl.c:194:35:    got int ( * )( ... )
      
      Fixes: 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv...
      fff1662c
  5. Aug 23, 2020
  6. Aug 12, 2020
  7. Aug 07, 2020
  8. Jun 09, 2020
  9. Jun 08, 2020
  10. May 14, 2020
    • Vasily Averin's avatar
      ipc/util.c: sysvipc_find_ipc() incorrectly updates position index · 5e698222
      Vasily Averin authored
      
      Commit 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase
      position index") is causing this bug (seen on 5.6.8):
      
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
      
         # ipcmk -Q
         Message queue id: 0
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x82db8127 0          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 1
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x82db8127 0          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcrm -q 0
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x76d1fb2a 1          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 2
         # ipcrm -q 2
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x76d1fb2a 1          root       644        0            0
         0x76d1fb2a 1          root       644        0            0
      
         # ipcmk -Q
         Message queue id: 3
         # ipcrm -q 1
         # ipcs -q
      
         ------ Message Queues --------
         key        msqid      owner      perms      used-bytes   messages
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
         0x7c982867 3          root       644        0            0
      
      Whenever an IPC item with a low id is deleted, the items with higher ids
      are duplicated, as if filling a hole.
      
      new_pos should jump through hole of unused ids, pos can be updated
      inside "for" cycle.
      
      Fixes: 89163f93 ("ipc/util.c: sysvipc_find_ipc() should increase position index")
      Reported-by: default avatarAndreas Schwab <schwab@suse.de>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/4921fe9b-9385-a2b4-1dc4-1099be6d2e39@virtuozzo.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e698222
  11. May 09, 2020
    • Christian Brauner's avatar
      nsproxy: add struct nsset · f2a8d52e
      Christian Brauner authored
      
      Add a simple struct nsset. It holds all necessary pieces to switch to a new
      set of namespaces without leaving a task in a half-switched state which we
      will make use of in the next patch. This patch switches the existing setns
      logic over without causing a change in setns() behavior. This brings
      setns() closer to how unshare() works(). The prepare_ns() function is
      responsible to prepare all necessary information. This has two reasons.
      First it minimizes dependencies between individual namespaces, i.e. all
      install handler can expect that all fields are properly initialized
      independent in what order they are called in. Second, this makes the code
      easier to maintain and easier to follow if it needs to be changed.
      
      The prepare_ns() helper will only be switched over to use a flags argument
      in the next patch. Here it will still use nstype as a simple integer
      argument which was argued would be clearer. I'm not particularly
      opinionated about this if it really helps or not. The struct nsset itself
      already contains the flags field since its name already indicates that it
      can contain information required by different namespaces. None of this
      should have functional consequences.
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
      f2a8d52e
  12. May 08, 2020
  13. Apr 27, 2020
  14. Apr 10, 2020
  15. Apr 07, 2020
  16. Feb 21, 2020
    • Ioanna Alifieraki's avatar
      Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" · edf28f40
      Ioanna Alifieraki authored
      This reverts commit a9795584.
      
      Commit a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage
      in exit_sem()") removes a lock that is needed.  This leads to a process
      looping infinitely in exit_sem() and can also lead to a crash.  There is
      a reproducer available in [1] and with the commit reverted the issue
      does not reproduce anymore.
      
      Using the reproducer found in [1] is fairly easy to reach a point where
      one of the child processes is looping infinitely in exit_sem between
      for(;;) and if (semid == -1) block, while it's trying to free its last
      sem_undo structure which has already been freed by freeary().
      
      Each sem_undo struct is on two lists: one per semaphore set (list_id)
      and one per process (list_proc).  The list_id list tracks undos by
      semaphore set, and the list_proc by process.
      
      Undo structures are removed either by freeary() or by exit_sem().  The
      freeary function is invoked when the user invokes a syscall to remove a
      semaphore set.  During this operation freeary() traverses the list_id
      associated with the semaphore set and removes the undo structures from
      both the list_id and list_proc lists.
      
      For this case, exit_sem() is called at process exit.  Each process
      contains a struct sem_undo_list (referred to as "ulp") which contains
      the head for the list_proc list.  When the process exits, exit_sem()
      traverses this list to remove each sem_undo struct.  As in freeary(),
      whenever a sem_undo struct is removed from list_proc, it is also removed
      from the list_id list.
      
      Removing elements from list_id is safe for both exit_sem() and freeary()
      due to sem_lock().  Removing elements from list_proc is not safe;
      freeary() locks &un->ulp->lock when it performs
      list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
      removed by commit a9795584 ("ipc,sem: remove uneeded sem_undo_list
      lock usage in exit_sem()").
      
      This can result in the following situation while executing the
      reproducer [1] : Consider a child process in exit_sem() and the parent
      in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).
      
       - The list_proc for the child contains the last two undo structs A and
         B (the rest have been removed either by exit_sem() or freeary()).
      
       - The semid for A is 1 and semid for B is 2.
      
       - exit_sem() removes A and at the same time freeary() removes B.
      
       - Since A and B have different semid sem_lock() will acquire different
         locks for each process and both can proceed.
      
      The bug is that they remove A and B from the same list_proc at the same
      time because only freeary() acquires the ulp lock. When exit_sem()
      removes A it makes ulp->list_proc.next to point at B and at the same
      time freeary() removes B setting B->semid=-1.
      
      At the next iteration of for(;;) loop exit_sem() will try to remove B.
      
      The only way to break from for(;;) is for (&un->list_proc ==
      &ulp->list_proc) to be true which is not. Then exit_sem() will check if
      B->semid=-1 which is and will continue looping in for(;;) until the
      memory for B is reallocated and the value at B->semid is changed.
      
      At that point, exit_sem() will crash attempting to unlink B from the
      lists (this can be easily triggered by running the reproducer [1] a
      second time).
      
      To prove this scenario instrumentation was added to keep information
      about each sem_undo (un) struct that is removed per process and per
      semaphore set (sma).
      
                CPU0                                CPU1
        [caller holds sem_lock(sma for A)]      ...
        freeary()                               exit_sem()
        ...                                     ...
        ...                                     sem_lock(sma for B)
        spin_lock(A->ulp->lock)                 ...
        list_del_rcu(un_A->list_proc)           list_del_rcu(un_B->list_proc)
      
      Undo structures A and B have different semid and sem_lock() operations
      proceed.  However they belong to the same list_proc list and they are
      removed at the same time.  This results into ulp->list_proc.next
      pointing to the address of B which is already removed.
      
      After reverting commit a9795584 ("ipc,sem: remove uneeded
      sem_undo_list lock usage in exit_sem()") the issue was no longer
      reproducible.
      
      [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
      
      Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
      
      
      Fixes: a9795584 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
      Signed-off-by: default avatarIoanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <malat@debian.org>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      edf28f40
  17. Feb 04, 2020
  18. Dec 09, 2019
  19. Nov 15, 2019
  20. Sep 26, 2019
  21. Sep 07, 2019
    • Arnd Bergmann's avatar
      ipc: fix sparc64 ipc() wrapper · fb377eb8
      Arnd Bergmann authored
      
      Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
      to a commit from my y2038 series in linux-5.1, as I missed the custom
      sys_ipc() wrapper that sparc64 uses in place of the generic version that
      I patched.
      
      The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
      now do not allow being called with the IPC_64 flag any more, resulting
      in a -EINVAL error when they don't recognize the command.
      
      Instead, the correct way to do this now is to call the internal
      ksys_old_{sem,shm,msg}ctl() functions to select the API version.
      
      As we generally move towards these functions anyway, change all of
      sparc_ipc() to consistently use those in place of the sys_*() versions,
      and move the required ksys_*() declarations into linux/syscalls.h
      
      The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
      errors when ipc is disabled.
      
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Fixes: 275f2214 ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMatt Turner <mattst88@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      fb377eb8
  22. Sep 05, 2019
  23. Jul 19, 2019
  24. Jul 17, 2019
    • Kees Cook's avatar
      ipc/mqueue.c: only perform resource calculation if user valid · a318f12e
      Kees Cook authored
      Andreas Christoforou reported:
      
        UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
        9 * 2305843009213693951 cannot be represented in type 'long int'
        ...
        Call Trace:
          mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
          evict+0x472/0x8c0 fs/inode.c:558
          iput_final fs/inode.c:1547 [inline]
          iput+0x51d/0x8c0 fs/inode.c:1573
          mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
          mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
          vfs_mkobj+0x39e/0x580 fs/namei.c:2892
          prepare_open ipc/mqueue.c:731 [inline]
          do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
      
      Which could be triggered by:
      
              struct mq_attr attr = {
                      .mq_flags = 0,
                      .mq_maxmsg = 9,
                      .mq_msgsize = 0x1fffffffffffffff,
                      .mq_curmsgs = 0,
              };
      
              if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
                      perror("mq_open");
      
      mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
      preparing to return -EINVAL.  During the cleanup, it calls
      mqueue_evict_inode() which performed resource usage tracking math for
      updating "user", before checking if there was a valid "user" at all
      (which would indicate that the calculations would be sane).  Instead,
      delay this check to after seeing a valid "user".
      
      The overflow was real, but the results went unused, so while the flaw is
      harmless, it's noisy for kernel fuzzers, so just fix it by moving the
      calculation under the non-NULL "user" where it actually gets used.
      
      Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarAndreas Christoforou <andreaschristofo@gmail.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a318f12e
  25. Jun 05, 2019
  26. May 25, 2019
  27. May 24, 2019
  28. May 15, 2019
    • Manfred Spraul's avatar
      ipc: do cyclic id allocation for the ipc object. · 99db46ea
      Manfred Spraul authored
      For ipcmni_extend mode, the sequence number space is only 7 bits.  So
      the chance of id reuse is relatively high compared with the non-extended
      mode.
      
      To alleviate this id reuse problem, this patch enables cyclic allocation
      for the index to the radix tree (idx).  The disadvantage is that this
      can cause a slight slow-down of the fast path, as the radix tree could
      be higher than necessary.
      
      To limit the radix tree height, I have chosen the following limits:
       1) The cycling is done over in_use*1.5.
       2) At least, the cycling is done over
         "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
         "ipcmni_extended": 4096 elements
      
      Result:
      - for normal mode:
      	No change for <= 42 active ipc elements. With more than 42
      	active ipc elements, a 2nd level would be added to the radix
      	tree.
      	Without cyclic allocation, a 2nd level would be added only with
      	more than 63 active elements.
      
      - for extended mode:
      	Cycling creates always at least a 2-level radix tree.
      	With more than 2730 active objects, a 3rd level would be
      	added, instead of > 4095 active objects until the 3rd level
      	is added without cyclic allocation.
      
      For a 2-level radix tree compared to a 1-level radix tree, I have
      observed < 1% performance impact.
      
      Notes:
      1) Normal "x=semget();y=semget();" is unaffected: Then the idx
        is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
        is used.
      
      2) The -1% happens in a microbenchmark after this situation:
      	x=semget();
      	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
      	y=semget();
      	Now perform semget calls on x and y that do not sleep.
      
      3) The worst-case reuse cycle time is unfortunately unaffected:
         If you have 2^24-1 ipc objects allocated, and get/remove the last
         possible element in a loop, then the id is reused after 128
         get/remove pairs.
      
      Performance check:
      A microbenchmark that performes no-op semop() randomly on two IDs,
      with only these two IDs allocated.
      The IDs were set using /proc/sys/kernel/sem_next_id.
      The test was run 5 times, averages are shown.
      
      1 & 2: Base (6.22 seconds for 10.000.000 semops)
      1 & 40: -0.2%
      1 & 3348: - 0.8%
      1 & 27348: - 1.6%
      1 & 15777204: - 3.2%
      
      Or: ~12.6 cpu cycles per additional radix tree level.
      The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
      than what I remember (spectre impact?).
      
      V2 of the patch:
      - use "min" and "max"
      - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
      	(2<<12).
      
      [akpm@linux-foundation.org: fix max() warning]
      Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.com
      
      
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99db46ea
    • Manfred Spraul's avatar
      ipc: conserve sequence numbers in ipcmni_extend mode · 3278a2c2
      Manfred Spraul authored
      Rewrite, based on the patch from Waiman Long:
      
      The mixing in of a sequence number into the IPC IDs is probably to avoid
      ID reuse in userspace as much as possible.  With ipcmni_extend mode, the
      number of usable sequence numbers is greatly reduced leading to higher
      chance of ID reuse.
      
      To address this issue, we need to conserve the sequence number space as
      much as possible.  Right now, the sequence number is incremented for
      every new ID created.  In reality, we only need to increment the
      sequence number when new allocated ID is not greater than the last one
      allocated.  It is in such case that the new ID may collide with an
      existing one.  This is being done irrespective of the ipcmni mode.
      
      In order to avoid any races, the index is first allocated and then the
      pointer is replaced.
      
      Changes compared to the initial patch:
       - Handle failures from idr_alloc().
       - Avoid that concurrent operations can see the wrong sequence number.
         (This is achieved by using idr_replace()).
       - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
         ipcmni_seq_shift().
       - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().
      
      Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com
      
      
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3278a2c2
Loading