Skip to content
  • Jann Horn's avatar
    mm/gup: fix try_grab_compound_head() race with split_huge_page() · 9109e157
    Jann Horn authored
    commit c24d3732 upstream.
    
    try_grab_compound_head() is used to grab a reference to a page from
    get_user_pages_fast(), which is only protected against concurrent freeing
    of page tables (via local_irq_save()), but not against concurrent TLB
    flushes, freeing of data pages, or splitting of compound pages.
    
    Because no reference is held to the page when try_grab_compound_head() is
    called, the page may have been freed and reallocated by the time its
    refcount has been elevated; therefore, once we're holding a stable
    reference to the page, the caller re-checks whether the PTE still points
    to the same page (with the same access rights).
    
    The problem is that try_grab_compound_head() has to grab a reference on
    the head page; but between the time we look up what the head page is and
    the time we actually grab a reference on the head page, the compound page
    may have been split up (either explicitly through split_huge_page() or by
    freeing the compound page to the buddy allocator and then allocating its
    individual order-0 pages).  If that happens, get_user_pages_fast() may end
    up returning the right page but lifting the refcount on a now-unrelated
    page, leading to use-after-free of pages.
    
    To fix it: Re-check whether the pages still belong together after lifting
    the refcount on the head page.  Move anything else that checks
    compound_head(page) below the refcount increment.
    
    This can't actually happen on bare-metal x86 (because there, disabling
    IRQs locks out remote TLB flushes), but it can happen on virtualized x86
    (e.g.  under KVM) and probably also on arm64.  The race window is pretty
    narrow, and constantly allocating and shattering hugepages isn't exactly
    fast; for now I've only managed to reproduce this in an x86 KVM guest with
    an artificially widened timing window (by adding a loop that repeatedly
    calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
    that PV TLB flushes are used instead of IPIs).
    
    As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
    a warning and bailout.  Since the existing code only performed the BUG_ON
    check on DEBUG_VM kernels, ensure that the new code also only performs the
    check under that configuration - I don't want to mix two logically
    separate changes together too much.  The macro VM_WARN_ON_ONCE_PAGE()
    doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
    block.  An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
    definition for !DEBUG_VM such that it always returns false, but since that
    would differ from the behavior of the normal WARN macros, it might be too
    confusing for readers.
    
    Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
    
    
    Fixes: 7aef4172 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
    Signed-off-by: default avatarJann Horn <jannh@google.com>
    Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Kirill A. Shutemov <kirill@shutemov.name>
    Cc: Jan Kara <jack@suse.cz>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    9109e157
Loading