Skip to content
Snippets Groups Projects
  1. Feb 10, 2023
    • Li Zhijian's avatar
      mm/memremap.c: fix outdated comment in devm_memremap_pages · 223ec6ab
      Li Zhijian authored
      commit a4574f63 ("mm/memremap_pages: convert to 'struct range'")
      converted res to range, update the comment correspondingly.
      
      Link: https://lkml.kernel.org/r/1675751220-2-1-git-send-email-lizhijian@fujitsu.com
      
      
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      223ec6ab
    • Suren Baghdasaryan's avatar
      mm: introduce __vm_flags_mod and use it in untrack_pfn · 68f48381
      Suren Baghdasaryan authored
      There are scenarios when vm_flags can be modified without exclusive
      mmap_lock, such as:
      - after VMA was isolated and mmap_lock was downgraded or dropped
      - in exit_mmap when there are no other mm users and locking is unnecessary
      Introduce __vm_flags_mod to avoid assertions when the caller takes
      responsibility for the required locking.
      Pass a hint to untrack_pfn to conditionally use __vm_flags_mod for
      flags modification to avoid assertion.
      
      Link: https://lkml.kernel.org/r/20230126193752.297968-7-surenb@google.com
      
      
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjun Roy <arjunroy@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@google.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Peter Oskolkov <posk@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Punit Agrawal <punit.agrawal@bytedance.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      68f48381
  2. Nov 08, 2022
  3. Oct 13, 2022
    • Alistair Popple's avatar
      mm/memremap.c: take a pgmap reference on page allocation · 0dc45ca1
      Alistair Popple authored
      ZONE_DEVICE pages have a struct dev_pagemap which is allocated by a
      driver.  When the struct page is first allocated by the kernel in
      memremap_pages() a reference is taken on the associated pagemap to ensure
      it is not freed prior to the pages being freed.
      
      Prior to 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") pages were considered free and returned to the driver when the
      reference count dropped to one.  However the pagemap reference was not
      dropped until the page reference count hit zero.  This would occur as part
      of the final put_page() in memunmap_pages() which would wait for all pages
      to be freed prior to returning.
      
      When the extra refcount was removed the pagemap reference was no longer
      being dropped in put_page().  Instead memunmap_pages() was changed to
      explicitly drop the pagemap references.  This means that memunmap_pages()
      can complete even though pages are still mapped by the kernel which can
      lead to kernel crashes, particularly if a driver frees the pagemap.
      
      To fix this drivers should take a pagemap reference when allocating the
      page.  This reference can then be returned when the page is freed.
      
      Link: https://lkml.kernel.org/r/12d155ec727935ebfbb4d639a03ab374917ea51b.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Fixes: 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page refcount")
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0dc45ca1
    • Alistair Popple's avatar
      mm: free device private pages have zero refcount · ef233450
      Alistair Popple authored
      Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") device private pages have no longer had an extra reference
      count when the page is in use.  However before handing them back to the
      owning device driver we add an extra reference count such that free pages
      have a reference count of one.
      
      This makes it difficult to tell if a page is free or not because both free
      and in use pages will have a non-zero refcount.  Instead we should return
      pages to the drivers page allocator with a zero reference count.  Kernel
      code can then safely use kernel functions such as get_page_unless_zero().
      
      Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef233450
  4. Sep 12, 2022
  5. Jul 18, 2022
  6. Jun 17, 2022
  7. Jun 01, 2022
  8. May 10, 2022
    • David Hildenbrand's avatar
      mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages · 78fbe906
      David Hildenbrand authored
      The basic question we would like to have a reliable and efficient answer
      to is: is this anonymous page exclusive to a single process or might it be
      shared?  We need that information for ordinary/single pages, hugetlb
      pages, and possibly each subpage of a THP.
      
      Introduce a way to mark an anonymous page as exclusive, with the ultimate
      goal of teaching our COW logic to not do "wrong COWs", whereby GUP pins
      lose consistency with the pages mapped into the page table, resulting in
      reported memory corruptions.
      
      Most pageflags already have semantics for anonymous pages, however,
      PG_mappedtodisk should never apply to pages in the swapcache, so let's
      reuse that flag.
      
      As PG_has_hwpoisoned also uses that flag on the second tail page of a
      compound page, convert it to PG_error instead, which is marked as
      PF_NO_TAIL, so never used for tail pages.
      
      Use custom page flag modification functions such that we can do additional
      sanity checks.  The semantics we'll put into some kernel doc in the future
      are:
      
      "
        PG_anon_exclusive is *usually* only expressive in combination with a
        page table entry. Depending on the page table entry type it might
        store the following information:
      
             Is what's mapped via this page table entry exclusive to the
             single process and can be mapped writable without further
             checks? If not, it might be shared and we might have to COW.
      
        For now, we only expect PTE-mapped THPs to make use of
        PG_anon_exclusive in subpages. For other anonymous compound
        folios (i.e., hugetlb), only the head page is logically mapped and
        holds this information.
      
        For example, an exclusive, PMD-mapped THP only has PG_anon_exclusive
        set on the head page. When replacing the PMD by a page table full
        of PTEs, PG_anon_exclusive, if set on the head page, will be set on
        all tail pages accordingly. Note that converting from a PTE-mapping
        to a PMD mapping using the same compound page is currently not
        possible and consequently doesn't require care.
      
        If GUP wants to take a reliable pin (FOLL_PIN) on an anonymous page,
        it should only pin if the relevant PG_anon_exclusive is set. In that
        case, the pin will be fully reliable and stay consistent with the pages
        mapped into the page table, as the bit cannot get cleared (e.g., by
        fork(), KSM) while the page is pinned. For anonymous pages that
        are mapped R/W, PG_anon_exclusive can be assumed to always be set
        because such pages cannot possibly be shared.
      
        The page table lock protecting the page table entry is the primary
        synchronization mechanism for PG_anon_exclusive; GUP-fast that does
        not take the PT lock needs special care when trying to clear the
        flag.
      
        Page table entry types and PG_anon_exclusive:
        * Present: PG_anon_exclusive applies.
        * Swap: the information is lost. PG_anon_exclusive was cleared.
        * Migration: the entry holds this information instead.
                     PG_anon_exclusive was cleared.
        * Device private: PG_anon_exclusive applies.
        * Device exclusive: PG_anon_exclusive applies.
        * HW Poison: PG_anon_exclusive is stale and not changed.
      
        If the page may be pinned (FOLL_PIN), clearing PG_anon_exclusive is
        not allowed and the flag will stick around until the page is freed
        and folio->mapping is cleared.
      "
      
      We won't be clearing PG_anon_exclusive on destructive unmapping (i.e.,
      zapping) of page table entries, page freeing code will handle that when
      also invalidate page->mapping to not indicate PageAnon() anymore.  Letting
      information about exclusivity stick around will be an important property
      when adding sanity checks to unpinning code.
      
      Note that we properly clear the flag in free_pages_prepare() via
      PAGE_FLAGS_CHECK_AT_PREP for each individual subpage of a compound page,
      so there is no need to manually clear the flag.
      
      Link: https://lkml.kernel.org/r/20220428083441.37290-12-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78fbe906
  9. Apr 29, 2022
    • Joao Martins's avatar
      mm/sparse-vmemmap: improve memory savings for compound devmaps · 4917f55b
      Joao Martins authored
      A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means
      that pages are mapped at a given huge page alignment and utilize uses
      compound pages as opposed to order-0 pages.
      
      Take advantage of the fact that most tail pages look the same (except the
      first two) to minimize struct page overhead.  Allocate a separate page for
      the vmemmap area which contains the head page and separate for the next 64
      pages.  The rest of the subsections then reuse this tail vmemmap page to
      initialize the rest of the tail pages.
      
      Sections are arch-dependent (e.g.  on x86 it's 64M, 128M or 512M) and when
      initializing compound devmap with big enough @vmemmap_shift (e.g.  1G PUD)
      it may cross multiple sections.  The vmemmap code needs to consult @pgmap
      so that multiple sections that all map the same tail data can refer back
      to the first copy of that data for a given gigantic page.
      
      On compound devmaps with 2M align, this mechanism lets 6 pages be saved
      out of the 8 necessary PFNs necessary to set the subsection's 512 struct
      pages being mapped.  On a 1G compound devmap it saves 4094 pages.
      
      Altmap isn't supported yet, given various restrictions in altmap pfn
      allocator, thus fallback to the already in use vmemmap_populate().  It is
      worth noting that altmap for devmap mappings was there to relieve the
      pressure of inordinate amounts of memmap space to map terabytes of pmem. 
      With compound pages the motivation for altmaps for pmem gets reduced.
      
      Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.com
      
      
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4917f55b
  10. Mar 25, 2022
  11. Mar 22, 2022
  12. Mar 03, 2022
  13. Jan 15, 2022
    • Joao Martins's avatar
      mm/memremap: add ZONE_DEVICE support for compound pages · c4386bd8
      Joao Martins authored
      Add a new @vmemmap_shift property for struct dev_pagemap which specifies
      that a devmap is composed of a set of compound pages of order
      @vmemmap_shift, instead of base pages.  When a compound page devmap is
      requested, all but the first page are initialised as tail pages instead
      of order-0 pages.
      
      For certain ZONE_DEVICE users like device-dax which have a fixed page
      size, this creates an opportunity to optimize GUP and GUP-fast walkers,
      treating it the same way as THP or hugetlb pages.
      
      Additionally, commit 7118fc29 ("hugetlb: address ref count racing in
      prep_compound_gigantic_page") removed set_page_count() because the
      setting of page ref count to zero was redundant.  devmap pages don't
      come from page allocator though and only head page refcount is used for
      compound pages, hence initialize tail page count to zero.
      
      Link: https://lkml.kernel.org/r/20211202204422.26777-5-joao.m.martins@oracle.com
      
      
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4386bd8
  14. Dec 04, 2021
  15. Sep 27, 2021
  16. Sep 08, 2021
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove nid parameter from arch_remove_memory() · 65a2aa5f
      David Hildenbrand authored
      The parameter is unused, let's remove it.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-3-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>	[s390]
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@ionos.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65a2aa5f
  17. Apr 30, 2021
  18. Feb 26, 2021
  19. Nov 02, 2020
  20. Oct 16, 2020
  21. Oct 14, 2020
    • Ira Weiny's avatar
      mm/memremap.c: convert devmap static branch to {inc,dec} · 433e7d31
      Ira Weiny authored
      While reviewing Protection Key Supervisor support it was pointed out that
      using a counter to track static branch enable was an anti-pattern which
      was better solved using the provided static_branch_{inc,dec} functions.[1]
      
      Fix up devmap_managed_key to work the same way.  Also this should be safer
      because there is a very small (very unlikely) race when multiple callers
      try to enable at the same time.
      
      [1] https://lore.kernel.org/lkml/20200714194031.GI5523@worktop.programming.kicks-ass.net/
      
      
      
      Signed-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Link: https://lkml.kernel.org/r/20200810235319.2796597-1-ira.weiny@intel.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      433e7d31
    • Yu Zhao's avatar
      mm: remove superfluous __ClearPageActive() · 6f4dd8de
      Yu Zhao authored
      
      To activate a page, mark_page_accessed() always holds a reference on it.
      It either gets a new reference when adding a page to
      lru_pvecs.activate_page or reuses an existing one it previously got when
      it added a page to lru_pvecs.lru_add.  So it doesn't call SetPageActive()
      on a page that doesn't have any reference left.  Therefore, the race is
      impossible these days (I didn't brother to dig into its history).
      
      For other paths, namely reclaim and migration, a reference count is always
      held while calling SetPageActive() on a page.
      
      SetPageSlabPfmemalloc() also uses SetPageActive(), but it's irrelevant to
      LRU pages.
      
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Link: http://lkml.kernel.org/r/20200818184704.3625199-2-yuzhao@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f4dd8de
    • Dan Williams's avatar
      mm/memremap_pages: support multiple ranges per invocation · b7b3c01b
      Dan Williams authored
      
      In support of device-dax growing the ability to front physically
      dis-contiguous ranges of memory, update devm_memremap_pages() to track
      multiple ranges with a single reference counter and devm instance.
      
      Convert all [devm_]memremap_pages() users to specify the number of ranges
      they are mapping in their 'struct dev_pagemap' instance.
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.co
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7b3c01b
    • Dan Williams's avatar
      mm/memremap_pages: convert to 'struct range' · a4574f63
      Dan Williams authored
      The 'struct resource' in 'struct dev_pagemap' is only used for holding
      resource span information.  The other fields, 'name', 'flags', 'desc',
      'parent', 'sibling', and 'child' are all unused wasted space.
      
      This is in preparation for introducing a multi-range extension of
      devm_memremap_pages().
      
      The bulk of this change is unwinding all the places internal to libnvdimm
      that used 'struct resource' unnecessarily, and replacing instances of
      'struct dev_pagemap'.res with 'struct dev_pagemap'.range.
      
      P2PDMA had a minor usage of the resource flags field, but only to report
      failures with "%pR".  That is replaced with an open coded print of the
      range.
      
      [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
        Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam
      
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4574f63
  22. Sep 04, 2020
  23. Apr 10, 2020
    • Logan Gunthorpe's avatar
      mm/memremap: set caching mode for PCI P2PDMA memory to WC · a50d8d98
      Logan Gunthorpe authored
      
      PCI BAR IO memory should never be mapped as WB, however prior to this
      the PAT bits were set WB and it was typically overridden by MTRR
      registers set by the firmware.
      
      Set PCI P2PDMA memory to be UC as this is what it currently, typically,
      ends up being mapped as on x86 after the MTRR registers override the
      cache setting.
      
      Future use-cases may need to generalize this by adding flags to select
      the caching type, as some P2PDMA cases may not want UC.  However, those
      use-cases are not upstream yet and this can be changed when they arrive.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-8-logang@deltatee.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a50d8d98
    • Logan Gunthorpe's avatar
      mm/memory_hotplug: add pgprot_t to mhp_params · bfeb022f
      Logan Gunthorpe authored
      
      devm_memremap_pages() is currently used by the PCI P2PDMA code to create
      struct page mappings for IO memory.  At present, these mappings are
      created with PAGE_KERNEL which implies setting the PAT bits to be WB.
      However, on x86, an mtrr register will typically override this and force
      the cache type to be UC-.  In the case firmware doesn't set this
      register it is effectively WB and will typically result in a machine
      check exception when it's accessed.
      
      Other arches are not currently likely to function correctly seeing they
      don't have any MTRR registers to fall back on.
      
      To solve this, provide a way to specify the pgprot value explicitly to
      arch_add_memory().
      
      Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
      simple change to pass the pgprot_t down to their respective functions
      which set up the page tables.  For x86_32, set the page tables
      explicitly using _set_memory_prot() (seeing they are already mapped).
      
      For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
      should be fine, for now, seeing these architectures don't support
      ZONE_DEVICE.
      
      A check in __add_pages() is also added to ensure the pgprot parameter
      was set for all arches.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfeb022f
    • Logan Gunthorpe's avatar
      mm/memory_hotplug: rename mhp_restrictions to mhp_params · f5637d3b
      Logan Gunthorpe authored
      
      The mhp_restrictions struct really doesn't specify anything resembling a
      restriction anymore so rename it to be mhp_params as it is a list of
      extended parameters.
      
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Badger <ebadger@gigaio.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5637d3b
  24. Mar 26, 2020
  25. Feb 21, 2020
    • Dan Williams's avatar
      mm/memremap_pages: Introduce memremap_compat_align() · 9ffc1d19
      Dan Williams authored
      The "sub-section memory hotplug" facility allows memremap_pages() users
      like libnvdimm to compensate for hardware platforms like x86 that have a
      section size larger than their hardware memory mapping granularity.  The
      compensation that sub-section support affords is being tolerant of
      physical memory resources shifting by units smaller (64MiB on x86) than
      the memory-hotplug section size (128 MiB). Where the platform
      physical-memory mapping granularity is limited by the number and
      capability of address-decode-registers in the memory controller.
      
      While the sub-section support allows memremap_pages() to operate on
      sub-section (2MiB) granularity, the Power architecture may still
      require 16MiB alignment on "!radix_enabled()" platforms.
      
      In order for libnvdimm to be able to detect and manage this per-arch
      limitation, introduce memremap_compat_align() as a common minimum
      alignment across all driver-facing memory-mapping interfaces, and let
      Power override it to 16MiB in the "!radix_enabled()" case.
      
      The assumption / requirement for 16MiB to be a viable
      memremap_compat_align() value is that Power does not have platforms
      where its equivalent of address-decode-registers never hardware remaps a
      persistent memory resource on smaller than 16MiB boundaries. Note that I
      tried my best to not add a new Kconfig symbol, but header include
      entanglements defeated the #ifndef memremap_compat_align design pattern
      and the need to export it defeats the __weak design pattern for arch
      overrides.
      
      Based on an initial patch by Aneesh.
      
      Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com
      
      
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      9ffc1d19
Loading