Skip to content
Snippets Groups Projects
  1. Feb 24, 2013
    • Cody P Schafer's avatar
      mm/page_alloc: add informative debugging message in page_outside_zone_boundaries() · b5e6a5a2
      Cody P Schafer authored
      
      Add a debug message which prints when a page is found outside of the
      boundaries of the zone it should belong to. Format is:
      	"page $pfn outside zone [ $start_pfn - $end_pfn ]"
      
      [akpm@linux-foundation.org: s/pr_debug/pr_err/]
      Signed-off-by: default avatarCody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Hansen <dave@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5e6a5a2
    • Cody P Schafer's avatar
      mm/page_alloc: add a VM_BUG in __free_one_page() if the zone is uninitialized. · d29bb978
      Cody P Schafer authored
      
      Freeing pages to uninitialized zones is not handled by
      __free_one_page(), and should never happen when the code is correct.
      
      Ran into this while writing some code that dynamically onlines extra
      zones.
      
      Signed-off-by: default avatarCody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Hansen <dave@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d29bb978
    • Cody P Schafer's avatar
      mm: add & use zone_end_pfn() and zone_spans_pfn() · 108bcc96
      Cody P Schafer authored
      
      Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code
      duplication.
      
      This also switches to using them in compaction (where an additional
      variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and
      kmemleak.
      
      Note that in compaction.c I avoid calling zone_end_pfn() repeatedly
      because I expect at some point the sycronization issues with start_pfn &
      spanned_pages will need fixing, either by actually using the seqlock or
      clever memory barrier usage.
      
      Signed-off-by: default avatarCody P Schafer <cody@linux.vnet.ibm.com>
      Cc: David Hansen <dave@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      108bcc96
    • Hugh Dickins's avatar
      mm: remove offlining arg to migrate_pages · 9c620e2b
      Hugh Dickins authored
      
      No functional change, but the only purpose of the offlining argument to
      migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
      KSM page for memory hotremove (which took ksm_thread_mutex) but not for
      other callers.  Now all cases are safe, remove the arg.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c620e2b
    • Mel Gorman's avatar
      mm: rename page struct field helpers · 22b751c3
      Mel Gorman authored
      
      The function names page_xchg_last_nid(), page_last_nid() and
      reset_page_last_nid() were judged to be inconsistent so rename them to a
      struct_field_op style pattern.  As it looked jarring to have
      reset_page_mapcount() and page_nid_reset_last() beside each other in
      memmap_init_zone(), this patch also renames reset_page_mapcount() to
      page_mapcount_reset().  There are others like init_page_count() but as
      it is used throughout the arch code a rename would likely cause more
      conflicts than it is worth.
      
      [akpm@linux-foundation.org: fix zcache]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22b751c3
    • Ming Lei's avatar
      mm: teach mm by current context info to not do I/O during memory allocation · 21caf2fc
      Ming Lei authored
      
      This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
      'struct task_struct'), so that the flag can be set by one task to avoid
      doing I/O inside memory allocation in the task's context.
      
      The patch trys to solve one deadlock problem caused by block device, and
      the problem may happen at least in the below situations:
      
      - during block device runtime resume, if memory allocation with
        GFP_KERNEL is called inside runtime resume callback of any one of its
        ancestors(or the block device itself), the deadlock may be triggered
        inside the memory allocation since it might not complete until the block
        device becomes active and the involed page I/O finishes.  The situation
        is pointed out first by Alan Stern.  It is not a good approach to
        convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
        subsystems may be involved(for example, PCI, USB and SCSI may be
        involved for usb mass stoarage device, network devices involved too in
        the iSCSI case)
      
      - during block device runtime suspend, because runtime resume need to
        wait for completion of concurrent runtime suspend.
      
      - during error handling of usb mass storage deivce, USB bus reset will
        be put on the device, so there shouldn't have any memory allocation with
        GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
        above may be triggered.  Unfortunately, any usb device may include one
        mass storage interface in theory, so it requires all usb interface
        drivers to handle the situation.  In fact, most usb drivers don't know
        how to handle bus reset on the device and don't provide .pre_set() and
        .post_reset() callback at all, so USB core has to unbind and bind driver
        for these devices.  So it is still not practical to resort to GFP_NOIO
        for solving the problem.
      
      Also the introduced solution can be used by block subsystem or block
      drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
      actual I/O transfer.
      
      It is not a good idea to convert all these GFP_KERNEL in the affected
      path into GFP_NOIO because these functions doing that may be implemented
      as library and will be called in many other contexts.
      
      In fact, memalloc_noio_flags() can convert some of current static
      GFP_NOIO allocation into GFP_KERNEL back in other non-affected contexts,
      at least almost all GFP_NOIO in USB subsystem can be converted into
      GFP_KERNEL after applying the approach and make allocation with GFP_NOIO
      only happen in runtime resume/bus reset/block I/O transfer contexts
      generally.
      
      [1], several GFP_KERNEL allocation examples in runtime resume path
      
      - pci subsystem
      acpi_os_allocate
      	<-acpi_ut_allocate
      		<-ACPI_ALLOCATE_ZEROED
      			<-acpi_evaluate_object
      				<-__acpi_bus_set_power
      					<-acpi_bus_set_power
      						<-acpi_pci_set_power_state
      							<-platform_pci_set_power_state
      								<-pci_platform_power_transition
      									<-__pci_complete_power_transition
      										<-pci_set_power_state
      											<-pci_restore_standard_config
      												<-pci_pm_runtime_resume
      - usb subsystem
      usb_get_status
      	<-finish_port_resume
      		<-usb_port_resume
      			<-generic_resume
      				<-usb_resume_device
      					<-usb_resume_both
      						<-usb_runtime_resume
      
      - some individual usb drivers
      usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....
      
      That is just what I have found.  Unfortunately, this allocation can only
      be found by human being now, and there should be many not found since
      any function in the resume path(call tree) may allocate memory with
      GFP_KERNEL.
      
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Oliver Neukum <oneukum@suse.de>
      Cc: Jiri Kosina <jiri.kosina@suse.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Decotigny <david.decotigny@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21caf2fc
    • Minchan Kim's avatar
      mm: remove MIGRATE_ISOLATE check in hotpath · 194159fb
      Minchan Kim authored
      
      Several functions test MIGRATE_ISOLATE and some of those are hotpath but
      MIGRATE_ISOLATE is used only if we enable CONFIG_MEMORY_ISOLATION(ie,
      CMA, memory-hotplug and memory-failure) which are not common config
      option.  So let's not add unnecessary overhead and code when we don't
      enable CONFIG_MEMORY_ISOLATION.
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      194159fb
    • Jiang Liu's avatar
      mm: set zone->present_pages to number of existing pages in the zone · 306f2e9e
      Jiang Liu authored
      
      Now all users of "number of pages managed by the buddy system" have been
      converted to use zone->managed_pages, so set zone->present_pages to what
      it should be:
      
      	present_pages = spanned_pages - absent_pages;
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
      Cc: Chris Clayton <chris2553@googlemail.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      306f2e9e
    • Jiang Liu's avatar
      mm: use zone->present_pages instead of zone->managed_pages where appropriate · b40da049
      Jiang Liu authored
      
      Now we have zone->managed_pages for "pages managed by the buddy system
      in the zone", so replace zone->present_pages with zone->managed_pages if
      what the user really wants is number of allocatable pages.
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
      Cc: Chris Clayton <chris2553@googlemail.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b40da049
    • Tang Chen's avatar
      acpi, memory-hotplug: support getting hotplug info from SRAT · 01a178a9
      Tang Chen authored
      
      We now provide an option for users who don't want to specify physical
      memory address in kernel commandline.
      
               /*
                * For movablemem_map=acpi:
                *
                * SRAT:                |_____| |_____| |_________| |_________| ......
                * node id:                0       1         1           2
                * hotpluggable:           n       y         y           n
                * movablemem_map:              |_____| |_________|
                *
                * Using movablemem_map, we can prevent memblock from allocating memory
                * on ZONE_MOVABLE at boot time.
                */
      
      So user just specify movablemem_map=acpi, and the kernel will use
      hotpluggable info in SRAT to determine which memory ranges should be set
      as ZONE_MOVABLE.
      
      If all the memory ranges in SRAT is hotpluggable, then no memory can be
      used by kernel.  But before parsing SRAT, memblock has already reserve
      some memory ranges for other purposes, such as for kernel image, and so
      on.  We cannot prevent kernel from using these memory.  So we need to
      exclude these ranges even if these memory is hotpluggable.
      
      Furthermore, there could be several memory ranges in the single node
      which the kernel resides in.  We may skip one range that have memory
      reserved by memblock, but if the rest of memory is too small, then the
      kernel will fail to boot.  So, make the whole node which the kernel
      resides in un-hotpluggable.  Then the kernel has enough memory to use.
      
      NOTE: Using this way will cause NUMA performance down because the
            whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
            on it.  If users don't want to lose NUMA performance, just don't use
            it.
      
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: use strcmp()]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01a178a9
    • Tang Chen's avatar
      acpi, memory-hotplug: extend movablemem_map ranges to the end of node · 27168d38
      Tang Chen authored
      
      When implementing movablemem_map boot option, we introduced an array
      movablemem_map.map[] to store the memory ranges to be set as
      ZONE_MOVABLE.
      
      Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
      the whole node memory range, we need to extend it to the node end so
      that we can use it to prevent memblock from allocating memory in the
      ranges user didn't specify.
      
      We now implement movablemem_map boot option like this:
      
              /*
               * For movablemem_map=nn[KMG]@ss[KMG]:
               *
               * SRAT:                |_____| |_____| |_________| |_________| ......
               * node id:                0       1         1           2
               * user specified:                |__|                 |___|
               * movablemem_map:                |___| |_________|    |______| ......
               *
               * Using movablemem_map, we can prevent memblock from allocating memory
               * on ZONE_MOVABLE at boot time.
               *
               * NOTE: In this case, SRAT info will be ingored.
               */
      
      [akpm@linux-foundation.org: clean up code, fix build warning]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27168d38
    • Tang Chen's avatar
      page_alloc: make movablemem_map have higher priority · 42f47e27
      Tang Chen authored
      
      If kernelcore or movablecore is specified at the same time with
      movablemem_map, movablemem_map will have higher priority to be
      satisfied.  This patch will make find_zone_movable_pfns_for_nodes()
      calculate zone_movable_pfn[] with the limit from zone_movable_limit[].
      
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Tested-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42f47e27
    • Tang Chen's avatar
      page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes · 6981ec31
      Tang Chen authored
      
      Introduce a new array zone_movable_limit[] to store the ZONE_MOVABLE
      limit from movablemem_map boot option for all nodes.  The function
      sanitize_zone_movable_limit() will find out to which node the ranges in
      movable_map.map[] belongs, and calculates the low boundary of
      ZONE_MOVABLE for each node.
      
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarLiu Jiang <jiang.liu@huawei.com>
      Reviewed-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Tested-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6981ec31
    • Tang Chen's avatar
      page_alloc: add movable_memmap kernel parameter · 34b71f1e
      Tang Chen authored
      
      Add functions to parse movablemem_map boot option.  Since the option
      could be specified more then once, all the maps will be stored in the
      global variable movablemem_map.map array.
      
      And also, we keep the array in monotonic increasing order by start_pfn.
      And merge all overlapped ranges.
      
      [akpm@linux-foundation.org: improve comment]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: remove unneeded parens]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Reviewed-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Tested-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34b71f1e
    • Andrew Morton's avatar
      mm/page_alloc.c:__setup_per_zone_wmarks: make min_pages unsigned long · 90ae8d67
      Andrew Morton authored
      
      `int' is an inappropriate type for a number-of-pages counter.
      
      While we're there, use the clamp() macro.
      
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Satoru Moriya <satoru.moriya@hds.com>
      Cc: Simon Jeons <simon.jeons@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ae8d67
    • Srinivas Pandruvada's avatar
      CMA: make putback_lru_pages() call conditional · 2a6f5124
      Srinivas Pandruvada authored
      
      As per documentation and other places calling putback_lru_pages(),
      putback_lru_pages() is called on error only.  Make the CMA code behave
      consistently.
      
      [akpm@linux-foundation.org: remove a test-n-branch in the wrapup code]
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a6f5124
  2. Feb 18, 2013
    • Linus Torvalds's avatar
      mm: fix pageblock bitmap allocation · 7c45512d
      Linus Torvalds authored
      
      Commit c060f943 ("mm: use aligned zone start for pfn_to_bitidx
      calculation") fixed out calculation of the index into the pageblock
      bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.
      
      However, the _allocation_ of that bitmap had never taken this alignment
      requirement into accout, so depending on the exact size and alignment of
      the zone, the use of that index could then access past the allocation,
      resulting in some very subtle memory corruption.
      
      This was reported (and bisected) by Ingo Molnar: one of his random
      config builds would hang with certain very specific kernel command line
      options.
      
      In the meantime, commit c060f943 has been marked for stable, so this
      fix needs to be back-ported to the stable kernels that backported the
      commit to use the right alignment.
      
      Bisected-and-tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c45512d
  3. Feb 12, 2013
  4. Feb 07, 2013
  5. Jan 11, 2013
    • Mel Gorman's avatar
      mm: compaction: partially revert capture of suitable high-order page · 8fb74b9f
      Mel Gorman authored
      
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca ("mm: compaction: capture a
      suitable high-order page immediately when it is made available").
      
      Reported-and-tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fb74b9f
    • Laura Abbott's avatar
      mm: use aligned zone start for pfn_to_bitidx calculation · c060f943
      Laura Abbott authored
      
      The current calculation in pfn_to_bitidx assumes that (pfn -
      zone->zone_start_pfn) >> pageblock_order will return the same bit for
      all pfn in a pageblock.  If zone_start_pfn is not aligned to
      pageblock_nr_pages, this may not always be correct.
      
      Consider the following with pageblock order = 10, zone start 2MB:
      
        pfn     | pfn - zone start | (pfn - zone start) >> page block order
        ----------------------------------------------------------------
        0x26000 | 0x25e00	   |  0x97
        0x26100 | 0x25f00	   |  0x97
        0x26200 | 0x26000	   |  0x98
        0x26300 | 0x26100	   |  0x98
      
      This means that calling {get,set}_pageblock_migratetype on a single page
      will not set the migratetype for the full block.  Fix this by rounding
      down zone_start_pfn when doing the bitidx calculation.
      
      For our use case, the effects of this bug were mostly tied to the fact
      that CMA allocations would either take a long time or fail to happen.
      Depending on the driver using CMA, this could result in anything from
      visual glitches to application failures.
      
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c060f943
    • Mel Gorman's avatar
      mm: compaction: Partially revert capture of suitable high-order page · 47ecfcb7
      Mel Gorman authored
      
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
      high-order page immediately when it is made available".
      
      Reported-and-tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47ecfcb7
  6. Jan 05, 2013
  7. Dec 21, 2012
  8. Dec 18, 2012
  9. Dec 13, 2012
  10. Dec 12, 2012
    • Marek Szyprowski's avatar
      mm: cma: remove watermark hacks · bc357f43
      Marek Szyprowski authored
      
      Commits 2139cbe6 ("cma: fix counting of isolated pages") and
      d95ea5d1 ("cma: fix watermark checking") introduced a reliable
      method of free page accounting when memory is being allocated from CMA
      regions, so the workaround introduced earlier by commit 49f223a9
      ("mm: trigger page reclaim in alloc_contig_range() to stabilise
      watermarks") can be finally removed.
      
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc357f43
    • Marek Szyprowski's avatar
      mm: cma: skip watermarks check for already isolated blocks in split_free_page() · 2e30abd1
      Marek Szyprowski authored
      
      Since commit 2139cbe6 ("cma: fix counting of isolated pages") free
      pages in isolated pageblocks are not accounted to NR_FREE_PAGES counters,
      so watermarks check is not required if one operates on a free page in
      isolated pageblock.
      
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e30abd1
    • Rafael Aquini's avatar
      mm: introduce putback_movable_pages() · 5733c7d1
      Rafael Aquini authored
      
      The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
      hacks around putback_lru_pages() in order to allow ballooned pages to be
      re-inserted on balloon page list as if a ballooned page was like a LRU page.
      
      As ballooned pages are not legitimate LRU pages, this patch introduces
      putback_movable_pages() to properly cope with cases where the isolated
      pageset contains ballooned pages and LRU pages, thus fixing the mentioned
      inelegant hack around putback_lru_pages().
      
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5733c7d1
    • Wen Congyang's avatar
      memory-hotplug: allocate zone's pcp before onlining pages · 6dcd73d7
      Wen Congyang authored
      
      We use __free_page() to put a page to buddy system when onlining pages.
      __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
      should allocate zone's pcp before onlining pages, otherwise we will lose
      some free pages.
      
      [mhocko@suse.cz: make zone_pcp_reset independent of MEMORY_HOTREMOVE]
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6dcd73d7
    • Wen Congyang's avatar
      memory-hotplug: fix NR_FREE_PAGES mismatch · 97d0da22
      Wen Congyang authored
      
      NR_FREE_PAGES will be wrong after offlining pages.  We add/dec
      NR_FREE_PAGES like this now:
      
      1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
      
      2. don't add NR_FREE_PAGES when it is freed and the migratetype is
         MIGRATE_ISOLATE
      
      3. dec NR_FREE_PAGES when offlining isolated pages.
      
      4. add NR_FREE_PAGES when undoing isolate pages.
      
      When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
      NR_FREE_PAGES are right.  When we come to step4, all pages are not in
      buddy system, so we don't change NR_FREE_PAGES in this step, but we change
      NR_FREE_PAGES in step3.  So NR_FREE_PAGES is wrong after offlining pages.
      So there is no need to change NR_FREE_PAGES in step3.
      
      This patch also fixs a problem in step2: if the migratetype is
      MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
      pcppages.
      
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Jianguo Wu <wujianguo106@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97d0da22
    • Wen Congyang's avatar
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang authored
      
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
    • Kirill A. Shutemov's avatar
      mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD · e5adfffc
      Kirill A. Shutemov authored
      
      We don't need custom NUMA_BUILD anymore, since we have handy
      IS_ENABLED().
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5adfffc
    • Rabin Vincent's avatar
      mm: show migration types in show_mem · 377e4f16
      Rabin Vincent authored
      
      This is useful to diagnose the reason for page allocation failure for
      cases where there appear to be several free pages.
      
      Example, with this alloc_pages(GFP_ATOMIC) failure:
      
       swapper/0: page allocation failure: order:0, mode:0x0
       ...
       Mem-info:
       Normal per-cpu:
       CPU    0: hi:   90, btch:  15 usd:  48
       CPU    1: hi:   90, btch:  15 usd:  21
       active_anon:0 inactive_anon:0 isolated_anon:0
        active_file:0 inactive_file:84 isolated_file:0
        unevictable:0 dirty:0 writeback:0 unstable:0
        free:4026 slab_reclaimable:75 slab_unreclaimable:484
        mapped:0 shmem:0 pagetables:0 bounce:0
       Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB
       inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB
       isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB
       dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB
       slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB
       bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
       lowmem_reserve[]: 0 0
      
      Before the patch, it's hard (for me, at least) to say why all these free
      chunks weren't considered for allocation:
      
       Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
       1*1024kB 1*2048kB 3*4096kB = 16128kB
      
      After the patch, it's obvious that the reason is that all of these are
      in the MIGRATE_CMA (C) freelist:
      
       Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB
       (C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB
      
      Signed-off-by: default avatarRabin Vincent <rabin.vincent@stericsson.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      377e4f16
  11. Dec 11, 2012
Loading