Skip to content
Snippets Groups Projects
  1. Apr 29, 2008
  2. Apr 28, 2008
  3. Feb 15, 2008
  4. Feb 07, 2008
    • Hugh Dickins's avatar
      memcgroup: reinstate swapoff mod · 044d66c1
      Hugh Dickins authored
      
      This patch reinstates the "swapoff: scan ptes preemptibly" mod we started
      with: in due course it should be rendered down into the earlier patches,
      leaving us with a more straightforward mem_cgroup_charge mod to unuse_pte,
      allocating with GFP_KERNEL while holding no spinlock and no atomic kmap.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      044d66c1
    • Balbir Singh's avatar
      Memory controller: make charging gfp mask aware · e1a1cd59
      Balbir Singh authored
      
      Nick Piggin pointed out that swap cache and page cache addition routines
      could be called from non GFP_KERNEL contexts.  This patch makes the
      charging routine aware of the gfp context.  Charging might fail if the
      cgroup is over it's limit, in which case a suitable error is returned.
      
      This patch was tested on a Powerpc box.  I am still looking at being able
      to test the path, through which allocations happen in non GFP_KERNEL
      contexts.
      
      [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1a1cd59
    • Balbir Singh's avatar
      Memory controller: memory accounting · 8a9f3ccd
      Balbir Singh authored
      
      Add the accounting hooks.  The accounting is carried out for RSS and Page
      Cache (unmapped) pages.  There is now a common limit and accounting for both.
      The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
      time.  Page cache is accounted at add_to_page_cache(),
      __delete_from_page_cache().  Swap cache is also accounted for.
      
      Each page's page_cgroup is protected with the last bit of the
      page_cgroup pointer, this makes handling of race conditions involving
      simultaneous mappings of a page easier.  A reference count is kept in the
      page_cgroup to deal with cases where a page might be unmapped from the RSS
      of all tasks, but still lives in the page cache.
      
      Credits go to Vaidyanathan Srinivasan for helping with reference counting work
      of the page cgroup.  Almost all of the page cache accounting code has help
      from Vaidyanathan Srinivasan.
      
      [hugh@veritas.com: fix swapoff breakage]
      [akpm@linux-foundation.org: fix locking]
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <Valdis.Kletnieks@vt.edu>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a9f3ccd
    • Hugh Dickins's avatar
      memcgroup: temporarily revert swapoff mod · 59bd2658
      Hugh Dickins authored
      
      This patch precisely reverts the "swapoff: scan ptes preemptibly" patch
      just presented.  It's a temporary measure to allow existing memory
      controller patches to apply without rejects: in due course they should be
      rendered down into one sensible patch, and this reversion disappear.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59bd2658
  5. Feb 05, 2008
    • Hugh Dickins's avatar
      tmpfs: open a window in shmem_unuse_inode · 2e0e26c7
      Hugh Dickins authored
      
      There are a couple of reasons (patches follow) why it would be good to open a
      window for sleep in shmem_unuse_inode, between its search for a matching swap
      entry, and its handling of the entry found.
      
      shmem_unuse_inode must then use igrab to hold the inode against deletion in
      that window, and its corresponding iput might result in deletion: so it had
      better unlock_page before the iput, and might as well release the page too.
      
      Nor is there any need to hold on to shmem_swaplist_mutex once we know we'll
      leave the loop.  So this unwinding moves from try_to_unuse and shmem_unuse
      into shmem_unuse_inode, in the case when it finds a match.
      
      Let try_to_unuse break on error in the shmem_unuse case, as it does in the
      unuse_mm case: though at this point in the series, no error to break on.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e0e26c7
    • Hugh Dickins's avatar
      swapoff: scan ptes preemptibly · 2e441889
      Hugh Dickins authored
      
      Provided that CONFIG_HIGHPTE is not set, unuse_pte_range can reduce latency
      in swapoff by scanning the page table preemptibly: so long as unuse_pte is
      careful to recheck that entry under pte lock.
      
      (To tell the truth, this patch was not inspired by any cries for lower
      latency here: rather, this restructuring permits a future memory controller
      patch to allocate with GFP_KERNEL in unuse_pte, where before it could not.
      But it would be wrong to tuck this change away inside a memcgroup patch.)
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Tested-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e441889
    • Hugh Dickins's avatar
      swapin: fix valid_swaphandles defect · 8952898b
      Hugh Dickins authored
      
      valid_swaphandles is supposed to do a quick pass over the swap map entries
      neigbouring the entry which swapin_readahead is targetting, to determine for
      it a range worth reading all together.  But since it always starts its search
      from the beginning of the swap "cluster", a reject (free entry) there
      immediately curtails the readaround, and every swapin_readahead from that
      cluster is for just a single page.  Instead scan forwards and backwards around
      the target entry.
      
      Use better names for some variables: a swap_info pointer is usually called
      "si" not "swapdev".  And at the end, if only the target page should be read,
      return count of 0 to disable readaround, to avoid the unnecessarily repeated
      call to read_swap_cache_async.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8952898b
    • Hugh Dickins's avatar
      swapin needs gfp_mask for loop on tmpfs · 02098fea
      Hugh Dickins authored
      
      Building in a filesystem on a loop device on a tmpfs file can hang when
      swapping, the loop thread caught in that infamous throttle_vm_writeout.
      
      In theory this is a long standing problem, which I've either never seen in
      practice, or long ago suppressed the recollection, after discounting my load
      and my tmpfs size as unrealistically high.  But now, with the new aops, it has
      become easy to hang on one machine.
      
      Loop used to grab_cache_page before the old prepare_write to tmpfs, which
      seems to have been enough to free up some memory for any swapin needed; but
      the new write_begin lets tmpfs find or allocate the page (much nicer, since
      grab_cache_page missed tmpfs pages in swapcache).
      
      When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
      has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
      break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
      read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
      mapping_gfp_mask - hence the hang.
      
      So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
      swapin_readahead to read_swap_cache_async to add_to_swap_cache.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02098fea
  6. Jul 29, 2007
  7. Jul 16, 2007
  8. May 07, 2007
  9. Jan 06, 2007
  10. Dec 08, 2006
  11. Dec 07, 2006
  12. Sep 29, 2006
  13. Aug 27, 2006
  14. Jun 30, 2006
  15. Jun 23, 2006
    • Pekka Enberg's avatar
      [PATCH] read_mapping_page for address space · 090d2b18
      Pekka Enberg authored
      
      Add read_mapping_page() which is used for callers that pass
      mapping->a_ops->readpage as the filler for read_cache_page.  This removes
      some duplication from filesystem code.
      
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      090d2b18
    • Hugh Dickins's avatar
      [PATCH] swapoff: use atomic_inc_not_zero() on mm_users · 70af7c5c
      Hugh Dickins authored
      
      Now that we have atomic_inc_not_zero, it's more elegant for try_to_unuse to
      use that on mm_users: doesn't actually matter at present, but safer to be
      sure that once mm_users has gone to 0, nothing raises it for an instant.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      70af7c5c
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: rip out swap based logic · d75a0fcd
      Christoph Lameter authored
      
      Rip the page migration logic out.
      
      Remove all code that has to do with swapping during page migration.
      
      This also guts the ability to migrate pages to swap.  No one used that so lets
      let it go for good.
      
      Page migration should be a bit broken after this patch.
      
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d75a0fcd
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter authored
      
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0697212a
    • Christoph Lameter's avatar
      [PATCH] migration: remove unnecessary PageSwapCache checks · 3c5a87f4
      Christoph Lameter authored
      
      Remove two unnecessary PageSwapCache checks.  The page refcount is raised
      and therefore page migration cannot occur in both functions.
      
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3c5a87f4
  16. Mar 31, 2006
  17. Mar 23, 2006
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: userland interface · 6e1819d6
      Rafael J. Wysocki authored
      
      This patch introduces a user space interface for swsusp.
      
      The interface is based on a special character device, called the snapshot
      device, that allows user space processes to perform suspend and resume-related
      operations with the help of some ioctls and the read()/write() functions.
       Additionally it allows these processes to allocate free swap pages from a
      selected swap partition, called the resume partition, so that they know which
      sectors of the resume partition are available to them.
      
      The interface uses the same low-level system memory snapshot-handling
      functions that are used by the built-it swap-writing/reading code of swsusp.
      
      The interface documentation is included in the patch.
      
      The patch assumes that the major and minor numbers of the snapshot device will
      be 10 (ie.  misc device) and 231, the registration of which has already been
      requested.
      
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6e1819d6
    • Rafael J. Wysocki's avatar
      [PATCH] swsusp: low level interface · f577eb30
      Rafael J. Wysocki authored
      
      Introduce the low level interface that can be used for handling the
      snapshot of the system memory by the in-kernel swap-writing/reading code of
      swsusp and the userland interface code (to be introduced shortly).
      
      Also change the way in which swsusp records the allocated swap pages and,
      consequently, simplifies the in-kernel swap-writing/reading code (this is
      necessary for the userland interface too).  To this end, it introduces two
      helper functions in mm/swapfile.c, so that the swsusp code does not refer
      directly to the swap internals.
      
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f577eb30
  18. Mar 22, 2006
  19. Feb 01, 2006
  20. Jan 19, 2006
  21. Jan 12, 2006
  22. Jan 11, 2006
  23. Jan 09, 2006
Loading