Skip to content
Snippets Groups Projects
  1. Sep 30, 2021
  2. Oct 06, 2020
  3. Sep 24, 2020
  4. May 14, 2020
    • Satya Tangirala's avatar
      block: Make blk-integrity preclude hardware inline encryption · d145dc23
      Satya Tangirala authored
      
      Whenever a device supports blk-integrity, make the kernel pretend that
      the device doesn't support inline encryption (essentially by setting the
      keyslot manager in the request queue to NULL).
      
      There's no hardware currently that supports both integrity and inline
      encryption. However, it seems possible that there will be such hardware
      in the near future (like the NVMe key per I/O support that might support
      both inline encryption and PI).
      
      But properly integrating both features is not trivial, and without
      real hardware that implements both, it is difficult to tell if it will
      be done correctly by the majority of hardware that support both.
      So it seems best not to support both features together right now, and
      to decide what to do at probe time.
      
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d145dc23
  5. Sep 18, 2019
  6. Jul 15, 2019
  7. Apr 30, 2019
  8. Apr 25, 2019
  9. Sep 24, 2018
  10. May 24, 2018
  11. Jun 09, 2017
  12. Apr 23, 2017
  13. Apr 21, 2017
    • Ilya Dryomov's avatar
      block: get rid of blk_integrity_revalidate() · 19b7ccf8
      Ilya Dryomov authored
      
      Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
      introduced blk_integrity_revalidate(), which seems to assume ownership
      of the stable pages flag and unilaterally clears it if no blk_integrity
      profile is registered:
      
          if (bi->profile)
                  disk->queue->backing_dev_info->capabilities |=
                          BDI_CAP_STABLE_WRITES;
          else
                  disk->queue->backing_dev_info->capabilities &=
                          ~BDI_CAP_STABLE_WRITES;
      
      It's called from revalidate_disk() and rescan_partitions(), making it
      impossible to enable stable pages for drivers that support partitions
      and don't use blk_integrity: while the call in revalidate_disk() can be
      trivially worked around (see zram, which doesn't support partitions and
      hence gets away with zram_revalidate_disk()), rescan_partitions() can
      be triggered from userspace at any time.  This breaks rbd, where the
      ceph messenger is responsible for generating/verifying CRCs.
      
      Since blk_integrity_{un,}register() "must" be used for (un)registering
      the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
      setting there.  This way drivers that call blk_integrity_register() and
      use integrity infrastructure won't interfere with drivers that don't
      but still want stable pages.
      
      Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.4+, needs backporting
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      19b7ccf8
  14. Mar 25, 2017
  15. Feb 02, 2017
  16. Oct 21, 2015
  17. Sep 11, 2015
  18. Jun 02, 2015
    • Tejun Heo's avatar
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo authored
      
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      66114cad
  19. Sep 27, 2014
  20. Nov 24, 2013
    • Kent Overstreet's avatar
      bio-integrity: Convert to bvec_iter · d57a5f7c
      Kent Overstreet authored
      
      The bio integrity is also stored in a bvec array, so if we use the bvec
      iter code we just added, the integrity code won't need to implement its
      own iteration stuff (bio_integrity_mark_head(), bio_integrity_mark_tail())
      
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      d57a5f7c
  21. Mar 20, 2013
  22. Feb 22, 2013
    • Darrick J. Wong's avatar
      bdi: allow block devices to say that they require stable page writes · 7d311cda
      Darrick J. Wong authored
      
      This patchset ("stable page writes, part 2") makes some key
      modifications to the original 'stable page writes' patchset.  First, it
      provides creators (devices and filesystems) of a backing_dev_info a flag
      that declares whether or not it is necessary to ensure that page
      contents cannot change during writeout.  It is no longer assumed that
      this is true of all devices (which was never true anyway).  Second, the
      flag is used to relaxed the wait_on_page_writeback calls so that wait
      only occurs if the device needs it.  Third, it fixes up the remaining
      disk-backed filesystems to use this improved conditional-wait logic to
      provide stable page writes on those filesystems.
      
      It is hoped that (for people not using checksumming devices, anyway)
      this patchset will give back unnecessary performance decreases since the
      original stable page write patchset went into 3.0.  Sorry about not
      fixing it sooner.
      
      Complaints were registered by several people about the long write
      latencies introduced by the original stable page write patchset.
      Generally speaking, the kernel ought to allocate as little extra memory
      as possible to facilitate writeout, but for people who simply cannot
      wait, a second page stability strategy is (re)introduced: snapshotting
      page contents.  The waiting behavior is still the default strategy; to
      enable page snapshotting, a superblock flag (MS_SNAP_STABLE) must be
      set.  This flag is used to bandaid^Henable stable page writeback on
      ext3[1], and is not used anywhere else.
      
      Given that there are already a few storage devices and network FSes that
      have rolled their own page stability wait/page snapshot code, it would
      be nice to move towards consolidating all of these.  It seems possible
      that iscsi and raid5 may wish to use the new stable page write support
      to enable zero-copy writeout.
      
      Thank you to Jan Kara for helping fix a couple more filesystems.
      
      Per Andrew Morton's request, here are the result of using dbench to measure
      latencies on ext2:
      
      3.8.0-rc3:
         Operation      Count    AvgLat    MaxLat
         ----------------------------------------
         WriteX        109347     0.028    59.817
         ReadX         347180     0.004     3.391
         Flush          15514    29.828   287.283
      
        Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms
      
      3.8.0-rc3 + patches:
         WriteX        105556     0.029     4.273
         ReadX         335004     0.005     4.112
         Flush          14982    30.540   298.634
      
        Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms
      
      As you can see, for ext2 the maximum write latency decreases from ~60ms
      on a laptop hard disk to ~4ms.  I'm not sure why the flush latencies
      increase, though I suspect that being able to dirty pages faster gives
      the flusher more work to do.
      
      On ext4, the average write latency decreases as well as all the maximum
      latencies:
      
      3.8.0-rc3:
         WriteX         85624     0.152    33.078
         ReadX         272090     0.010    61.210
         Flush          12129    36.219   168.260
      
        Throughput 44.8618 MB/sec  4 clients  4 procs  max_latency=168.276 ms
      
      3.8.0-rc3 + patches:
         WriteX         86082     0.141    30.928
         ReadX         273358     0.010    36.124
         Flush          12214    34.800   165.689
      
        Throughput 44.9941 MB/sec  4 clients  4 procs  max_latency=165.722 ms
      
      XFS seems to exhibit similar latency improvements as ext2:
      
      3.8.0-rc3:
         WriteX        125739     0.028   104.343
         ReadX         399070     0.005     4.115
         Flush          17851    25.004   131.390
      
        Throughput 66.0024 MB/sec  4 clients  4 procs  max_latency=131.406 ms
      
      3.8.0-rc3 + patches:
         WriteX        123529     0.028     6.299
         ReadX         392434     0.005     4.287
         Flush          17549    25.120   188.687
      
        Throughput 64.9113 MB/sec  4 clients  4 procs  max_latency=188.704 ms
      
      ...and btrfs, just to round things out, also shows some latency
      decreases:
      
      3.8.0-rc3:
         WriteX         67122     0.083    82.355
         ReadX         212719     0.005     2.828
         Flush           9547    47.561   147.418
      
        Throughput 35.3391 MB/sec  4 clients  4 procs  max_latency=147.433 ms
      
      3.8.0-rc3 + patches:
         WriteX         64898     0.101    71.631
         ReadX         206673     0.005     7.123
         Flush           9190    47.963   219.034
      
        Throughput 34.0795 MB/sec  4 clients  4 procs  max_latency=219.044 ms
      
      Before this patchset, all filesystems would block, regardless of whether
      or not it was necessary.  ext3 would wait, but still generate occasional
      checksum errors.  The network filesystems were left to do their own
      thing, so they'd wait too.
      
      After this patchset, all the disk filesystems except ext3 and btrfs will
      wait only if the hardware requires it.  ext3 (if necessary) snapshots
      pages instead of blocking, and btrfs provides its own bdi so the mm will
      never wait.  Network filesystems haven't been touched, so either they
      provide their own wait code, or they don't block at all.  The blocking
      behavior is back to what it was before 3.0 if you don't have a disk
      requiring stable page writes.
      
      This patchset has been tested on 3.8.0-rc3 on x64 with ext3, ext4, and
      xfs.  I've spot-checked 3.8.0-rc4 and seem to be getting the same
      results as -rc3.
      
      [1] The alternative fixes to ext3 include fixing the locking order and
      page bit handling like we did for ext4 (but then why not just use
      ext4?), or setting PG_writeback so early that ext3 becomes extremely
      slow.  I tried that, but the number of write()s I could initiate dropped
      by nearly an order of magnitude.  That was a bit much even for the
      author of the stable page series! :)
      
      This patch:
      
      Creates a per-backing-device flag that tracks whether or not pages must
      be held immutable during writeout.  Eventually it will be used to waive
      wait_for_page_writeback() if nothing requires stable pages.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d311cda
  23. Oct 31, 2011
  24. Apr 05, 2011
    • Mike Snitzer's avatar
      dm: improve block integrity support · a63a5cf8
      Mike Snitzer authored
      
      The current block integrity (DIF/DIX) support in DM is verifying that
      all devices' integrity profiles match during DM device resume (which
      is past the point of no return).  To some degree that is unavoidable
      (stacked DM devices force this late checking).  But for most DM
      devices (which aren't stacking on other DM devices) the ideal time to
      verify all integrity profiles match is during table load.
      
      Introduce the notion of an "initialized" integrity profile: a profile
      that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
      template.  Add blk_integrity_is_initialized() to allow checking if a
      profile was initialized.
      
      Update DM integrity support to:
      - check all devices with _initialized_ integrity profiles match
        during table load; uninitialized profiles (e.g. for underlying DM
        device(s) of a stacked DM device) are ignored.
      - disallow a table load that would result in an integrity profile that
        conflicts with a DM device's existing (in-use) integrity profile
      - avoid clearing an existing integrity profile
      - validate all integrity profiles match during resume; but if they
        don't all we can do is report the mismatch (during resume we're past
        the point of no return)
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      a63a5cf8
  25. Oct 15, 2010
  26. Sep 10, 2010
  27. Mar 30, 2010
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
Loading