- Apr 30, 2019
-
-
Christoph Hellwig authored
Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 22, 2019
-
-
Yufen Yu authored
commit 2da78092 "block: Fix dev_t minor allocation lifetime" specifically moved blk_free_devt(dev->devt) call to part_release() to avoid reallocating device number before the device is fully shutdown. However, it can cause use-after-free on gendisk in get_gendisk(). We use md device as example to show the race scenes: Process1 Worker Process2 md_free blkdev_open del_gendisk add delete_partition_work_fn() to wq __blkdev_get get_gendisk put_disk disk_release kfree(disk) find part from ext_devt_idr get_disk_and_module(disk) cause use after free delete_partition_work_fn put_device(part) part_release remove part from ext_devt_idr Before <devt, hd_struct pointer> is removed from ext_devt_idr by delete_partition_work_fn(), we can find the devt and then access gendisk by hd_struct pointer. But, if we access the gendisk after it have been freed, it can cause in use-after-freeon gendisk in get_gendisk(). We fix this by adding a new helper blk_invalidate_devt() in delete_partition() and del_gendisk(). It replaces hd_struct pointer in idr with value 'NULL', and deletes the entry from idr in part_release() as we do now. Thanks to Jan Kara for providing the solution and more clear comments for the code. Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime") Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 15, 2019
-
-
Yufen Yu authored
commit 2da78092 "block: Fix dev_t minor allocation lifetime" specifically moved blk_free_devt(dev->devt) call to part_release() to avoid reallocating device number before the device is fully shutdown. However, it can cause use-after-free on gendisk in get_gendisk(). We use md device as example to show the race scenes: Process1 Worker Process2 md_free blkdev_open del_gendisk add delete_partition_work_fn() to wq __blkdev_get get_gendisk put_disk disk_release kfree(disk) find part from ext_devt_idr get_disk_and_module(disk) cause use after free delete_partition_work_fn put_device(part) part_release remove part from ext_devt_idr Before <devt, hd_struct pointer> is removed from ext_devt_idr by delete_partition_work_fn(), we can find the devt and then access gendisk by hd_struct pointer. But, if we access the gendisk after it have been freed, it can cause in use-after-freeon gendisk in get_gendisk(). We fix this by adding a new helper blk_invalidate_devt() in delete_partition() and del_gendisk(). It replaces hd_struct pointer in idr with value 'NULL', and deletes the entry from idr in part_release() as we do now. Thanks to Jan Kara for providing the solution and more clear comments for the code. Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime") Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 12, 2019
-
-
Martin Wilck authored
Drivers now report to the block layer if they support media change events. If this is not the case, there's no need to allocate the event structure, and all event handling code can effectively be skipped. This simplifies code flow in particular for non-removable sd devices. This effectively reverts commit 75e3f3ee ("block: always allocate genhd->ev if check_events is implemented"). The sysfs files for the events are kept in place even if no events are supported, as user space may rely on them being present. The only difference is that an error code is now returned if the user tries to set poll_msecs. Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Martin Wilck authored
Currently, an empty disk->events field tells the block layer not to forward media change events to user space. This was done in commit 7c88a168 ("block: don't propagate unlisted DISK_EVENTs to userland") in order to avoid events from "fringe" drivers to be forwarded to user space. By doing so, the block layer lost the information which events were supported by a particular block device, and most importantly, whether or not a given device supports media change events at all. Prepare for not interpreting the "events" field this way in the future any more. This is done by adding an additional field "event_flags" to struct gendisk, and two flag bits that can be set to have the device treated like one that had the "events" field set to a non-zero value before. This applies only to the sd and sr drivers, which are changed to set the new flags. The new flags are DISK_EVENT_FLAG_POLL to enforce polling of the device for synchronous events, and DISK_EVENT_FLAG_UEVENT to tell the blocklayer to generate udev events from kernel events. In order to add the event_flags field to struct gendisk, the events field is converted to an "unsigned short"; it doesn't need to hold values bigger than 2 anyway. This patch doesn't change behavior. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Martin Wilck authored
The async_events field, intended to be used for drivers that support asynchronous notifications about disk events (aka media change events), isn't currently used by any driver, and apparently that has been that way for a long time (if not forever). Remove it. Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 28, 2019
-
-
Keyur Patel authored
Replace hard coded function name register_blkdev with __func__, to improve robustness and to conform to the Linux kernel coding style. Issue found using checkpatch. Signed-off-by:
Keyur Patel <iamkeyur96@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
zhengbin authored
If __device_add_disk-->bdi_register_owner-->bdi_register--> bdi_register_va-->device_create_vargs fails, bdi->dev is still NULL, __device_add_disk-->register_disk will visit bdi->dev->kobj. This patch fixes that. Signed-off-by:
zhengbin <zhengbin13@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 10, 2018
-
-
Mikulas Patocka authored
The previous patches deleted all the code that needed the second value returned from part_in_flight - now the kernel only uses the first value. Consequently, part_in_flight (and blk_mq_in_flight) may be changed so that it only returns one value. This patch just refactors the code, there's no functional change. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mikulas Patocka authored
Now when part_round_stats is gone, we can switch to per-cpu in-flight counters. We use the local-atomic type local_t, so that if part_inc_in_flight or part_dec_in_flight is reentrantly called from an interrupt, the value will be correct. The other counters could be corrupted due to reentrant interrupt, but the corruption only results in slight counter skew - the in_flight counter must be exact, so it needs local_t. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mikulas Patocka authored
We want to convert to per-cpu in_flight counters. The function part_round_stats needs the in_flight counter every jiffy, it would be too costly to sum all the percpu variables every jiffy, so it must be deleted. part_round_stats is used to calculate two counters - time_in_queue and io_ticks. time_in_queue can be calculated without part_round_stats, by adding the duration of the I/O when the I/O ends (the value is almost as exact as the previously calculated value, except that time for in-progress I/Os is not counted). io_ticks can be approximated by increasing the value when I/O is started or ended and the jiffies value has changed. If the I/Os take less than a jiffy, the value is as exact as the previously calculated value. If the I/Os take more than a jiffy, io_ticks can drift behind the previously calculated value. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mike Snitzer authored
All of part_stat_* and related methods are used with preempt disabled, so there is no need to pass cpu around to allow of them. Just call smp_processor_id() as needed. Suggested-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 16, 2018
-
-
Jens Axboe authored
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 28, 2018
-
-
Hannes Reinecke authored
Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by:
Martin Wilck <martin.wilck@suse.com> Signed-off-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 22, 2018
-
-
Omar Sandoval authored
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not updating properly on 4.18. This is because we started using ktime to track elapsed time, and we convert nanoseconds to jiffies when we update the partition counter. However, this gets rounded down, so any I/Os that take less than a jiffy are not accounted for. Previously in this case, the value of jiffies would sometimes increment while we were doing I/O, so at least some I/Os were accounted for. Let's convert the stats to use nanoseconds internally. We still report milliseconds as before, now more accurately than ever. The value is still truncated to 32 bits for backwards compatibility. Fixes: 522a7775 ("block: consolidate struct request timestamp fields") Cc: stable@vger.kernel.org Reported-by:
Klaus Kusche <klaus.kusche@computerix.info> Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 18, 2018
-
-
Michael Callahan authored
Add tracking of REQ_OP_DISCARD ios to the partition statistics and append them to the various stat files in /sys as well as /proc/diskstats. These are tracked with the same four stats as reads and writes: Number of discard ios completed. Number of discard ios merged Number of discard sectors completed Milliseconds spent on discard requests This is done via adding a new STAT_DISCARD define to genhd.h and then using it to index that stat field for discard requests. tj: Refreshed on top of v4.17 and other previous updates. Signed-off-by:
Michael Callahan <michaelcallahan@fb.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Andy Newell <newella@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Michael Callahan authored
Add defines for STAT_READ and STAT_WRITE for indexing the partition stat entries. This clarifies some fs/ code which has hardcoded 1 for STAT_WRITE and will make it easier to extend the stats with additional fields. tj: Refreshed on top of v4.17. Signed-off-by:
Michael Callahan <michaelcallahan@fb.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 24, 2018
-
-
Joe Perches authored
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 16, 2018
-
-
Christoph Hellwig authored
Variants of proc_create{,_data} that directly take a struct seq_operations argument and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- Apr 26, 2018
-
-
Omar Sandoval authored
When the blk-mq inflight implementation was added, /proc/diskstats was converted to use it, but /sys/block/$dev/inflight was not. Fix it by adding another helper to count in-flight requests by data direction. Fixes: f299b7c7 ("blk-mq: provide internal in-flight variant") Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 15, 2018
-
-
Srivatsa S. Bhat authored
register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one bugs in the error prints). While at it, also update the comment describing register_blkdev(). Signed-off-by:
Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by:
Logan Gunthorpe <logang@deltatee.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Feb 26, 2018
-
-
Jan Kara authored
When two blkdev_open() calls for a partition race with device removal and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in blkdev_open(). The race can happen as follows: CPU0 CPU1 CPU2 del_gendisk() bdev_unhash_inode(part1); blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL) bdev = bd_acquire() bdev = bd_acquire() blkdev_get(bdev) bd_start_claiming(bdev) - finds old inode 'whole' bd_prepare_to_claim() -> 0 bdev_unhash_inode(whole); <device removed> <new device under same number created> blkdev_get(bdev); bd_start_claiming(bdev) - finds new inode 'whole' bd_prepare_to_claim() - this also succeeds as we have different 'whole' here... - bad things happen now as we have two exclusive openers of the same bdev The problem here is that block device opens can see various intermediate states while gendisk is shutting down and then being recreated. We fix the problem by introducing new lookup_sem in gendisk that synchronizes gendisk deletion with get_gendisk() and furthermore by making sure that get_gendisk() does not return gendisk that is being (or has been) deleted. This makes sure that once we ever manage to look up newly created bdev inode, we are also guaranteed that following get_gendisk() will either return failure (and we fail open) or it returns gendisk for the new device and following bdget_disk() will return new bdev inode (i.e., blkdev_open() follows the path as if it is completely run after new device is created). Reported-and-analyzed-by:
Hou Tao <houtao1@huawei.com> Tested-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Add a proper counterpart to get_disk_and_module() - put_disk_and_module(). Currently it is opencoded in several places. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Rename get_disk() to get_disk_and_module() to make sure what the function does. It's not a great name but at least it is now clear that put_disk() is not it's counterpart. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Commit 8ddcd653 "block: introduce GENHD_FL_HIDDEN" added handling of hidden devices to get_gendisk() but forgot to drop module reference which is also acquired by get_disk(). Drop the reference as necessary. Arguably the function naming here is misleading as put_disk() is *not* the counterpart of get_disk() but let's fix that in the follow up commit since that will be more intrusive. Fixes: 8ddcd653 CC: Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 15, 2018
-
-
Mike Snitzer authored
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this is block/genhd.c:add_disk() has requires that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant that drivers may use to add a disk without also calling blk_register_queue(). Driver must call blk_register_queue() once its request_queue is fully initialized. - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED is not set. It won't be set if driver used add_disk_no_queue_reg() but driver encounters an error and must del_gendisk() before calling blk_register_queue(). - Export blk_register_queue(). These changes allow DM to use add_disk_no_queue_reg() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known its request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization performed by blk_register_queue() is missed for DM devices anymore. Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mike Snitzer authored
device_add_disk() will only call bdi_register_owner() if !GENHD_FL_HIDDEN, so it follows that del_gendisk() should only call bdi_unregister() if !GENHD_FL_HIDDEN. Found with code inspection. bdi_unregister() won't do any harm if bdi_register_owner() wasn't used but best to avoid the unnecessary call to bdi_unregister(). Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN") Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 19, 2017
-
-
Randy Dunlap authored
Fix typo in error message. Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
weiping zhang authored
device_add_disk need do more safety error handle, so this patch just add WARN_ON. Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
weiping zhang <zhangweiping@didichuxing.com> Adapted for current series by me. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 11, 2017
-
-
Colin Ian King authored
It is possible that the pointer disk can be null and hence we can get a null pointer deference when accessing disk->flags. Add a null pointer check to avoid the dereference. Detected by CoverityScan, CID#1461133 ("Explicit null dereferenced") Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hannes Reinecke authored
When creating nvme multipath devices we should populate the 'slaves' and 'holders' directorys properly to aid userspace topology detection. Signed-off-by:
Hannes Reinecke <hare@suse.com> [hch: split from a larger patch] Reviewed-by:
Keith Busch <keith.busch@intel.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 03, 2017
-
-
Christoph Hellwig authored
With this flag a driver can create a gendisk that can be used for I/O submission inside the kernel, but which is not registered as user facing block device. This will be useful for the NVMe multipath implementation. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The hidden gendisks introduced in the next patch need to keep the dev field in their struct device empty so that udev won't try to create block device nodes for them. To support that rewrite disk_devt to look at the major and first_minor fields in the gendisk itself instead of looking into the struct device. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 26, 2017
-
-
Byungchul Park authored
Darrick posted the following warning and Dave Chinner analyzed it: > ====================================================== > WARNING: possible circular locking dependency detected > 4.14.0-rc1-fixes #1 Tainted: G W > ------------------------------------------------------ > loop0/31693 is trying to acquire lock: > (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs] > > but now in release context of a crosslock acquired at the following: > ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #2 ((complete)&ret.event){+.+.}: > lock_acquire+0xab/0x200 > wait_for_completion_io+0x4e/0x1a0 > submit_bio_wait+0x7f/0xb0 > blkdev_issue_zeroout+0x71/0xa0 > xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs] > xfs_bmapi_write+0x374/0x11f0 [xfs] > xfs_iomap_write_direct+0x2ac/0x430 [xfs] > xfs_file_iomap_begin+0x20d/0xd50 [xfs] > iomap_apply+0x43/0xe0 > dax_iomap_rw+0x89/0xf0 > xfs_file_dax_write+0xcc/0x220 [xfs] > xfs_file_write_iter+0xf0/0x130 [xfs] > __vfs_write+0xd9/0x150 > vfs_write+0xc8/0x1c0 > SyS_write+0x45/0xa0 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > -> #1 (&xfs_nondir_ilock_class){++++}: > lock_acquire+0xab/0x200 > down_write_nested+0x4a/0xb0 > xfs_ilock+0x263/0x330 [xfs] > xfs_setattr_size+0x152/0x370 [xfs] > xfs_vn_setattr+0x6b/0x90 [xfs] > notify_change+0x27d/0x3f0 > do_truncate+0x5b/0x90 > path_openat+0x237/0xa90 > do_filp_open+0x8a/0xf0 > do_sys_open+0x11c/0x1f0 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}: > up_write+0x1c/0x40 > xfs_iunlock+0x1d0/0x310 [xfs] > xfs_file_fallocate+0x8a/0x310 [xfs] > loop_queue_work+0xb7/0x8d0 > kthread_worker_fn+0xb9/0x1f0 > > Chain exists of: > &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event > > Possible unsafe locking scenario by crosslock: > > CPU0 CPU1 > ---- ---- > lock(&xfs_nondir_ilock_class); > lock((complete)&ret.event); > lock(&(&ip->i_mmaplock)->mr_lock); > unlock((complete)&ret.event); > > *** DEADLOCK *** The warning is a false positive, caused by the fact that all wait_for_completion()s in submit_bio_wait() are waiting with the same lock class. However, some bios have nothing to do with others, for example in the case of loop devices, there's no direct connection between the bios of an upper device and the bios of a lower device(=loop device). The safest way to assign different lock classes to different devices is to do it for each gendisk. In other words, this patch assigns a lockdep_map per gendisk and uses it when initializing completion in submit_bio_wait(). Analyzed-by:
Dave Chinner <david@fromorbit.com> Reported-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Byungchul Park <byungchul.park@lge.com> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-10-git-send-email-byungchul.park@lge.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Aug 23, 2017
-
-
Christoph Hellwig authored
This helper allows looking up a partion under RCU protection without grabbing a reference to it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 18, 2017
-
-
Bart Van Assche authored
Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with rcu_dereference_protected(). This patch does not change the behavior of the modified code but ensures that sparse does not complain about disk->part_tbl manipulations nor about part_tbl->part accesses. Additionally, improve documentation of the locking requirements of the modified functions. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 09, 2017
-
-
Jens Axboe authored
We don't have to inc/dec some counter, since we can just iterate the tags. That makes inc/dec a noop, but means we have to iterate busy tags to get an in-flight count. Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Instead of returning the count that matches the partition, pass in an array of two ints. Index 0 will be filled with the inflight count for the partition in question, and index 1 will filled with the root inflight count, if the partition passed in is not the root. This is in preparation for being able to calculate both in one go. Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
No functional change in this patch, just in preparation for basing the inflight mechanism on the queue in question. Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-