- Mar 30, 2021
-
-
Daniel Wagner authored
[ Upstream commit 9ec49144 ] register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN but enables uevents at the end again in order to announce disk after possible partitions are created. When the device is removed the uevents are still on and user land sees 'remove' messages for devices which were never 'add'ed to the system. KERNEL[95481.571887] remove /devices/virtual/nvme-fabrics/ctl/nvme5/nvme0c5n1 (block) Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the uevents at all. Signed-off-by:
Daniel Wagner <dwagner@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20210311151917.136091-1-dwagner@suse.de Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Jan 17, 2021
-
-
Ming Lei authored
commit aebf5db9 upstream. Make sure that bdgrab() is done on the 'block_device' instance before referring to it for avoiding use-after-free. Cc: <stable@vger.kernel.org> Reported-by:
<syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Nov 12, 2020
-
-
Christoph Hellwig authored
Return if the function ended up sending an uevent or not. Cc: stable@vger.kernel.org # v5.9 Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Petr Vorel <pvorel@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 05, 2020
-
-
Christoph Hellwig authored
All remaining callers of bdget() outside of fs/block_dev.c want to get a reference to the struct block_device for a given struct hd_struct. Add a helper just for that and then mark bdget static. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 25, 2020
-
-
Christoph Hellwig authored
No need to go through the hd_struct to find the partition number. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 23, 2020
-
-
Christoph Hellwig authored
Use blkdev_get_by_dev instead of open coding it using bdget_disk + blkdev_get, and split the code to read the partition table into a separate helper to make it a little more obvious. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We can only scan for partitions on the whole disk, so move the flag from struct block_device to struct gendisk. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 10, 2020
-
-
Christoph Hellwig authored
Like check_disk_changed, except that it does not call ->revalidate_disk but leaves that to the caller. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 02, 2020
-
-
Christoph Hellwig authored
Only virtio_blk and xen-blkfront set the revalidate argument to true, and both do not implement the ->revalidate_disk method. So switch to the helper that just updates the size instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags variable to better describe the condition. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We can trivially derive the gendisk from the hd_struct. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 01, 2020
-
-
Christoph Hellwig authored
Use early returns and goto-based unwinding to simplify the flow a bit. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 31, 2020
-
-
Randy Dunlap authored
Drop the repeated word "to" in multiple places. Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 18, 2020
-
-
Boris Burkov authored
In order to improve consistency and usability in cgroup stat accounting, we would like to support the root cgroup's io.stat. Since the root cgroup has processes doing io even if the system has no explicitly created cgroups, we need to be careful to avoid overhead in that case. For that reason, the rstat algorithms don't handle the root cgroup, so just turning the file on wouldn't give correct statistics. To get around this, we simulate flushing the iostat struct by filling it out directly from global disk stats. The result is a root cgroup io.stat file consistent with both /proc/diskstats and io.stat. Note that in order to collect the disk stats, we needed to iterate over devices. To facilitate that, we had to change the linkage of a disk_type to external so that it can be used from blk-cgroup.c to iterate over disks. Suggested-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Boris Burkov <boris@bur.io> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 08, 2020
-
-
Christoph Hellwig authored
md is the last driver using the legacy media_changed method. Switch it over to (not so) new ->clear_events approach, which also removes the need for the ->revalidate_disk method. Signed-off-by:
Christoph Hellwig <hch@lst.de> [axboe: remove unused 'bdops' variable in disk_clear_events()] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 24, 2020
-
-
Luis Chamberlain authored
Commit dc9edc44 ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d9952 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. While at it, update the docs with the context expectations for the request_queue / gendisk refcount decrement, and make these expectations explicit by using might_sleep(). Fixes: dc9edc44 ("block: Fix a blk_exit_rl() regression") Suggested-by:
Nicolai Stange <nstange@suse.de> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Luis Chamberlain authored
Let us clarify the context under which the helpers to increment the refcount for the gendisk and request_queue can be called under. We make this explicit on the places where we may sleep with might_sleep(). We don't address the decrement context yet, as that needs some extra work and fixes, but will be addressed in the next patch. Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Luis Chamberlain authored
This adds documentation for the gendisk / request_queue refcount helpers. Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 27, 2020
-
-
Konstantin Khlebnikov authored
The RCU lock is required only in disk_map_sector_rcu() to lookup the partition. After that request holds reference to related hd_struct. Replace get_cpu() with preempt_disable() - returned cpu index is unused. [hch: rebased] Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
percpu variables have a perfectly fine working stub implementation for UP kernels, so use that. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 19, 2020
-
-
Christoph Hellwig authored
part_inc_in_flight and part_dec_in_flight only have one caller each, and those callers are purely for bio based drivers. Merge each function into the only caller, and remove the superflous blk-mq checks. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Don't bother to call part_in_flight / part_in_flight_rw on blk-mq devices, just call the blk-mq versions directly. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 13, 2020
-
-
Ming Lei authored
gendisk can't be gone when there is IO activity, so not hold part0's refcount in IO path. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@infradead.org> Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP, so define it just for 32bit SMP. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@infradead.org> Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
delete_partition() clears the cached last_lookup partition. However the .last_lookup cache may be overwritten by one IO path after it is cleared from delete_partition(). Then another IO path may use the cached deleting partition after hd_struct_free() is called, then use-after-free is triggered on the cached partition. Fixes the issue by the following approach: 1) always get the partition's refcount via hd_struct_try_get() before setting .last_lookup 2) move clearing .last_lookup from delete_partition() to hd_struct_free() which is the release handle of the partition's percpu-refcount, so that no IO path can cache deleteing partition via .last_lookup. It is one candidate approach of Yufen's patch[1] which adds overhead in fast path by indirect lookup which may introduce one extra cacheline in IO path. Also this patch relies on percpu-refcount's protection, and it is easier to understand and verify. [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t Reported-by:
Yufen Yu <yuyufen@huawei.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 09, 2020
-
-
Christoph Hellwig authored
Split out a new bdi_set_owner helper to set the owner, and move the policy for creating the bdi name back into genhd.c, where it belongs. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 20, 2020
-
-
Christoph Hellwig authored
invalidate_partition and bdev_unhash_inode are always paired, and invalidate_partition already does an icache lookup for the block device inode. Piggy back on that to remove the inode from the hash. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
invalidate_partition is only used in genhd.c, so mark it static. Also drop the return value given that is is always ignored. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All callers have the hd_struct at hand, so pass it instead of performing another lookup. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 27, 2020
-
-
Christoph Hellwig authored
There really isn't any good reason to stash a method directly into struct gendisk. Move it together with the other block device operations. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 25, 2020
-
-
Christoph Hellwig authored
get_gendisk is not used by any modular code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
disk_map_sector_rcu is not used by any modular code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
disk_get_part is not used by any modular code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Konstantin Khlebnikov authored
Column "time_in_queue" in diskstats is supposed to show total waiting time of all requests. I.e. value should be equal to the sum of times from other columns. But this is not true, because column "time_in_queue" is counted separately in jiffies rather than in nanoseconds as other times. This patch removes redundant counter for "time_in_queue" and shows total time of read, write, discard and flush requests. Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Konstantin Khlebnikov authored
Reading /proc/diskstats iterates over all cpus for summing each field. It's faster to sum all fields in one pass. Hammering /proc/diskstats with fio shows 2x performance improvement: fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \ --size=1k --bs=1k --fallocate=none --create_on_open=1 \ --time_based=1 --runtime=10 --invalidate=0 --group_report JOBS=1 JOBS=10 Before: 7k iops 64k iops After: 18k iops 120k iops Also this way code is more compact: add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346) Function old new delta part_stat_read_all - 194 +194 diskstats_show 1344 631 -713 part_stat_show 1219 392 -827 Total: Before=14966947, After=14965601, chg -0.01% Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 24, 2020
-
-
Christoph Hellwig authored
Move the sysfs _show methods that are used both on the full disk and partition nodes to genhd.c instead of hiding them in the partitioning code. Also move the declaration for these methods to block/blk.h so that we don't expose them to drivers. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Thes functions aren't really related to partition support, so move them to a more suitable place. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-