- Sep 15, 2021
-
-
Pavel Begunkov authored
commit d9cf3bd5 upstream. __bio_iov_append_get_pages() doesn't put not appended pages on bio_add_hw_page() failure, so potentially leaking them, fix it. Also, do the same for __bio_iov_iter_get_pages(), even though it looks like it can't be triggered by userspace in this case. Fixes: 0512a75b ("block: Introduce REQ_OP_ZONE_APPEND") Cc: stable@vger.kernel.org # 5.8+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1edfa6a2ffd66d55e6345a477df5387d2c1415d0.1626653825.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 16, 2021
-
-
Yufen Yu authored
[ Upstream commit 3edf5346 ] For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Suggested-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Oct 28, 2020
-
-
Naohiro Aota authored
When the bio's size reaches max_append_sectors, bio_add_hw_page returns 0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected result of building a small enough bio not to be split in the IO path. However, iov_iter is not advanced in this case, causing the same pages are filled for the bio again and again. Fix the case by properly advancing the iov_iter for already processed pages. Fixes: 0512a75b ("block: Introduce REQ_OP_ZONE_APPEND") Cc: stable@vger.kernel.org # 5.8+ Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 15, 2020
-
-
Mauro Carvalho Chehab authored
Fix this warning: ./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string. The thing is that *iter is not a valid markup. That seems to be a typo: *iter -> @iter Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
Mauro Carvalho Chehab authored
Using "@bio's parent" causes the following waring: ./block/bio.c:10: WARNING: Inline emphasis start-string without end-string. The main problem here is that this would be converted into: **bio**'s parent By kernel-doc, which is not a valid notation. It would be possible to use, instead, this kernel-doc markup: ``bio's`` parent Yet, here, is probably simpler to just use an altenative language: the parent of @bio Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
- Oct 05, 2020
-
-
Eric Biggers authored
bio_crypt_clone() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, bio_crypt_clone() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via kcryptd_io_read() in drivers/md/dm-crypt.c. Neither case is currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make bio_crypt_clone() able to fail, analogous to bio_integrity_clone(). Reported-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Eric Biggers <ebiggers@google.com> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Satya Tangirala <satyat@google.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 09, 2020
-
-
Ritesh Harjani authored
If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway not merging this page in this bio, then it make sense to make same_page also as false before returning. Without this patch, we hit below WARNING in iomap. This mostly happens with very large memory system and / or after tweaking vm dirty threshold params to delay writeback of dirty data. WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150 CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6 Call Trace: __remove_mapping+0x154/0x320 (unreliable) iomap_releasepage+0x80/0x180 try_to_release_page+0x94/0xe0 invalidate_inode_page+0xc8/0x110 invalidate_mapping_pages+0x1dc/0x540 generic_fadvise+0x3c8/0x450 xfs_file_fadvise+0x2c/0xe0 [xfs] vfs_fadvise+0x3c/0x60 ksys_fadvise64_64+0x68/0xe0 sys_fadvise64+0x28/0x40 system_call_exception+0xf8/0x1c0 system_call_common+0xf0/0x278 Fixes: cc90bc68 ("block: fix "check bi_size overflow before merge"") Reported-by:
Shivaprasad G Bhat <sbhat@linux.ibm.com> Suggested-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by:
Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 18, 2020
-
-
Matthew Wilcox (Oracle) authored
If we pass in an offset which is larger than PAGE_SIZE, then page_is_mergeable() thinks it's not mergeable with the previous bio_vec, leading to a large number of bio_vecs being used. Use a slightly more obvious test that the two pages are compatible with each other. Fixes: 52d52d1c ("block: only allow contiguous page structs in a bio_vec") Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 31, 2020
-
-
Randy Dunlap authored
Drop the repeated words "a" and "the". Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 01, 2020
-
-
Christoph Hellwig authored
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 29, 2020
-
-
Christoph Hellwig authored
Keep the cgroup code together. Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_associate_blkg_from_page is a special purpose helper for swap bios that doesn't need access to bio internals. Move it to the swap code instead of having it in bio.c. Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Merge __bio_associate_blkg into the only caller, which allows to slightly reduce the RCU crticial section and better explain the code flow. Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_clone_blkg_association is supposed to clone the associatation, but actually ends up doing a search with a tryget. As we know we have a reference on the source cgroup just get an unconditional additional reference to it and call it a day. That also removes the need for a RCU critical section. Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_disassociate_blkg has two callers, of which one immediately assigns a new value to >bi_blkg. Just open code the function in the two callers. Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 24, 2020
-
-
Gustavo A. R. Silva authored
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83 Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 05, 2020
-
-
Christoph Hellwig authored
The status can be trivially derived from the bio itself. That also avoid callers like NVMe to incorrectly pass a blk_status_t instead of the errno, and the overhead of translating the blk_status_t to the errno in the I/O completion fast path when no tracing is enabled. Fixes: 35fe0d12 ("nvme: trace bio completion") Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 27, 2020
-
-
Christoph Hellwig authored
All callers are in blk-core.c, so move update_io_ticks over. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Remove these now unused functions. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 19, 2020
-
-
Christoph Hellwig authored
part_inc_in_flight and part_dec_in_flight only have one caller each, and those callers are purely for bio based drivers. Merge each function into the only caller, and remove the superflous blk-mq checks. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 14, 2020
-
-
Satya Tangirala authored
We must have some way of letting a storage device driver know what encryption context it should use for en/decrypting a request. However, it's the upper layers (like the filesystem/fscrypt) that know about and manages encryption contexts. As such, when the upper layer submits a bio to the block layer, and this bio eventually reaches a device driver with support for inline encryption, the device driver will need to have been told the encryption context for that bio. We want to communicate the encryption context from the upper layer to the storage device along with the bio, when the bio is submitted to the block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can represent an encryption context (note that we can't use the bi_private field in struct bio to do this because that field does not function to pass information across layers in the storage stack). We also introduce various functions to manipulate the bio_crypt_ctx and make the bio/request merging logic aware of the bio_crypt_ctx. We also make changes to blk-mq to make it handle bios with encryption contexts. blk-mq can merge many bios into the same request. These bios need to have contiguous data unit numbers (the necessary changes to blk-merge are also made to ensure this) - as such, it suffices to keep the data unit number of just the first bio, since that's all a storage driver needs to infer the data unit number to use for each data block in each bio in a request. blk-mq keeps track of the encryption context to be used for all the bios in a request with the request's rq_crypt_ctx. When the first bio is added to an empty request, blk-mq will program the encryption context of that bio into the request_queue's keyslot manager, and store the returned keyslot in the request's rq_crypt_ctx. All the functions to operate on encryption contexts are in blk-crypto.c. Upper layers only need to call bio_crypt_set_ctx with the encryption key, algorithm and data_unit_num; they don't have to worry about getting a keyslot for each encryption context, as blk-mq/blk-crypto handles that. Blk-crypto also makes it possible for request-based layered devices like dm-rq to make use of inline encryption hardware by cloning the rq_crypt_ctx and programming a keyslot in the new request_queue when necessary. Note that any user of the block layer can submit bios with an encryption context, such as filesystems, device-mapper targets, etc. Signed-off-by:
Satya Tangirala <satyat@google.com> Reviewed-by:
Eric Biggers <ebiggers@google.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- May 13, 2020
-
-
Johannes Thumshirn authored
Export bio_release_pages and bio_iov_iter_get_pages, so they can be used from modular code. Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by:
Keith Busch <kbusch@kernel.org> [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a max_sectors argument. This max_sectors argument can be used to specify constraints from the hardware. Signed-off-by:
Christoph Hellwig <hch@lst.de> [ jth: rebased and made public for blk-map.c ] Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Daniel Wagner <dwagner@suse.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 27, 2020
-
-
Christoph Hellwig authored
The bio_map_* helpers are just the low-level helpers for the blk_rq_map_* APIs. Move them together for better logical grouping, as no there isn't much overlap with other code in bio.c. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 25, 2020
-
-
Christoph Hellwig authored
This is bio layer functionality and not related to buffer heads. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Konstantin Khlebnikov authored
Column "time_in_queue" in diskstats is supposed to show total waiting time of all requests. I.e. value should be equal to the sum of times from other columns. But this is not true, because column "time_in_queue" is counted separately in jiffies rather than in nanoseconds as other times. This patch removes redundant counter for "time_in_queue" and shows total time of read, write, discard and flush requests. Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Konstantin Khlebnikov authored
Currently io_ticks is approximated by adding one at each start and end of requests if jiffies counter has changed. This works perfectly for requests shorter than a jiffy or if one of requests starts/ends at each jiffy. If disk executes just one request at a time and they are longer than two jiffies then only first and last jiffies will be accounted. Fix is simple: at the end of request add up into io_ticks jiffies passed since last update rather than just one jiffy. Example: common HDD executes random read 4k requests around 12ms. fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 & iostat -x 10 sdb Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch: Before: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43 After: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99 Now io_ticks does not loose time between start and end of requests, but for queue-depth > 1 some I/O time between adjacent starts might be lost. For load estimation "%util" is not as useful as average queue length, but it clearly shows how often disk queue is completely empty. Fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting") Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 24, 2020
-
-
Christoph Hellwig authored
Thes functions aren't really related to partition support, so move them to a more suitable place. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 18, 2020
-
-
Ming Lei authored
submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD takes up to 100 second on some devices. Also any block I/O operation that occurs after the BLKSECDISCARD is submitted will also potentially be affected by the hung task timeouts. Another report is that task hang can be observed when running mkfs over raid10 which takes a small max discard sectors limit because of chunk size. So prevent hung_check from firing by taking same approach used in blk_execute_rq(), and the wake-up interval is set as half the hung_check timer period, which keeps overhead low enough. Cc: Salman Qazi <sqazi@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Bart Van Assche <bvanassche@acm.org> Link: https://lkml.org/lkml/2020/2/12/1193 Reported-by:
Salman Qazi <sqazi@google.com> Reviewed-by:
Jesse Barnes <jsbarnes@google.com> Reviewed-by:
Salman Qazi <sqazi@google.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 09, 2020
-
-
Ming Lei authored
Commit 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod") adds bio_truncate() for handling bio EOD. However, bio_truncate() doesn't use the passed 'op' parameter from guard_bio_eod's callers. So bio_trunacate() may retrieve wrong 'op', and zering pages may not be done for READ bio. Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs() in submit_bh_wbc() so that bio_truncate() can always retrieve correct op info. Meantime remove the 'op' parameter from guard_bio_eod() because it isn't used any more. Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixes: 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Fold in kerneldoc and bio_op() change. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 28, 2019
-
-
Ming Lei authored
Some filesystem, such as vfat, may send bio which crosses device boundary, and the worse thing is that the IO request starting within device boundaries can contain more than one segment past EOD. Commit dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors") tries to fix this issue by returning -EIO for this situation. However, this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb() may hang for ever. Also the current truncating on last segment is dangerous by updating the last bvec, given bvec table becomes not immutable any more, and fs bio users may not retrieve the truncated pages via bio_for_each_segment_all() in its .end_io callback. Fixes this issue by supporting multi-segment truncating. And the approach is simpler: - just update bio size since block layer can make correct bvec with the updated bio size. Then bvec table becomes really immutable. - zero all truncated segments for read bio Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixed-by: dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors") Reported-by:
<syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 10, 2019
-
-
Andreas Gruenbacher authored
This partially reverts commit e3a5d8e3. Commit e3a5d8e3 ("check bi_size overflow before merge") adds a bio_full check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail when the last bi_io_vec has been reached. Instead, what we want here is only the bi_size overflow check. Fixes: e3a5d8e3 ("block: check bi_size overflow before merge") Cc: stable@vger.kernel.org # v5.4+ Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 05, 2019
-
-
Justin Tee authored
7c20f116 ("bio-integrity: stop abusing bi_end_io") moves bio_integrity_free from bio_uninit() to bio_integrity_verify_fn() and bio_endio(). This way looks wrong because bio may be freed without calling bio_endio(), for example, blk_rq_unprep_clone() is called from dm_mq_queue_rq() when the underlying queue of dm-mpath is busy. So memory leak of bio integrity data is caused by commit 7c20f116. Fixes this issue by re-adding bio_integrity_free() to bio_uninit(). Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by Justin Tee <justin.tee@broadcom.com> Add commit log, and simplify/fix the original patch wroten by Justin. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 12, 2019
-
-
Junichi Nomura authored
__bio_try_merge_page() may merge a page to bio without bio_full() check and cause bi_size overflow. The overflow typically ends up with sd_init_command() warning on zero segment request with call trace like this: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180 CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:scsi_init_io+0x156/0x180 RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246 RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000 RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118 RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000 R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000 FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0 Call Trace: sd_init_command+0x326/0xb40 [sd_mod] scsi_queue_rq+0x502/0xaa0 ? blk_mq_get_driver_tag+0xe7/0x120 blk_mq_dispatch_rq_list+0x256/0x5a0 ? elv_rb_del+0x24/0x30 ? deadline_remove_request+0x7b/0xc0 blk_mq_do_dispatch_sched+0xa3/0x140 blk_mq_sched_dispatch_requests+0xfb/0x170 __blk_mq_run_hw_queue+0x81/0x130 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x179/0x390 worker_thread+0x4f/0x3e0 kthread+0x105/0x140 ? max_active_store+0x80/0x80 ? kthread_bind+0x20/0x20 ret_from_fork+0x35/0x40 ---[ end trace f9036abf5af4a4d3 ]--- blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0 XFS (sdd1): writeback error on sector 2875552 __bio_try_merge_page() should check the overflow before actually doing merge. Fixes: 07173c3e ("block: enable multipage bvecs") Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 22, 2019
-
-
Christoph Hellwig authored
Hiding page refcount manipulation inside a low-level bio helper is somewhat awkward. Instead return the same page information to the callers, where it fits in much better. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Passsthrough bio handling should be the same as normal bio handling, except that we need to take hardware limitations into account. Thus use the common try_merge implementation after checking the hardware limits. This changes behavior in that we now also check segment and dma boundary settings for same page merges, which is a little more work but has no effect as those need to be larger than the page size. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
If we can add more data into an existing segment we do not create a gap per definition, so move the check for a gap after the attempt to merge into the segment. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 14, 2019
-
-
Johannes Weiner authored
psi tracks the time tasks wait for refaulting pages to become uptodate, but it does not track the time spent submitting the IO. The submission part can be significant if backing storage is contended or when cgroup throttling (io.latency) is in effect - a lot of time is spent in submit_bio(). In that case, we underreport memory pressure. Annotate submit_bio() to account submission time as memory stall when the bio is reading userspace workingset pages. Tested-by:
Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 06, 2019
-
-
Hans Holmberg authored
Now that there no module users left of bio_map_kern, stop exporting the symbol. Reviewed-by:
Javier González <javier@javigon.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Hans Holmberg <hans@owltronix.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-