diff mbox series

[RFC,76/86] treewide: block: remove cond_resched()

Message ID 20231107230822.371443-20-ankur.a.arora@oracle.com (mailing list archive)
State New
Headers show
Series Make the kernel preemptible | expand

Commit Message

Ankur Arora Nov. 7, 2023, 11:08 p.m. UTC
There are broadly three sets of uses of cond_resched():

1.  Calls to cond_resched() out of the goodness of our heart,
    otherwise known as avoiding lockup splats.

2.  Open coded variants of cond_resched_lock() which call
    cond_resched().

3.  Retry or error handling loops, where cond_resched() is used as a
    quick alternative to spinning in a tight-loop.

When running under a full preemption model, the cond_resched() reduces
to a NOP (not even a barrier) so removing it obviously cannot matter.

But considering only voluntary preemption models (for say code that
has been mostly tested under those), for set-1 and set-2 the
scheduler can now preempt kernel tasks running beyond their time
quanta anywhere they are preemptible() [1]. Which removes any need
for these explicitly placed scheduling points.

The cond_resched() calls in set-3 are a little more difficult.
To start with, given it's NOP character under full preemption, it
never actually saved us from a tight loop.
With voluntary preemption, it's not a NOP, but it might as well be --
for most workloads the scheduler does not have an interminable supply
of runnable tasks on the runqueue.

So, cond_resched() is useful to not get softlockup splats, but not
terribly good for error handling. Ideally, these should be replaced
with some kind of timed or event wait.
For now we use cond_resched_stall(), which tries to schedule if
possible, and executes a cpu_relax() if not.

All the uses here are in set-1 (some right after we give up the
lock, causing an explicit preemption check.)

We can remove all of them.

[1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@oracle.com/

Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: cgroups@vger.kernel.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 block/blk-cgroup.c |  2 --
 block/blk-lib.c    | 11 -----------
 block/blk-mq.c     |  3 ---
 block/blk-zoned.c  |  6 ------
 4 files changed, 22 deletions(-)
diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 4a42ea2972ad..145c378367ec 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -597,7 +597,6 @@  static void blkg_destroy_all(struct gendisk *disk)
 		if (!(--count)) {
 			count = BLKG_DESTROY_BATCH_SIZE;
 			spin_unlock_irq(&q->queue_lock);
-			cond_resched();
 			goto restart;
 		}
 	}
@@ -1234,7 +1233,6 @@  static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 			 * need to rescheduling to avoid softlockup.
 			 */
 			spin_unlock_irq(&blkcg->lock);
-			cond_resched();
 			spin_lock_irq(&blkcg->lock);
 			continue;
 		}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..0bb118e9748b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -69,14 +69,6 @@  int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		bio->bi_iter.bi_size = req_sects << 9;
 		sector += req_sects;
 		nr_sects -= req_sects;
-
-		/*
-		 * We can loop for a long time in here, if someone does
-		 * full device discards (like mkfs). Be nice and allow
-		 * us to schedule out to avoid softlocking if preempt
-		 * is disabled.
-		 */
-		cond_resched();
 	}
 
 	*biop = bio;
@@ -145,7 +137,6 @@  static int __blkdev_issue_write_zeroes(struct block_device *bdev,
 			bio->bi_iter.bi_size = nr_sects << 9;
 			nr_sects = 0;
 		}
-		cond_resched();
 	}
 
 	*biop = bio;
@@ -189,7 +180,6 @@  static int __blkdev_issue_zero_pages(struct block_device *bdev,
 			if (bi_size < sz)
 				break;
 		}
-		cond_resched();
 	}
 
 	*biop = bio;
@@ -336,7 +326,6 @@  int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 			bio_put(bio);
 			break;
 		}
-		cond_resched();
 	}
 	blk_finish_plug(&plug);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1fafd54dce3c..f45ee6a69700 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1372,7 +1372,6 @@  static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
 {
 	do {
 		blk_hctx_poll(rq->q, rq->mq_hctx, NULL, 0);
-		cond_resched();
 	} while (!completion_done(wait));
 }
 
@@ -4310,7 +4309,6 @@  static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	for (i = 0; i < set->nr_hw_queues; i++) {
 		if (!__blk_mq_alloc_map_and_rqs(set, i))
 			goto out_unwind;
-		cond_resched();
 	}
 
 	return 0;
@@ -4425,7 +4423,6 @@  static int blk_mq_realloc_tag_set_tags(struct blk_mq_tag_set *set,
 				__blk_mq_free_map_and_rqs(set, i);
 			return -ENOMEM;
 		}
-		cond_resched();
 	}
 
 done:
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 619ee41a51cc..8005f55e22e5 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -208,9 +208,6 @@  static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
 				   gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		sector += zone_sectors;
-
-		/* This may take a while, so be nice to others */
-		cond_resched();
 	}
 
 	if (bio) {
@@ -293,9 +290,6 @@  int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
 		bio = blk_next_bio(bio, bdev, 0, op | REQ_SYNC, gfp_mask);
 		bio->bi_iter.bi_sector = sector;
 		sector += zone_sectors;
-
-		/* This may take a while, so be nice to others */
-		cond_resched();
 	}
 
 	ret = submit_bio_wait(bio);