Message ID | 20240405085018.243260-1-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: work around sparse in queue_limits_commit_update | expand |
On 05/04/2024 09:50, Christoph Hellwig wrote: > The spare lock context tracking warns about the mutex unlock in > queue_limits_commit_update despite the __releases annoation: > > block/blk-settings.c:263:9: warning: context imbalance in 'queue_limits_commit_update' - wrong > count at exit > > As far as I can tell that is because the sparse lock tracking code is > busted for inline functions. Work around it by splitting an inline > wrapper out of queue_limits_commit_update and doing the unlock there. I find that just removing the __releases() from queue_limits_commit_update makes it go away. I have been playing around with this and I can't see why. I'm not convinced that mutexes are handled properly by sparse. Here is a similar issue for net code: john@localhost:~/mnt_sda4/john/mkp-scsi> make C=2 net/core/sock.o CHECK scripts/mod/empty.c CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers CHECK net/core/sock.c net/core/sock.c:2393:9: warning: context imbalance in 'sk_clone_lock' - wrong count at exit net/core/sock.c:2397:6: warning: context imbalance in 'sk_free_unlock_clone' - unexpected unlock net/core/sock.c:4034:13: warning: context imbalance in 'proto_seq_start' - wrong count at exit net/core/sock.c:4046:13: warning: context imbalance in 'proto_seq_stop' - wrong count at exit john@localhost:~/mnt_sda4/john/mkp-scsi> static void proto_seq_stop(struct seq_file *seq, void *v) __releases(proto_list_mutex) { mutex_unlock(&proto_list_mutex); } That seems similar to the queue_limits_commit_update problem, but no inlining. Changing the q->limits_lock to a semaphore seems to make the issue go away; but how to annotate queue_limits_set() is tricky regardless, as it acquires and then releases silently. Anyway, changing the code, below, for sparse when it seems somewhat broken/unreliable may not be the best approach. Thanks, John > > Reported-by: John Garry <john.g.garry@oracle.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-settings.c | 40 +++++++++++++++++----------------------- > include/linux/blkdev.h | 32 +++++++++++++++++++++++++++++--- > 2 files changed, 46 insertions(+), 26 deletions(-) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index cdbaef159c4bc3..9ef52b80965dc1 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -239,31 +239,20 @@ int blk_set_default_limits(struct queue_limits *lim) > return blk_validate_limits(lim); > } > > -/** > - * queue_limits_commit_update - commit an atomic update of queue limits > - * @q: queue to update > - * @lim: limits to apply > - * > - * Apply the limits in @lim that were obtained from queue_limits_start_update() > - * and updated by the caller to @q. > - * > - * Returns 0 if successful, else a negative error code. > - */ > -int queue_limits_commit_update(struct request_queue *q, > +int __queue_limits_commit_update(struct request_queue *q, > struct queue_limits *lim) > - __releases(q->limits_lock) > { > - int error = blk_validate_limits(lim); > - > - if (!error) { > - q->limits = *lim; > - if (q->disk) > - blk_apply_bdi_limits(q->disk->bdi, lim); > - } > - mutex_unlock(&q->limits_lock); > - return error; > + int error; > + > + error = blk_validate_limits(lim); > + if (error) > + return error; > + q->limits = *lim; > + if (q->disk) > + blk_apply_bdi_limits(q->disk->bdi, lim); > + return 0; > } > -EXPORT_SYMBOL_GPL(queue_limits_commit_update); > +EXPORT_SYMBOL_GPL(__queue_limits_commit_update); > > /** > * queue_limits_set - apply queue limits to queue > @@ -278,8 +267,13 @@ EXPORT_SYMBOL_GPL(queue_limits_commit_update); > */ > int queue_limits_set(struct request_queue *q, struct queue_limits *lim) > { > + int error; > + > mutex_lock(&q->limits_lock); > - return queue_limits_commit_update(q, lim); > + error = __queue_limits_commit_update(q, lim); > + mutex_unlock(&q->limits_lock); > + > + return error; > } > EXPORT_SYMBOL_GPL(queue_limits_set); > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index c3e8f7cf96be9e..99f1d2fcec4a2a 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -869,6 +869,15 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, > return chunk_sectors - (offset & (chunk_sectors - 1)); > } > > +/* > + * Note: we want queue_limits_start_update to be inline to avoid passing a huge > + * strut by value. This in turn requires the part of queue_limits_commit_update > + * that unlocks the mutex to be inline as well to not confuse the sparse lock > + * context tracking. Never use __queue_limits_commit_update directly. > + */ > +int __queue_limits_commit_update(struct request_queue *q, > + struct queue_limits *lim); > + > /** > * queue_limits_start_update - start an atomic update of queue limits > * @q: queue to update > @@ -883,13 +892,30 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, > */ > static inline struct queue_limits > queue_limits_start_update(struct request_queue *q) > - __acquires(q->limits_lock) > { > mutex_lock(&q->limits_lock); > return q->limits; > } > -int queue_limits_commit_update(struct request_queue *q, > - struct queue_limits *lim); > + > +/** > + * queue_limits_commit_update - commit an atomic update of queue limits > + * @q: queue to update > + * @lim: limits to apply > + * > + * Apply the limits in @lim that were obtained from queue_limits_start_update() > + * and updated by the caller to @q. > + * > + * Returns 0 if successful, else a negative error code. > + */ > +static inline int queue_limits_commit_update(struct request_queue *q, > + struct queue_limits *lim) > +{ > + int error = __queue_limits_commit_update(q, lim); > + > + mutex_unlock(&q->limits_lock); > + return error; > +} > + > int queue_limits_set(struct request_queue *q, struct queue_limits *lim); > > /*
On Fri, Apr 05, 2024 at 01:31:10PM +0100, John Garry wrote: > Anyway, changing the code, below, for sparse when it seems somewhat > broken/unreliable may not be the best approach. Ok, let's skip this and I'll report a bug to the sparse maintainers (unless you want to do that, in which case I'll happily leave it to you).
On 05/04/2024 15:38, Christoph Hellwig wrote: > On Fri, Apr 05, 2024 at 01:31:10PM +0100, John Garry wrote: >> Anyway, changing the code, below, for sparse when it seems somewhat >> broken/unreliable may not be the best approach. > Ok, let's skip this and I'll report a bug to the sparse maintainers > (unless you want to do that, in which case I'll happily leave it to > you). I can look at this issue a bit more and report a bug if I can't find the real cause of the problem.
On 05/04/2024 15:38, Christoph Hellwig wrote: > On Fri, Apr 05, 2024 at 01:31:10PM +0100, John Garry wrote: >> Anyway, changing the code, below, for sparse when it seems somewhat >> broken/unreliable may not be the best approach. > Ok, let's skip this and I'll report a bug to the sparse maintainers > (unless you want to do that, in which case I'll happily leave it to > you). This actually looks like a kernel issue - that being that the mutex API is not annotated for lock checking. For a reference, see this: https://lore.kernel.org/lkml/cover.1579893447.git.jbi.octave@gmail.com/T/#mbb8bda6c0a7ca7ce19f46df976a8e3b489745488 As a quick hack to prove, you can try this for building blk-setting.c: ---->8---- diff --git a/block/blk-settings.c b/block/blk-settings.c index cdbaef159c4b..c9da5549f3c2 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -277,6 +277,7 @@ EXPORT_SYMBOL_GPL(queue_limits_commit_update); * Returns 0 if successful, else a negative error code. */ int queue_limits_set(struct request_queue *q, struct queue_limits *lim) +__acquires(q->limits_lock) { mutex_lock(&q->limits_lock); return queue_limits_commit_update(q, lim); diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 67edc4ca2bee..af5d3e20553b 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -143,7 +143,7 @@ do { \ } while (0) #else -extern void mutex_lock(struct mutex *lock); +extern __lockfunc void mutex_lock(struct mutex *lock) __acquires(lock); extern int __must_check mutex_lock_interruptible(struct mutex *lock); extern int __must_check mutex_lock_killable(struct mutex *lock); extern void mutex_lock_io(struct mutex *lock); @@ -162,7 +162,7 @@ extern void mutex_lock_io(struct mutex *lock); * Returns 1 if the mutex has been acquired successfully, and 0 on contention. */ extern int mutex_trylock(struct mutex *lock); -extern void mutex_unlock(struct mutex *lock); +extern __lockfunc void mutex_unlock(struct mutex *lock) __releases(lock); extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); ----8<---- I would need to investigate further for any progress in adding that lock checking to the mutex API, but it did not look promising from that patchset. For now I suppose you can either: a. remove current annotation. b. change to a spinlock - I don't think that anything requiring scheduling is happening when updating the limits, but would need to audit to be sure. BTW, as for lock checking for inline functions in the header, I think that there is no checking there.
On Fri, Apr 05, 2024 at 05:43:24PM +0100, John Garry wrote: > This actually looks like a kernel issue - that being that the mutex API is > not annotated for lock checking. Oh. Yeah, that would explain the weird behavior. > I would need to investigate further for any progress in adding that lock > checking to the mutex API, but it did not look promising from that > patchset. For now I suppose you can either: > a. remove current annotation. I can send a patch for that. > b. change to a spinlock - I don't think that anything requiring scheduling > is happening when updating the limits, but would need to audit to be sure. With SCSI we'll hold it over the new ->device_configure, which must be able to sleep.
On 05/04/2024 18:13, Christoph Hellwig wrote: > On Fri, Apr 05, 2024 at 05:43:24PM +0100, John Garry wrote: >> This actually looks like a kernel issue - that being that the mutex API is >> not annotated for lock checking. > > Oh. Yeah, that would explain the weird behavior. > >> I would need to investigate further for any progress in adding that lock >> checking to the mutex API, but it did not look promising from that >> patchset. For now I suppose you can either: >> a. remove current annotation. > > I can send a patch for that. > A reminder on this one. I can send a patch if you like. >> b. change to a spinlock - I don't think that anything requiring scheduling >> is happening when updating the limits, but would need to audit to be sure. > > With SCSI we'll hold it over the new ->device_configure, which must > be able to sleep.
On Thu, May 09, 2024 at 10:49:29AM +0100, John Garry wrote: > On 05/04/2024 18:13, Christoph Hellwig wrote: >> On Fri, Apr 05, 2024 at 05:43:24PM +0100, John Garry wrote: >>> This actually looks like a kernel issue - that being that the mutex API is >>> not annotated for lock checking. >> >> Oh. Yeah, that would explain the weird behavior. >> >>> I would need to investigate further for any progress in adding that lock >>> checking to the mutex API, but it did not look promising from that >>> patchset. For now I suppose you can either: >>> a. remove current annotation. >> >> I can send a patch for that. >> > > A reminder on this one. > > I can send a patch if you like. Yes, please. Sorry, I expected you to just do that.
On 09/05/2024 13:58, Christoph Hellwig wrote: >> A reminder on this one. >> >> I can send a patch if you like. > Yes, please. ok, fine. But, uggh, now I see more C=1 warnings on Jens' 6.9 branch. It's quite late to be sending fixes for those (for 6.9)... I'll send then anyway.
On 5/9/24 7:50 AM, John Garry wrote: > On 09/05/2024 13:58, Christoph Hellwig wrote: >>> A reminder on this one. >>> >>> I can send a patch if you like. >> Yes, please. > > ok, fine. > > But, uggh, now I see more C=1 warnings on Jens' 6.9 branch. It's quite > late to be sending fixes for those (for 6.9)... > > I'll send then anyway. Send it for 6.10.
On 09/05/2024 15:00, Jens Axboe wrote: >> But, uggh, now I see more C=1 warnings on Jens' 6.9 branch. It's quite >> late to be sending fixes for those (for 6.9)... >> >> I'll send then anyway. > Send it for 6.10. ok, fine. JFYI, Here's what I found on the 6.10 queue in block/: block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. Should it be static? block/blk-settings.c:263:9: warning: context imbalance in 'queue_limits_commit_update' - wrong count at exit block/blk-cgroup.c:811:5: warning: context imbalance in 'blkg_conf_prep' - wrong count at exit block/blk-cgroup.c:941:9: warning: context imbalance in 'blkg_conf_exit' - different lock contexts for basic block block/blk-iocost.c:732:9: warning: context imbalance in 'iocg_lock' - wrong count at exit block/blk-iocost.c:743:28: warning: context imbalance in 'iocg_unlock' - unexpected unlock block/blk-zoned.c:576:30: warning: context imbalance in 'disk_get_and_lock_zone_wplug' - wrong count at exit block/blk-zoned.c: note: in included file (through include/linux/blkdev.h): ./include/linux/bio.h:592:9: warning: context imbalance in 'blk_zone_wplug_handle_write' - unexpected unlock block/blk-zoned.c:1721:31: warning: context imbalance in 'blk_revalidate_seq_zone' - unexpected unlock block/bfq-iosched.c:5498:9: warning: context imbalance in 'bfq_exit_icq' - different lock contexts for basic block Actually most pre-date v6.9 anyway, apart from the zoned stuff. And the bdev.c static warning is an outstanding patch, which I replied to. At a glance, some of those pre-v6.9 issues looks non-trivial to fix, which may be the reason that they have not been fixed... John
On 5/10/24 00:04, John Garry wrote: > On 09/05/2024 15:00, Jens Axboe wrote: >>> But, uggh, now I see more C=1 warnings on Jens' 6.9 branch. It's quite >>> late to be sending fixes for those (for 6.9)... >>> >>> I'll send then anyway. >> Send it for 6.10. > > ok, fine. > > JFYI, Here's what I found on the 6.10 queue in block/: > > block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. > Should it be static? > block/blk-settings.c:263:9: warning: context imbalance in > 'queue_limits_commit_update' - wrong count at exit > block/blk-cgroup.c:811:5: warning: context imbalance in > 'blkg_conf_prep' - wrong count at exit > block/blk-cgroup.c:941:9: warning: context imbalance in > 'blkg_conf_exit' - different lock contexts for basic block > block/blk-iocost.c:732:9: warning: context imbalance in 'iocg_lock' - > wrong count at exit > block/blk-iocost.c:743:28: warning: context imbalance in 'iocg_unlock' > - unexpected unlock > block/blk-zoned.c:576:30: warning: context imbalance in > 'disk_get_and_lock_zone_wplug' - wrong count at exit > block/blk-zoned.c: note: in included file (through include/linux/blkdev.h): > ./include/linux/bio.h:592:9: warning: context imbalance in > 'blk_zone_wplug_handle_write' - unexpected unlock > block/blk-zoned.c:1721:31: warning: context imbalance in > 'blk_revalidate_seq_zone' - unexpected unlock > block/bfq-iosched.c:5498:9: warning: context imbalance in > 'bfq_exit_icq' - different lock contexts for basic block > > Actually most pre-date v6.9 anyway, apart from the zoned stuff. And the > bdev.c static warning is an outstanding patch, which I replied to. I will have a look at the zone stuff. This is all from the new addition of zone write plugging, so all my bad (I did run with lockdep but did not compile test with sparse). > > At a glance, some of those pre-v6.9 issues looks non-trivial to fix, > which may be the reason that they have not been fixed... Yeah. The context imbalance warnings are sometimes hard to address depending on the code. Let's see.
On 10/05/2024 02:40, Damien Le Moal wrote: Hi Damien, >> block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. >> Should it be static? >> block/blk-settings.c:263:9: warning: context imbalance in >> 'queue_limits_commit_update' - wrong count at exit >> block/blk-cgroup.c:811:5: warning: context imbalance in >> 'blkg_conf_prep' - wrong count at exit >> block/blk-cgroup.c:941:9: warning: context imbalance in >> 'blkg_conf_exit' - different lock contexts for basic block >> block/blk-iocost.c:732:9: warning: context imbalance in 'iocg_lock' - >> wrong count at exit >> block/blk-iocost.c:743:28: warning: context imbalance in 'iocg_unlock' >> - unexpected unlock >> block/blk-zoned.c:576:30: warning: context imbalance in >> 'disk_get_and_lock_zone_wplug' - wrong count at exit >> block/blk-zoned.c: note: in included file (through include/linux/blkdev.h): >> ./include/linux/bio.h:592:9: warning: context imbalance in >> 'blk_zone_wplug_handle_write' - unexpected unlock >> block/blk-zoned.c:1721:31: warning: context imbalance in >> 'blk_revalidate_seq_zone' - unexpected unlock >> block/bfq-iosched.c:5498:9: warning: context imbalance in >> 'bfq_exit_icq' - different lock contexts for basic block >> >> Actually most pre-date v6.9 anyway, apart from the zoned stuff. And the >> bdev.c static warning is an outstanding patch, which I replied to. > I will have a look at the zone stuff. This is all from the new addition of zone > write plugging, so all my bad (I did run with lockdep but did not compile test > with sparse). > Can you confirm that you looked to solve these zoned device sparse warnings and they are difficult to solve? >> At a glance, some of those pre-v6.9 issues looks non-trivial to fix, >> which may be the reason that they have not been fixed... > Yeah. The context imbalance warnings are sometimes hard to address depending on > the code. Let's see. I am looking the other sparse warnings in block/ now... John
On 6/12/24 17:37, John Garry wrote: > On 10/05/2024 02:40, Damien Le Moal wrote: > > Hi Damien, > >>> block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. >>> Should it be static? >>> block/blk-settings.c:263:9: warning: context imbalance in >>> 'queue_limits_commit_update' - wrong count at exit >>> block/blk-cgroup.c:811:5: warning: context imbalance in >>> 'blkg_conf_prep' - wrong count at exit >>> block/blk-cgroup.c:941:9: warning: context imbalance in >>> 'blkg_conf_exit' - different lock contexts for basic block >>> block/blk-iocost.c:732:9: warning: context imbalance in 'iocg_lock' - >>> wrong count at exit >>> block/blk-iocost.c:743:28: warning: context imbalance in 'iocg_unlock' >>> - unexpected unlock >>> block/blk-zoned.c:576:30: warning: context imbalance in >>> 'disk_get_and_lock_zone_wplug' - wrong count at exit >>> block/blk-zoned.c: note: in included file (through include/linux/blkdev.h): >>> ./include/linux/bio.h:592:9: warning: context imbalance in >>> 'blk_zone_wplug_handle_write' - unexpected unlock >>> block/blk-zoned.c:1721:31: warning: context imbalance in >>> 'blk_revalidate_seq_zone' - unexpected unlock >>> block/bfq-iosched.c:5498:9: warning: context imbalance in >>> 'bfq_exit_icq' - different lock contexts for basic block >>> >>> Actually most pre-date v6.9 anyway, apart from the zoned stuff. And the >>> bdev.c static warning is an outstanding patch, which I replied to. >> I will have a look at the zone stuff. This is all from the new addition of zone >> write plugging, so all my bad (I did run with lockdep but did not compile test >> with sparse). >> > Can you confirm that you looked to solve these zoned device sparse > warnings and they are difficult to solve? Yes, I had a look but failed to see any way to remove these. Annotations did not help, at best only changing these warnings into other warnings.
diff --git a/block/blk-settings.c b/block/blk-settings.c index cdbaef159c4bc3..9ef52b80965dc1 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -239,31 +239,20 @@ int blk_set_default_limits(struct queue_limits *lim) return blk_validate_limits(lim); } -/** - * queue_limits_commit_update - commit an atomic update of queue limits - * @q: queue to update - * @lim: limits to apply - * - * Apply the limits in @lim that were obtained from queue_limits_start_update() - * and updated by the caller to @q. - * - * Returns 0 if successful, else a negative error code. - */ -int queue_limits_commit_update(struct request_queue *q, +int __queue_limits_commit_update(struct request_queue *q, struct queue_limits *lim) - __releases(q->limits_lock) { - int error = blk_validate_limits(lim); - - if (!error) { - q->limits = *lim; - if (q->disk) - blk_apply_bdi_limits(q->disk->bdi, lim); - } - mutex_unlock(&q->limits_lock); - return error; + int error; + + error = blk_validate_limits(lim); + if (error) + return error; + q->limits = *lim; + if (q->disk) + blk_apply_bdi_limits(q->disk->bdi, lim); + return 0; } -EXPORT_SYMBOL_GPL(queue_limits_commit_update); +EXPORT_SYMBOL_GPL(__queue_limits_commit_update); /** * queue_limits_set - apply queue limits to queue @@ -278,8 +267,13 @@ EXPORT_SYMBOL_GPL(queue_limits_commit_update); */ int queue_limits_set(struct request_queue *q, struct queue_limits *lim) { + int error; + mutex_lock(&q->limits_lock); - return queue_limits_commit_update(q, lim); + error = __queue_limits_commit_update(q, lim); + mutex_unlock(&q->limits_lock); + + return error; } EXPORT_SYMBOL_GPL(queue_limits_set); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c3e8f7cf96be9e..99f1d2fcec4a2a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -869,6 +869,15 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, return chunk_sectors - (offset & (chunk_sectors - 1)); } +/* + * Note: we want queue_limits_start_update to be inline to avoid passing a huge + * strut by value. This in turn requires the part of queue_limits_commit_update + * that unlocks the mutex to be inline as well to not confuse the sparse lock + * context tracking. Never use __queue_limits_commit_update directly. + */ +int __queue_limits_commit_update(struct request_queue *q, + struct queue_limits *lim); + /** * queue_limits_start_update - start an atomic update of queue limits * @q: queue to update @@ -883,13 +892,30 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, */ static inline struct queue_limits queue_limits_start_update(struct request_queue *q) - __acquires(q->limits_lock) { mutex_lock(&q->limits_lock); return q->limits; } -int queue_limits_commit_update(struct request_queue *q, - struct queue_limits *lim); + +/** + * queue_limits_commit_update - commit an atomic update of queue limits + * @q: queue to update + * @lim: limits to apply + * + * Apply the limits in @lim that were obtained from queue_limits_start_update() + * and updated by the caller to @q. + * + * Returns 0 if successful, else a negative error code. + */ +static inline int queue_limits_commit_update(struct request_queue *q, + struct queue_limits *lim) +{ + int error = __queue_limits_commit_update(q, lim); + + mutex_unlock(&q->limits_lock); + return error; +} + int queue_limits_set(struct request_queue *q, struct queue_limits *lim); /*
The spare lock context tracking warns about the mutex unlock in queue_limits_commit_update despite the __releases annoation: block/blk-settings.c:263:9: warning: context imbalance in 'queue_limits_commit_update' - wrong count at exit As far as I can tell that is because the sparse lock tracking code is busted for inline functions. Work around it by splitting an inline wrapper out of queue_limits_commit_update and doing the unlock there. Reported-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-settings.c | 40 +++++++++++++++++----------------------- include/linux/blkdev.h | 32 +++++++++++++++++++++++++++++--- 2 files changed, 46 insertions(+), 26 deletions(-)