block/genhd: use atomic_t for disk_event->block

Message ID	20210601110145.113365-1-hare@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Hannes Reinecke <hare@suse.de> To: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>, linux-block@vger.kernel.org, Linux Kernel Mailinglist <linux-kernel@vger.kernel.org>, Hannes Reinecke <hare@suse.de> Subject: [PATCH] block/genhd: use atomic_t for disk_event->block Date: Tue, 1 Jun 2021 13:01:45 +0200 Message-Id: <20210601110145.113365-1-hare@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	block/genhd: use atomic_t for disk_event->block \| expand block/genhd: use atomic_t for disk_event->block

Message ID

20210601110145.113365-1-hare@suse.de (mailing list archive)

State

New, archived

Headers

From: Hannes Reinecke <hare@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>, linux-block@vger.kernel.org,
        Linux Kernel Mailinglist <linux-kernel@vger.kernel.org>,
        Hannes Reinecke <hare@suse.de>
Subject: [PATCH] block/genhd: use atomic_t for disk_event->block
Date: Tue,  1 Jun 2021 13:01:45 +0200
Message-Id: <20210601110145.113365-1-hare@suse.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

block/genhd: use atomic_t for disk_event->block | expand

Commit Message

Hannes Reinecke June 1, 2021, 11:01 a.m. UTC

__disk_unblock_events() will call queue_delayed_work() with a '0' argument
under a spin lock. This might cause the queue_work item to be executed
immediately, and run into a deadlock in disk_check_events() waiting for
the lock to be released.

This patch converts the 'blocked' counter into an atomic variable, so we don't
need to hold a spinlock anymore when scheduling the workqueue function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/genhd.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

Comments

Ming Lei June 1, 2021, 1:25 p.m. UTC | #1

On Tue, Jun 01, 2021 at 01:01:45PM +0200, Hannes Reinecke wrote:
> __disk_unblock_events() will call queue_delayed_work() with a '0' argument
> under a spin lock. This might cause the queue_work item to be executed
> immediately, and run into a deadlock in disk_check_events() waiting for
> the lock to be released.

Do you have lockdep warning on this 'deadlock'?

Executed immediately doesn't mean the work fn is run in the current
task context, and it is actually run in one wq worker(task) context, so
__queue_work() usually wakes up one wq worker for running the work fn,
then there shouldn't be the 'deadlock' you mentioned.

Thanks, 
Ming

Hannes Reinecke June 1, 2021, 1:41 p.m. UTC | #2

On 6/1/21 3:25 PM, Ming Lei wrote:
> On Tue, Jun 01, 2021 at 01:01:45PM +0200, Hannes Reinecke wrote:
>> __disk_unblock_events() will call queue_delayed_work() with a '0' argument
>> under a spin lock. This might cause the queue_work item to be executed
>> immediately, and run into a deadlock in disk_check_events() waiting for
>> the lock to be released.
> 
> Do you have lockdep warning on this 'deadlock'?
> 
> Executed immediately doesn't mean the work fn is run in the current
> task context, and it is actually run in one wq worker(task) context, so
> __queue_work() usually wakes up one wq worker for running the work fn,
> then there shouldn't be the 'deadlock' you mentioned.
> 

That's what I thought, too, but then we have a customer report
complaining about a stuck installation, and this kernel message:

> [  990.305908] INFO: task init:1 blocked for more than 491 seconds.
> [  990.311904]       Not tainted 5.3.18-22-default #1
> [  990.316682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
> [  990.324483] init            D    0     1      0 0x00000000
> [  990.329950] Call Trace:
> [  990.332396]  __schedule+0x28a/0x6d0
> [  990.335876]  ? work_busy+0x80/0x80
> [  990.339267]  schedule+0x2f/0xa0
> [  990.342399]  schedule_timeout+0x1dd/0x300
> [  990.346399]  ? check_preempt_curr+0x29/0x80
> [  990.350569]  ? ttwu_do_wakeup+0x19/0x150
> [  990.354480]  ? work_busy+0x80/0x80
> [  990.357869]  wait_for_completion+0xba/0x140
> [  990.362040]  ? wake_up_q+0xa0/0xa0
> [  990.365430]  __flush_work+0x177/0x1d0
> [  990.369080]  ? worker_detach_from_pool+0xa0/0xa0
> [  990.373682]  __cancel_work_timer+0x12b/0x1b0
> [  990.377940]  ? exact_lock+0xd/0x20
> [  990.381329]  ? kobj_lookup+0x113/0x160
> [  990.385067]  disk_block_events+0x78/0x90
> [  990.388979]  __blkdev_get+0x6d/0x7e0
> [  990.392542]  ? blkdev_get_by_dev+0x40/0x40
> [  990.396627]  do_dentry_open+0x1ea/0x380
> [  990.400450]  path_openat+0x2fc/0x1520
> [  990.404103]  do_filp_open+0x9b/0x110
> [  990.407667]  ? kmem_cache_alloc+0x3d/0x250
> [  990.411749]  ? do_sys_open+0x1bd/0x250
> [  990.415486]  do_sys_open+0x1bd/0x250
> [  990.419052]  do_syscall_64+0x5b/0x1e0
> [  990.422701]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Which does vanish with this patch.

Cheers,

Hannes

diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..07e70f0c9c25 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1379,7 +1379,7 @@  struct disk_events {
 	spinlock_t		lock;
 
 	struct mutex		block_mutex;	/* protects blocking */
-	int			block;		/* event blocking depth */
+	atomic_t		block;		/* event blocking depth */
 	unsigned int		pending;	/* events already sent out */
 	unsigned int		clearing;	/* events being cleared */
 
@@ -1439,8 +1439,6 @@  static unsigned long disk_events_poll_jiffies(struct gendisk *disk)
 void disk_block_events(struct gendisk *disk)
 {
 	struct disk_events *ev = disk->ev;
-	unsigned long flags;
-	bool cancel;
 
 	if (!ev)
 		return;
@@ -1451,11 +1449,7 @@  void disk_block_events(struct gendisk *disk)
 	 */
 	mutex_lock(&ev->block_mutex);
 
-	spin_lock_irqsave(&ev->lock, flags);
-	cancel = !ev->block++;
-	spin_unlock_irqrestore(&ev->lock, flags);
-
-	if (cancel)
+	if (atomic_inc_return(&ev->block) == 1)
 		cancel_delayed_work_sync(&disk->ev->dwork);
 
 	mutex_unlock(&ev->block_mutex);
@@ -1467,23 +1461,19 @@  static void __disk_unblock_events(struct gendisk *disk, bool check_now)
 	unsigned long intv;
 	unsigned long flags;
 
+	if (atomic_dec_return(&ev->block) <= 0) {
+		mutex_unlock(&ev->block_mutex);
+		return;
+	}
 	spin_lock_irqsave(&ev->lock, flags);
-
-	if (WARN_ON_ONCE(ev->block <= 0))
-		goto out_unlock;
-
-	if (--ev->block)
-		goto out_unlock;
-
 	intv = disk_events_poll_jiffies(disk);
+	spin_unlock_irqrestore(&ev->lock, flags);
 	if (check_now)
 		queue_delayed_work(system_freezable_power_efficient_wq,
 				&ev->dwork, 0);
 	else if (intv)
 		queue_delayed_work(system_freezable_power_efficient_wq,
 				&ev->dwork, intv);
-out_unlock:
-	spin_unlock_irqrestore(&ev->lock, flags);
 }
 
 /**
@@ -1523,10 +1513,10 @@  void disk_flush_events(struct gendisk *disk, unsigned int mask)
 
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
-	if (!ev->block)
+	spin_unlock_irq(&ev->lock);
+	if (!atomic_read(&ev->block))
 		mod_delayed_work(system_freezable_power_efficient_wq,
 				&ev->dwork, 0);
-	spin_unlock_irq(&ev->lock);
 }
 
 /**
@@ -1638,11 +1628,11 @@  static void disk_check_events(struct disk_events *ev,
 	*clearing_ptr &= ~clearing;
 
 	intv = disk_events_poll_jiffies(disk);
-	if (!ev->block && intv)
+	spin_unlock_irq(&ev->lock);
+	if (!atomic_read(&ev->block) && intv)
 		queue_delayed_work(system_freezable_power_efficient_wq,
 				&ev->dwork, intv);
 
-	spin_unlock_irq(&ev->lock);
 
 	/*
 	 * Tell userland about new events.  Only the events listed in
@@ -1807,7 +1797,7 @@  static void disk_alloc_events(struct gendisk *disk)
 	ev->disk = disk;
 	spin_lock_init(&ev->lock);
 	mutex_init(&ev->block_mutex);
-	ev->block = 1;
+	atomic_set(&ev->block, 1);
 	ev->poll_msecs = -1;
 	INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn);
 
@@ -1851,6 +1841,6 @@  static void disk_del_events(struct gendisk *disk)
 static void disk_release_events(struct gendisk *disk)
 {
 	/* the block count should be 1 from disk_del_events() */
-	WARN_ON_ONCE(disk->ev && disk->ev->block != 1);
+	WARN_ON_ONCE(disk->ev && atomic_read(&disk->ev->block) != 1);
 	kfree(disk->ev);
 }

block/genhd: use atomic_t for disk_event->block

Commit Message

Comments

Patch