Revert "block: revert back to synchronous request_queue removal"

From: Ethan Zhao <Haifeng.Zhao@intel.com>

From: Ethan Zhao <Haifeng.Zhao@intel.com>

'commit e8c7d14ac6c3 ("block: revert back to synchronous request_queue
removal")' introduced panic issue to NVMe hotplug as following(hit
after just 2 times NVMe SSD hotplug under stable 5.9-RC2):

BUG: sleeping function called from invalid context at block/genhd.c:1563
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/30
INFO: lockdep is turned off.
CPU: 30 PID: 0 Comm: swapper/30 Tainted: G S      W         5.9.0-RC2 #3
Hardware name: Intel Corporation xxxxxxxx
Call Trace:
<IRQ>
dump_stack+0x9d/0xe0
___might_sleep.cold.79+0x17f/0x1af
disk_release+0x26/0x200
device_release+0x6d/0x1c0
kobject_put+0x14d/0x430
hd_struct_free+0xfb/0x260
percpu_ref_switch_to_atomic_rcu+0x3d1/0x580
? rcu_nocb_unlock_irqrestore+0xb6/0xf0
? trace_hardirqs_on+0x20/0x1b5
? rcu_do_batch+0x3ff/0xb50
rcu_do_batch+0x47c/0xb50
? rcu_accelerate_cbs+0xa9/0x740
? rcu_spawn_one_nocb_kthread+0x3d0/0x3d0
rcu_core+0x945/0xd90
? __do_softirq+0x182/0xac0
__do_softirq+0x1ca/0xac0
asm_call_on_stack+0xf/0x20
</IRQ>
do_softirq_own_stack+0x7f/0x90
irq_exit_rcu+0x1e3/0x230
sysvec_apic_timer_interrupt+0x48/0xb0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:cpuidle_enter_state+0x116/0xe90
Code: 00 31 ff e8 ac c8 a4 fe 80 7c 24 10 00 74 12 9c 58 f6 c4 02
 0f 85 7e 08 00 00 31 ff e8 43 ca be fe e8 ae a3 d5 fe fb 45 85 ed
 <0f> 88 4e 05 00 00 4d 63 f5 49 83 fe 09 0f 87 29 0b 00 00 4b 8d 04
RSP: 0018:ff110001040dfd78 EFLAGS: 00000206
RAX: 0000000000000007 RBX: ffd1fff7b1a01e00 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 0000000040000000 RDI: ffffffffb7c5c0f2
RBP: ffffffffb9a416a0 R08: 0000000000000000 R09: 0000000000000000
R10: ff110001040d0007 R11: ffe21c002081a000 R12: 0000000000000003
R13: 0000000000000003 R14: 0000000000000138
... ...
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0
Oops: 0010 [#1] SMP KASAN NOPTI
CPU: 30 PID: 500 Comm: irq/124-pciehp Tainted: G S  W  5.9.0-RC2 #3
Hardware name: Intel Corporation xxxxxxxx
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:ff110007d5ba75e8 EFLAGS: 00010096
RAX: 0000000000000000 RBX: ff110001040d0000 RCX: ff110007d5ba7668
RDX: 0000000000000009 RSI: ff110001040d0000 RDI: ff110008119f59c0
RBP: ff110008119f59c0 R08: fffffbfff73f7b4d R09: fffffbfff73f7b4d
R10: ffffffffb9fbda67 R11: fffffbfff73f7b4c R12: 0000000000000000
R13: ff110007d5ba7668 R14: ff110001040d0000 R15: ff110008119f59c0
FS:  0000000000000000(0000) GS:ff11000811800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000007cea16002 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
ttwu_do_activate+0xd3/0x160
try_to_wake_up+0x652/0x1850
? migrate_swap_stop+0xad0/0xad0
? lock_contended+0xd70/0xd70
? rcu_read_unlock+0x50/0x50
? rcu_do_batch+0x3ff/0xb50
swake_up_locked+0x85/0x1c0
complete+0x4d/0x70
rcu_do_batch+0x47c/0xb50
? rcu_spawn_one_nocb_kthread+0x3d0/0x3d0
rcu_core+0x945/0xd90
? __do_softirq+0x182/0xac0
__do_softirq+0x1ca/0xac0
irq_exit_rcu+0x1e3/0x230
sysvec_apic_timer_interrupt+0x48/0xb0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:_raw_spin_unlock_irqrestore+0x40/0x50
Code: e8 35 ad 36 fe 48 89 ef e8 cd ce 37 fe f6 c7 02 75 11 53 9d
 e8 91 1f 5c fe 65 ff 0d ba af c2 47 5b 5d c3 e8 d2 22 5c fe 53 9d
 <eb> ed 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53
RSP: 0018:ff110007d5ba79d0 EFLAGS: 00000293
RAX: 0000000000000007 RBX: 0000000000000293 RCX: ffffffffb67710d4
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffb83f41ce
RBP: ffffffffbb77e740 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffbb77e743 R11: fffffbfff76efce8 R12: 000000000000198e
R13: ff11001031d7a0b0 R14: ffffffffbb77e740 R15: ffffffffbb77e788
? do_raw_spin_unlock+0x54/0x230
? _raw_spin_unlock_irqrestore+0x3e/0x50
dma_debug_device_change+0x150/0x5e0
notifier_call_chain+0x90/0x160
__blocking_notifier_call_chain+0x6d/0xa0
device_release_driver_internal+0x37d/0x490
pci_stop_bus_device+0x123/0x190
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x17e/0x330
? pciehp_configure_device+0x3e0/0x3e0
? trace_hardirqs_on+0x20/0x1b5
pciehp_disable_slot+0x101/0x360
? pme_is_native.cold.2+0x29/0x29
pciehp_handle_presence_or_link_change+0x1ac/0xee0
? pciehp_handle_disable_request+0x110/0x110
pciehp_ist.cold.11+0x39/0x54
? pciehp_set_indicators+0x190/0x190
? alloc_desc+0x510/0xa30
? irq_set_affinity_notifier+0x380/0x380
? pciehp_set_indicators+0x190/0x190
? irq_thread+0x137/0x420
irq_thread_fn+0x86/0x150
irq_thread+0x21f/0x420
? irq_forced_thread_fn+0x170/0x170
? irq_thread_check_affinity+0x210/0x210
? __kthread_parkme+0x52/0x1a0
? lockdep_hardirqs_on_prepare+0x33e/0x4e0
? _raw_spin_unlock_irqrestore+0x3e/0x50
? trace_hardirqs_on+0x20/0x1b5
? wake_threads_waitq+0x40/0x40
? __kthread_parkme+0xd1/0x1a0
? irq_thread_check_affinity+0x210/0x210
kthread+0x36a/0x430
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x1f/0x30
... ...
CR2: 0000000000000000
---[ end trace cedc4047ef91d2ec ]---

Seems scheduling happened within hardware interrupt context, after
reverted this patch, stable 5.9-RC4 build was tested with more than 20
times NVMe SSD hotplug, no panic found.

This reverts commit e8c7d14ac6c37c173ec606907d38802b00302988.

Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
Signed-off-by: Ethan Zhao <Haifeng.Zhao@intel.com>
---
 block/blk-core.c       |  8 --------
 block/blk-sysfs.c      | 43 +++++++++++++++++++++---------------------
 block/genhd.c          | 17 -----------------
 include/linux/blkdev.h |  2 ++
 4 files changed, 23 insertions(+), 47 deletions(-)

Message ID	20200908075047.5140-1-haifeng.zhao@intel.com (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <SRS0=qE/B=CR=vger.kernel.org=linux-pci-owner@kernel.org> IronPort-SDR: 0YjoJs1hM9DwdiruAnPIvwo0WlEpJGiLNxpX8KGbpUJ3fcCfOzchnB9OBoOeDDWWYCg7s2LTUF 6gehLUXyB5dg== IronPort-SDR: BuFM1i5Y5AkbylUyRAM+bpBZfIGnMaYUAyO3gG00kQKKYBasWA45z0GI4xmsjLj4+SSm9G2t+4 pNI/MXoVKXKA== From: Ethan Zhao <haifeng.zhao@intel.com> To: axboe@kernel.dk, bhelgaas@google.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, mcgrof@kernel.org, ShanshanX.Zhang@intel.com, pei.p.jia@intel.com, Ethan Zhao <Haifeng.Zhao@intel.com> Subject: [PATCH] Revert "block: revert back to synchronous request_queue removal" Date: Tue, 8 Sep 2020 03:50:48 -0400 Message-Id: <20200908075047.5140-1-haifeng.zhao@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk
Series	Revert "block: revert back to synchronous request_queue removal" \| expand Revert "block: revert back to synchronous request_queue removal"

Revert "block: revert back to synchronous request_queue removal"

Commit Message

Comments

Patch