scsi: take module reference during async scan

Message ID	20200907154745.20145-1-thenzl@redhat.com (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <SRS0=wipa=CQ=vger.kernel.org=linux-scsi-owner@kernel.org> From: Tomas Henzl <thenzl@redhat.com> To: linux-scsi@vger.kernel.org Subject: [PATCH] scsi: take module reference during async scan Date: Mon, 7 Sep 2020 17:47:45 +0200 Message-Id: <20200907154745.20145-1-thenzl@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	scsi: take module reference during async scan \| expand scsi: take module reference during async scan

Tomas Henzl Sept. 7, 2020, 3:47 p.m. UTC

During an async scan the driver shost->hostt structures are used,
that may cause issues when the driver is removed at that time.
As protection take the module reference.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
---
 drivers/scsi/scsi_scan.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Bart Van Assche Sept. 7, 2020, 4:57 p.m. UTC | #1

On 2020-09-07 08:47, Tomas Henzl wrote:
> During an async scan the driver shost->hostt structures are used,
> that may cause issues when the driver is removed at that time.
> As protection take the module reference.
> 
> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
> ---
>  drivers/scsi/scsi_scan.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f2437a757..c9cc0862c 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1825,6 +1825,8 @@ static void do_scan_async(void *_data, async_cookie_t c)
>  
>  	do_scsi_scan_host(shost);
>  	scsi_finish_async_scan(data);
> +
> +	module_put(shost->hostt->module);
>  }
>  
>  /**
> @@ -1848,6 +1850,12 @@ void scsi_scan_host(struct Scsi_Host *shost)
>  		return;
>  	}
>  
> +	/* protection against surprise driver removal
> +	 * module_put is called from do_scan_async
> +	 */
> +	if (!try_module_get(shost->hostt->module))
> +		return;
> +
>  	/* register with the async subsystem so wait_for_device_probe()
>  	 * will flush this work
>  	 */

Shouldn't scsi_autopm_put_host(shost) be called if try_module_get() fails?

Please also update the following comment in scsi_scan_host():

	/* scsi_autopm_put_host(shost) is called in scsi_finish_async_scan() */

Thanks,

Bart.

James Bottomley Sept. 7, 2020, 5:46 p.m. UTC | #2

On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> During an async scan the driver shost->hostt structures are used,
> that may cause issues when the driver is removed at that time.
> As protection take the module reference.

Can I just ask what issues?  Today, our module model is that
scsi_device_get() bumps the module refcount and therefore makes the
module ineligible to be removed.  scsi_host_get() doesn't do this
because the way the host model is supposed to be coded, we can call
remove at any time but the module won't get freed until the last put of
the host.  I can see we have a potential problem with
scsi_forget_host() racing with the async scan thread ... is that what
you see? What's supposed to happen is that scsi_device_get() starts
failing as soon as the module begins it's exit routine, so if a scan is
in progress, it can't add any new devices ... in theory this means that
the list is stable for scsi_forget_host(), so knowing how that
assumption is breaking would be useful.

James

Tomas Henzl Sept. 7, 2020, 8:09 p.m. UTC | #3

On 9/7/20 7:46 PM, James Bottomley wrote:
> On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
>> During an async scan the driver shost->hostt structures are used,
>> that may cause issues when the driver is removed at that time.
>> As protection take the module reference.
> 
> Can I just ask what issues?  Today, our module model is that
> scsi_device_get() bumps the module refcount and therefore makes the
> module ineligible to be removed.  scsi_host_get() doesn't do this
> because the way the host model is supposed to be coded, we can call
> remove at any time but the module won't get freed until the last put of
> the host.  I can see we have a potential problem with
> scsi_forget_host() racing with the async scan thread ... is that what
> you see? What's supposed to happen is that scsi_device_get() starts
> failing as soon as the module begins it's exit routine, so if a scan is
> in progress, it can't add any new devices ... in theory this means that
> the list is stable for scsi_forget_host(), so knowing how that
> assumption is breaking would be useful.

I think that the problem is that async scan uses callbacks to the module
and when the module is being removed during scan it is not protected.

modprobe mpt3sas && rmmod  mpt3sas

[  370.031614] INFO: task rmmod:3120 blocked for more than 120 seconds.
[  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
[  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  370.050931] rmmod           D    0  3120   2460 0x00004080
[  370.056414] Call Trace:
[  370.058889]  ? __schedule+0x24f/0x650
[  370.062554]  schedule+0x2f/0xa0
[  370.065738]  async_synchronize_cookie_domain+0xad/0x140
[  370.070983]  ? finish_wait+0x80/0x80
[  370.074580]  __x64_sys_delete_module+0x166/0x280
[  370.079198]  do_syscall_64+0x5b/0x1a0
[  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  370.087946] RIP: 0033:0x7f6de460a7db
[  370.091534] Code: Bad RIP value.
[  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206 ORIG_RAX:
00000000000000b0
[  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
00007f6de460a7db
[  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
00005592370d3818
[  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
0000000000000000
[  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
00007ffe9971e9c0
[  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
00005592370d37b0


> 
> James
>

James Bottomley Sept. 7, 2020, 8:24 p.m. UTC | #4

On Mon, 2020-09-07 at 22:09 +0200, Tomas Henzl wrote:
> On 9/7/20 7:46 PM, James Bottomley wrote:
> > On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> > > During an async scan the driver shost->hostt structures are used,
> > > that may cause issues when the driver is removed at that time.
> > > As protection take the module reference.
> > 
> > Can I just ask what issues?  Today, our module model is that
> > scsi_device_get() bumps the module refcount and therefore makes the
> > module ineligible to be removed.  scsi_host_get() doesn't do this
> > because the way the host model is supposed to be coded, we can call
> > remove at any time but the module won't get freed until the last
> > put of the host.  I can see we have a potential problem with
> > scsi_forget_host() racing with the async scan thread ... is that
> > what you see? What's supposed to happen is that scsi_device_get()
> > starts failing as soon as the module begins it's exit routine, so
> > if a scan is in progress, it can't add any new devices ... in
> > theory this means that the list is stable for scsi_forget_host(),
> > so knowing how that assumption is breaking would be useful.
> 
> I think that the problem is that async scan uses callbacks to the
> module and when the module is being removed during scan it is not
> protected.

As I said above: the module shouldn't be freed until the scans are
completed or aborted ... I don't think we have a use after free
problem.  What you show below seems to be a deadlock:

> modprobe mpt3sas && rmmod  mpt3sas
> 
> [  370.031614] INFO: task rmmod:3120 blocked for more than 120
> seconds.
> [  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
> [  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  370.050931] rmmod           D    0  3120   2460 0x00004080
> [  370.056414] Call Trace:
> [  370.058889]  ? __schedule+0x24f/0x650
> [  370.062554]  schedule+0x2f/0xa0
> [  370.065738]  async_synchronize_cookie_domain+0xad/0x140
> [  370.070983]  ? finish_wait+0x80/0x80
> [  370.074580]  __x64_sys_delete_module+0x166/0x280
> [  370.079198]  do_syscall_64+0x5b/0x1a0
> [  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
> [  370.087946] RIP: 0033:0x7f6de460a7db
> [  370.091534] Code: Bad RIP value.
> [  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206 ORIG_RAX:
> 00000000000000b0
> [  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
> 00007f6de460a7db
> [  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
> 00005592370d3818
> [  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
> 0000000000000000
> [  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
> 00007ffe9971e9c0
> [  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
> 00005592370d37b0

This seems to be showing something different: I think the
async_synchronize_full() in delete_module is where we're stuck. That
seems to indicate something has just stopped inside the async scan code
... likely due to something reacting badly to scsi_device_get()
failing.

James

Tomas Henzl Sept. 7, 2020, 9:02 p.m. UTC | #5

On 9/7/20 10:24 PM, James Bottomley wrote:
> On Mon, 2020-09-07 at 22:09 +0200, Tomas Henzl wrote:
>> On 9/7/20 7:46 PM, James Bottomley wrote:
>>> On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
>>>> During an async scan the driver shost->hostt structures are used,
>>>> that may cause issues when the driver is removed at that time.
>>>> As protection take the module reference.
>>>
>>> Can I just ask what issues?  Today, our module model is that
>>> scsi_device_get() bumps the module refcount and therefore makes the
>>> module ineligible to be removed.  scsi_host_get() doesn't do this
>>> because the way the host model is supposed to be coded, we can call
>>> remove at any time but the module won't get freed until the last
>>> put of the host.  I can see we have a potential problem with
>>> scsi_forget_host() racing with the async scan thread ... is that
>>> what you see? What's supposed to happen is that scsi_device_get()
>>> starts failing as soon as the module begins it's exit routine, so
>>> if a scan is in progress, it can't add any new devices ... in
>>> theory this means that the list is stable for scsi_forget_host(),
>>> so knowing how that assumption is breaking would be useful.
>>
>> I think that the problem is that async scan uses callbacks to the
>> module and when the module is being removed during scan it is not
>> protected.
> 
> As I said above: the module shouldn't be freed until the scans are
> completed or aborted ... I don't think we have a use after free
> problem.  What you show below seems to be a deadlock:
> 
>> modprobe mpt3sas && rmmod  mpt3sas
>>
>> [  370.031614] INFO: task rmmod:3120 blocked for more than 120
>> seconds.
>> [  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
>> [  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  370.050931] rmmod           D    0  3120   2460 0x00004080
>> [  370.056414] Call Trace:
>> [  370.058889]  ? __schedule+0x24f/0x650
>> [  370.062554]  schedule+0x2f/0xa0
>> [  370.065738]  async_synchronize_cookie_domain+0xad/0x140
>> [  370.070983]  ? finish_wait+0x80/0x80
>> [  370.074580]  __x64_sys_delete_module+0x166/0x280
>> [  370.079198]  do_syscall_64+0x5b/0x1a0
>> [  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
>> [  370.087946] RIP: 0033:0x7f6de460a7db
>> [  370.091534] Code: Bad RIP value.
>> [  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206 ORIG_RAX:
>> 00000000000000b0
>> [  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
>> 00007f6de460a7db
>> [  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
>> 00005592370d3818
>> [  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
>> 0000000000000000
>> [  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
>> 00007ffe9971e9c0
>> [  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
>> 00005592370d37b0
> 
> This seems to be showing something different: I think the
> async_synchronize_full() in delete_module is where we're stuck. That
> seems to indicate something has just stopped inside the async scan code
> ... likely due to something reacting badly to scsi_device_get()
> failing.
We may be protected by the async_synchronize_full waiting for probably
the  do_scan_async to end and that protects us from use after free - all
that seems to resolve after a longer time and the driver is removed in
the end.
Maybe the driver could react better to when its exit function is called
but what is wrong with keeping an additional module reference during the
scan process, the driver's exit function can't then be called from
module removal code at any time and there is no weird behavior?


> 
> James
>

Tomas Henzl Sept. 7, 2020, 9:12 p.m. UTC | #6

On 9/7/20 6:57 PM, Bart Van Assche wrote:
> On 2020-09-07 08:47, Tomas Henzl wrote:
>> During an async scan the driver shost->hostt structures are used,
>> that may cause issues when the driver is removed at that time.
>> As protection take the module reference.
>>
>> Signed-off-by: Tomas Henzl <thenzl@redhat.com>
>> ---
>>  drivers/scsi/scsi_scan.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index f2437a757..c9cc0862c 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -1825,6 +1825,8 @@ static void do_scan_async(void *_data, async_cookie_t c)
>>  
>>  	do_scsi_scan_host(shost);
>>  	scsi_finish_async_scan(data);
>> +
>> +	module_put(shost->hostt->module);
>>  }
>>  
>>  /**
>> @@ -1848,6 +1850,12 @@ void scsi_scan_host(struct Scsi_Host *shost)
>>  		return;
>>  	}
>>  
>> +	/* protection against surprise driver removal
>> +	 * module_put is called from do_scan_async
>> +	 */
>> +	if (!try_module_get(shost->hostt->module))
>> +		return;
>> +
>>  	/* register with the async subsystem so wait_for_device_probe()
>>  	 * will flush this work
>>  	 */
> 
> Shouldn't scsi_autopm_put_host(shost) be called if try_module_get() fails?

Thanks. Yes it should, I'll post a V2 if James agrees to this patch in
general in a parallel thread.


> 
> Please also update the following comment in scsi_scan_host():
> 
> 	/* scsi_autopm_put_host(shost) is called in scsi_finish_async_scan() */
It's late here so I'm tired and I miss something but how should I update
it ?


Thanks,
Tomas
> 
> Thanks,
> 
> Bart.
> 
>

Douglas Gilbert Sept. 7, 2020, 9:32 p.m. UTC | #7

On 2020-09-07 1:46 p.m., James Bottomley wrote:
> On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
>> During an async scan the driver shost->hostt structures are used,
>> that may cause issues when the driver is removed at that time.
>> As protection take the module reference.
> 
> Can I just ask what issues?  Today, our module model is that
> scsi_device_get() bumps the module refcount and therefore makes the
> module ineligible to be removed.  scsi_host_get() doesn't do this
> because the way the host model is supposed to be coded, we can call
> remove at any time but the module won't get freed until the last put of
> the host.  I can see we have a potential problem with
> scsi_forget_host() racing with the async scan thread ... is that what
> you see? What's supposed to happen is that scsi_device_get() starts
> failing as soon as the module begins it's exit routine, so if a scan is
> in progress, it can't add any new devices ... in theory this means that
> the list is stable for scsi_forget_host(), so knowing how that
> assumption is breaking would be useful.

James,
If you think it is bullet-proof try using CONFIG_DEBUG_TEST_DRIVER_REMOVE=y .
John Garry reported that:

  # insmod scsi_debug.ko

Gave errors like this:

[  140.115244] debugfs: Directory 'sde' with parent 'block' already present!
[  140.376426] debugfs: Directory 'sde' with parent 'block' already present!
[  140.420613] sd 3:0:0:0: [sde] tag#40 access beyond end of device
[  140.426655] blk_update_request: I/O error, dev sde, sector 15984 op 
0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  140.437319] sd 3:0:0:0: [sde] tag#41 access beyond end of device
[  140.443368] blk_update_request: I/O error, dev sde, sector 15984 op 
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
...

Which wasn't the scsi_debug driver directly as it doesn't use debugfs. So
I suspect something is rotten in the mid-level.

When I tried to replicate John's config I couldn't even boot my Ubuntu
20.04 based system (with a MKP kernel). Seemed to fail/lockup before any
kernel prints came out to the serial port (yes, still useful), perhaps in
initrd. I'm guessing another, non-SCSI module caused the lockup. So I
gave up and turned off that config setting.

Doug Gilbert

James Bottomley Sept. 7, 2020, 10:02 p.m. UTC | #8

On Mon, 2020-09-07 at 23:02 +0200, Tomas Henzl wrote:
> On 9/7/20 10:24 PM, James Bottomley wrote:
> > On Mon, 2020-09-07 at 22:09 +0200, Tomas Henzl wrote:
> > > On 9/7/20 7:46 PM, James Bottomley wrote:
> > > > On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> > > > > During an async scan the driver shost->hostt structures are
> > > > > used, that may cause issues when the driver is removed at
> > > > > that time. As protection take the module reference.
> > > > 
> > > > Can I just ask what issues?  Today, our module model is that
> > > > scsi_device_get() bumps the module refcount and therefore makes
> > > > the module ineligible to be removed.  scsi_host_get() doesn't
> > > > do this because the way the host model is supposed to be coded,
> > > > we can call remove at any time but the module won't get freed
> > > > until the last put of the host.  I can see we have a potential
> > > > problem with scsi_forget_host() racing with the async scan
> > > > thread ... is that what you see? What's supposed to happen is
> > > > that scsi_device_get() starts failing as soon as the module
> > > > begins it's exit routine, so if a scan is in progress, it can't
> > > > add any new devices ... in theory this means that the list is
> > > > stable for scsi_forget_host(), so knowing how that assumption
> > > > is breaking would be useful.
> > > 
> > > I think that the problem is that async scan uses callbacks to the
> > > module and when the module is being removed during scan it is not
> > > protected.
> > 
> > As I said above: the module shouldn't be freed until the scans are
> > completed or aborted ... I don't think we have a use after free
> > problem.  What you show below seems to be a deadlock:
> > 
> > > modprobe mpt3sas && rmmod  mpt3sas
> > > 
> > > [  370.031614] INFO: task rmmod:3120 blocked for more than 120
> > > seconds.
> > > [  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
> > > [  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [  370.050931] rmmod           D    0  3120   2460 0x00004080
> > > [  370.056414] Call Trace:
> > > [  370.058889]  ? __schedule+0x24f/0x650
> > > [  370.062554]  schedule+0x2f/0xa0
> > > [  370.065738]  async_synchronize_cookie_domain+0xad/0x140
> > > [  370.070983]  ? finish_wait+0x80/0x80
> > > [  370.074580]  __x64_sys_delete_module+0x166/0x280
> > > [  370.079198]  do_syscall_64+0x5b/0x1a0
> > > [  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
> > > [  370.087946] RIP: 0033:0x7f6de460a7db
> > > [  370.091534] Code: Bad RIP value.
> > > [  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206
> > > ORIG_RAX:
> > > 00000000000000b0
> > > [  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
> > > 00007f6de460a7db
> > > [  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
> > > 00005592370d3818
> > > [  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
> > > 0000000000000000
> > > [  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
> > > 00007ffe9971e9c0
> > > [  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
> > > 00005592370d37b0
> > 
> > This seems to be showing something different: I think the
> > async_synchronize_full() in delete_module is where we're stuck.
> > That seems to indicate something has just stopped inside the async
> > scan code ... likely due to something reacting badly to
> > scsi_device_get() failing.
> 
> We may be protected by the async_synchronize_full waiting for
> probably the  do_scan_async to end and that protects us from use
> after free - all that seems to resolve after a longer time and the
> driver is removed in the end.

OK, so the above isn't actually a deadlock?  I was just assuming that
because 120s seems rather a long time for a SAS scan.  If it actually
eventually returns everything seems to be working correctly ... unless
it's still taking longer than an actual scan would?

> Maybe the driver could react better to when its exit function is
> called but what is wrong with keeping an additional module reference
> during the scan process, the driver's exit function can't then be
> called from module removal code at any time and there is no weird
> behavior?

Well it alters the behaviour in two ways: firstly because now you're
forced to wait for an entire host scan to complete once you start it,
you can't cancel it as you can today by removing the module; and
secondly it will be a behaviour change:  Today you can call rmmod at
any time until something pins the host either by opening a tape or
mounting a disk at which point delete_modul() fails with -EBUSY.  After
the patch you propose it will also fail with -EBUSY from the moment
scanning starts until the moment it finishes.  I'm not convinced
anything would actually notice either of these, but it is a behaviour
change.

James

James Bottomley Sept. 7, 2020, 10:56 p.m. UTC | #9

On Mon, 2020-09-07 at 17:32 -0400, Douglas Gilbert wrote:
> On 2020-09-07 1:46 p.m., James Bottomley wrote:
> > On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
> > > During an async scan the driver shost->hostt structures are used,
> > > that may cause issues when the driver is removed at that time.
> > > As protection take the module reference.
> > 
> > Can I just ask what issues?  Today, our module model is that
> > scsi_device_get() bumps the module refcount and therefore makes the
> > module ineligible to be removed.  scsi_host_get() doesn't do this
> > because the way the host model is supposed to be coded, we can call
> > remove at any time but the module won't get freed until the last
> > put of the host.  I can see we have a potential problem with
> > scsi_forget_host() racing with the async scan thread ... is that
> > what you see? What's supposed to happen is that scsi_device_get()
> > starts failing as soon as the module begins it's exit routine, so
> > if a scan is in progress, it can't add any new devices ... in
> > theory this means that the list is stable for scsi_forget_host(),
> > so knowing how that assumption is breaking would be useful.
> 
> James,
> If you think it is bullet-proof try using 

I'm not saying it's got no bugs, just that the above is the way it's
supposed to work.

> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y .

The problem with this option is it basically gives you a thundering
herd of removal reinsertions ... trying to do it for a single driver
(or set of drivers) is likely a better way to get actionable debugging
information.

> John Garry reported that:
> 
>   # insmod scsi_debug.ko
> 
> Gave errors like this:
> 
> [  140.115244] debugfs: Directory 'sde' with parent 'block' already
> present!
> [  140.376426] debugfs: Directory 'sde' with parent 'block' already
> present!
> [  140.420613] sd 3:0:0:0: [sde] tag#40 access beyond end of device
> [  140.426655] blk_update_request: I/O error, dev sde, sector 15984
> op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [  140.437319] sd 3:0:0:0: [sde] tag#41 access beyond end of device
> [  140.443368] blk_update_request: I/O error, dev sde, sector 15984
> op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> ...
> 
> Which wasn't the scsi_debug driver directly as it doesn't use
> debugfs. So I suspect something is rotten in the mid-level.
> 
> When I tried to replicate John's config I couldn't even boot my
> Ubuntu 20.04 based system (with a MKP kernel). Seemed to fail/lockup
> before any kernel prints came out to the serial port (yes, still
> useful), perhaps in initrd. I'm guessing another, non-SCSI module
> caused the lockup. So I gave up and turned off that config setting.

If that can be distilled down to a better test case, I can look into
it.

James

Tomas Henzl Sept. 8, 2020, 8:22 a.m. UTC | #10

On 9/8/20 12:02 AM, James Bottomley wrote:
> On Mon, 2020-09-07 at 23:02 +0200, Tomas Henzl wrote:
>> On 9/7/20 10:24 PM, James Bottomley wrote:
>>> On Mon, 2020-09-07 at 22:09 +0200, Tomas Henzl wrote:
>>>> On 9/7/20 7:46 PM, James Bottomley wrote:
>>>>> On Mon, 2020-09-07 at 17:47 +0200, Tomas Henzl wrote:
>>>>>> During an async scan the driver shost->hostt structures are
>>>>>> used, that may cause issues when the driver is removed at
>>>>>> that time. As protection take the module reference.
>>>>>
>>>>> Can I just ask what issues?  Today, our module model is that
>>>>> scsi_device_get() bumps the module refcount and therefore makes
>>>>> the module ineligible to be removed.  scsi_host_get() doesn't
>>>>> do this because the way the host model is supposed to be coded,
>>>>> we can call remove at any time but the module won't get freed
>>>>> until the last put of the host.  I can see we have a potential
>>>>> problem with scsi_forget_host() racing with the async scan
>>>>> thread ... is that what you see? What's supposed to happen is
>>>>> that scsi_device_get() starts failing as soon as the module
>>>>> begins it's exit routine, so if a scan is in progress, it can't
>>>>> add any new devices ... in theory this means that the list is
>>>>> stable for scsi_forget_host(), so knowing how that assumption
>>>>> is breaking would be useful.
>>>>
>>>> I think that the problem is that async scan uses callbacks to the
>>>> module and when the module is being removed during scan it is not
>>>> protected.
>>>
>>> As I said above: the module shouldn't be freed until the scans are
>>> completed or aborted ... I don't think we have a use after free
>>> problem.  What you show below seems to be a deadlock:
>>>
>>>> modprobe mpt3sas && rmmod  mpt3sas
>>>>
>>>> [  370.031614] INFO: task rmmod:3120 blocked for more than 120
>>>> seconds.
>>>> [  370.037967]       Not tainted 4.18.0-193.el8.x86_64 #1
>>>> [  370.043105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>> disables this message.
>>>> [  370.050931] rmmod           D    0  3120   2460 0x00004080
>>>> [  370.056414] Call Trace:
>>>> [  370.058889]  ? __schedule+0x24f/0x650
>>>> [  370.062554]  schedule+0x2f/0xa0
>>>> [  370.065738]  async_synchronize_cookie_domain+0xad/0x140
>>>> [  370.070983]  ? finish_wait+0x80/0x80
>>>> [  370.074580]  __x64_sys_delete_module+0x166/0x280
>>>> [  370.079198]  do_syscall_64+0x5b/0x1a0
>>>> [  370.082876]  entry_SYSCALL_64_after_hwframe+0x65/0xca
>>>> [  370.087946] RIP: 0033:0x7f6de460a7db
>>>> [  370.091534] Code: Bad RIP value.
>>>> [  370.094777] RSP: 002b:00007ffe9971e798 EFLAGS: 00000206
>>>> ORIG_RAX:
>>>> 00000000000000b0
>>>> [  370.102341] RAX: ffffffffffffffda RBX: 00005592370d37b0 RCX:
>>>> 00007f6de460a7db
>>>> [  370.109481] RDX: 000000000000000a RSI: 0000000000000800 RDI:
>>>> 00005592370d3818
>>>> [  370.116606] RBP: 0000000000000000 R08: 00007ffe9971d711 R09:
>>>> 0000000000000000
>>>> [  370.123748] R10: 00007f6de467c8e0 R11: 0000000000000206 R12:
>>>> 00007ffe9971e9c0
>>>> [  370.130888] R13: 00007ffe99720333 R14: 00005592370d32a0 R15:
>>>> 00005592370d37b0
>>>
>>> This seems to be showing something different: I think the
>>> async_synchronize_full() in delete_module is where we're stuck.
>>> That seems to indicate something has just stopped inside the async
>>> scan code ... likely due to something reacting badly to
>>> scsi_device_get() failing.
>>
>> We may be protected by the async_synchronize_full waiting for
>> probably the  do_scan_async to end and that protects us from use
>> after free - all that seems to resolve after a longer time and the
>> driver is removed in the end.
> 
> OK, so the above isn't actually a deadlock?  I was just assuming that
> because 120s seems rather a long time for a SAS scan.  If it actually
> eventually returns everything seems to be working correctly ... unless
> it's still taking longer than an actual scan would?

The scan on this machine takes few seconds, the rmmod returns after
~300sec, that's not good.

> 
>> Maybe the driver could react better to when its exit function is
>> called but what is wrong with keeping an additional module reference
>> during the scan process, the driver's exit function can't then be
>> called from module removal code at any time and there is no weird
>> behavior?
> 
> Well it alters the behaviour in two ways: firstly because now you're
> forced to wait for an entire host scan to complete once you start it,
> you can't cancel it as you can today by removing the module; and
> secondly it will be a behaviour change:  Today you can call rmmod at
> any time until something pins the host either by opening a tape or
> mounting a disk at which point delete_modul() fails with -EBUSY.  After
> the patch you propose it will also fail with -EBUSY from the moment
> scanning starts until the moment it finishes.  I'm not convinced
> anything would actually notice either of these, but it is a behaviour
> change.

A changed behavior also makes me doubt about this patch so let's look
into it. Some drivers may shorten the scanning time but many just wait
till scanning ends - internal mpts3sas driver
in 'scsih_remove' has this:
while (ioc->is_driver_loading)
	ssleep(1);
also smartpqi in 'pqi_wait_until_scan_finished' seems to be just
waiting. (Not tested that and I also haven't looked at other drivers.)
When scan starts the rmmod may fail also today when at that moment
scsi_device_get took the module reference - we change with this patch
the probability (only) from not likely to for sure.

If we want fix this we could instead review all drivers. A fix in a
called function can't be complete, in this case for example driver
removal may start (unlikely)before it's scan_start is called .
I still think that the scan code should protect the driver as it is
easier compared to a protection in every driver.

tomash


> 
> James

>

John Garry Sept. 8, 2020, 2:04 p.m. UTC | #11

On 07/09/2020 22:32, Douglas Gilbert wrote:

Hi Doug,

>   # insmod scsi_debug.ko
> 
> Gave errors like this:
> 
> [  140.115244] debugfs: Directory 'sde' with parent 'block' already 
> present!

As an aside, I thought that this issue was fixed...

> [  140.376426] debugfs: Directory 'sde' with parent 'block' already 
> present!
> [  140.420613] sd 3:0:0:0: [sde] tag#40 access beyond end of device
> [  140.426655] blk_update_request: I/O error, dev sde, sector 15984 op 
> 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
> [  140.437319] sd 3:0:0:0: [sde] tag#41 access beyond end of device
> [  140.443368] blk_update_request: I/O error, dev sde, sector 15984 op 
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> ...
> 
> Which wasn't the scsi_debug driver directly as it doesn't use debugfs. So
> I suspect something is rotten in the mid-level.
> 
> When I tried to replicate John's config I couldn't even boot my Ubuntu
> 20.04 based system (with a MKP kernel). Seemed to fail/lockup before any
> kernel prints came out to the serial port (yes, still useful), perhaps in
> initrd. I'm guessing another, non-SCSI module caused the lockup. So I
> gave up and turned off that config setting.

You can also try this hack locally (without enabling that config), if 
you like:

--->8---

--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -496,6 +496,12 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
         int local_trigger_count = atomic_read(&deferred_trigger_count);
         bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
                            !drv->suppress_bind_attrs;

+       if (strcmp(drv->name, "sd") == 0)
+               test_remove = 1;
+       else if (strcmp(drv->name, "scsi_debug") == 0)
+               test_remove = 1;

---8<---

Cheers,
john

scsi: take module reference during async scan

Commit Message

Comments

Patch