diff mbox

scsi: check for device state in __scsi_remove_target()

Message ID 1513171297-58020-1-git-send-email-hare@suse.de (mailing list archive)
State Accepted
Headers show

Commit Message

Hannes Reinecke Dec. 13, 2017, 1:21 p.m. UTC
As it turned out device_get() doesn't use kref_get_unless_zero(),
so we will be always getting a device pointer.
So we need to check for the device state in __scsi_remove_target()
to avoid tripping over deleted objects.

Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_sysfs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Dec. 13, 2017, 10:23 p.m. UTC | #1
On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
> As it turned out device_get() doesn't use kref_get_unless_zero(),

> so we will be always getting a device pointer.

> So we need to check for the device state in __scsi_remove_target()

> to avoid tripping over deleted objects.

> 

> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")


How about adding Reported-by: Jason Yan? See also
https://www.spinics.net/lists/linux-scsi/msg115295.html

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Jason Yan Dec. 14, 2017, 8:05 a.m. UTC | #2
On 2017/12/14 6:23, Bart Van Assche wrote:
> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
>> As it turned out device_get() doesn't use kref_get_unless_zero(),
>> so we will be always getting a device pointer.
>> So we need to check for the device state in __scsi_remove_target()
>> to avoid tripping over deleted objects.
>>
>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
>
> How about adding Reported-by: Jason Yan? See also
> https://www.spinics.net/lists/linux-scsi/msg115295.html
>
> Anyway:
>
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
>

Seems the same as my patch.So how do we plan to fix this issue,
pick this approach up or the approach James Bottomley suggested?
I have sent a patch to change get_device() but Greg seems do not
like this way.
Hannes Reinecke Dec. 14, 2017, 9:02 a.m. UTC | #3
On 12/14/2017 09:05 AM, Jason Yan wrote:
> 
> On 2017/12/14 6:23, Bart Van Assche wrote:
>> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
>>> As it turned out device_get() doesn't use kref_get_unless_zero(),
>>> so we will be always getting a device pointer.
>>> So we need to check for the device state in __scsi_remove_target()
>>> to avoid tripping over deleted objects.
>>>
>>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
>>
>> How about adding Reported-by: Jason Yan? See also
>> https://www.spinics.net/lists/linux-scsi/msg115295.html
>>
>> Anyway:
>>
>> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
>>
> 
> Seems the same as my patch.So how do we plan to fix this issue,
> pick this approach up or the approach James Bottomley suggested?
> I have sent a patch to change get_device() but Greg seems do not
> like this way.
> 
This is actually a real regression, which can be trivially exercised by
eg logging out from two connections to an iSCSI target.
(Our QA tripped across that one).
So I'd rather have to have it fixed reasonably soon.

While 'get_device' is IMO the 'correct' solution it surely warrants a
broader discussion, plus one would need to audit all callers to check
the return value. If we were going down that route we should probably
add a __must_check to get_device(), too.
But again, this will probably drag out for quite some time, and I'd
prefer to have the fix in the meantime.

Cheers,

Hannes
Ewan Milne Dec. 14, 2017, 10:10 p.m. UTC | #4
On Thu, 2017-12-14 at 10:02 +0100, Hannes Reinecke wrote:
> On 12/14/2017 09:05 AM, Jason Yan wrote:
> > 
> > On 2017/12/14 6:23, Bart Van Assche wrote:
> >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
> >>> As it turned out device_get() doesn't use kref_get_unless_zero(),
> >>> so we will be always getting a device pointer.
> >>> So we need to check for the device state in __scsi_remove_target()
> >>> to avoid tripping over deleted objects.
> >>>
> >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
> >>
> >> How about adding Reported-by: Jason Yan? See also
> >> https://www.spinics.net/lists/linux-scsi/msg115295.html
> >>
> >> Anyway:
> >>
> >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> >>
> > 
> > Seems the same as my patch.So how do we plan to fix this issue,
> > pick this approach up or the approach James Bottomley suggested?
> > I have sent a patch to change get_device() but Greg seems do not
> > like this way.
> > 
> This is actually a real regression, which can be trivially exercised by
> eg logging out from two connections to an iSCSI target.
> (Our QA tripped across that one).
> So I'd rather have to have it fixed reasonably soon.
> 
> While 'get_device' is IMO the 'correct' solution it surely warrants a
> broader discussion, plus one would need to audit all callers to check
> the return value. If we were going down that route we should probably
> add a __must_check to get_device(), too.
> But again, this will probably drag out for quite some time, and I'd
> prefer to have the fix in the meantime.
> 
> Cheers,
> 
> Hannes

We have 2 reproducible test cases, this patch fixes one of them,
which was a continually oscillating FC target port w/short dev_loss_tmo.
I'm still waiting for a report on the iSCSI test.  The code looks good.
We need to get some kind of fix for this sooner rather than later.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Ewan Milne Dec. 18, 2017, 2:38 p.m. UTC | #5
On Thu, 2017-12-14 at 17:10 -0500, Ewan D. Milne wrote:
> On Thu, 2017-12-14 at 10:02 +0100, Hannes Reinecke wrote:
> > On 12/14/2017 09:05 AM, Jason Yan wrote:
> > > 
> > > On 2017/12/14 6:23, Bart Van Assche wrote:
> > >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
> > >>> As it turned out device_get() doesn't use kref_get_unless_zero(),
> > >>> so we will be always getting a device pointer.
> > >>> So we need to check for the device state in __scsi_remove_target()
> > >>> to avoid tripping over deleted objects.
> > >>>
> > >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
> > >>
> > >> How about adding Reported-by: Jason Yan? See also
> > >> https://www.spinics.net/lists/linux-scsi/msg115295.html
> > >>
> > >> Anyway:
> > >>
> > >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
> > >>
> > > 
> > > Seems the same as my patch.So how do we plan to fix this issue,
> > > pick this approach up or the approach James Bottomley suggested?
> > > I have sent a patch to change get_device() but Greg seems do not
> > > like this way.
> > > 
> > This is actually a real regression, which can be trivially exercised by
> > eg logging out from two connections to an iSCSI target.
> > (Our QA tripped across that one).
> > So I'd rather have to have it fixed reasonably soon.
> > 
> > While 'get_device' is IMO the 'correct' solution it surely warrants a
> > broader discussion, plus one would need to audit all callers to check
> > the return value. If we were going down that route we should probably
> > add a __must_check to get_device(), too.
> > But again, this will probably drag out for quite some time, and I'd
> > prefer to have the fix in the meantime.
> > 
> > Cheers,
> > 
> > Hannes
> 
> We have 2 reproducible test cases, this patch fixes one of them,
> which was a continually oscillating FC target port w/short dev_loss_tmo.
> I'm still waiting for a report on the iSCSI test.  The code looks good.
> We need to get some kind of fix for this sooner rather than later.
> 
> Reviewed-by: Ewan D. Milne <emilne@redhat.com>

Report here is that Hannes's patch fixes our failing iSCSI test also.
Martin/James, can we get this in please?
Martin K. Petersen Dec. 19, 2017, 3:37 a.m. UTC | #6
Hannes,

> As it turned out device_get() doesn't use kref_get_unless_zero(),
> so we will be always getting a device pointer.
> So we need to check for the device state in __scsi_remove_target()
> to avoid tripping over deleted objects.

Applied to 4.15/scsi-fixes. Thanks!
Bart Van Assche Jan. 16, 2018, 4:11 p.m. UTC | #7
On Mon, 2017-12-18 at 22:37 -0500, Martin K. Petersen wrote:
> Hannes,

> 

> > As it turned out device_get() doesn't use kref_get_unless_zero(),

> > so we will be always getting a device pointer.

> > So we need to check for the device state in __scsi_remove_target()

> > to avoid tripping over deleted objects.

> 

> Applied to 4.15/scsi-fixes. Thanks!


Hello Martin,

Since that patch fixes an issue that was introduced in kernel v4.14 but did
not have a "Cc: stable" tag, should this patch be sent to Greg for inclusion
in the kernel v4.14.x series?

Thanks,

Bart.
Martin K. Petersen Jan. 17, 2018, 4:39 a.m. UTC | #8
Bart,

>> Applied to 4.15/scsi-fixes. Thanks!
>
> Since that patch fixes an issue that was introduced in kernel v4.14
> but did not have a "Cc: stable" tag, should this patch be sent to Greg
> for inclusion in the kernel v4.14.x series?

Yes. Hannes?
diff mbox

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index cbc0fe2..a04678b 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1411,7 +1411,10 @@  static void __scsi_remove_target(struct scsi_target *starget)
 		 * check.
 		 */
 		if (sdev->channel != starget->channel ||
-		    sdev->id != starget->id ||
+		    sdev->id != starget->id)
+			continue;
+		if (sdev->sdev_state == SDEV_DEL ||
+		    sdev->sdev_state == SDEV_CANCEL ||
 		    !get_device(&sdev->sdev_gendev))
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);