Message ID | 1513171297-58020-1-git-send-email-hare@suse.de (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote: > As it turned out device_get() doesn't use kref_get_unless_zero(), > so we will be always getting a device pointer. > So we need to check for the device state in __scsi_remove_target() > to avoid tripping over deleted objects. > > Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") How about adding Reported-by: Jason Yan? See also https://www.spinics.net/lists/linux-scsi/msg115295.html Anyway: Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
On 2017/12/14 6:23, Bart Van Assche wrote: > On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote: >> As it turned out device_get() doesn't use kref_get_unless_zero(), >> so we will be always getting a device pointer. >> So we need to check for the device state in __scsi_remove_target() >> to avoid tripping over deleted objects. >> >> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") > > How about adding Reported-by: Jason Yan? See also > https://www.spinics.net/lists/linux-scsi/msg115295.html > > Anyway: > > Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > Seems the same as my patch.So how do we plan to fix this issue, pick this approach up or the approach James Bottomley suggested? I have sent a patch to change get_device() but Greg seems do not like this way.
On 12/14/2017 09:05 AM, Jason Yan wrote: > > On 2017/12/14 6:23, Bart Van Assche wrote: >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote: >>> As it turned out device_get() doesn't use kref_get_unless_zero(), >>> so we will be always getting a device pointer. >>> So we need to check for the device state in __scsi_remove_target() >>> to avoid tripping over deleted objects. >>> >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") >> >> How about adding Reported-by: Jason Yan? See also >> https://www.spinics.net/lists/linux-scsi/msg115295.html >> >> Anyway: >> >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> >> > > Seems the same as my patch.So how do we plan to fix this issue, > pick this approach up or the approach James Bottomley suggested? > I have sent a patch to change get_device() but Greg seems do not > like this way. > This is actually a real regression, which can be trivially exercised by eg logging out from two connections to an iSCSI target. (Our QA tripped across that one). So I'd rather have to have it fixed reasonably soon. While 'get_device' is IMO the 'correct' solution it surely warrants a broader discussion, plus one would need to audit all callers to check the return value. If we were going down that route we should probably add a __must_check to get_device(), too. But again, this will probably drag out for quite some time, and I'd prefer to have the fix in the meantime. Cheers, Hannes
On Thu, 2017-12-14 at 10:02 +0100, Hannes Reinecke wrote: > On 12/14/2017 09:05 AM, Jason Yan wrote: > > > > On 2017/12/14 6:23, Bart Van Assche wrote: > >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote: > >>> As it turned out device_get() doesn't use kref_get_unless_zero(), > >>> so we will be always getting a device pointer. > >>> So we need to check for the device state in __scsi_remove_target() > >>> to avoid tripping over deleted objects. > >>> > >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") > >> > >> How about adding Reported-by: Jason Yan? See also > >> https://www.spinics.net/lists/linux-scsi/msg115295.html > >> > >> Anyway: > >> > >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > >> > > > > Seems the same as my patch.So how do we plan to fix this issue, > > pick this approach up or the approach James Bottomley suggested? > > I have sent a patch to change get_device() but Greg seems do not > > like this way. > > > This is actually a real regression, which can be trivially exercised by > eg logging out from two connections to an iSCSI target. > (Our QA tripped across that one). > So I'd rather have to have it fixed reasonably soon. > > While 'get_device' is IMO the 'correct' solution it surely warrants a > broader discussion, plus one would need to audit all callers to check > the return value. If we were going down that route we should probably > add a __must_check to get_device(), too. > But again, this will probably drag out for quite some time, and I'd > prefer to have the fix in the meantime. > > Cheers, > > Hannes We have 2 reproducible test cases, this patch fixes one of them, which was a continually oscillating FC target port w/short dev_loss_tmo. I'm still waiting for a report on the iSCSI test. The code looks good. We need to get some kind of fix for this sooner rather than later. Reviewed-by: Ewan D. Milne <emilne@redhat.com>
On Thu, 2017-12-14 at 17:10 -0500, Ewan D. Milne wrote: > On Thu, 2017-12-14 at 10:02 +0100, Hannes Reinecke wrote: > > On 12/14/2017 09:05 AM, Jason Yan wrote: > > > > > > On 2017/12/14 6:23, Bart Van Assche wrote: > > >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote: > > >>> As it turned out device_get() doesn't use kref_get_unless_zero(), > > >>> so we will be always getting a device pointer. > > >>> So we need to check for the device state in __scsi_remove_target() > > >>> to avoid tripping over deleted objects. > > >>> > > >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") > > >> > > >> How about adding Reported-by: Jason Yan? See also > > >> https://www.spinics.net/lists/linux-scsi/msg115295.html > > >> > > >> Anyway: > > >> > > >> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> > > >> > > > > > > Seems the same as my patch.So how do we plan to fix this issue, > > > pick this approach up or the approach James Bottomley suggested? > > > I have sent a patch to change get_device() but Greg seems do not > > > like this way. > > > > > This is actually a real regression, which can be trivially exercised by > > eg logging out from two connections to an iSCSI target. > > (Our QA tripped across that one). > > So I'd rather have to have it fixed reasonably soon. > > > > While 'get_device' is IMO the 'correct' solution it surely warrants a > > broader discussion, plus one would need to audit all callers to check > > the return value. If we were going down that route we should probably > > add a __must_check to get_device(), too. > > But again, this will probably drag out for quite some time, and I'd > > prefer to have the fix in the meantime. > > > > Cheers, > > > > Hannes > > We have 2 reproducible test cases, this patch fixes one of them, > which was a continually oscillating FC target port w/short dev_loss_tmo. > I'm still waiting for a report on the iSCSI test. The code looks good. > We need to get some kind of fix for this sooner rather than later. > > Reviewed-by: Ewan D. Milne <emilne@redhat.com> Report here is that Hannes's patch fixes our failing iSCSI test also. Martin/James, can we get this in please?
Hannes, > As it turned out device_get() doesn't use kref_get_unless_zero(), > so we will be always getting a device pointer. > So we need to check for the device state in __scsi_remove_target() > to avoid tripping over deleted objects. Applied to 4.15/scsi-fixes. Thanks!
On Mon, 2017-12-18 at 22:37 -0500, Martin K. Petersen wrote: > Hannes, > > > As it turned out device_get() doesn't use kref_get_unless_zero(), > > so we will be always getting a device pointer. > > So we need to check for the device state in __scsi_remove_target() > > to avoid tripping over deleted objects. > > Applied to 4.15/scsi-fixes. Thanks! Hello Martin, Since that patch fixes an issue that was introduced in kernel v4.14 but did not have a "Cc: stable" tag, should this patch be sent to Greg for inclusion in the kernel v4.14.x series? Thanks, Bart.
Bart, >> Applied to 4.15/scsi-fixes. Thanks! > > Since that patch fixes an issue that was introduced in kernel v4.14 > but did not have a "Cc: stable" tag, should this patch be sent to Greg > for inclusion in the kernel v4.14.x series? Yes. Hannes?
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index cbc0fe2..a04678b 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1411,7 +1411,10 @@ static void __scsi_remove_target(struct scsi_target *starget) * check. */ if (sdev->channel != starget->channel || - sdev->id != starget->id || + sdev->id != starget->id) + continue; + if (sdev->sdev_state == SDEV_DEL || + sdev->sdev_state == SDEV_CANCEL || !get_device(&sdev->sdev_gendev)) continue; spin_unlock_irqrestore(shost->host_lock, flags);
As it turned out device_get() doesn't use kref_get_unless_zero(), so we will be always getting a device pointer. So we need to check for the device state in __scsi_remove_target() to avoid tripping over deleted objects. Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()") Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/scsi_sysfs.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)