Message ID | HK0P153MB0273A1B109CE3A33F0984D34BFDE0@HK0P153MB0273.APCP153.PROD.OUTLOOK.COM (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | SCSI low level driver: how to prevent I/O upon hibernation? | expand |
Hello Dexuan, On Fri, Apr 10, 2020 at 1:44 PM Dexuan Cui <decui@microsoft.com> wrote: > > Hi all, > Can you please recommend the standard way to prevent the upper layer SCSI > driver from submitting new I/O requests when the system is doing hibernation? > > Actually I already asked the question on 5/30 last year: > https://marc.info/?l=linux-scsi&m=155918927116283&w=2 > and I thought all the sdevs are suspended and resumed automatically in > drivers/scsi/scsi_pm.c, and the low level SCSI adapter driver (i.e. hv_storvsc) > only needs to suspend/resume the state of the adapter itself. However, it looks > this is not true, because today I got such a panic in a v5.6 Linux VM running on > Hyper-V: the 'suspend' part of the hibernation process finished without any > issue, but when the VM was trying to resume back from the 'new' kernel to the > 'old' kernel, these events happened: > > 1. the new kernel loaded the saved state from disk to memory. > > 2. the new kernel quiesced the devices, including the SCSI DVD device > controlled by the hv_storvsc low level SCSI driver, i.e. > drivers/scsi/storvsc_drv.c: storvsc_suspend() was called and the related vmbus > ringbuffer was freed. > > 3. However, disk_events_workfn() -> ... -> cdrom_check_events() -> ... > -> scsi_queue_rq() -> ... -> storvsc_queuecommand() was still trying to > submit I/O commands to the freed vmbus ringbuffer, and as a result, a NULL > pointer dereference panic happened. Last time I replied to you in above link: "scsi_device_quiesce() has been called by scsi_dev_type_suspend() to prevent any non-pm request from entering queue." That meant no any normal FS request can enter scsi queue after suspend, however request with BLK_MQ_REQ_PREEMPT is still allowed to be queued to LLD after suspend. So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT request is still to be handled. Thanks, Ming Lei
> From: Ming Lei <tom.leiming@gmail.com> > Sent: Friday, April 10, 2020 12:43 AM > To: Dexuan Cui <decui@microsoft.com> > > Hello Dexuan, > > On Fri, Apr 10, 2020 at 1:44 PM Dexuan Cui <decui@microsoft.com> wrote: > > > > Hi all, > > Can you please recommend the standard way to prevent the upper layer SCSI > > driver from submitting new I/O requests when the system is doing > > hibernation? > > > > Actually I already asked the question on 5/30 last year ... > > and I thought all the sdevs are suspended and resumed automatically in > > drivers/scsi/scsi_pm.c, and the low level SCSI adapter driver (i.e. hv_storvsc) > > only needs to suspend/resume the state of the adapter itself. However, it > > looks > > this is not true, because today I got such a panic in a v5.6 Linux VM running > > on Hyper-V: the 'suspend' part of the hibernation process finished without > > any issue, but when the VM was trying to resume back from the 'new' > > kernel to the 'old' kernel, these events happened: > > > > 1. the new kernel loaded the saved state from disk to memory. > > > > 2. the new kernel quiesced the devices, including the SCSI DVD device > > controlled by the hv_storvsc low level SCSI driver, i.e. > > drivers/scsi/storvsc_drv.c: storvsc_suspend() was called and the related > > vmbus ringbuffer was freed. > > > > 3. However, disk_events_workfn() -> ... -> cdrom_check_events() -> ... > > -> scsi_queue_rq() -> ... -> storvsc_queuecommand() was still trying to > > submit I/O commands to the freed vmbus ringbuffer, and as a result, a NULL > > pointer dereference panic happened. > > Last time I replied to you in above link: > > "scsi_device_quiesce() has been called by scsi_dev_type_suspend() to prevent > any non-pm request from entering queue." > > That meant no any normal FS request can enter scsi queue after suspend, > however request with BLK_MQ_REQ_PREEMPT is still allowed to be queued > to LLD after suspend. > > So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT > request is still to be handled. > > Thanks, > Ming Lei Actually I think I found the APIs with the help of Long Li: scsi_host_block and scsi_host_unblock(). The new APIs were just added on 2/28. :-) Unluckily scsi_host_block() doesn't allow a state transition from SDEV_QUIESCE to SDEV_BLOCK and returns -EINVAL for that, so I made the below patch and it looks hibernation can work reliably for me now. Please let me know if the change to scsi_device_set_state() is OK. If the patch looks good, I plan to split and post it sometime next week. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 47835c4b4ee0..06c260f6cdae 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2284,6 +2284,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) switch (oldstate) { case SDEV_RUNNING: case SDEV_CREATED_BLOCK: + case SDEV_QUIESCE: case SDEV_OFFLINE: break; default: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index fb41636519ee..fd51d2f03778 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1948,6 +1948,11 @@ static int storvsc_suspend(struct hv_device *hv_dev) struct storvsc_device *stor_device = hv_get_drvdata(hv_dev); struct Scsi_Host *host = stor_device->host; struct hv_host_device *host_dev = shost_priv(host); + int ret; + + ret = scsi_host_block(host); + if (ret) + return ret; storvsc_wait_to_drain(stor_device); @@ -1968,10 +1973,15 @@ static int storvsc_suspend(struct hv_device *hv_dev) static int storvsc_resume(struct hv_device *hv_dev) { + struct storvsc_device *stor_device = hv_get_drvdata(hv_dev); + struct Scsi_Host *host = stor_device->host; int ret; ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size, hv_dev_is_fc(hv_dev)); + if (!ret) + ret = scsi_host_unblock(host, SDEV_RUNNING); + return ret; } Thanks, -- Dexuan
On 2020-04-10 23:01, Dexuan Cui wrote:
> Please let me know if the change to scsi_device_set_state() is OK.
Hadn't Ming Lei already root-caused this issue for you? From his e-mail:
"So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT
request is still to be handled."
Please follow that advice.
Thanks,
Bart.
> From: Bart Van Assche <bvanassche@acm.org> > Sent: Saturday, April 11, 2020 8:03 AM > > On 2020-04-10 23:01, Dexuan Cui wrote: > > Please let me know if the change to scsi_device_set_state() is OK. > > Hadn't Ming Lei already root-caused this issue for you? From his e-mail: > "So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT > request is still to be handled." > > Please follow that advice. > > Bart. Hi Bart, Ming, I agree Ming has root-caused the issue, but it looks the advice can not apply to the hibernation scenario. :-) Sorry for my lack of knowledge of the complex SCSI subsystems -- could you please elaborate on what a low level SCSI device driver (like hv_storvsc) should do to safely save/restore the device state upon hibernation? The nature of "free related vmbus ringbuffer" in hv_storvsc is that: the driver can not handle any I/O after the device is quiesced in software_resume() -> load_image_and_restore() -> hibernation_restore() -> dpm_suspend_start() -> ... -> storvsc_suspend(). BTW, after the SCSI device is quiesced, the hibernation's resume path also quiesces other devices, disables non-boot CPUs, and finally jumps to the old kernel's entry point where the old kernel was suspended, and the old kernel will resume back. My intuition is that the upper level SCSI layer should provide an API to flush any pending I/O and block any new I/O after a SCSI device is "quiesced"? -- it looks scsi_host_block()/scsi_host_unblock() are such APIs, which are already used by drivers/scsi/aacraid/linit.c: aac_suspend()/aac_resume(). That's why I proposed the patch of the same thing for hv_storvsc, and it looks the patch works for me: without the patch I can easily hit the panic I reported in the first email; with the patch, I have successfully done more than 30 rounds of hibernation without the panic. However, it looks you implied my intuition is wrong and it's *expected* that the upper level SCSI layer can still submit I/O requests with the BLK_MQ_REQ_PREEMPT flag after the SCSI device is "quiesced"? If this is the case, then how is hv_storvsc supposed to handle the I/O after the SCSI device is quiesced? I can keep the related vmbus ringbuffer, but the real issue is: the driver is unable to handle any I/O at all since the vmbus connection to the Hyper-V host is disconnected soon, after the SCSI device is quiesced. Should hv_storvsc return an error for such I/O, or block such I/O until the SCSI device is resumed? -- These don't look good to me, and I really think the upper level SCSI layer should provide an API to block any new I/O after a SCSI device is "quiesced" -- again, can you please clarify if scsi_host_block()/scsi_host_unblock() are such APIs? Looking forward to your replies! Thanks, -- Dexuan
> From: linux-hyperv-owner@vger.kernel.org > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui > Sent: Saturday, April 11, 2020 10:20 AM > To: Bart Van Assche <bvanassche@acm.org>; Ming Lei > <tom.leiming@gmail.com>; Martin K. Petersen <martin.petersen@oracle.com> > Cc: linux-scsi@vger.kernel.org; Christoph Hellwig <hch@lst.de>; > linux-hyperv@vger.kernel.org; Long Li <longli@microsoft.com>; vkuznets > <vkuznets@redhat.com>; Michael Kelley <mikelley@microsoft.com>; KY > Srinivasan <kys@microsoft.com>; Olaf Hering <olaf@aepfle.de>; Stephen > Hemminger <sthemmin@microsoft.com>; linux-kernel@vger.kernel.org > Subject: RE: SCSI low level driver: how to prevent I/O upon hibernation? > > > From: Bart Van Assche <bvanassche@acm.org> > > Sent: Saturday, April 11, 2020 8:03 AM > > > > On 2020-04-10 23:01, Dexuan Cui wrote: > > > Please let me know if the change to scsi_device_set_state() is OK. > > > > Hadn't Ming Lei already root-caused this issue for you? From his e-mail: > > "So you can't free related vmbus ringbuffer cause BLK_MQ_REQ_PREEMPT > > request is still to be handled." > > > > Please follow that advice. > > > > Bart. > > Hi Bart, Ming, > I agree Ming has root-caused the issue, but it looks the advice can not > apply to the hibernation scenario. :-) > > Sorry for my lack of knowledge of the complex SCSI subsystems -- could > you please elaborate on what a low level SCSI device driver (like hv_storvsc) > should do to safely save/restore the device state upon hibernation? > > The nature of "free related vmbus ringbuffer" in hv_storvsc is that: the > driver can not handle any I/O after the device is quiesced in > software_resume() -> load_image_and_restore() -> hibernation_restore() > -> dpm_suspend_start() -> ... -> storvsc_suspend(). BTW, after the SCSI > device is quiesced, the hibernation's resume path also quiesces other > devices, disables non-boot CPUs, and finally jumps to the old kernel's > entry point where the old kernel was suspended, and the old kernel will > resume back. > > My intuition is that the upper level SCSI layer should provide an API to > flush any pending I/O and block any new I/O after a SCSI device is > "quiesced"? -- it looks scsi_host_block()/scsi_host_unblock() are such > APIs, which are already used by > drivers/scsi/aacraid/linit.c: aac_suspend()/aac_resume(). > > That's why I proposed the patch of the same thing for hv_storvsc, and > it looks the patch works for me: without the patch I can easily hit the > panic I reported in the first email; with the patch, I have successfully > done more than 30 rounds of hibernation without the panic. > > However, it looks you implied my intuition is wrong and it's *expected* > that the upper level SCSI layer can still submit I/O requests with the > BLK_MQ_REQ_PREEMPT flag after the SCSI device is "quiesced"? > > If this is the case, then how is hv_storvsc supposed to handle the I/O > after the SCSI device is quiesced? I can keep the related vmbus ringbuffer, > but the real issue is: the driver is unable to handle any I/O at all since the > vmbus connection to the Hyper-V host is disconnected soon, after > the SCSI device is quiesced. Should hv_storvsc return an error for such I/O, > or block such I/O until the SCSI device is resumed? -- These don't look > good to me, and I really think the upper level SCSI layer should provide > an API to block any new I/O after a SCSI device is "quiesced" -- again, can > you please clarify if scsi_host_block()/scsi_host_unblock() are such APIs? > > Looking forward to your replies! > > Thanks, > -- Dexuan It looks scsi_host_block() and scsi_host_unblock() are just the APIs I need. I have run my hibernation test with the APIs more than 1000 rounds and the VM is still running fine without the panic I reported previously when I didn't use the APIs. FYI: the aacraid driver is the first user of the APIs: commit 3d3ca53b163914c1397289d0c2ee6d2f52362dcc Author: Hannes Reinecke <hare@suse.de> Date: Fri Feb 28 08:53:14 2020 +0100 scsi: aacraid: use scsi_host_(block,unblock) to block I/O Use scsi_host_block() and scsi_host_unblock() instead of scsi_block_requests()/scsi_unblock_requests() to block and unblock I/O. This has the advantage that the block layer will stop sending I/O to the adapter instead of having the SCSI midlayer requeueing I/O internally. Link: https://lore.kernel.org/r/20200228075318.91255-10-hare@suse.de Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Balsundar P < Balsundar.P@microchip.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> BTW, I suspect aac_suspend() -> scsi_host_block() is not relly working because scsi_host_block() doesn't allow a state transition from SDEV_QUIESCE to SDEV_BLOCK and returns -EINVAL for that. aac_suspend() should check the return value of scsi_host_block(), and my change to scsi_device_set_state() should be needed to avoid the -EINVAL error. If there is no objection, I plan to send some patches later. Thanks, -- Dexuan
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index fb41636519ee..dcfb0a820977 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1949,6 +1949,8 @@ static int storvsc_suspend(struct hv_device *hv_dev) struct Scsi_Host *host = stor_device->host; struct hv_host_device *host_dev = shost_priv(host); + scsi_block_requests(host); + storvsc_wait_to_drain(stor_device); drain_workqueue(host_dev->handle_error_wq); @@ -1968,10 +1970,14 @@ static int storvsc_suspend(struct hv_device *hv_dev) static int storvsc_resume(struct hv_device *hv_dev) { + struct storvsc_device *stor_device = hv_get_drvdata(hv_dev); + struct Scsi_Host *host = stor_device->host; int ret; ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size, hv_dev_is_fc(hv_dev)); + if (!ret) + scsi_unblock_requests(host); return ret; }