Message ID | 20240820172256.903251-1-philipchen@chromium.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] virtio_pmem: Check device status before requesting flush | expand |
On 8/20/24 10:22 AM, Philip Chen wrote: > If a pmem device is in a bad status, the driver side could wait for > host ack forever in virtio_pmem_flush(), causing the system to hang. > > So add a status check in the beginning of virtio_pmem_flush() to return > early if the device is not activated. > > Signed-off-by: Philip Chen <philipchen@chromium.org> > --- > > v2: > - Remove change id from the patch description > - Add more details to the patch description > > drivers/nvdimm/nd_virtio.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > index 35c8fbbba10e..97addba06539 100644 > --- a/drivers/nvdimm/nd_virtio.c > +++ b/drivers/nvdimm/nd_virtio.c > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) > unsigned long flags; > int err, err1; > > + /* > + * Don't bother to submit the request to the device if the device is > + * not acticated. s/acticated/activated/ > + */ > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); > + return -EIO; > + } > + > might_sleep(); > req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); > if (!req_data)
Hi, On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@intel.com> wrote: > > > > On 8/20/24 10:22 AM, Philip Chen wrote: > > If a pmem device is in a bad status, the driver side could wait for > > host ack forever in virtio_pmem_flush(), causing the system to hang. > > > > So add a status check in the beginning of virtio_pmem_flush() to return > > early if the device is not activated. > > > > Signed-off-by: Philip Chen <philipchen@chromium.org> > > --- > > > > v2: > > - Remove change id from the patch description > > - Add more details to the patch description > > > > drivers/nvdimm/nd_virtio.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > > index 35c8fbbba10e..97addba06539 100644 > > --- a/drivers/nvdimm/nd_virtio.c > > +++ b/drivers/nvdimm/nd_virtio.c > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) > > unsigned long flags; > > int err, err1; > > > > + /* > > + * Don't bother to submit the request to the device if the device is > > + * not acticated. > > s/acticated/activated/ Thanks for the review. I'll fix this typo in v3. In addition to this typo, does anyone have any other concerns? > > > + */ > > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { > > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); > > + return -EIO; > > + } > > + > > might_sleep(); > > req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); > > if (!req_data)
Philip Chen wrote: > Hi, > > On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@intel.com> wrote: > > > > > > > > On 8/20/24 10:22 AM, Philip Chen wrote: > > > If a pmem device is in a bad status, the driver side could wait for > > > host ack forever in virtio_pmem_flush(), causing the system to hang. > > > > > > So add a status check in the beginning of virtio_pmem_flush() to return > > > early if the device is not activated. > > > > > > Signed-off-by: Philip Chen <philipchen@chromium.org> > > > --- > > > > > > v2: > > > - Remove change id from the patch description > > > - Add more details to the patch description > > > > > > drivers/nvdimm/nd_virtio.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > > > index 35c8fbbba10e..97addba06539 100644 > > > --- a/drivers/nvdimm/nd_virtio.c > > > +++ b/drivers/nvdimm/nd_virtio.c > > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) > > > unsigned long flags; > > > int err, err1; > > > > > > + /* > > > + * Don't bother to submit the request to the device if the device is > > > + * not acticated. > > > > s/acticated/activated/ > > Thanks for the review. > I'll fix this typo in v3. > > In addition to this typo, does anyone have any other concerns? I'm not super familiar with the virtio-pmem workings and the needs reset flag is barely used. Did you actually experience this hang? How was this found? What is the user visible issue and how critical is it? Thanks, Ira > > > > > > + */ > > > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { > > > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); > > > + return -EIO; > > > + } > > > + > > > might_sleep(); > > > req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); > > > if (!req_data)
Hi On Wed, Aug 21, 2024 at 1:37 PM Ira Weiny <ira.weiny@intel.com> wrote: > > Philip Chen wrote: > > Hi, > > > > On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@intel.com> wrote: > > > > > > > > > > > > On 8/20/24 10:22 AM, Philip Chen wrote: > > > > If a pmem device is in a bad status, the driver side could wait for > > > > host ack forever in virtio_pmem_flush(), causing the system to hang. > > > > > > > > So add a status check in the beginning of virtio_pmem_flush() to return > > > > early if the device is not activated. > > > > > > > > Signed-off-by: Philip Chen <philipchen@chromium.org> > > > > --- > > > > > > > > v2: > > > > - Remove change id from the patch description > > > > - Add more details to the patch description > > > > > > > > drivers/nvdimm/nd_virtio.c | 9 +++++++++ > > > > 1 file changed, 9 insertions(+) > > > > > > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > > > > index 35c8fbbba10e..97addba06539 100644 > > > > --- a/drivers/nvdimm/nd_virtio.c > > > > +++ b/drivers/nvdimm/nd_virtio.c > > > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) > > > > unsigned long flags; > > > > int err, err1; > > > > > > > > + /* > > > > + * Don't bother to submit the request to the device if the device is > > > > + * not acticated. > > > > > > s/acticated/activated/ > > > > Thanks for the review. > > I'll fix this typo in v3. > > > > In addition to this typo, does anyone have any other concerns? > > I'm not super familiar with the virtio-pmem workings and the needs reset > flag is barely used. > > Did you actually experience this hang? How was this found? What is the > user visible issue and how critical is it? Yes, I experienced the problem when trying to enable hibernation for a VM. In the typical hibernation flow, the kernel would try to: (1) freeze the processes (2) freeze the devices (3) take a snapshot of the memory (4) thaw the devices (5) write the snapshot to the storage (6) power off the system (or perform platform-specific action) In my case, I see VMM fail to re-activate the virtio_pmem device in (4). (And therefore the virtio_pmem device side sets VIRTIO_CONFIG_S_NEEDS_RESET.) As a result, when the kernel tries to power off the virtio_pmem device in (6), the system would hang in virtio_pmem_flush() if this patch is not added. To fix the root cause of this issue, I sent another patch to add freeze/restore PM callbacks to the virtio_pmem driver: https://lore.kernel.org/lkml/20240815004617.2325269-1-philipchen@chromium.org/ (Please also help review that patch.) However, I think this patch is still helpful since the system shouldn't hang in virtio_pmem_flush() regardless of the device state. > > Thanks, > Ira > > > > > > > > > > + */ > > > > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { > > > > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); > > > > + return -EIO; > > > > + } > > > > + > > > > might_sleep(); > > > > req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); > > > > if (!req_data) > >
diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c index 35c8fbbba10e..97addba06539 100644 --- a/drivers/nvdimm/nd_virtio.c +++ b/drivers/nvdimm/nd_virtio.c @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) unsigned long flags; int err, err1; + /* + * Don't bother to submit the request to the device if the device is + * not acticated. + */ + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); + return -EIO; + } + might_sleep(); req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); if (!req_data)
If a pmem device is in a bad status, the driver side could wait for host ack forever in virtio_pmem_flush(), causing the system to hang. So add a status check in the beginning of virtio_pmem_flush() to return early if the device is not activated. Signed-off-by: Philip Chen <philipchen@chromium.org> --- v2: - Remove change id from the patch description - Add more details to the patch description drivers/nvdimm/nd_virtio.c | 9 +++++++++ 1 file changed, 9 insertions(+)