Message ID | ae9f20dc8873f2027f7b3c5d2aaa0bdfe06850b8.1554756534.git.alifm@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fio-ccw fixes for kernel stacktraces | expand |
On Mon, 8 Apr 2019 17:05:33 -0400 Farhan Ali <alifm@linux.ibm.com> wrote: > When releasing the vfio-ccw mdev, we currently do not release > any existing channel program and it's pinned pages. This can s/it's/its/ > lead to the following warning: > > [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1] > > .... > > 1038876.561921] Call Trace: > [1038876.561935] ([<00000009897fb870>] 0x9897fb870) > [1038876.561949] [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1] > [1038876.561965] [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio] > [1038876.561978] [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio] > [1038876.562024] [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm] > [1038876.562045] [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm] > [1038876.562065] [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm] > [1038876.562083] [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm] > [1038876.562098] [<00000000003c2dc4>] __fput+0x144/0x228 > [1038876.562113] [<000000000016ee82>] task_work_run+0x8a/0xd8 > [1038876.562125] [<000000000014c7a8>] do_exit+0x5d8/0xd90 > [1038876.562140] [<000000000014d084>] do_group_exit+0xc4/0xc8 > [1038876.562155] [<000000000015c046>] get_signal+0x9ae/0xa68 > [1038876.562169] [<0000000000108d66>] do_signal+0x66/0x768 > [1038876.562185] [<0000000000b9e37e>] system_call+0x1ea/0x2d8 > [1038876.562195] 2 locks held by qemu-system-s39/144727: > [1038876.562205] #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio] > [1038876.562230] #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1] > [1038876.562250] Last Breaking-Event-Address: > [1038876.562262] [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1] > [1038876.562272] irq event stamp: 4236481 > [1038876.562287] hardirqs last enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740 > [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740 > [1038876.562311] softirqs last enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598 > [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108 > [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]--- > > Similarly we do not free the channel program when we are removing > the vfio-ccw device. Let's fix this by resetting the device and freeing > the channel program and pinned pages in the release path. For the remove > path we can just quiesce the device, since in the remove path the mediated > device is going away for good and so we don't need to do a full reset. > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > index ec2f796c..aacd528 100644 > --- a/drivers/s390/cio/vfio_ccw_ops.c > +++ b/drivers/s390/cio/vfio_ccw_ops.c > @@ -133,11 +133,12 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev) > > if ((private->state != VFIO_CCW_STATE_NOT_OPER) && > (private->state != VFIO_CCW_STATE_STANDBY)) { > - if (!vfio_ccw_mdev_reset(mdev)) > + if (!vfio_ccw_sch_quiesce(private->sch)) > private->state = VFIO_CCW_STATE_STANDBY; > /* The state will be NOT_OPER on error. */ > } > > + cp_free(&private->cp); > private->mdev = NULL; > atomic_inc(&private->avail); > > @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev) > dev_get_drvdata(mdev_parent_dev(mdev)); > int i; > > + if ((private->state != VFIO_CCW_STATE_NOT_OPER) && > + (private->state != VFIO_CCW_STATE_STANDBY)) { > + if (!vfio_ccw_mdev_reset(mdev)) > + private->state = VFIO_CCW_STATE_STANDBY; > + /* The state will be NOT_OPER on error. */ > + } > + > + cp_free(&private->cp); > vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, > &private->nb); > Looks good to me, would love a review/ack.
On 04/11/2019 12:27 PM, Cornelia Huck wrote: > On Mon, 8 Apr 2019 17:05:33 -0400 > Farhan Ali<alifm@linux.ibm.com> wrote: > >> When releasing the vfio-ccw mdev, we currently do not release >> any existing channel program and it's pinned pages. This can > s/it's/its/ > Will fix. Thanks for taking a look at the patches. Thanks Farhan
On Thu, 11 Apr 2019 16:39:32 -0400 Farhan Ali <alifm@linux.ibm.com> wrote: > On 04/11/2019 12:27 PM, Cornelia Huck wrote: > > On Mon, 8 Apr 2019 17:05:33 -0400 > > Farhan Ali<alifm@linux.ibm.com> wrote: > > > >> When releasing the vfio-ccw mdev, we currently do not release > >> any existing channel program and it's pinned pages. This can > > s/it's/its/ > > > > Will fix. > > Thanks for taking a look at the patches. NP, I really want to queue them soon :) Just give me a quick hint about dependencies: Do I need anything else before I can queue patch 1 and 3? I've lost track a bit...
On 04/12/2019 04:12 AM, Cornelia Huck wrote: > On Thu, 11 Apr 2019 16:39:32 -0400 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> On 04/11/2019 12:27 PM, Cornelia Huck wrote: >>> On Mon, 8 Apr 2019 17:05:33 -0400 >>> Farhan Ali<alifm@linux.ibm.com> wrote: >>> >>>> When releasing the vfio-ccw mdev, we currently do not release >>>> any existing channel program and it's pinned pages. This can >>> s/it's/its/ >>> >> >> Will fix. >> >> Thanks for taking a look at the patches. > > NP, I really want to queue them soon :) > > Just give me a quick hint about dependencies: Do I need anything else > before I can queue patch 1 and 3? I've lost track a bit... > > Patch 1 can be queued without any problems and it has no dependencies. Patch 3 would depend on your [PATCH v4 1/6] vfio-ccw: make it safe to access channel programs. Since I want to call cp_free without any problems. I know you are waiting for acks/r-bs for patch 3, so if no one has problems then we can go ahead and queue it. Thanks Farhan
On 4/8/19 5:05 PM, Farhan Ali wrote: > When releasing the vfio-ccw mdev, we currently do not release > any existing channel program and it's pinned pages. This can > lead to the following warning: > > [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1] > > .... > > 1038876.561921] Call Trace: > [1038876.561935] ([<00000009897fb870>] 0x9897fb870) > [1038876.561949] [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1] > [1038876.561965] [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio] > [1038876.561978] [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio] > [1038876.562024] [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm] > [1038876.562045] [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm] > [1038876.562065] [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm] > [1038876.562083] [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm] > [1038876.562098] [<00000000003c2dc4>] __fput+0x144/0x228 > [1038876.562113] [<000000000016ee82>] task_work_run+0x8a/0xd8 > [1038876.562125] [<000000000014c7a8>] do_exit+0x5d8/0xd90 > [1038876.562140] [<000000000014d084>] do_group_exit+0xc4/0xc8 > [1038876.562155] [<000000000015c046>] get_signal+0x9ae/0xa68 > [1038876.562169] [<0000000000108d66>] do_signal+0x66/0x768 > [1038876.562185] [<0000000000b9e37e>] system_call+0x1ea/0x2d8 > [1038876.562195] 2 locks held by qemu-system-s39/144727: > [1038876.562205] #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio] > [1038876.562230] #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1] > [1038876.562250] Last Breaking-Event-Address: > [1038876.562262] [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1] > [1038876.562272] irq event stamp: 4236481 > [1038876.562287] hardirqs last enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740 > [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740 > [1038876.562311] softirqs last enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598 > [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108 > [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]--- > > Similarly we do not free the channel program when we are removing > the vfio-ccw device. Let's fix this by resetting the device and freeing > the channel program and pinned pages in the release path. For the remove > path we can just quiesce the device, since in the remove path the mediated > device is going away for good and so we don't need to do a full reset. > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > --- > drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c > index ec2f796c..aacd528 100644 > --- a/drivers/s390/cio/vfio_ccw_ops.c > +++ b/drivers/s390/cio/vfio_ccw_ops.c > @@ -133,11 +133,12 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev) > > if ((private->state != VFIO_CCW_STATE_NOT_OPER) && > (private->state != VFIO_CCW_STATE_STANDBY)) { > - if (!vfio_ccw_mdev_reset(mdev)) > + if (!vfio_ccw_sch_quiesce(private->sch)) > private->state = VFIO_CCW_STATE_STANDBY; > /* The state will be NOT_OPER on error. */ > } > > + cp_free(&private->cp); > private->mdev = NULL; > atomic_inc(&private->avail); > > @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev) > dev_get_drvdata(mdev_parent_dev(mdev)); > int i; > > + if ((private->state != VFIO_CCW_STATE_NOT_OPER) && > + (private->state != VFIO_CCW_STATE_STANDBY)) { > + if (!vfio_ccw_mdev_reset(mdev)) > + private->state = VFIO_CCW_STATE_STANDBY; > + /* The state will be NOT_OPER on error. */ > + } > + > + cp_free(&private->cp); > vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, > &private->nb); > > This seems sane to me. Acked-by: Eric Farman <farman@linux.ibm.com>
On 4/12/19 10:13 AM, Farhan Ali wrote: > > > On 04/12/2019 04:12 AM, Cornelia Huck wrote: >> On Thu, 11 Apr 2019 16:39:32 -0400 >> Farhan Ali <alifm@linux.ibm.com> wrote: >> >>> On 04/11/2019 12:27 PM, Cornelia Huck wrote: >>>> On Mon, 8 Apr 2019 17:05:33 -0400 >>>> Farhan Ali<alifm@linux.ibm.com> wrote: >>>>> When releasing the vfio-ccw mdev, we currently do not release >>>>> any existing channel program and it's pinned pages. This can >>>> s/it's/its/ >>> >>> Will fix. >>> >>> Thanks for taking a look at the patches. >> >> NP, I really want to queue them soon :) >> >> Just give me a quick hint about dependencies: Do I need anything else >> before I can queue patch 1 and 3? I've lost track a bit... >> >> > > Patch 1 can be queued without any problems and it has no dependencies. > > Patch 3 would depend on your [PATCH v4 1/6] vfio-ccw: make it safe to > access channel programs. Since I want to call cp_free without any problems. I'm sorry; I'm working as a LIFO queue this afternoon. At least I reviewed patch 1 of that series. :) > > I know you are waiting for acks/r-bs for patch 3, so if no one has > problems then we can go ahead and queue it. > > > Thanks > Farhan >
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c index ec2f796c..aacd528 100644 --- a/drivers/s390/cio/vfio_ccw_ops.c +++ b/drivers/s390/cio/vfio_ccw_ops.c @@ -133,11 +133,12 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev) if ((private->state != VFIO_CCW_STATE_NOT_OPER) && (private->state != VFIO_CCW_STATE_STANDBY)) { - if (!vfio_ccw_mdev_reset(mdev)) + if (!vfio_ccw_sch_quiesce(private->sch)) private->state = VFIO_CCW_STATE_STANDBY; /* The state will be NOT_OPER on error. */ } + cp_free(&private->cp); private->mdev = NULL; atomic_inc(&private->avail); @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev) dev_get_drvdata(mdev_parent_dev(mdev)); int i; + if ((private->state != VFIO_CCW_STATE_NOT_OPER) && + (private->state != VFIO_CCW_STATE_STANDBY)) { + if (!vfio_ccw_mdev_reset(mdev)) + private->state = VFIO_CCW_STATE_STANDBY; + /* The state will be NOT_OPER on error. */ + } + + cp_free(&private->cp); vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY, &private->nb);
When releasing the vfio-ccw mdev, we currently do not release any existing channel program and it's pinned pages. This can lead to the following warning: [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1] .... 1038876.561921] Call Trace: [1038876.561935] ([<00000009897fb870>] 0x9897fb870) [1038876.561949] [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1] [1038876.561965] [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio] [1038876.561978] [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio] [1038876.562024] [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm] [1038876.562045] [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm] [1038876.562065] [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm] [1038876.562083] [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm] [1038876.562098] [<00000000003c2dc4>] __fput+0x144/0x228 [1038876.562113] [<000000000016ee82>] task_work_run+0x8a/0xd8 [1038876.562125] [<000000000014c7a8>] do_exit+0x5d8/0xd90 [1038876.562140] [<000000000014d084>] do_group_exit+0xc4/0xc8 [1038876.562155] [<000000000015c046>] get_signal+0x9ae/0xa68 [1038876.562169] [<0000000000108d66>] do_signal+0x66/0x768 [1038876.562185] [<0000000000b9e37e>] system_call+0x1ea/0x2d8 [1038876.562195] 2 locks held by qemu-system-s39/144727: [1038876.562205] #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio] [1038876.562230] #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1] [1038876.562250] Last Breaking-Event-Address: [1038876.562262] [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1] [1038876.562272] irq event stamp: 4236481 [1038876.562287] hardirqs last enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740 [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740 [1038876.562311] softirqs last enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598 [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108 [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]--- Similarly we do not free the channel program when we are removing the vfio-ccw device. Let's fix this by resetting the device and freeing the channel program and pinned pages in the release path. For the remove path we can just quiesce the device, since in the remove path the mediated device is going away for good and so we don't need to do a full reset. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> --- drivers/s390/cio/vfio_ccw_ops.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)