Message ID | 20201022171209.19494-2-akrowiak@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | s390/vfio-ap: dynamic configuration support | expand |
Hi Tony,
I love your patch! Perhaps something to improve:
[auto build test WARNING on s390/features]
[also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/572c94c40a76754d49f07e4e383097d2db132f8c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
git checkout 572c94c40a76754d49f07e4e383097d2db132f8c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> drivers/s390/crypto/vfio_ap_ops.c:1119:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
1119 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
| ^~~~~~~~~~~~~~~~~~~~~~~~
vim +/vfio_ap_mdev_reset_queue +1119 drivers/s390/crypto/vfio_ap_ops.c
258287c994de8f Tony Krowiak 2018-09-25 1118
ec89b55e3bce7c Pierre Morel 2019-05-21 @1119 int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
46a7263d4746a2 Tony Krowiak 2018-09-25 1120 unsigned int retry)
46a7263d4746a2 Tony Krowiak 2018-09-25 1121 {
46a7263d4746a2 Tony Krowiak 2018-09-25 1122 struct ap_queue_status status;
ec89b55e3bce7c Pierre Morel 2019-05-21 1123 int retry2 = 2;
ec89b55e3bce7c Pierre Morel 2019-05-21 1124 int apqn = AP_MKQID(apid, apqi);
46a7263d4746a2 Tony Krowiak 2018-09-25 1125
46a7263d4746a2 Tony Krowiak 2018-09-25 1126 do {
ec89b55e3bce7c Pierre Morel 2019-05-21 1127 status = ap_zapq(apqn);
46a7263d4746a2 Tony Krowiak 2018-09-25 1128 switch (status.response_code) {
46a7263d4746a2 Tony Krowiak 2018-09-25 1129 case AP_RESPONSE_NORMAL:
ec89b55e3bce7c Pierre Morel 2019-05-21 1130 while (!status.queue_empty && retry2--) {
ec89b55e3bce7c Pierre Morel 2019-05-21 1131 msleep(20);
ec89b55e3bce7c Pierre Morel 2019-05-21 1132 status = ap_tapq(apqn, NULL);
ec89b55e3bce7c Pierre Morel 2019-05-21 1133 }
024cdcdbf3cf99 Halil Pasic 2019-09-03 1134 WARN_ON_ONCE(retry2 <= 0);
46a7263d4746a2 Tony Krowiak 2018-09-25 1135 return 0;
46a7263d4746a2 Tony Krowiak 2018-09-25 1136 case AP_RESPONSE_RESET_IN_PROGRESS:
46a7263d4746a2 Tony Krowiak 2018-09-25 1137 case AP_RESPONSE_BUSY:
46a7263d4746a2 Tony Krowiak 2018-09-25 1138 msleep(20);
46a7263d4746a2 Tony Krowiak 2018-09-25 1139 break;
46a7263d4746a2 Tony Krowiak 2018-09-25 1140 default:
46a7263d4746a2 Tony Krowiak 2018-09-25 1141 /* things are really broken, give up */
46a7263d4746a2 Tony Krowiak 2018-09-25 1142 return -EIO;
46a7263d4746a2 Tony Krowiak 2018-09-25 1143 }
46a7263d4746a2 Tony Krowiak 2018-09-25 1144 } while (retry--);
46a7263d4746a2 Tony Krowiak 2018-09-25 1145
46a7263d4746a2 Tony Krowiak 2018-09-25 1146 return -EBUSY;
46a7263d4746a2 Tony Krowiak 2018-09-25 1147 }
46a7263d4746a2 Tony Krowiak 2018-09-25 1148
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 10/22/20 3:44 PM, kernel test robot wrote: > Hi Tony, > > I love your patch! Perhaps something to improve: > > [auto build test WARNING on s390/features] > [also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543 > base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features > config: s390-allyesconfig (attached as .config) > compiler: s390-linux-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/0day-ci/linux/commit/572c94c40a76754d49f07e4e383097d2db132f8c > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543 > git checkout 572c94c40a76754d49f07e4e383097d2db132f8c > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > > All warnings (new ones prefixed by >>): > >>> drivers/s390/crypto/vfio_ap_ops.c:1119:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes] > 1119 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > | ^~~~~~~~~~~~~~~~~~~~~~~~ This function needs to be made static because it is no longer defined in the header file. > > vim +/vfio_ap_mdev_reset_queue +1119 drivers/s390/crypto/vfio_ap_ops.c > > 258287c994de8f Tony Krowiak 2018-09-25 1118 > ec89b55e3bce7c Pierre Morel 2019-05-21 @1119 int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > 46a7263d4746a2 Tony Krowiak 2018-09-25 1120 unsigned int retry) > 46a7263d4746a2 Tony Krowiak 2018-09-25 1121 { > 46a7263d4746a2 Tony Krowiak 2018-09-25 1122 struct ap_queue_status status; > ec89b55e3bce7c Pierre Morel 2019-05-21 1123 int retry2 = 2; > ec89b55e3bce7c Pierre Morel 2019-05-21 1124 int apqn = AP_MKQID(apid, apqi); > 46a7263d4746a2 Tony Krowiak 2018-09-25 1125 > 46a7263d4746a2 Tony Krowiak 2018-09-25 1126 do { > ec89b55e3bce7c Pierre Morel 2019-05-21 1127 status = ap_zapq(apqn); > 46a7263d4746a2 Tony Krowiak 2018-09-25 1128 switch (status.response_code) { > 46a7263d4746a2 Tony Krowiak 2018-09-25 1129 case AP_RESPONSE_NORMAL: > ec89b55e3bce7c Pierre Morel 2019-05-21 1130 while (!status.queue_empty && retry2--) { > ec89b55e3bce7c Pierre Morel 2019-05-21 1131 msleep(20); > ec89b55e3bce7c Pierre Morel 2019-05-21 1132 status = ap_tapq(apqn, NULL); > ec89b55e3bce7c Pierre Morel 2019-05-21 1133 } > 024cdcdbf3cf99 Halil Pasic 2019-09-03 1134 WARN_ON_ONCE(retry2 <= 0); > 46a7263d4746a2 Tony Krowiak 2018-09-25 1135 return 0; > 46a7263d4746a2 Tony Krowiak 2018-09-25 1136 case AP_RESPONSE_RESET_IN_PROGRESS: > 46a7263d4746a2 Tony Krowiak 2018-09-25 1137 case AP_RESPONSE_BUSY: > 46a7263d4746a2 Tony Krowiak 2018-09-25 1138 msleep(20); > 46a7263d4746a2 Tony Krowiak 2018-09-25 1139 break; > 46a7263d4746a2 Tony Krowiak 2018-09-25 1140 default: > 46a7263d4746a2 Tony Krowiak 2018-09-25 1141 /* things are really broken, give up */ > 46a7263d4746a2 Tony Krowiak 2018-09-25 1142 return -EIO; > 46a7263d4746a2 Tony Krowiak 2018-09-25 1143 } > 46a7263d4746a2 Tony Krowiak 2018-09-25 1144 } while (retry--); > 46a7263d4746a2 Tony Krowiak 2018-09-25 1145 > 46a7263d4746a2 Tony Krowiak 2018-09-25 1146 return -EBUSY; > 46a7263d4746a2 Tony Krowiak 2018-09-25 1147 } > 46a7263d4746a2 Tony Krowiak 2018-09-25 1148 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Thu, 22 Oct 2020 13:11:56 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > The queues assigned to a matrix mediated device are currently reset when: > > * The VFIO_DEVICE_RESET ioctl is invoked > * The mdev fd is closed by userspace (QEMU) > * The mdev is removed from sysfs. What about the situation when vfio_ap_mdev_group_notifier() is called to tell us that our pointer to KVM is about to become invalid? Do we need to clean up the IRQ stuff there? > > Immediately after the reset of a queue, a call is made to disable > interrupts for the queue. This is entirely unnecessary because the reset of > a queue disables interrupts, so this will be removed. Makes sense. > > Since interrupt processing may have been enabled by the guest, it may also > be necessary to clean up the resources used for interrupt processing. Part > of the cleanup operation requires a reference to KVM, so a check is also > being added to ensure the reference to KVM exists. The reason is because > the release callback - invoked when userspace closes the mdev fd - removes > the reference to KVM. When the remove callback - called when the mdev is > removed from sysfs - is subsequently invoked, there will be no reference to > KVM when the cleanup is performed. Please see below in the code. > > This patch will also do a bit of refactoring due to the fact that the > remove callback, implemented in vfio_ap_drv.c, disables the queue after > resetting it. Instead of the remove callback making a call into the > vfio_ap_ops.c to clean up the resources used for interrupt processing, > let's move the probe and remove callbacks into the vfio_ap_ops.c > file keep all code related to managing queues in a single file. > It would have been helpful to split out the refactoring as a separate patch. This way it is harder to review the code that got moved, because it is intermingled with the changes that intend to change behavior. > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/vfio_ap_drv.c | 45 +------------------ > drivers/s390/crypto/vfio_ap_ops.c | 63 +++++++++++++++++++-------- > drivers/s390/crypto/vfio_ap_private.h | 7 +-- > 3 files changed, 52 insertions(+), 63 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c > index be2520cc010b..73bd073fd5d3 100644 > --- a/drivers/s390/crypto/vfio_ap_drv.c > +++ b/drivers/s390/crypto/vfio_ap_drv.c > @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = { > > MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids); > > -/** > - * vfio_ap_queue_dev_probe: > - * > - * Allocate a vfio_ap_queue structure and associate it > - * with the device as driver_data. > - */ > -static int vfio_ap_queue_dev_probe(struct ap_device *apdev) > -{ > - struct vfio_ap_queue *q; > - > - q = kzalloc(sizeof(*q), GFP_KERNEL); > - if (!q) > - return -ENOMEM; > - dev_set_drvdata(&apdev->device, q); > - q->apqn = to_ap_queue(&apdev->device)->qid; > - q->saved_isc = VFIO_AP_ISC_INVALID; > - return 0; > -} > - > -/** > - * vfio_ap_queue_dev_remove: > - * > - * Takes the matrix lock to avoid actions on this device while removing > - * Free the associated vfio_ap_queue structure > - */ > -static void vfio_ap_queue_dev_remove(struct ap_device *apdev) > -{ > - struct vfio_ap_queue *q; > - int apid, apqi; > - > - mutex_lock(&matrix_dev->lock); > - q = dev_get_drvdata(&apdev->device); > - dev_set_drvdata(&apdev->device, NULL); > - apid = AP_QID_CARD(q->apqn); > - apqi = AP_QID_QUEUE(q->apqn); > - vfio_ap_mdev_reset_queue(apid, apqi, 1); > - vfio_ap_irq_disable(q); > - kfree(q); > - mutex_unlock(&matrix_dev->lock); > -} > - > static void vfio_ap_matrix_dev_release(struct device *dev) > { > struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev); > @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void) > return ret; > > memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv)); > - vfio_ap_drv.probe = vfio_ap_queue_dev_probe; > - vfio_ap_drv.remove = vfio_ap_queue_dev_remove; > + vfio_ap_drv.probe = vfio_ap_mdev_probe_queue; > + vfio_ap_drv.remove = vfio_ap_mdev_remove_queue; > vfio_ap_drv.ids = ap_queue_ids; > > ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME); > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..c471832f0a30 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev && > + q->matrix_mdev->kvm) Here is the check that the kvm reference exists, you mentioned in the cover letter. You make only the gisc_unregister depend on it, because that's what is going to explode. But I'm actually wondering if "KVM is gone but we still haven't cleaned up our aqic resources" is valid. I argue that it is not. The two resources we manage are the gisc registration and the pinned page. I argue that it makes on sense to keep what was the guests page pinned, if here is no guest associated (we don't have KVM). I assume the cleanup is supposed to be atomic from the perspective of other threads/contexts, so I expect the cleanup either to be fully done or not not entered the critical section. So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a bug. Isn't it? In that sense this change would only hide the actual problem. Is the scenario we are talking about something that can happen, or is this just about programming defensively? In any case, I don't think this is a good idea. We can be defensive about it, but we have to do it differently. > kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > if (q->saved_pfn && q->matrix_mdev) > vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > * Returns if ap_aqic function failed with invalid, deconfigured or > * checkstopped AP. > */ > -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) > +static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) > { > struct ap_qirq_ctrl aqic_gisa = {}; > struct ap_queue_status status; > @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > if (!q) > goto out_unlock; > > + q->matrix_mdev = matrix_mdev; What is the purpose of this? Doesn't the preceding vfio_ap_get_queue() already set q->matrix_mdev? > status = vcpu->run->s.regs.gprs[1]; > > /* If IR bit(16) is set we enable the interrupt */ > @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, > return NOTIFY_OK; > } > > -static void vfio_ap_irq_disable_apqn(int apqn) > -{ > - struct device *dev; > - struct vfio_ap_queue *q; > - > - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL, > - &apqn, match_apqn); > - if (dev) { > - q = dev_get_drvdata(dev); > - vfio_ap_irq_disable(q); > - put_device(dev); > - } > -} > - > int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > unsigned int retry) > { > @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > { > int ret; > int rc = 0; > + struct vfio_ap_queue *q; > unsigned long apid, apqi; > struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > > @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > */ > if (ret) > rc = ret; > - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); > + q = vfio_ap_get_queue(matrix_mdev, > + AP_MKQID(apid, apqi)); > + if (q) > + vfio_ap_free_aqic_resources(q); Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't think so. I mean does the current code (and vfio_ap_mdev_reset_queue() in particular guarantee that the reset is actually done when we arrive here)? BTW, I think we have a similar problem with the current code as well. Under what circumstances do we expect !q? If we don't, then we need to complain one way or another. I believe that each time we call vfio_ap_mdev_reset_queue(), we will also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic resources associated with the queue -- if any. So I would really prefer having a function that does both. > } > } > > @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void) > { > mdev_unregister_device(&matrix_dev->device); > } > + > +int vfio_ap_mdev_probe_queue(struct ap_device *apdev) > +{ > + struct vfio_ap_queue *q; > + struct ap_queue *queue; > + > + queue = to_ap_queue(&apdev->device); > + > + q = kzalloc(sizeof(*q), GFP_KERNEL); > + if (!q) > + return -ENOMEM; > + > + dev_set_drvdata(&queue->ap_dev.device, q); > + q->apqn = queue->qid; > + q->saved_isc = VFIO_AP_ISC_INVALID; > + > + return 0; > +} > + > +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) > +{ > + struct vfio_ap_queue *q; > + struct ap_queue *queue; > + int apid, apqi; > + > + queue = to_ap_queue(&apdev->device); What is the benefit of rewriting this? You introduced queue just to do queue->ap_dev to get to the apdev you have in hand in the first place. > + > + mutex_lock(&matrix_dev->lock); > + q = dev_get_drvdata(&queue->ap_dev.device); > + dev_set_drvdata(&queue->ap_dev.device, NULL); > + apid = AP_QID_CARD(q->apqn); > + apqi = AP_QID_QUEUE(q->apqn); > + vfio_ap_mdev_reset_queue(apid, apqi, 1); > + vfio_ap_free_aqic_resources(q); > + kfree(q); > + mutex_unlock(&matrix_dev->lock); > +} > diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h > index f46dde56b464..d9003de4fbad 100644 > --- a/drivers/s390/crypto/vfio_ap_private.h > +++ b/drivers/s390/crypto/vfio_ap_private.h > @@ -90,8 +90,6 @@ struct ap_matrix_mdev { > > extern int vfio_ap_mdev_register(void); > extern void vfio_ap_mdev_unregister(void); > -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > - unsigned int retry); > > struct vfio_ap_queue { > struct ap_matrix_mdev *matrix_mdev; > @@ -100,5 +98,8 @@ struct vfio_ap_queue { > #define VFIO_AP_ISC_INVALID 0xff > unsigned char saved_isc; > }; > -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q); > + > +int vfio_ap_mdev_probe_queue(struct ap_device *queue); > +void vfio_ap_mdev_remove_queue(struct ap_device *queue); > + > #endif /* _VFIO_AP_PRIVATE_H_ */
On 10/27/20 2:48 AM, Halil Pasic wrote: > On Thu, 22 Oct 2020 13:11:56 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> The queues assigned to a matrix mediated device are currently reset when: >> >> * The VFIO_DEVICE_RESET ioctl is invoked >> * The mdev fd is closed by userspace (QEMU) >> * The mdev is removed from sysfs. > What about the situation when vfio_ap_mdev_group_notifier() is called to > tell us that our pointer to KVM is about to become invalid? Do we need to > clean up the IRQ stuff there? After reading this question, I decided to do some tracing using printk's and learned that the vfio_ap_mdev_group_notifier() function does not get called when the guest is shutdown. The reason for this is because the vfio_ap_mdev_release() function, which is called before the KVM pointer is invalidated, unregisters the group notifier. I took a look at some of the other drivers that register a group notifier in the mdev_parent_ops.open callback and each unregistered the notifier in the mdev_parent_ops.release callback. So, to answer your question, there is no need to cleanup the IRQ stuff in the vfio_ap_mdev_group_notifier() function since it will not get called when the KVM pointer is invalidated. The cleanup should be done in the vfio_ap_mdev_release() function that gets called when the mdev fd is closed. > >> Immediately after the reset of a queue, a call is made to disable >> interrupts for the queue. This is entirely unnecessary because the reset of >> a queue disables interrupts, so this will be removed. > Makes sense. > >> Since interrupt processing may have been enabled by the guest, it may also >> be necessary to clean up the resources used for interrupt processing. Part >> of the cleanup operation requires a reference to KVM, so a check is also >> being added to ensure the reference to KVM exists. The reason is because >> the release callback - invoked when userspace closes the mdev fd - removes >> the reference to KVM. When the remove callback - called when the mdev is >> removed from sysfs - is subsequently invoked, there will be no reference to >> KVM when the cleanup is performed. > Please see below in the code. > >> This patch will also do a bit of refactoring due to the fact that the >> remove callback, implemented in vfio_ap_drv.c, disables the queue after >> resetting it. Instead of the remove callback making a call into the >> vfio_ap_ops.c to clean up the resources used for interrupt processing, >> let's move the probe and remove callbacks into the vfio_ap_ops.c >> file keep all code related to managing queues in a single file. >> > It would have been helpful to split out the refactoring as a separate > patch. This way it is harder to review the code that got moved, because > it is intermingled with the changes that intend to change behavior. I suppose I can do that. > >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> drivers/s390/crypto/vfio_ap_drv.c | 45 +------------------ >> drivers/s390/crypto/vfio_ap_ops.c | 63 +++++++++++++++++++-------- >> drivers/s390/crypto/vfio_ap_private.h | 7 +-- >> 3 files changed, 52 insertions(+), 63 deletions(-) >> >> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c >> index be2520cc010b..73bd073fd5d3 100644 >> --- a/drivers/s390/crypto/vfio_ap_drv.c >> +++ b/drivers/s390/crypto/vfio_ap_drv.c >> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = { >> >> MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids); >> >> -/** >> - * vfio_ap_queue_dev_probe: >> - * >> - * Allocate a vfio_ap_queue structure and associate it >> - * with the device as driver_data. >> - */ >> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev) >> -{ >> - struct vfio_ap_queue *q; >> - >> - q = kzalloc(sizeof(*q), GFP_KERNEL); >> - if (!q) >> - return -ENOMEM; >> - dev_set_drvdata(&apdev->device, q); >> - q->apqn = to_ap_queue(&apdev->device)->qid; >> - q->saved_isc = VFIO_AP_ISC_INVALID; >> - return 0; >> -} >> - >> -/** >> - * vfio_ap_queue_dev_remove: >> - * >> - * Takes the matrix lock to avoid actions on this device while removing >> - * Free the associated vfio_ap_queue structure >> - */ >> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev) >> -{ >> - struct vfio_ap_queue *q; >> - int apid, apqi; >> - >> - mutex_lock(&matrix_dev->lock); >> - q = dev_get_drvdata(&apdev->device); >> - dev_set_drvdata(&apdev->device, NULL); >> - apid = AP_QID_CARD(q->apqn); >> - apqi = AP_QID_QUEUE(q->apqn); >> - vfio_ap_mdev_reset_queue(apid, apqi, 1); >> - vfio_ap_irq_disable(q); >> - kfree(q); >> - mutex_unlock(&matrix_dev->lock); >> -} >> - >> static void vfio_ap_matrix_dev_release(struct device *dev) >> { >> struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev); >> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void) >> return ret; >> >> memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv)); >> - vfio_ap_drv.probe = vfio_ap_queue_dev_probe; >> - vfio_ap_drv.remove = vfio_ap_queue_dev_remove; >> + vfio_ap_drv.probe = vfio_ap_mdev_probe_queue; >> + vfio_ap_drv.remove = vfio_ap_mdev_remove_queue; >> vfio_ap_drv.ids = ap_queue_ids; >> >> ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME); >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c >> index e0bde8518745..c471832f0a30 100644 >> --- a/drivers/s390/crypto/vfio_ap_ops.c >> +++ b/drivers/s390/crypto/vfio_ap_ops.c >> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn) >> */ >> static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) >> { >> - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) >> + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev && >> + q->matrix_mdev->kvm) > Here is the check that the kvm reference exists, you mentioned in the > cover letter. You make only the gisc_unregister depend on it, because > that's what is going to explode. > > But I'm actually wondering if "KVM is gone but we still haven't cleaned > up our aqic resources" is valid. I argue that it is not. The two > resources we manage are the gisc registration and the pinned page. I > argue that it makes on sense to keep what was the guests page pinned, > if here is no guest associated (we don't have KVM). > > I assume the cleanup is supposed to be atomic from the perspective of > other threads/contexts, so I expect the cleanup either to be fully done > or not not entered the critical section. > > So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a > bug. Isn't it? > > In that sense this change would only hide the actual problem. > > Is the scenario we are talking about something that can happen, or is > this just about programming defensively? > > In any case, I don't think this is a good idea. We can be defensive > about it, but we have to do it differently. > > >> kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); >> if (q->saved_pfn && q->matrix_mdev) >> vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), >> @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) >> * Returns if ap_aqic function failed with invalid, deconfigured or >> * checkstopped AP. >> */ >> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) >> +static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) >> { >> struct ap_qirq_ctrl aqic_gisa = {}; >> struct ap_queue_status status; >> @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> if (!q) >> goto out_unlock; >> >> + q->matrix_mdev = matrix_mdev; > What is the purpose of this? Doesn't the preceding vfio_ap_get_queue() > already set q->matrix_mdev? You are correct, it shall be removed. > >> status = vcpu->run->s.regs.gprs[1]; >> >> /* If IR bit(16) is set we enable the interrupt */ >> @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, >> return NOTIFY_OK; >> } >> >> -static void vfio_ap_irq_disable_apqn(int apqn) >> -{ >> - struct device *dev; >> - struct vfio_ap_queue *q; >> - >> - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL, >> - &apqn, match_apqn); >> - if (dev) { >> - q = dev_get_drvdata(dev); >> - vfio_ap_irq_disable(q); >> - put_device(dev); >> - } >> -} >> - >> int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >> unsigned int retry) >> { >> @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >> { >> int ret; >> int rc = 0; >> + struct vfio_ap_queue *q; >> unsigned long apid, apqi; >> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >> >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >> */ >> if (ret) >> rc = ret; >> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >> + q = vfio_ap_get_queue(matrix_mdev, >> + AP_MKQID(apid, apqi)); >> + if (q) >> + vfio_ap_free_aqic_resources(q); > Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't > think so. I mean does the current code (and vfio_ap_mdev_reset_queue() > in particular guarantee that the reset is actually done when we arrive > here)? BTW, I think we have a similar problem with the current code as > well. If the return code from the vfio_ap_mdev_reset_queue() function is zero, then yes, we are guaranteed the reset was done and the queue is empty. The function returns a non-zero return code if the reset fails or the queue the reset did not complete within a given amount of time, so maybe we shouldn't free AQIC resources when we get a non-zero return code from the reset function? There are three occasions when the vfio_ap_mdev_reset_queues() is called: 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace (i.e., when the guest is started) 2. When the mdev fd is closed (vfio_ap_mdev_release()) 3. When the mdev is removed (vfio_ap_mdev_remove()) The IRQ resources are initialized when the PQAP(AQIC) is intercepted to enable interrupts. This would occur after the guest boots and the AP bus initializes. So, 1 would presumably occur before that happens. I couldn't find anywhere in the AP bus or zcrypt code where a PQAP(AQIC) is executed to disable interrupts, so my assumption is that IRQ disablement is accomplished by a reset on the guest. I'll have to ask Harald about that. So, 2 would occur when the guest is about to terminate and 3 would occur only after the guest is terminated. In any case, it seems that IRQ resources should be cleaned up. Maybe it would be more appropriate to do that in the vfio_ap_mdev_release() and vfio_ap_mdev_remove() functions themselves? > > Under what circumstances do we expect !q? If we don't, then we need to > complain one way or another. In the current code (i.e., prior to introducing the subsequent hot plug patches), an APQN can not be assigned to an mdev unless it references a queue device bound to the vfio_ap device driver; however, there is nothing preventing a queue device from getting unbound while the guest is running (one of the problems mostly resolved by this series). In that case, q would be NULL. > > I believe that each time we call vfio_ap_mdev_reset_queue(), we will > also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic > resources associated with the queue -- if any. So I would really prefer > having a function that does both. As stated above, I don't believe PQAP(AQIC) is ever called by the AP bus or zcrypt to disable IRQs, but I could be wrong about that so I'll verify with Harald. If that is the case, then it would make sense to free IRQ resources when a queue completes. I can either add a function that does both and call it instead of vfio_ap_mdev_reset_queue(). What say you? > >> } >> } >> >> @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void) >> { >> mdev_unregister_device(&matrix_dev->device); >> } >> + >> +int vfio_ap_mdev_probe_queue(struct ap_device *apdev) >> +{ >> + struct vfio_ap_queue *q; >> + struct ap_queue *queue; >> + >> + queue = to_ap_queue(&apdev->device); >> + >> + q = kzalloc(sizeof(*q), GFP_KERNEL); >> + if (!q) >> + return -ENOMEM; >> + >> + dev_set_drvdata(&queue->ap_dev.device, q); >> + q->apqn = queue->qid; >> + q->saved_isc = VFIO_AP_ISC_INVALID; >> + >> + return 0; >> +} >> + >> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) >> +{ >> + struct vfio_ap_queue *q; >> + struct ap_queue *queue; >> + int apid, apqi; >> + >> + queue = to_ap_queue(&apdev->device); > What is the benefit of rewriting this? You introduced > queue just to do queue->ap_dev to get to the apdev you > have in hand in the first place. I'm not quite sure what you're asking. This function is the callback function specified via the function pointer specified via the remove field of the struct ap_driver when the vfio_ap device driver is registered with the AP bus. That callback function takes a struct ap_device as a parameter. What am I missing here? > >> + >> + mutex_lock(&matrix_dev->lock); >> + q = dev_get_drvdata(&queue->ap_dev.device); >> + dev_set_drvdata(&queue->ap_dev.device, NULL); >> + apid = AP_QID_CARD(q->apqn); >> + apqi = AP_QID_QUEUE(q->apqn); >> + vfio_ap_mdev_reset_queue(apid, apqi, 1); >> + vfio_ap_free_aqic_resources(q); >> + kfree(q); >> + mutex_unlock(&matrix_dev->lock); >> +} >> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h >> index f46dde56b464..d9003de4fbad 100644 >> --- a/drivers/s390/crypto/vfio_ap_private.h >> +++ b/drivers/s390/crypto/vfio_ap_private.h >> @@ -90,8 +90,6 @@ struct ap_matrix_mdev { >> >> extern int vfio_ap_mdev_register(void); >> extern void vfio_ap_mdev_unregister(void); >> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >> - unsigned int retry); >> >> struct vfio_ap_queue { >> struct ap_matrix_mdev *matrix_mdev; >> @@ -100,5 +98,8 @@ struct vfio_ap_queue { >> #define VFIO_AP_ISC_INVALID 0xff >> unsigned char saved_isc; >> }; >> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q); >> + >> +int vfio_ap_mdev_probe_queue(struct ap_device *queue); >> +void vfio_ap_mdev_remove_queue(struct ap_device *queue); >> + >> #endif /* _VFIO_AP_PRIVATE_H_ */
On 10/29/20 7:29 PM, Tony Krowiak wrote: > > > On 10/27/20 2:48 AM, Halil Pasic wrote: >> On Thu, 22 Oct 2020 13:11:56 -0400 >> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >> >>> The queues assigned to a matrix mediated device are currently reset >>> when: >>> >>> * The VFIO_DEVICE_RESET ioctl is invoked >>> * The mdev fd is closed by userspace (QEMU) >>> * The mdev is removed from sysfs. >> What about the situation when vfio_ap_mdev_group_notifier() is called to >> tell us that our pointer to KVM is about to become invalid? Do we >> need to >> clean up the IRQ stuff there? > > After reading this question, I decided to do some tracing using > printk's and learned that the vfio_ap_mdev_group_notifier() > function does not get called when the guest is shutdown. The reason > for this is because the vfio_ap_mdev_release() function, which is called > before the KVM pointer is invalidated, unregisters the group notifier. > > I took a look at some of the other drivers that register a group > notifier in the mdev_parent_ops.open callback and each unregistered > the notifier in the mdev_parent_ops.release callback. > > So, to answer your question, there is no need to cleanup the IRQ > stuff in the vfio_ap_mdev_group_notifier() function since it will > not get called when the KVM pointer is invalidated. The cleanup > should be done in the vfio_ap_mdev_release() function that gets > called when the mdev fd is closed. > >> >>> Immediately after the reset of a queue, a call is made to disable >>> interrupts for the queue. This is entirely unnecessary because the >>> reset of >>> a queue disables interrupts, so this will be removed. >> Makes sense. >> >>> Since interrupt processing may have been enabled by the guest, it >>> may also >>> be necessary to clean up the resources used for interrupt >>> processing. Part >>> of the cleanup operation requires a reference to KVM, so a check is >>> also >>> being added to ensure the reference to KVM exists. The reason is >>> because >>> the release callback - invoked when userspace closes the mdev fd - >>> removes >>> the reference to KVM. When the remove callback - called when the >>> mdev is >>> removed from sysfs - is subsequently invoked, there will be no >>> reference to >>> KVM when the cleanup is performed. >> Please see below in the code. >> >>> This patch will also do a bit of refactoring due to the fact that the >>> remove callback, implemented in vfio_ap_drv.c, disables the queue after >>> resetting it. Instead of the remove callback making a call into the >>> vfio_ap_ops.c to clean up the resources used for interrupt processing, >>> let's move the probe and remove callbacks into the vfio_ap_ops.c >>> file keep all code related to managing queues in a single file. >>> >> It would have been helpful to split out the refactoring as a separate >> patch. This way it is harder to review the code that got moved, because >> it is intermingled with the changes that intend to change behavior. > > I suppose I can do that. > >>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >>> --- >>> drivers/s390/crypto/vfio_ap_drv.c | 45 +------------------ >>> drivers/s390/crypto/vfio_ap_ops.c | 63 >>> +++++++++++++++++++-------- >>> drivers/s390/crypto/vfio_ap_private.h | 7 +-- >>> 3 files changed, 52 insertions(+), 63 deletions(-) >>> >>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c >>> b/drivers/s390/crypto/vfio_ap_drv.c >>> index be2520cc010b..73bd073fd5d3 100644 >>> --- a/drivers/s390/crypto/vfio_ap_drv.c >>> +++ b/drivers/s390/crypto/vfio_ap_drv.c >>> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = { >>> MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids); >>> -/** >>> - * vfio_ap_queue_dev_probe: >>> - * >>> - * Allocate a vfio_ap_queue structure and associate it >>> - * with the device as driver_data. >>> - */ >>> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev) >>> -{ >>> - struct vfio_ap_queue *q; >>> - >>> - q = kzalloc(sizeof(*q), GFP_KERNEL); >>> - if (!q) >>> - return -ENOMEM; >>> - dev_set_drvdata(&apdev->device, q); >>> - q->apqn = to_ap_queue(&apdev->device)->qid; >>> - q->saved_isc = VFIO_AP_ISC_INVALID; >>> - return 0; >>> -} >>> - >>> -/** >>> - * vfio_ap_queue_dev_remove: >>> - * >>> - * Takes the matrix lock to avoid actions on this device while >>> removing >>> - * Free the associated vfio_ap_queue structure >>> - */ >>> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev) >>> -{ >>> - struct vfio_ap_queue *q; >>> - int apid, apqi; >>> - >>> - mutex_lock(&matrix_dev->lock); >>> - q = dev_get_drvdata(&apdev->device); >>> - dev_set_drvdata(&apdev->device, NULL); >>> - apid = AP_QID_CARD(q->apqn); >>> - apqi = AP_QID_QUEUE(q->apqn); >>> - vfio_ap_mdev_reset_queue(apid, apqi, 1); >>> - vfio_ap_irq_disable(q); >>> - kfree(q); >>> - mutex_unlock(&matrix_dev->lock); >>> -} >>> - >>> static void vfio_ap_matrix_dev_release(struct device *dev) >>> { >>> struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev); >>> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void) >>> return ret; >>> memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv)); >>> - vfio_ap_drv.probe = vfio_ap_queue_dev_probe; >>> - vfio_ap_drv.remove = vfio_ap_queue_dev_remove; >>> + vfio_ap_drv.probe = vfio_ap_mdev_probe_queue; >>> + vfio_ap_drv.remove = vfio_ap_mdev_remove_queue; >>> vfio_ap_drv.ids = ap_queue_ids; >>> ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, >>> VFIO_AP_DRV_NAME); >>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c >>> b/drivers/s390/crypto/vfio_ap_ops.c >>> index e0bde8518745..c471832f0a30 100644 >>> --- a/drivers/s390/crypto/vfio_ap_ops.c >>> +++ b/drivers/s390/crypto/vfio_ap_ops.c >>> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn) >>> */ >>> static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) >>> { >>> - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) >>> + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev && >>> + q->matrix_mdev->kvm) >> Here is the check that the kvm reference exists, you mentioned in the >> cover letter. You make only the gisc_unregister depend on it, because >> that's what is going to explode. >> >> But I'm actually wondering if "KVM is gone but we still haven't cleaned >> up our aqic resources" is valid. I argue that it is not. The two >> resources we manage are the gisc registration and the pinned page. I >> argue that it makes on sense to keep what was the guests page pinned, >> if here is no guest associated (we don't have KVM). >> >> I assume the cleanup is supposed to be atomic from the perspective of >> other threads/contexts, so I expect the cleanup either to be fully done >> or not not entered the critical section. >> >> So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a >> bug. Isn't it? >> >> In that sense this change would only hide the actual problem. >> >> Is the scenario we are talking about something that can happen, or is >> this just about programming defensively? >> >> In any case, I don't think this is a good idea. We can be defensive >> about it, but we have to do it differently. >> >> >>> kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); >>> if (q->saved_pfn && q->matrix_mdev) >>> vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), >>> @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct >>> vfio_ap_queue *q) >>> * Returns if ap_aqic function failed with invalid, deconfigured or >>> * checkstopped AP. >>> */ >>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) >>> +static struct ap_queue_status vfio_ap_irq_disable(struct >>> vfio_ap_queue *q) >>> { >>> struct ap_qirq_ctrl aqic_gisa = {}; >>> struct ap_queue_status status; >>> @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >>> if (!q) >>> goto out_unlock; >>> + q->matrix_mdev = matrix_mdev; >> What is the purpose of this? Doesn't the preceding vfio_ap_get_queue() >> already set q->matrix_mdev? > > You are correct, it shall be removed. > >> >>> status = vcpu->run->s.regs.gprs[1]; >>> /* If IR bit(16) is set we enable the interrupt */ >>> @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct >>> notifier_block *nb, >>> return NOTIFY_OK; >>> } >>> -static void vfio_ap_irq_disable_apqn(int apqn) >>> -{ >>> - struct device *dev; >>> - struct vfio_ap_queue *q; >>> - >>> - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL, >>> - &apqn, match_apqn); >>> - if (dev) { >>> - q = dev_get_drvdata(dev); >>> - vfio_ap_irq_disable(q); >>> - put_device(dev); >>> - } >>> -} >>> - >>> int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >>> unsigned int retry) >>> { >>> @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct >>> mdev_device *mdev) >>> { >>> int ret; >>> int rc = 0; >>> + struct vfio_ap_queue *q; >>> unsigned long apid, apqi; >>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct >>> mdev_device *mdev) >>> */ >>> if (ret) >>> rc = ret; >>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >>> + q = vfio_ap_get_queue(matrix_mdev, >>> + AP_MKQID(apid, apqi)); >>> + if (q) >>> + vfio_ap_free_aqic_resources(q); >> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't >> think so. I mean does the current code (and vfio_ap_mdev_reset_queue() >> in particular guarantee that the reset is actually done when we arrive >> here)? BTW, I think we have a similar problem with the current code as >> well. > > If the return code from the vfio_ap_mdev_reset_queue() function > is zero, then yes, we are guaranteed the reset was done and the > queue is empty. The function returns a non-zero return code if > the reset fails or the queue the reset did not complete within a given > amount of time, so maybe we shouldn't free AQIC resources when > we get a non-zero return code from the reset function? > > There are three occasions when the vfio_ap_mdev_reset_queues() > is called: > 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace > (i.e., when the guest is started) > 2. When the mdev fd is closed (vfio_ap_mdev_release()) > 3. When the mdev is removed (vfio_ap_mdev_remove()) > > The IRQ resources are initialized when the PQAP(AQIC) > is intercepted to enable interrupts. This would occur after > the guest boots and the AP bus initializes. So, 1 would > presumably occur before that happens. I couldn't find > anywhere in the AP bus or zcrypt code where a PQAP(AQIC) > is executed to disable interrupts, so my assumption is > that IRQ disablement is accomplished by a reset on > the guest. I'll have to ask Harald about that. So, 2 would > occur when the guest is about to terminate and 3 > would occur only after the guest is terminated. In any > case, it seems that IRQ resources should be cleaned up. > Maybe it would be more appropriate to do that in the > vfio_ap_mdev_release() and vfio_ap_mdev_remove() > functions themselves? After further review, I've come to the conclusion it makes sense to cleanup the IRQ resources only in the vfio_ap_mdev_release() function for the following reasons: 1. The KVM pointer should still be available because it is not invalidated until after the release callback is invoked. 2. The release callback is part of normal guest termination, so interrupt processing will presumably no longer be necessary for the guest. 3. The zcrypt drivers on the guest do not disable interrupt processing via the PQAP(AQIC) instruction (I verified this with Harald), so there is no opportunity to clean up IRQ on interception of IRQ disable. 4. It makes no sense to clean up IRQ resources in the vfio_ap_mdev_remove() function because the function disallows removal of the mdev when the KVM pointer is still valid because it is assumed the guest is still running at that point > >> >> Under what circumstances do we expect !q? If we don't, then we need to >> complain one way or another. > > In the current code (i.e., prior to introducing the subsequent hot > plug patches), an APQN can not be assigned to an mdev unless it > references a queue device bound to the vfio_ap device driver; however, > there is nothing preventing a queue device from getting unbound > while the guest is running (one of the problems mostly resolved by this > series). In that case, q would be NULL. > >> >> I believe that each time we call vfio_ap_mdev_reset_queue(), we will >> also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic >> resources associated with the queue -- if any. So I would really prefer >> having a function that does both. > > As stated above, I don't believe PQAP(AQIC) is ever called by > the AP bus or zcrypt to disable IRQs, but I could be wrong about > that so I'll verify with Harald. If that is the case, then it would > make sense to free IRQ resources when a queue completes. > I can either add a function that does both and call it instead of > vfio_ap_mdev_reset_queue(). What say you? > >> >>> } >>> } >>> @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void) >>> { >>> mdev_unregister_device(&matrix_dev->device); >>> } >>> + >>> +int vfio_ap_mdev_probe_queue(struct ap_device *apdev) >>> +{ >>> + struct vfio_ap_queue *q; >>> + struct ap_queue *queue; >>> + >>> + queue = to_ap_queue(&apdev->device); >>> + >>> + q = kzalloc(sizeof(*q), GFP_KERNEL); >>> + if (!q) >>> + return -ENOMEM; >>> + >>> + dev_set_drvdata(&queue->ap_dev.device, q); >>> + q->apqn = queue->qid; >>> + q->saved_isc = VFIO_AP_ISC_INVALID; >>> + >>> + return 0; >>> +} >>> + >>> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) >>> +{ >>> + struct vfio_ap_queue *q; >>> + struct ap_queue *queue; >>> + int apid, apqi; >>> + >>> + queue = to_ap_queue(&apdev->device); >> What is the benefit of rewriting this? You introduced >> queue just to do queue->ap_dev to get to the apdev you >> have in hand in the first place. > > I'm not quite sure what you're asking. This function is > the callback function specified via the function pointer > specified via the remove field of the struct ap_driver > when the vfio_ap device driver is registered with the > AP bus. That callback function takes a struct ap_device > as a parameter. What am I missing here? > >> >>> + >>> + mutex_lock(&matrix_dev->lock); >>> + q = dev_get_drvdata(&queue->ap_dev.device); >>> + dev_set_drvdata(&queue->ap_dev.device, NULL); >>> + apid = AP_QID_CARD(q->apqn); >>> + apqi = AP_QID_QUEUE(q->apqn); >>> + vfio_ap_mdev_reset_queue(apid, apqi, 1); >>> + vfio_ap_free_aqic_resources(q); >>> + kfree(q); >>> + mutex_unlock(&matrix_dev->lock); >>> +} >>> diff --git a/drivers/s390/crypto/vfio_ap_private.h >>> b/drivers/s390/crypto/vfio_ap_private.h >>> index f46dde56b464..d9003de4fbad 100644 >>> --- a/drivers/s390/crypto/vfio_ap_private.h >>> +++ b/drivers/s390/crypto/vfio_ap_private.h >>> @@ -90,8 +90,6 @@ struct ap_matrix_mdev { >>> extern int vfio_ap_mdev_register(void); >>> extern void vfio_ap_mdev_unregister(void); >>> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >>> - unsigned int retry); >>> struct vfio_ap_queue { >>> struct ap_matrix_mdev *matrix_mdev; >>> @@ -100,5 +98,8 @@ struct vfio_ap_queue { >>> #define VFIO_AP_ISC_INVALID 0xff >>> unsigned char saved_isc; >>> }; >>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q); >>> + >>> +int vfio_ap_mdev_probe_queue(struct ap_device *queue); >>> +void vfio_ap_mdev_remove_queue(struct ap_device *queue); >>> + >>> #endif /* _VFIO_AP_PRIVATE_H_ */ >
On Thu, 29 Oct 2020 19:29:35 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > On 10/27/20 2:48 AM, Halil Pasic wrote: > > On Thu, 22 Oct 2020 13:11:56 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > >> The queues assigned to a matrix mediated device are currently reset when: > >> > >> * The VFIO_DEVICE_RESET ioctl is invoked > >> * The mdev fd is closed by userspace (QEMU) > >> * The mdev is removed from sysfs. > > What about the situation when vfio_ap_mdev_group_notifier() is called to > > tell us that our pointer to KVM is about to become invalid? Do we need to > > clean up the IRQ stuff there? > > After reading this question, I decided to do some tracing using > printk's and learned that the vfio_ap_mdev_group_notifier() > function does not get called when the guest is shutdown. The reason > for this is because the vfio_ap_mdev_release() function, which is called > before the KVM pointer is invalidated, unregisters the group notifier. > > I took a look at some of the other drivers that register a group > notifier in the mdev_parent_ops.open callback and each unregistered > the notifier in the mdev_parent_ops.release callback. > > So, to answer your question, there is no need to cleanup the IRQ > stuff in the vfio_ap_mdev_group_notifier() function since it will > not get called when the KVM pointer is invalidated. The cleanup > should be done in the vfio_ap_mdev_release() function that gets > called when the mdev fd is closed. You say if vfio_ap_mdev_group_notifier() is called to tell us that KVM going away, then it is a bug? If that is the case, I would like that reflected in the code! By that I mean at logging an error at least (if not BUG_ON). Regards, Halil
On Thu, 29 Oct 2020 19:29:35 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > >> */ > >> if (ret) > >> rc = ret; > >> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); > >> + q = vfio_ap_get_queue(matrix_mdev, > >> + AP_MKQID(apid, apqi)); > >> + if (q) > >> + vfio_ap_free_aqic_resources(q); > > Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't > > think so. I mean does the current code (and vfio_ap_mdev_reset_queue() > > in particular guarantee that the reset is actually done when we arrive > > here)? BTW, I think we have a similar problem with the current code as > > well. > > If the return code from the vfio_ap_mdev_reset_queue() function > is zero, then yes, we are guaranteed the reset was done and the > queue is empty. I've read up on this and I disagree. We should discuss this offline. > The function returns a non-zero return code if > the reset fails or the queue the reset did not complete within a given > amount of time, so maybe we shouldn't free AQIC resources when > we get a non-zero return code from the reset function? > If the queue is gone, or broken, it won't produce interrupts or poke the notifier bit, and we should clean up the AQIC resources. > There are three occasions when the vfio_ap_mdev_reset_queues() > is called: > 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace > (i.e., when the guest is started) > 2. When the mdev fd is closed (vfio_ap_mdev_release()) > 3. When the mdev is removed (vfio_ap_mdev_remove()) > > The IRQ resources are initialized when the PQAP(AQIC) > is intercepted to enable interrupts. This would occur after > the guest boots and the AP bus initializes. So, 1 would > presumably occur before that happens. I couldn't find > anywhere in the AP bus or zcrypt code where a PQAP(AQIC) > is executed to disable interrupts, so my assumption is > that IRQ disablement is accomplished by a reset on > the guest. I'll have to ask Harald about that. So, 2 would > occur when the guest is about to terminate and 3 > would occur only after the guest is terminated. In any > case, it seems that IRQ resources should be cleaned up. > Maybe it would be more appropriate to do that in the > vfio_ap_mdev_release() and vfio_ap_mdev_remove() > functions themselves? I'm a bit confused. But I think you are wrong. What happens when the guest reIPLs? I guess the subsystem reset should also do the VFIO_DEVICE_RESET ioctl, and that has to reset the queues and disable the interrupts. Or? Regards, Halil
On Thu, 29 Oct 2020 19:29:35 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > >> */ > >> if (ret) > >> rc = ret; > >> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); > >> + q = vfio_ap_get_queue(matrix_mdev, > >> + AP_MKQID(apid, apqi)); > >> + if (q) > >> + vfio_ap_free_aqic_resources(q); [..] > > > > Under what circumstances do we expect !q? If we don't, then we need to > > complain one way or another. > > In the current code (i.e., prior to introducing the subsequent hot > plug patches), an APQN can not be assigned to an mdev unless it > references a queue device bound to the vfio_ap device driver; however, > there is nothing preventing a queue device from getting unbound > while the guest is running (one of the problems mostly resolved by this > series). In that case, q would be NULL. But if the queue does not belong to us any more it does not make sense call vfio_ap_mdev_reset_queue() on it's APQN, or? I think we should have if(!q) continue; at the very beginning of the loop body, or we want to be sure that q is not null.
On Thu, 29 Oct 2020 19:29:35 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) > >> +{ > >> + struct vfio_ap_queue *q; > >> + struct ap_queue *queue; > >> + int apid, apqi; > >> + > >> + queue = to_ap_queue(&apdev->device); > > What is the benefit of rewriting this? You introduced > > queue just to do queue->ap_dev to get to the apdev you > > have in hand in the first place. > > I'm not quite sure what you're asking. This function is > the callback function specified via the function pointer > specified via the remove field of the struct ap_driver > when the vfio_ap device driver is registered with the > AP bus. That callback function takes a struct ap_device > as a parameter. What am I missing here? Please compare the removed function vfio_ap_queue_dev_remove() with the added function vfio_ap_mdev_remove_queue() line by line. It should become clear. Regards, Halil
On 10/30/20 1:42 PM, Halil Pasic wrote: > On Thu, 29 Oct 2020 19:29:35 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >>>> */ >>>> if (ret) >>>> rc = ret; >>>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >>>> + q = vfio_ap_get_queue(matrix_mdev, >>>> + AP_MKQID(apid, apqi)); >>>> + if (q) >>>> + vfio_ap_free_aqic_resources(q); >>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't >>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue() >>> in particular guarantee that the reset is actually done when we arrive >>> here)? BTW, I think we have a similar problem with the current code as >>> well. >> If the return code from the vfio_ap_mdev_reset_queue() function >> is zero, then yes, we are guaranteed the reset was done and the >> queue is empty. > I've read up on this and I disagree. We should discuss this offline. Maybe you are confusing things here; my statement is specific to the return code from the vfio_ap_mdev_reset_queue() function, not the response code from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue() function issues the PQAP(ZAPQ) instruction and if the status response code is 0 indicating the reset was successfully initiated, it waits for the queue to empty. When the queue is empty, it returns 0 to indicate the queue is reset. If the queue does not become empty after a period of time, it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose there is no guarantee the reset was done, so maybe a change needs to be made there such as a non-zero return code. > >> The function returns a non-zero return code if >> the reset fails or the queue the reset did not complete within a given >> amount of time, so maybe we shouldn't free AQIC resources when >> we get a non-zero return code from the reset function? >> > If the queue is gone, or broken, it won't produce interrupts or poke the > notifier bit, and we should clean up the AQIC resources. True, which is what the code provided by this patch does; however, the AQIC resources should be cleaned up only if the KVM pointer is not NULL for reasons discussed elsewhere. > > >> There are three occasions when the vfio_ap_mdev_reset_queues() >> is called: >> 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace >> (i.e., when the guest is started) >> 2. When the mdev fd is closed (vfio_ap_mdev_release()) >> 3. When the mdev is removed (vfio_ap_mdev_remove()) >> >> The IRQ resources are initialized when the PQAP(AQIC) >> is intercepted to enable interrupts. This would occur after >> the guest boots and the AP bus initializes. So, 1 would >> presumably occur before that happens. I couldn't find >> anywhere in the AP bus or zcrypt code where a PQAP(AQIC) >> is executed to disable interrupts, so my assumption is >> that IRQ disablement is accomplished by a reset on >> the guest. I'll have to ask Harald about that. So, 2 would >> occur when the guest is about to terminate and 3 >> would occur only after the guest is terminated. In any >> case, it seems that IRQ resources should be cleaned up. >> Maybe it would be more appropriate to do that in the >> vfio_ap_mdev_release() and vfio_ap_mdev_remove() >> functions themselves? > I'm a bit confused. But I think you are wrong. What happens when the > guest reIPLs? I guess the subsystem reset should also do the > VFIO_DEVICE_RESET ioctl, and that has to reset the queues and disable > the interrupts. Or? What did I say that is wrong? I think you are referring to my statement about the VFIO_DEVICE_RESET ioctl. I am not knowledgeable about all of the circumstances under which the VFIO_DEVICE_RESET ioctl is invoked, but I know for a fact that it is invoked when the guest is started as I've verified that via tracing. On the other hand, I suspect you are correct in assuming it is also invoked on a subsystem reset from the guest, so that also argues for cleaning up the IRQ resources after a reset as long as the KVM pointer is valid. > > Regards, > Halil >
On 10/30/20 1:27 PM, Halil Pasic wrote: > On Thu, 29 Oct 2020 19:29:35 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> On 10/27/20 2:48 AM, Halil Pasic wrote: >>> On Thu, 22 Oct 2020 13:11:56 -0400 >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>> >>>> The queues assigned to a matrix mediated device are currently reset when: >>>> >>>> * The VFIO_DEVICE_RESET ioctl is invoked >>>> * The mdev fd is closed by userspace (QEMU) >>>> * The mdev is removed from sysfs. >>> What about the situation when vfio_ap_mdev_group_notifier() is called to >>> tell us that our pointer to KVM is about to become invalid? Do we need to >>> clean up the IRQ stuff there? >> After reading this question, I decided to do some tracing using >> printk's and learned that the vfio_ap_mdev_group_notifier() >> function does not get called when the guest is shutdown. The reason >> for this is because the vfio_ap_mdev_release() function, which is called >> before the KVM pointer is invalidated, unregisters the group notifier. >> >> I took a look at some of the other drivers that register a group >> notifier in the mdev_parent_ops.open callback and each unregistered >> the notifier in the mdev_parent_ops.release callback. >> >> So, to answer your question, there is no need to cleanup the IRQ >> stuff in the vfio_ap_mdev_group_notifier() function since it will >> not get called when the KVM pointer is invalidated. The cleanup >> should be done in the vfio_ap_mdev_release() function that gets >> called when the mdev fd is closed. > You say if vfio_ap_mdev_group_notifier() is called to tell us > that KVM going away, then it is a bug? If the notifier gets called after the notifier is unregistered then yes, I would say that is a bug; however, my tracing showed that the notifier does not get called precisely because it is unregistered in the release callback. > > If that is the case, I would like that reflected in the code! By that I > mean at logging an error at least (if not BUG_ON). I do not know whether or not there are other circumstances under which the notifier can get invoked before the release callback to make notification that the KVM pointer has been invalidated, so I don't think this would be appropriate. I think we should just process the call by setting the matrix_mdev->kvm pointer to NULL and decrement the reference count to kvm. Maybe someone from the VFIO team can provide some better insight. > > Regards, > Halil
On 10/30/20 1:54 PM, Halil Pasic wrote: > On Thu, 29 Oct 2020 19:29:35 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >>>> */ >>>> if (ret) >>>> rc = ret; >>>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >>>> + q = vfio_ap_get_queue(matrix_mdev, >>>> + AP_MKQID(apid, apqi)); >>>> + if (q) >>>> + vfio_ap_free_aqic_resources(q); > [..] > >>> Under what circumstances do we expect !q? If we don't, then we need to >>> complain one way or another. >> In the current code (i.e., prior to introducing the subsequent hot >> plug patches), an APQN can not be assigned to an mdev unless it >> references a queue device bound to the vfio_ap device driver; however, >> there is nothing preventing a queue device from getting unbound >> while the guest is running (one of the problems mostly resolved by this >> series). In that case, q would be NULL. > But if the queue does not belong to us any more it does not make sense > call vfio_ap_mdev_reset_queue() on it's APQN, or? This is precisely why we prevent a queue from being taken away from vfio_ap (the in-use callback) when its APQN is assigned to an mdev in this patch series. On the other hand, this is a very good point. > > I think we should have > > if(!q) > continue; > at the very beginning of the loop body, or we want to be sure that q is > not null. I agree, I'll go ahead and make this change. > >
On 10/30/20 4:53 PM, Tony Krowiak wrote: > > > On 10/30/20 1:54 PM, Halil Pasic wrote: >> On Thu, 29 Oct 2020 19:29:35 -0400 >> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >> >>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct >>>>> mdev_device *mdev) >>>>> */ >>>>> if (ret) >>>>> rc = ret; >>>>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >>>>> + q = vfio_ap_get_queue(matrix_mdev, >>>>> + AP_MKQID(apid, apqi)); >>>>> + if (q) >>>>> + vfio_ap_free_aqic_resources(q); >> [..] >> >>>> Under what circumstances do we expect !q? If we don't, then we need to >>>> complain one way or another. >>> In the current code (i.e., prior to introducing the subsequent hot >>> plug patches), an APQN can not be assigned to an mdev unless it >>> references a queue device bound to the vfio_ap device driver; however, >>> there is nothing preventing a queue device from getting unbound >>> while the guest is running (one of the problems mostly resolved by this >>> series). In that case, q would be NULL. >> But if the queue does not belong to us any more it does not make sense >> call vfio_ap_mdev_reset_queue() on it's APQN, or? > > This is precisely why we prevent a queue from being taken away > from vfio_ap (the in-use callback) when its APQN is assigned to an > mdev in this patch series. On the other hand, this is a very good > point. > >> >> I think we should have >> >> if(!q) >> continue; >> at the very beginning of the loop body, or we want to be sure that q is >> not null. > > I agree, I'll go ahead and make this change. After thinking about this a bit more, I don't think it makes sense to make this change in this patch. For the current implementation, it is incumbent upon the system administrator to ensure that a queue device is not unbound from the vfio_ap device driver if its APQN is assigned to an mdev, so the assumption here is that any APQN assigned to the mdev is (or was) bound to the vfio_ap driver. If it was erroneously unbound while in use by a guest, then both the guest and possibly the zcrypt driver will have simultaneous access (one of the things fixed by this patch series). In that case, I think it ought to be reset regardless of whether it is bound to vfio_ap or not. Having said that, I think it makes sense to make the change you recommend in patch 03/14. In that patch, the vfio_ap_queue object is retrieved from the matrix_mdev. Since these queue objects are linked only when the queue device is probed and unlinked when the the queue device is removed and a queue device can not get bound to another driver while its APQN is assigned to an mdev, it would make perfect sense to forego reset of a queue when its APQN is assigned to an mdev. > > > > >> >
On 10/30/20 1:56 PM, Halil Pasic wrote: > On Thu, 29 Oct 2020 19:29:35 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >>>> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) >>>> +{ >>>> + struct vfio_ap_queue *q; >>>> + struct ap_queue *queue; >>>> + int apid, apqi; >>>> + >>>> + queue = to_ap_queue(&apdev->device); >>> What is the benefit of rewriting this? You introduced >>> queue just to do queue->ap_dev to get to the apdev you >>> have in hand in the first place. >> I'm not quite sure what you're asking. This function is >> the callback function specified via the function pointer >> specified via the remove field of the struct ap_driver >> when the vfio_ap device driver is registered with the >> AP bus. That callback function takes a struct ap_device >> as a parameter. What am I missing here? > Please compare the removed function vfio_ap_queue_dev_remove() with the > added function vfio_ap_mdev_remove_queue() line by line. It should > become clear. Got it. You are one sharp cookie, I'll fix this. > > Regards, > Halil
On Fri, 30 Oct 2020 16:37:04 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > On 10/30/20 1:42 PM, Halil Pasic wrote: > > On Thu, 29 Oct 2020 19:29:35 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > >>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > >>>> */ > >>>> if (ret) > >>>> rc = ret; > >>>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); > >>>> + q = vfio_ap_get_queue(matrix_mdev, > >>>> + AP_MKQID(apid, apqi)); > >>>> + if (q) > >>>> + vfio_ap_free_aqic_resources(q); > >>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't > >>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue() > >>> in particular guarantee that the reset is actually done when we arrive > >>> here)? BTW, I think we have a similar problem with the current code as > >>> well. > >> If the return code from the vfio_ap_mdev_reset_queue() function > >> is zero, then yes, we are guaranteed the reset was done and the > >> queue is empty. > > I've read up on this and I disagree. We should discuss this offline. > > Maybe you are confusing things here; my statement is specific to the return > code from the vfio_ap_mdev_reset_queue() function, not the response code > from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue() > function issues the PQAP(ZAPQ) instruction and if the status response code > is 0 indicating the reset was successfully initiated, it waits for the > queue to empty. When the queue is empty, it returns 0 to indicate > the queue is reset. > If the queue does not become empty after a period of > time, > it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose > there is no guarantee the reset was done, so maybe a change needs to be > made there such as a non-zero return code. > I've overlooked the wait for empty. Maybe that return 0 had a part in it. I now remember me insisting on having the wait code added when the interrupt support was in the make. Sorry! If we have given up on out of retries retries, we are in trouble anyway. > > > >> The function returns a non-zero return code if > >> the reset fails or the queue the reset did not complete within a given > >> amount of time, so maybe we shouldn't free AQIC resources when > >> we get a non-zero return code from the reset function? > >> > > If the queue is gone, or broken, it won't produce interrupts or poke the > > notifier bit, and we should clean up the AQIC resources. > > True, which is what the code provided by this patch does; however, > the AQIC resources should be cleaned up only if the KVM pointer is > not NULL for reasons discussed elsewhere. Yes, but these should be cleaned up before the KVM pointer becomes null. We don't want to keep the page with the notifier byte pinned forever, or?
On 10/30/20 11:43 PM, Halil Pasic wrote: > On Fri, 30 Oct 2020 16:37:04 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> On 10/30/20 1:42 PM, Halil Pasic wrote: >>> On Thu, 29 Oct 2020 19:29:35 -0400 >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote: >>> >>>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >>>>>> */ >>>>>> if (ret) >>>>>> rc = ret; >>>>>> - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); >>>>>> + q = vfio_ap_get_queue(matrix_mdev, >>>>>> + AP_MKQID(apid, apqi)); >>>>>> + if (q) >>>>>> + vfio_ap_free_aqic_resources(q); >>>>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't >>>>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue() >>>>> in particular guarantee that the reset is actually done when we arrive >>>>> here)? BTW, I think we have a similar problem with the current code as >>>>> well. >>>> If the return code from the vfio_ap_mdev_reset_queue() function >>>> is zero, then yes, we are guaranteed the reset was done and the >>>> queue is empty. >>> I've read up on this and I disagree. We should discuss this offline. >> Maybe you are confusing things here; my statement is specific to the return >> code from the vfio_ap_mdev_reset_queue() function, not the response code >> from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue() >> function issues the PQAP(ZAPQ) instruction and if the status response code >> is 0 indicating the reset was successfully initiated, it waits for the >> queue to empty. When the queue is empty, it returns 0 to indicate >> the queue is reset. >> If the queue does not become empty after a period of >> time, >> it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose >> there is no guarantee the reset was done, so maybe a change needs to be >> made there such as a non-zero return code. >> > I've overlooked the wait for empty. Maybe that return 0 had a part in > it. I now remember me insisting on having the wait code added when the > interrupt support was in the make. Sorry! > > If we have given up on out of retries retries, we are in trouble anyway. > >>> >>>> The function returns a non-zero return code if >>>> the reset fails or the queue the reset did not complete within a given >>>> amount of time, so maybe we shouldn't free AQIC resources when >>>> we get a non-zero return code from the reset function? >>>> >>> If the queue is gone, or broken, it won't produce interrupts or poke the >>> notifier bit, and we should clean up the AQIC resources. >> True, which is what the code provided by this patch does; however, >> the AQIC resources should be cleaned up only if the KVM pointer is >> not NULL for reasons discussed elsewhere. > Yes, but these should be cleaned up before the KVM pointer becomes > null. We don't want to keep the page with the notifier byte pinned > forever, or? No, we do not want to keep the page forever. I probably should have been clearer. There are times we do a reset - e.g., on remove of the mdev - at which time there should be no KVM pointer, or else the remove will not be allowed. Of course, we won't do the reset either, so I guess you can ignore my comment. If there is no KVM pointer yet a page remains pinned, something bad happened.
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c index be2520cc010b..73bd073fd5d3 100644 --- a/drivers/s390/crypto/vfio_ap_drv.c +++ b/drivers/s390/crypto/vfio_ap_drv.c @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = { MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids); -/** - * vfio_ap_queue_dev_probe: - * - * Allocate a vfio_ap_queue structure and associate it - * with the device as driver_data. - */ -static int vfio_ap_queue_dev_probe(struct ap_device *apdev) -{ - struct vfio_ap_queue *q; - - q = kzalloc(sizeof(*q), GFP_KERNEL); - if (!q) - return -ENOMEM; - dev_set_drvdata(&apdev->device, q); - q->apqn = to_ap_queue(&apdev->device)->qid; - q->saved_isc = VFIO_AP_ISC_INVALID; - return 0; -} - -/** - * vfio_ap_queue_dev_remove: - * - * Takes the matrix lock to avoid actions on this device while removing - * Free the associated vfio_ap_queue structure - */ -static void vfio_ap_queue_dev_remove(struct ap_device *apdev) -{ - struct vfio_ap_queue *q; - int apid, apqi; - - mutex_lock(&matrix_dev->lock); - q = dev_get_drvdata(&apdev->device); - dev_set_drvdata(&apdev->device, NULL); - apid = AP_QID_CARD(q->apqn); - apqi = AP_QID_QUEUE(q->apqn); - vfio_ap_mdev_reset_queue(apid, apqi, 1); - vfio_ap_irq_disable(q); - kfree(q); - mutex_unlock(&matrix_dev->lock); -} - static void vfio_ap_matrix_dev_release(struct device *dev) { struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev); @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void) return ret; memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv)); - vfio_ap_drv.probe = vfio_ap_queue_dev_probe; - vfio_ap_drv.remove = vfio_ap_queue_dev_remove; + vfio_ap_drv.probe = vfio_ap_mdev_probe_queue; + vfio_ap_drv.remove = vfio_ap_mdev_remove_queue; vfio_ap_drv.ids = ap_queue_ids; ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME); diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e0bde8518745..c471832f0a30 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn) */ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) { - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev && + q->matrix_mdev->kvm) kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); if (q->saved_pfn && q->matrix_mdev) vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) * Returns if ap_aqic function failed with invalid, deconfigured or * checkstopped AP. */ -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) +static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q) { struct ap_qirq_ctrl aqic_gisa = {}; struct ap_queue_status status; @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) if (!q) goto out_unlock; + q->matrix_mdev = matrix_mdev; status = vcpu->run->s.regs.gprs[1]; /* If IR bit(16) is set we enable the interrupt */ @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, return NOTIFY_OK; } -static void vfio_ap_irq_disable_apqn(int apqn) -{ - struct device *dev; - struct vfio_ap_queue *q; - - dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL, - &apqn, match_apqn); - if (dev) { - q = dev_get_drvdata(dev); - vfio_ap_irq_disable(q); - put_device(dev); - } -} - int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, unsigned int retry) { @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) { int ret; int rc = 0; + struct vfio_ap_queue *q; unsigned long apid, apqi; struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) */ if (ret) rc = ret; - vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi)); + q = vfio_ap_get_queue(matrix_mdev, + AP_MKQID(apid, apqi)); + if (q) + vfio_ap_free_aqic_resources(q); } } @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void) { mdev_unregister_device(&matrix_dev->device); } + +int vfio_ap_mdev_probe_queue(struct ap_device *apdev) +{ + struct vfio_ap_queue *q; + struct ap_queue *queue; + + queue = to_ap_queue(&apdev->device); + + q = kzalloc(sizeof(*q), GFP_KERNEL); + if (!q) + return -ENOMEM; + + dev_set_drvdata(&queue->ap_dev.device, q); + q->apqn = queue->qid; + q->saved_isc = VFIO_AP_ISC_INVALID; + + return 0; +} + +void vfio_ap_mdev_remove_queue(struct ap_device *apdev) +{ + struct vfio_ap_queue *q; + struct ap_queue *queue; + int apid, apqi; + + queue = to_ap_queue(&apdev->device); + + mutex_lock(&matrix_dev->lock); + q = dev_get_drvdata(&queue->ap_dev.device); + dev_set_drvdata(&queue->ap_dev.device, NULL); + apid = AP_QID_CARD(q->apqn); + apqi = AP_QID_QUEUE(q->apqn); + vfio_ap_mdev_reset_queue(apid, apqi, 1); + vfio_ap_free_aqic_resources(q); + kfree(q); + mutex_unlock(&matrix_dev->lock); +} diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h index f46dde56b464..d9003de4fbad 100644 --- a/drivers/s390/crypto/vfio_ap_private.h +++ b/drivers/s390/crypto/vfio_ap_private.h @@ -90,8 +90,6 @@ struct ap_matrix_mdev { extern int vfio_ap_mdev_register(void); extern void vfio_ap_mdev_unregister(void); -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, - unsigned int retry); struct vfio_ap_queue { struct ap_matrix_mdev *matrix_mdev; @@ -100,5 +98,8 @@ struct vfio_ap_queue { #define VFIO_AP_ISC_INVALID 0xff unsigned char saved_isc; }; -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q); + +int vfio_ap_mdev_probe_queue(struct ap_device *queue); +void vfio_ap_mdev_remove_queue(struct ap_device *queue); + #endif /* _VFIO_AP_PRIVATE_H_ */
The queues assigned to a matrix mediated device are currently reset when: * The VFIO_DEVICE_RESET ioctl is invoked * The mdev fd is closed by userspace (QEMU) * The mdev is removed from sysfs. Immediately after the reset of a queue, a call is made to disable interrupts for the queue. This is entirely unnecessary because the reset of a queue disables interrupts, so this will be removed. Since interrupt processing may have been enabled by the guest, it may also be necessary to clean up the resources used for interrupt processing. Part of the cleanup operation requires a reference to KVM, so a check is also being added to ensure the reference to KVM exists. The reason is because the release callback - invoked when userspace closes the mdev fd - removes the reference to KVM. When the remove callback - called when the mdev is removed from sysfs - is subsequently invoked, there will be no reference to KVM when the cleanup is performed. This patch will also do a bit of refactoring due to the fact that the remove callback, implemented in vfio_ap_drv.c, disables the queue after resetting it. Instead of the remove callback making a call into the vfio_ap_ops.c to clean up the resources used for interrupt processing, let's move the probe and remove callbacks into the vfio_ap_ops.c file keep all code related to managing queues in a single file. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> --- drivers/s390/crypto/vfio_ap_drv.c | 45 +------------------ drivers/s390/crypto/vfio_ap_ops.c | 63 +++++++++++++++++++-------- drivers/s390/crypto/vfio_ap_private.h | 7 +-- 3 files changed, 52 insertions(+), 63 deletions(-)