Message ID | 20180306182128.23281-7-decui@microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
> -----Original Message----- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; > Stephen Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; apw@canonical.com; > jasowang@redhat.com > Cc: linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang Zhang > <haiyangz@microsoft.com>; vkuznets@redhat.com; marcelo.cerri@canonical.com; Michael > Kelley (EOSG) <Michael.H.Kelley@microsoft.com>; Dexuan Cui <decui@microsoft.com>; > stable@vger.kernel.org; Jack Morgenstein <jackm@mellanox.com> > Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg() > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > disabled in __setup_irq(). > > Fix this by polling the channel. > > 2. If the host is ejecting the VF device before we reach > hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() > forever, because at this time the host doesn't respond to the > CREATE_INTERRUPT request. This issue also happens to old kernels like > v4.14, v4.13, etc. > > Fix this by polling the channel for the PCI_EJECT message and > hpdev->state, and by checking the PCI vendor ID. > > Note: actually the above issues also happen to a SMP VM, if > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > Tested-by: Chris Valean <v-chvale@microsoft.com> > Cc: stable@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: K. Y. Srinivasan <kys@microsoft.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Jack Morgenstein <jackm@mellanox.com> > --- > drivers/pci/host/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 57 insertions(+), 1 deletion(-) > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote: > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg() > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > disabled in __setup_irq(). > > Fix this by polling the channel. > > 2. If the host is ejecting the VF device before we reach > hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() > forever, because at this time the host doesn't respond to the > CREATE_INTERRUPT request. This issue also happens to old kernels like > v4.14, v4.13, etc. If you are fixing a problem you should report what commit you are fixing with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log to send it to stable kernels to which it should be applied; mentioning kernel versions in the commit log is useless and should be omitted. Side note: you should not have stable@vger.kernel.org in the email addresses CC list you are sending the patches to (you mark patches for stable by adding an appropriate CC tag in the commit log). Here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst?h=v4.16-rc4 Last but not least, most of the patches in this series do not justify sending them to stable kernels at all so you should remove the corresponding tag from the patches. Thanks, Lorenzo > Fix this by polling the channel for the PCI_EJECT message and > hpdev->state, and by checking the PCI vendor ID. > > Note: actually the above issues also happen to a SMP VM, if > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > Tested-by: Chris Valean <v-chvale@microsoft.com> > Cc: stable@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: K. Y. Srinivasan <kys@microsoft.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Jack Morgenstein <jackm@mellanox.com> > --- > drivers/pci/host/pci-hyperv.c | 58 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 57 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 265ba11e53e2..50cdefe3f6d3 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -521,6 +521,8 @@ struct hv_pci_compl { > s32 completion_status; > }; > > +static void hv_pci_onchannelcallback(void *context); > + > /** > * hv_pci_generic_compl() - Invoked for a completion packet > * @context: Set up by the sender of the packet. > @@ -665,6 +667,31 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where, > } > } > > +static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev) > +{ > + u16 ret; > + unsigned long flags; > + void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + > + PCI_VENDOR_ID; > + > + spin_lock_irqsave(&hpdev->hbus->config_lock, flags); > + > + /* Choose the function to be read. (See comment above) */ > + writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr); > + /* Make sure the function was chosen before we start reading. */ > + mb(); > + /* Read from that function's config space. */ > + ret = readw(addr); > + /* > + * mb() is not required here, because the spin_unlock_irqrestore() > + * is a barrier. > + */ > + > + spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags); > + > + return ret; > +} > + > /** > * _hv_pcifront_write_config() - Internal PCI config write > * @hpdev: The PCI driver's representation of the device > @@ -1107,8 +1134,37 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) > * Since this function is called with IRQ locks held, can't > * do normal wait for completion; instead poll. > */ > - while (!try_wait_for_completion(&comp.comp_pkt.host_event)) > + while (!try_wait_for_completion(&comp.comp_pkt.host_event)) { > + /* 0xFFFF means an invalid PCI VENDOR ID. */ > + if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) { > + dev_err_once(&hbus->hdev->device, > + "the device has gone\n"); > + goto free_int_desc; > + } > + > + /* > + * When the higher level interrupt code calls us with > + * interrupt disabled, we must poll the channel by calling > + * the channel callback directly when channel->target_cpu is > + * the current CPU. When the higher level interrupt code > + * calls us with interrupt enabled, let's add the > + * local_bh_disable()/enable() to avoid race. > + */ > + local_bh_disable(); > + > + if (hbus->hdev->channel->target_cpu == smp_processor_id()) > + hv_pci_onchannelcallback(hbus); > + > + local_bh_enable(); > + > + if (hpdev->state == hv_pcichild_ejecting) { > + dev_err_once(&hbus->hdev->device, > + "the device is being ejected\n"); > + goto free_int_desc; > + } > + > udelay(100); > + } > > if (comp.comp_pkt.completion_status < 0) { > dev_err(&hbus->hdev->device, > -- > 2.7.4
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Sent: Wednesday, March 7, 2018 04:35 > On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote: > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > > Hyper-V VM with SR-IOV. This is because when we reach > hv_compose_msi_msg() > > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > > disabled in __setup_irq(). > > > > Fix this by polling the channel. > > > > 2. If the host is ejecting the VF device before we reach > > hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() > > forever, because at this time the host doesn't respond to the > > CREATE_INTERRUPT request. This issue also happens to old kernels like > > v4.14, v4.13, etc. > > If you are fixing a problem you should report what commit you are fixing > with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log > to send it to stable kernels to which it should be applied; mentioning > kernel versions in the commit log is useless and should be omitted. Hi Lorenzo, Thanks for your comments! This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-) Here the patch is made to resolve 2 issues: #1 is triggered by the x86 global reservation mode (4900be8360) patch. 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c should be fixed. #2 is a longstanding issue since the first day the pci-hyperv driver was accepted into the kernel. So IMO actually we don't really need to add a Fixes: tag, which is usually used to specify a specific commit that introduces a bug that is being fixed. > Side note: you should not have stable@vger.kernel.org in the email > addresses CC list you are sending the patches to (you mark patches for > stable by adding an appropriate CC tag in the commit log). Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org manually. Instead I used "git send-email" to send this patchset, and it told me "The Cc list above has been expanded by additional addresses found in the patch commit message." I didn't find a way to disable this behavior of "git send-email" by checking its manual and googling it. This is strange. > Here: > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst > > Last but not least, most of the patches in this series do not justify > sending them to stable kernels at all so you should remove the > corresponding tag from the patches. I hope at least these 2 patches can go into the stable kernels: [PATCH v3 3/6] PCI: hv: serialize the present/eject work items [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() Especially the second one, which fixes a real hang issue for UP virtual machines running v4.15 and newer. And, IMO the patches are small enough (<100 lines) , but definitely the maintainers make the final call. > > Thanks, > Lorenzo > > > Fix this by polling the channel for the PCI_EJECT message and > > hpdev->state, and by checking the PCI vendor ID. > > > > Note: actually the above issues also happen to a SMP VM, if > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > > Tested-by: Chris Valean <v-chvale@microsoft.com> > > Cc: stable@vger.kernel.org > > Cc: Stephen Hemminger <sthemmin@microsoft.com> > > Cc: K. Y. Srinivasan <kys@microsoft.com> > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: Jack Morgenstein <jackm@mellanox.com> > > --- > > drivers/pci/host/pci-hyperv.c | 58 Thanks, -- Dexuan
> -----Original Message----- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 1:22 PM > To: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; > olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com > Cc: linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com; > marcelo.cerri@canonical.com; Michael Kelley (EOSG) > <Michael.H.Kelley@microsoft.com>; Dexuan Cui <decui@microsoft.com>; > stable@vger.kernel.org; Jack Morgenstein <jackm@mellanox.com> > Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > Hyper-V VM with SR-IOV. This is because when we reach > hv_compose_msi_msg() by request_irq() -> request_threaded_irq() -> > __setup_irq()->irq_startup() -> __irq_startup() -> irq_domain_activate_irq() - > > ... -> > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is disabled in > __setup_irq(). > > Fix this by polling the channel. > > 2. If the host is ejecting the VF device before we reach hv_compose_msi_msg(), > in a UP VM, we can hang in hv_compose_msi_msg() forever, because at this > time the host doesn't respond to the CREATE_INTERRUPT request. This issue > also happens to old kernels like v4.14, v4.13, etc. > > Fix this by polling the channel for the PCI_EJECT message and > hpdev->state, and by checking the PCI vendor ID. > > Note: actually the above issues also happen to a SMP VM, if "hbus->hdev- > >channel->target_cpu == smp_processor_id()" is true. > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > Tested-by: Chris Valean <v-chvale@microsoft.com> > Cc: stable@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: K. Y. Srinivasan <kys@microsoft.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Jack Morgenstein <jackm@mellanox.com> > --- Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
> From: Dexuan Cui > Sent: Wednesday, March 7, 2018 13:40 > To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; > olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; linux- > kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang > Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com; > marcelo.cerri@canonical.com; Michael Kelley (EOSG) > <Michael.H.Kelley@microsoft.com>; stable@vger.kernel.org; Jack > Morgenstein <jackm@mellanox.com> > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > > From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > > Sent: Wednesday, March 7, 2018 04:35 > > On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote: > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > > > Hyper-V VM with SR-IOV. This is because when we reach > > hv_compose_msi_msg() > > > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > > > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > > > disabled in __setup_irq(). > > > > > > Fix this by polling the channel. > > > > > > 2. If the host is ejecting the VF device before we reach > > > hv_compose_msi_msg(), in a UP VM, we can hang in > hv_compose_msi_msg() > > > forever, because at this time the host doesn't respond to the > > > CREATE_INTERRUPT request. This issue also happens to old kernels like > > > v4.14, v4.13, etc. > > > > If you are fixing a problem you should report what commit you are fixing > > with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log > > to send it to stable kernels to which it should be applied; mentioning > > kernel versions in the commit log is useless and should be omitted. > > Hi Lorenzo, > Thanks for your comments! > This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-) > > Here the patch is made to resolve 2 issues: > #1 is triggered by the x86 global reservation mode (4900be8360) patch. > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c > should be fixed. > > #2 is a longstanding issue since the first day the pci-hyperv driver was > accepted into the kernel. > > So IMO actually we don't really need to add a Fixes: tag, which is usually > used to specify a specific commit that introduces a bug that is being fixed. > > > Side note: you should not have stable@vger.kernel.org in the email > > addresses CC list you are sending the patches to (you mark patches for > > stable by adding an appropriate CC tag in the commit log). > > Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org > manually. Instead I used "git send-email" to send this patchset, and it told > me "The Cc list above has been expanded by additional addresses found > in the patch commit message." > > I didn't find a way to disable this behavior of "git send-email" by checking > its manual and googling it. This is strange. > > > Here: > > > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst > > > > Last but not least, most of the patches in this series do not justify > > sending them to stable kernels at all so you should remove the > > corresponding tag from the patches. > > I hope at least these 2 patches can go into the stable kernels: > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > Especially the second one, which fixes a real hang issue for UP virtual > machines running v4.15 and newer. > And, IMO the patches are small enough (<100 lines) , but definitely > the maintainers make the final call. > > > > > Thanks, > > Lorenzo > > > > > Fix this by polling the channel for the PCI_EJECT message and > > > hpdev->state, and by checking the PCI vendor ID. > > > > > > Note: actually the above issues also happen to a SMP VM, if > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > > > > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > > > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > > > Tested-by: Chris Valean <v-chvale@microsoft.com> > > > Cc: stable@vger.kernel.org > > > Cc: Stephen Hemminger <sthemmin@microsoft.com> > > > Cc: K. Y. Srinivasan <kys@microsoft.com> > > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > > Cc: Jack Morgenstein <jackm@mellanox.com> > > > --- > > > drivers/pci/host/pci-hyperv.c | 58 > > > Thanks, > -- Dexuan Hi Lorenzo, Bjorn, and all, Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd the patchset. Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be easier if you just remove the tags if you belive it's necessary (IMHO all the 6 paches are not big and it would be great if we can have all of them in the old stable kernels, but I respect your decision). Please let me know if I missed something when addressing the comments, and if I should send a v4. Thanks! -- Dexuan
On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote: > > From: Dexuan Cui > > Sent: Wednesday, March 7, 2018 13:40 > > To: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > > Cc: bhelgaas@google.com; linux-pci@vger.kernel.org; KY Srinivasan > > <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; > > olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; linux- > > kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang > > Zhang <haiyangz@microsoft.com>; vkuznets@redhat.com; > > marcelo.cerri@canonical.com; Michael Kelley (EOSG) > > <Michael.H.Kelley@microsoft.com>; stable@vger.kernel.org; Jack > > Morgenstein <jackm@mellanox.com> > > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > > > > From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > > > Sent: Wednesday, March 7, 2018 04:35 > > > On Tue, Mar 06, 2018 at 06:21:56PM +0000, Dexuan Cui wrote: > > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > > > > Hyper-V VM with SR-IOV. This is because when we reach > > > hv_compose_msi_msg() > > > > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > > > > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > > > > disabled in __setup_irq(). > > > > > > > > Fix this by polling the channel. > > > > > > > > 2. If the host is ejecting the VF device before we reach > > > > hv_compose_msi_msg(), in a UP VM, we can hang in > > hv_compose_msi_msg() > > > > forever, because at this time the host doesn't respond to the > > > > CREATE_INTERRUPT request. This issue also happens to old kernels like > > > > v4.14, v4.13, etc. > > > > > > If you are fixing a problem you should report what commit you are fixing > > > with a Fixes: tag and add a CC: stable@vger.kernel.org to the commit log > > > to send it to stable kernels to which it should be applied; mentioning > > > kernel versions in the commit log is useless and should be omitted. > > > > Hi Lorenzo, > > Thanks for your comments! > > This patch does have a "Cc: stable@vger.kernel.org" in the sign-off area. :-) > > > > Here the patch is made to resolve 2 issues: > > #1 is triggered by the x86 global reservation mode (4900be8360) patch. > > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c > > should be fixed. > > > > #2 is a longstanding issue since the first day the pci-hyperv driver was > > accepted into the kernel. > > > > So IMO actually we don't really need to add a Fixes: tag, which is usually > > used to specify a specific commit that introduces a bug that is being fixed. > > > > > Side note: you should not have stable@vger.kernel.org in the email > > > addresses CC list you are sending the patches to (you mark patches for > > > stable by adding an appropriate CC tag in the commit log). > > > > Sorry, I didn't know this, but actually I didn't add stable@vger.kernel.org > > manually. Instead I used "git send-email" to send this patchset, and it told > > me "The Cc list above has been expanded by additional addresses found > > in the patch commit message." > > > > I didn't find a way to disable this behavior of "git send-email" by checking > > its manual and googling it. This is strange. > > > > > Here: > > > > > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst > > > > > > Last but not least, most of the patches in this series do not justify > > > sending them to stable kernels at all so you should remove the > > > corresponding tag from the patches. > > > > I hope at least these 2 patches can go into the stable kernels: > > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items > > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > Especially the second one, which fixes a real hang issue for UP virtual > > machines running v4.15 and newer. > > And, IMO the patches are small enough (<100 lines) , but definitely > > the maintainers make the final call. > > > > > > > > Thanks, > > > Lorenzo > > > > > > > Fix this by polling the channel for the PCI_EJECT message and > > > > hpdev->state, and by checking the PCI vendor ID. > > > > > > > > Note: actually the above issues also happen to a SMP VM, if > > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > > > > > > > Signed-off-by: Dexuan Cui <decui@microsoft.com> > > > > Tested-by: Adrian Suhov <v-adsuho@microsoft.com> > > > > Tested-by: Chris Valean <v-chvale@microsoft.com> > > > > Cc: stable@vger.kernel.org > > > > Cc: Stephen Hemminger <sthemmin@microsoft.com> > > > > Cc: K. Y. Srinivasan <kys@microsoft.com> > > > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > > > Cc: Jack Morgenstein <jackm@mellanox.com> > > > > --- > > > > drivers/pci/host/pci-hyperv.c | 58 > > > > > > Thanks, > > -- Dexuan > > Hi Lorenzo, Bjorn, and all, > Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd > the patchset. > > Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag > for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be > easier if you just remove the tags if you belive it's necessary (IMHO all the > 6 paches are not big and it would be great if we can have all of them in > the old stable kernels, but I respect your decision). > > Please let me know if I missed something when addressing the comments, > and if I should send a v4. I will have a look tomorrow, thank you. Lorenzo
On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote: [...] > Hi Lorenzo, Bjorn, and all, > Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd > the patchset. > > Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag > for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be > easier if you just remove the tags if you belive it's necessary (IMHO all the > 6 paches are not big and it would be great if we can have all of them in > the old stable kernels, but I respect your decision). I think you need a v4 since for patches that are actually fixing a bug I want a Fixes: tag added and I want them to be applicable independently of other patches in the series. Send them in a separate series if you prefer - I just want to make sure they apply independently. As for marking patches for stable kernels, I do not think it is right to send cosmetic churn to stable kernels anyway, at least that's my reading of the policy. You can easily post a v4 with patches reshuffled - I will apply them accordingly. Before re-posting please read this to improve formatting (I can do it for you but while at it you can do it yourself): https://marc.info/?l=linux-pci&m=150905742808166&w=2 Thanks, Lorenzo
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Sent: Wednesday, March 14, 2018 04:50 > On Tue, Mar 13, 2018 at 06:23:39PM +0000, Dexuan Cui wrote: > > [...] > > > Hi Lorenzo, Bjorn, and all, > > Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd > > the patchset. > > > > Should I send a v4 that just removes the "CC: stable@vger.kernel.org" tag > > for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be > > easier if you just remove the tags if you belive it's necessary (IMHO all the > > 6 paches are not big and it would be great if we can have all of them in > > the old stable kernels, but I respect your decision). > > I think you need a v4 since for patches that are actually fixing a bug I > want a Fixes: tag added and I want them to be applicable independently > of other patches in the series. Send them in a separate series if you > prefer - I just want to make sure they apply independently. Ok, I'll send 2 series: one for [6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() [3/6] PCI: hv: serialize the present/eject work items These fix real issues reported by Mellanox when they tested SR-IOV with VMs runnning on Hyper-V. I'll add stable tags for them. The other series will cover these 4 patces which don't need to go in stable: [5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary [4/6] PCI: hv: remove hbus->enum_sem [2/6] PCI: hv: hv_eject_device_work(): remove the bogus test [1/6] PCI: hv: fix a comment typo in _hv_pcifront_read_config() > As for marking patches for stable kernels, I do not think it is right > to send cosmetic churn to stable kernels anyway, at least that's my > reading of the policy. > > You can easily post a v4 with patches reshuffled - I will apply them > accordingly. OK. > Before re-posting please read this to improve formatting (I can do it > for you but while at it you can do it yourself): > https://marc.info/?l=linux-pci&m=150905742808166&w=2 Will read. Thanks! -- Dexuan
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index 265ba11e53e2..50cdefe3f6d3 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -521,6 +521,8 @@ struct hv_pci_compl { s32 completion_status; }; +static void hv_pci_onchannelcallback(void *context); + /** * hv_pci_generic_compl() - Invoked for a completion packet * @context: Set up by the sender of the packet. @@ -665,6 +667,31 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where, } } +static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev) +{ + u16 ret; + unsigned long flags; + void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + + PCI_VENDOR_ID; + + spin_lock_irqsave(&hpdev->hbus->config_lock, flags); + + /* Choose the function to be read. (See comment above) */ + writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr); + /* Make sure the function was chosen before we start reading. */ + mb(); + /* Read from that function's config space. */ + ret = readw(addr); + /* + * mb() is not required here, because the spin_unlock_irqrestore() + * is a barrier. + */ + + spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags); + + return ret; +} + /** * _hv_pcifront_write_config() - Internal PCI config write * @hpdev: The PCI driver's representation of the device @@ -1107,8 +1134,37 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) * Since this function is called with IRQ locks held, can't * do normal wait for completion; instead poll. */ - while (!try_wait_for_completion(&comp.comp_pkt.host_event)) + while (!try_wait_for_completion(&comp.comp_pkt.host_event)) { + /* 0xFFFF means an invalid PCI VENDOR ID. */ + if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) { + dev_err_once(&hbus->hdev->device, + "the device has gone\n"); + goto free_int_desc; + } + + /* + * When the higher level interrupt code calls us with + * interrupt disabled, we must poll the channel by calling + * the channel callback directly when channel->target_cpu is + * the current CPU. When the higher level interrupt code + * calls us with interrupt enabled, let's add the + * local_bh_disable()/enable() to avoid race. + */ + local_bh_disable(); + + if (hbus->hdev->channel->target_cpu == smp_processor_id()) + hv_pci_onchannelcallback(hbus); + + local_bh_enable(); + + if (hpdev->state == hv_pcichild_ejecting) { + dev_err_once(&hbus->hdev->device, + "the device is being ejected\n"); + goto free_int_desc; + } + udelay(100); + } if (comp.comp_pkt.completion_status < 0) { dev_err(&hbus->hdev->device,