Message ID | 501BB4EF.7080909@jp.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: > Hi all, > > This patch adds kernel parameter "reset_pcie_devices" which resets PCIe > devices at boot time to address DMA problem on kdump with iommu. When > this parameter is specified, a hot reset is triggered on each PCIe root > port and downstream port to reset its downstream endpoint. Hi Takao, Why not use existing "reset_devices" parameter instead of introducing a new one? Thanks Vivek > > Background: > A kdump problem about DMA has been discussed for a long time. That is, > when a kernel is switched to the kdump kernel DMA derived from first > kernel affects second kernel. Recently this problem surfaces when iommu > is used for PCI passthrough on KVM guest. In the case of the machine I > use, when intel_iommu=on is specified, DMAR error is detected in kdump > kernel and PCI SERR is also detected. Finally kdump fails because some > devices does not work correctly. > > The root cause is that ongoing DMA from first kernel causes DMAR fault > because page table of DMAR is initialized while kdump kernel is booting > up. Therefore to address this problem DMA needs to be stopped before DMAR > is initialized at kdump kernel boot time. By this patch, PCIe devices > are reset by hot reset and its DMA is stopped when reset_pcie_devices is > specified. One problem of this solution is that VGA is reset and the > monitor blacks out when the link between the port and VGA controller was > reset. So this patch does not reset the port whose child endpoint is VGA > device. > > Any comments would be appreciated. > > Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> > --- > Documentation/kernel-parameters.txt | 4 + > drivers/pci/quirks.c | 59 ++++++++++++++++++++++++++ > 2 files changed, 63 insertions(+) > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index e714a02..e694e9f 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2489,6 +2489,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > reset_devices [KNL] Force drivers to reset the underlying device > during initialization. > > + reset_pcie_devices > + [PCIE] Reset PCIe endpoint at boot time by sending a > + hot reset to root port and downstream port > + > resume= [SWSUSP] > Specify the partition device for software suspend > Format: > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 5155317..7f7fc02 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -32,6 +32,65 @@ > #include "pci.h" > > /* > + * Reset PCIe endpoint by sending a hot reset to root port and downstream port > + */ > +unsigned int reset_pcie_devices; > +EXPORT_SYMBOL(reset_pcie_devices); > +static int __init set_reset_pcie_devices(char *str) > +{ > + reset_pcie_devices = 1; > + return 1; > +} > +__setup("reset_pcie_devices", set_reset_pcie_devices); > + > +static void __devinit quirk_pcie_device_reset(struct pci_dev *dev) > +{ > + struct pci_bus *subordinate; > + struct pci_dev *child; > + u16 ctrl; > + > + if (!reset_pcie_devices || !pci_is_pcie(dev) || !dev->subordinate || > + ((dev->pcie_type != PCI_EXP_TYPE_ROOT_PORT) && > + (dev->pcie_type != PCI_EXP_TYPE_DOWNSTREAM))) > + return; > + > + subordinate = dev->subordinate; > + list_for_each_entry(child, &subordinate->devices, bus_list) { > + if ((child->pcie_type == PCI_EXP_TYPE_UPSTREAM) || > + (child->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) || > + ((child->class >> 16) == PCI_BASE_CLASS_DISPLAY)) > + /* Don't reset switch, bridge, VGA device */ > + return; > + } > + > + dev_info(&dev->dev, "Reset Secondary bus\n"); > + > + list_for_each_entry(child, &subordinate->devices, bus_list) { > + dev_info(&child->dev, "save state\n"); > + pci_save_state(child); > + } > + > + /* Assert Secondary Bus Reset */ > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); > + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); > + > + msleep(2); > + > + /* De-assert Secondary Bus Reset */ > + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); > + > + msleep(200); > + > + list_for_each_entry(child, &subordinate->devices, bus_list) { > + dev_info(&child->dev, "restore state\n"); > + pci_restore_state(child); > + } > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_pcie_device_reset); > + > +/* > * Decoding should be disabled for a PCI device during BAR sizing to avoid > * conflict. But doing so may cause problems on host bridge and perhaps other > * key system devices. For devices that need to have mmio decoding always-on, -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/03/2012 07:24 AM, Takao Indoh wrote: > Hi all, > > This patch adds kernel parameter "reset_pcie_devices" which resets PCIe > devices at boot time to address DMA problem on kdump with iommu. When > this parameter is specified, a hot reset is triggered on each PCIe root > port and downstream port to reset its downstream endpoint. > > Background: > A kdump problem about DMA has been discussed for a long time. That is, > when a kernel is switched to the kdump kernel DMA derived from first > kernel affects second kernel. Recently this problem surfaces when iommu > is used for PCI passthrough on KVM guest. In the case of the machine I > use, when intel_iommu=on is specified, DMAR error is detected in kdump > kernel and PCI SERR is also detected. Finally kdump fails because some > devices does not work correctly. > > The root cause is that ongoing DMA from first kernel causes DMAR fault > because page table of DMAR is initialized while kdump kernel is booting > up. Therefore to address this problem DMA needs to be stopped before DMAR > is initialized at kdump kernel boot time. By this patch, PCIe devices > are reset by hot reset and its DMA is stopped when reset_pcie_devices is > specified. One problem of this solution is that VGA is reset and the > monitor blacks out when the link between the port and VGA controller was > reset. So this patch does not reset the port whose child endpoint is VGA > device. > > Any comments would be appreciated. > > Signed-off-by: Takao Indoh<indou.takao@jp.fujitsu.com> > --- Have you considered something less disruptive such as clearing the Master Enable in each endpoint's PCI Command Register ? That should prevent DMA transactions from endpoints during the kdump and kexec, and when the driver for the device gets reconfigured, Master Enable will be set back on, but after the driver has had the opportunity to do a device-specific reset. - Don > Documentation/kernel-parameters.txt | 4 + > drivers/pci/quirks.c | 59 ++++++++++++++++++++++++++ > 2 files changed, 63 insertions(+) > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index e714a02..e694e9f 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2489,6 +2489,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > reset_devices [KNL] Force drivers to reset the underlying device > during initialization. > > + reset_pcie_devices > + [PCIE] Reset PCIe endpoint at boot time by sending a > + hot reset to root port and downstream port > + > resume= [SWSUSP] > Specify the partition device for software suspend > Format: > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 5155317..7f7fc02 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -32,6 +32,65 @@ > #include "pci.h" > > /* > + * Reset PCIe endpoint by sending a hot reset to root port and downstream port > + */ > +unsigned int reset_pcie_devices; > +EXPORT_SYMBOL(reset_pcie_devices); > +static int __init set_reset_pcie_devices(char *str) > +{ > + reset_pcie_devices = 1; > + return 1; > +} > +__setup("reset_pcie_devices", set_reset_pcie_devices); > + > +static void __devinit quirk_pcie_device_reset(struct pci_dev *dev) > +{ > + struct pci_bus *subordinate; > + struct pci_dev *child; > + u16 ctrl; > + > + if (!reset_pcie_devices || !pci_is_pcie(dev) || !dev->subordinate || > + ((dev->pcie_type != PCI_EXP_TYPE_ROOT_PORT)&& > + (dev->pcie_type != PCI_EXP_TYPE_DOWNSTREAM))) > + return; > + > + subordinate = dev->subordinate; > + list_for_each_entry(child,&subordinate->devices, bus_list) { > + if ((child->pcie_type == PCI_EXP_TYPE_UPSTREAM) || > + (child->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) || > + ((child->class>> 16) == PCI_BASE_CLASS_DISPLAY)) > + /* Don't reset switch, bridge, VGA device */ > + return; > + } > + > + dev_info(&dev->dev, "Reset Secondary bus\n"); > + > + list_for_each_entry(child,&subordinate->devices, bus_list) { > + dev_info(&child->dev, "save state\n"); > + pci_save_state(child); > + } > + > + /* Assert Secondary Bus Reset */ > + pci_read_config_word(dev, PCI_BRIDGE_CONTROL,&ctrl); > + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); > + > + msleep(2); > + > + /* De-assert Secondary Bus Reset */ > + ctrl&= ~PCI_BRIDGE_CTL_BUS_RESET; > + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); > + > + msleep(200); > + > + list_for_each_entry(child,&subordinate->devices, bus_list) { > + dev_info(&child->dev, "restore state\n"); > + pci_restore_state(child); > + } > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_pcie_device_reset); > + > +/* > * Decoding should be disabled for a PCI device during BAR sizing to avoid > * conflict. But doing so may cause problems on host bridge and perhaps other > * key system devices. For devices that need to have mmio decoding always-on, > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Vivek, (2012/08/03 20:46), Vivek Goyal wrote: > On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: >> Hi all, >> >> This patch adds kernel parameter "reset_pcie_devices" which resets PCIe >> devices at boot time to address DMA problem on kdump with iommu. When >> this parameter is specified, a hot reset is triggered on each PCIe root >> port and downstream port to reset its downstream endpoint. > > Hi Takao, > > Why not use existing "reset_devices" parameter instead of introducing > a new one? "reset_devices" is used for each driver to reset their own device, and this patch resets all devices forcibly, so I thought they were different things. Thanks, Takao Indoh > > Thanks > Vivek > >> >> Background: >> A kdump problem about DMA has been discussed for a long time. That is, >> when a kernel is switched to the kdump kernel DMA derived from first >> kernel affects second kernel. Recently this problem surfaces when iommu >> is used for PCI passthrough on KVM guest. In the case of the machine I >> use, when intel_iommu=on is specified, DMAR error is detected in kdump >> kernel and PCI SERR is also detected. Finally kdump fails because some >> devices does not work correctly. >> >> The root cause is that ongoing DMA from first kernel causes DMAR fault >> because page table of DMAR is initialized while kdump kernel is booting >> up. Therefore to address this problem DMA needs to be stopped before DMAR >> is initialized at kdump kernel boot time. By this patch, PCIe devices >> are reset by hot reset and its DMA is stopped when reset_pcie_devices is >> specified. One problem of this solution is that VGA is reset and the >> monitor blacks out when the link between the port and VGA controller was >> reset. So this patch does not reset the port whose child endpoint is VGA >> device. >> >> Any comments would be appreciated. >> >> Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> >> --- >> Documentation/kernel-parameters.txt | 4 + >> drivers/pci/quirks.c | 59 ++++++++++++++++++++++++++ >> 2 files changed, 63 insertions(+) >> >> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt >> index e714a02..e694e9f 100644 >> --- a/Documentation/kernel-parameters.txt >> +++ b/Documentation/kernel-parameters.txt >> @@ -2489,6 +2489,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. >> reset_devices [KNL] Force drivers to reset the underlying device >> during initialization. >> >> + reset_pcie_devices >> + [PCIE] Reset PCIe endpoint at boot time by sending a >> + hot reset to root port and downstream port >> + >> resume= [SWSUSP] >> Specify the partition device for software suspend >> Format: >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 5155317..7f7fc02 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -32,6 +32,65 @@ >> #include "pci.h" >> >> /* >> + * Reset PCIe endpoint by sending a hot reset to root port and downstream port >> + */ >> +unsigned int reset_pcie_devices; >> +EXPORT_SYMBOL(reset_pcie_devices); >> +static int __init set_reset_pcie_devices(char *str) >> +{ >> + reset_pcie_devices = 1; >> + return 1; >> +} >> +__setup("reset_pcie_devices", set_reset_pcie_devices); >> + >> +static void __devinit quirk_pcie_device_reset(struct pci_dev *dev) >> +{ >> + struct pci_bus *subordinate; >> + struct pci_dev *child; >> + u16 ctrl; >> + >> + if (!reset_pcie_devices || !pci_is_pcie(dev) || !dev->subordinate || >> + ((dev->pcie_type != PCI_EXP_TYPE_ROOT_PORT) && >> + (dev->pcie_type != PCI_EXP_TYPE_DOWNSTREAM))) >> + return; >> + >> + subordinate = dev->subordinate; >> + list_for_each_entry(child, &subordinate->devices, bus_list) { >> + if ((child->pcie_type == PCI_EXP_TYPE_UPSTREAM) || >> + (child->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) || >> + ((child->class >> 16) == PCI_BASE_CLASS_DISPLAY)) >> + /* Don't reset switch, bridge, VGA device */ >> + return; >> + } >> + >> + dev_info(&dev->dev, "Reset Secondary bus\n"); >> + >> + list_for_each_entry(child, &subordinate->devices, bus_list) { >> + dev_info(&child->dev, "save state\n"); >> + pci_save_state(child); >> + } >> + >> + /* Assert Secondary Bus Reset */ >> + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); >> + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; >> + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); >> + >> + msleep(2); >> + >> + /* De-assert Secondary Bus Reset */ >> + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; >> + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); >> + >> + msleep(200); >> + >> + list_for_each_entry(child, &subordinate->devices, bus_list) { >> + dev_info(&child->dev, "restore state\n"); >> + pci_restore_state(child); >> + } >> +} >> +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_pcie_device_reset); >> + >> +/* >> * Decoding should be disabled for a PCI device during BAR sizing to avoid >> * conflict. But doing so may cause problems on host bridge and perhaps other >> * key system devices. For devices that need to have mmio decoding always-on, > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: > Hi Vivek, > > (2012/08/03 20:46), Vivek Goyal wrote: > > On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: > >> Hi all, > >> > >> This patch adds kernel parameter "reset_pcie_devices" which resets PCIe > >> devices at boot time to address DMA problem on kdump with iommu. When > >> this parameter is specified, a hot reset is triggered on each PCIe root > >> port and downstream port to reset its downstream endpoint. > > > > Hi Takao, > > > > Why not use existing "reset_devices" parameter instead of introducing > > a new one? > > "reset_devices" is used for each driver to reset their own device, and > this patch resets all devices forcibly, so I thought they were different > things. Yes reset_devices currently is used for driver to reset its device. I thought one could very well extend its reach to reset pci express devices at bus level. Having them separate is not going to be much useful from kdump perspective. We will end up passing both reset_devices and reset_pcie_devices to second kernel whill lead to bus level reset as well as device level reset. Ideal situation would be that somehow detect that bus level reset has been done and skip device level reset (assuming bus level reset obviates the need of device level reset, please correct me if that's not the case). After pcie reset, can we store the state in a variable and drivers can use that variable to check if PCIe level reset was done or not. If yes, skip device level reset (Assuming driver knows that device is on a PCIe slot). In that case we will not have to introduce new kernel command line, and also avoid double reset? Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/08/07 5:39), Vivek Goyal wrote: > On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: >> Hi Vivek, >> >> (2012/08/03 20:46), Vivek Goyal wrote: >>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: >>>> Hi all, >>>> >>>> This patch adds kernel parameter "reset_pcie_devices" which resets PCIe >>>> devices at boot time to address DMA problem on kdump with iommu. When >>>> this parameter is specified, a hot reset is triggered on each PCIe root >>>> port and downstream port to reset its downstream endpoint. >>> >>> Hi Takao, >>> >>> Why not use existing "reset_devices" parameter instead of introducing >>> a new one? >> >> "reset_devices" is used for each driver to reset their own device, and >> this patch resets all devices forcibly, so I thought they were different >> things. > > Yes reset_devices currently is used for driver to reset its device. I > thought one could very well extend its reach to reset pci express devices > at bus level. > > Having them separate is not going to be much useful from kdump > perspective. We will end up passing both reset_devices and > reset_pcie_devices to second kernel whill lead to bus level reset as well > as device level reset. > > Ideal situation would be that somehow detect that bus level reset has been > done and skip device level reset (assuming bus level reset obviates the > need of device level reset, please correct me if that's not the case). > > After pcie reset, can we store the state in a variable and drivers can > use that variable to check if PCIe level reset was done or not. If yes, > skip device level reset (Assuming driver knows that device is on a > PCIe slot). > > In that case we will not have to introduce new kernel command line, and > also avoid double reset? Actually I'm not sure whether the driver does not need to do their reset after bus level reset, but I agree with you, now I'm thinking that using reset_devices is better rather than adding narrow one which is limited to PCI express, otherwise we may have to add new parameter every time when adding new reset method, such as reset_pcie_devices, reset_pci_legacy_devices, etc. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/08/07 5:39), Vivek Goyal wrote: > On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: >> Hi Vivek, >> >> (2012/08/03 20:46), Vivek Goyal wrote: >>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: >>>> Hi all, >>>> >>>> This patch adds kernel parameter "reset_pcie_devices" which resets PCIe >>>> devices at boot time to address DMA problem on kdump with iommu. When >>>> this parameter is specified, a hot reset is triggered on each PCIe root >>>> port and downstream port to reset its downstream endpoint. >>> >>> Hi Takao, >>> >>> Why not use existing "reset_devices" parameter instead of introducing >>> a new one? >> >> "reset_devices" is used for each driver to reset their own device, and >> this patch resets all devices forcibly, so I thought they were different >> things. > > Yes reset_devices currently is used for driver to reset its device. I > thought one could very well extend its reach to reset pci express devices > at bus level. > > Having them separate is not going to be much useful from kdump > perspective. We will end up passing both reset_devices and > reset_pcie_devices to second kernel whill lead to bus level reset as well > as device level reset. > > Ideal situation would be that somehow detect that bus level reset has been > done and skip device level reset (assuming bus level reset obviates the > need of device level reset, please correct me if that's not the case). > > After pcie reset, can we store the state in a variable and drivers can > use that variable to check if PCIe level reset was done or not. If yes, > skip device level reset (Assuming driver knows that device is on a > PCIe slot). > > In that case we will not have to introduce new kernel command line, and > also avoid double reset? I found a problem when testing my patch on some machines. Originally there are two problems in kdump kernel when iommu is enabled; DMAR error and PCI SERR. I thought they are fixed by my patch, but I noticed that PCI SERR is still detected after applying the patch. It seems that something happens when Interrupt Remapping is initialized in kdump kernel. Therefore resetting devices has to be done before enable_IR() is called. I have three ideas for it. (i) Resetting devices in 1st kernel(panic kernel) We can reset devices before jumping into 2nd kernel. Of course it may be dangerous to scan pci device tree and call PCI functions in panic'd kernel. Beforehand we need to collect device information so that only minimal code could run on panic. (ii) Resetting devices in purgatory It seems to be be appropriate place to do this, but I'm not sure where I can save/restore PCI config when resetting devices in purgatory. (iii) Resetting devices in 2nd kernel(kdump kernel) Important point is to do reset before enable_IR() is called as I wrote above. I think I should add new function to do reset into arch/x86/pci/early.c and call it in setup_arch like early_dump_pci_devices() or early_quirks(). Any comments? Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-pci-owner@vger.kernel.org > [mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Takao Indoh > Sent: Wednesday, September 05, 2012 8:10 PM > To: vgoyal@redhat.com > Cc: kexec@lists.infradead.org; linux-kernel@vger.kernel.org; > linux-pci@vger.kernel.org; bhelgaas@google.com; hbabu@us.ibm.com; Ishii, > Hironobu/?? ??; martin.wilck@ts.fujitsu.com > Subject: Re: [RFC][PATCH] Reset PCIe devices to address DMA problem on kdump > with iommu > > (2012/08/07 5:39), Vivek Goyal wrote: > > On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: > >> Hi Vivek, > >> > >> (2012/08/03 20:46), Vivek Goyal wrote: > >>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: > >>>> Hi all, > >>>> > >>>> This patch adds kernel parameter "reset_pcie_devices" which resets > PCIe > >>>> devices at boot time to address DMA problem on kdump with iommu. When > >>>> this parameter is specified, a hot reset is triggered on each PCIe > root > >>>> port and downstream port to reset its downstream endpoint. > >>> > >>> Hi Takao, > >>> > >>> Why not use existing "reset_devices" parameter instead of introducing > >>> a new one? > >> > >> "reset_devices" is used for each driver to reset their own device, and > >> this patch resets all devices forcibly, so I thought they were different > >> things. > > > > Yes reset_devices currently is used for driver to reset its device. I > > thought one could very well extend its reach to reset pci express devices > > at bus level. > > > > Having them separate is not going to be much useful from kdump > > perspective. We will end up passing both reset_devices and > > reset_pcie_devices to second kernel whill lead to bus level reset as well > > as device level reset. > > > > Ideal situation would be that somehow detect that bus level reset has > been > > done and skip device level reset (assuming bus level reset obviates the > > need of device level reset, please correct me if that's not the case). > > > > After pcie reset, can we store the state in a variable and drivers can > > use that variable to check if PCIe level reset was done or not. If yes, > > skip device level reset (Assuming driver knows that device is on a > > PCIe slot). > > > > In that case we will not have to introduce new kernel command line, and > > also avoid double reset? > > I found a problem when testing my patch on some machines. > > Originally there are two problems in kdump kernel when iommu is enabled; > DMAR error and PCI SERR. I thought they are fixed by my patch, but I > noticed that PCI SERR is still detected after applying the patch. It > seems that something happens when Interrupt Remapping is initialized in > kdump kernel. I'm not sure, but I guess the PCI SERR might be caused as follows. - The 1st kernel enables interrupt remapping. The MSI(-X) address and data registers of PCI devices are programmed in remappable format. - At the beginning of 2nd kernel, interrupt remapping is still active. And then it is disabled by enable_IR() function for initialization. - PCI device generate an interrupt. At this moment, interrupt remapping is not enabled yet. On the other hand, MSI(-X) address and data of this interrupt is in remappable format because those are programmed by 1st kernel. I guess this might be a cause of PCI SERR. I guess clearing command register or disabling MSI before interrupt remapping initialization in 2nd kernel might solve the PCI SERR problem. Regards, Kenji Kaneshige > > Therefore resetting devices has to be done before enable_IR() is > called. I have three ideas for it. > > (i) Resetting devices in 1st kernel(panic kernel) > We can reset devices before jumping into 2nd kernel. Of course it may > be dangerous to scan pci device tree and call PCI functions in panic'd > kernel. Beforehand we need to collect device information so that only > minimal code could run on panic. > > (ii) Resetting devices in purgatory > It seems to be be appropriate place to do this, but I'm not sure > where I can save/restore PCI config when resetting devices in > purgatory. > > (iii) Resetting devices in 2nd kernel(kdump kernel) > Important point is to do reset before enable_IR() is called as I wrote > above. I think I should add new function to do reset into > arch/x86/pci/early.c and call it in setup_arch like > early_dump_pci_devices() or early_quirks(). > > Any comments? > > Thanks, > Takao Indoh > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/10 11:34), Kaneshige, Kenji wrote: >> -----Original Message----- >> From: linux-pci-owner@vger.kernel.org >> [mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Takao Indoh >> Sent: Wednesday, September 05, 2012 8:10 PM >> To: vgoyal@redhat.com >> Cc: kexec@lists.infradead.org; linux-kernel@vger.kernel.org; >> linux-pci@vger.kernel.org; bhelgaas@google.com; hbabu@us.ibm.com; Ishii, >> Hironobu/?? ??; martin.wilck@ts.fujitsu.com >> Subject: Re: [RFC][PATCH] Reset PCIe devices to address DMA problem on kdump >> with iommu >> >> (2012/08/07 5:39), Vivek Goyal wrote: >>> On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: >>>> Hi Vivek, >>>> >>>> (2012/08/03 20:46), Vivek Goyal wrote: >>>>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: >>>>>> Hi all, >>>>>> >>>>>> This patch adds kernel parameter "reset_pcie_devices" which resets >> PCIe >>>>>> devices at boot time to address DMA problem on kdump with iommu. When >>>>>> this parameter is specified, a hot reset is triggered on each PCIe >> root >>>>>> port and downstream port to reset its downstream endpoint. >>>>> >>>>> Hi Takao, >>>>> >>>>> Why not use existing "reset_devices" parameter instead of introducing >>>>> a new one? >>>> >>>> "reset_devices" is used for each driver to reset their own device, and >>>> this patch resets all devices forcibly, so I thought they were different >>>> things. >>> >>> Yes reset_devices currently is used for driver to reset its device. I >>> thought one could very well extend its reach to reset pci express devices >>> at bus level. >>> >>> Having them separate is not going to be much useful from kdump >>> perspective. We will end up passing both reset_devices and >>> reset_pcie_devices to second kernel whill lead to bus level reset as well >>> as device level reset. >>> >>> Ideal situation would be that somehow detect that bus level reset has >> been >>> done and skip device level reset (assuming bus level reset obviates the >>> need of device level reset, please correct me if that's not the case). >>> >>> After pcie reset, can we store the state in a variable and drivers can >>> use that variable to check if PCIe level reset was done or not. If yes, >>> skip device level reset (Assuming driver knows that device is on a >>> PCIe slot). >>> >>> In that case we will not have to introduce new kernel command line, and >>> also avoid double reset? >> >> I found a problem when testing my patch on some machines. >> >> Originally there are two problems in kdump kernel when iommu is enabled; >> DMAR error and PCI SERR. I thought they are fixed by my patch, but I >> noticed that PCI SERR is still detected after applying the patch. It >> seems that something happens when Interrupt Remapping is initialized in >> kdump kernel. > > I'm not sure, but I guess the PCI SERR might be caused as follows. > > - The 1st kernel enables interrupt remapping. The MSI(-X) address and > data registers of PCI devices are programmed in remappable format. > > - At the beginning of 2nd kernel, interrupt remapping is still active. > And then it is disabled by enable_IR() function for initialization. > > - PCI device generate an interrupt. At this moment, interrupt remapping > is not enabled yet. On the other hand, MSI(-X) address and data of this > interrupt is in remappable format because those are programmed by 1st > kernel. I guess this might be a cause of PCI SERR. > > I guess clearing command register or disabling MSI before interrupt > remapping initialization in 2nd kernel might solve the PCI SERR problem. Thank you for your comment. That makes sense. Though I think clearing bus master bit is enough, do you think I need to clear all command register bit not only bus master? Thanks, Takao Indoh > Regards, > Kenji Kaneshige > > >> >> Therefore resetting devices has to be done before enable_IR() is >> called. I have three ideas for it. >> >> (i) Resetting devices in 1st kernel(panic kernel) >> We can reset devices before jumping into 2nd kernel. Of course it may >> be dangerous to scan pci device tree and call PCI functions in panic'd >> kernel. Beforehand we need to collect device information so that only >> minimal code could run on panic. >> >> (ii) Resetting devices in purgatory >> It seems to be be appropriate place to do this, but I'm not sure >> where I can save/restore PCI config when resetting devices in >> purgatory. >> >> (iii) Resetting devices in 2nd kernel(kdump kernel) >> Important point is to do reset before enable_IR() is called as I wrote >> above. I think I should add new function to do reset into >> arch/x86/pci/early.c and call it in setup_arch like >> early_dump_pci_devices() or early_quirks(). >> >> Any comments? >> >> Thanks, >> Takao Indoh >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 05, 2012 at 08:09:58PM +0900, Takao Indoh wrote: > (2012/08/07 5:39), Vivek Goyal wrote: > >On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: > >>Hi Vivek, > >> > >>(2012/08/03 20:46), Vivek Goyal wrote: > >>>On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: > >>>>Hi all, > >>>> > >>>>This patch adds kernel parameter "reset_pcie_devices" which resets PCIe > >>>>devices at boot time to address DMA problem on kdump with iommu. When > >>>>this parameter is specified, a hot reset is triggered on each PCIe root > >>>>port and downstream port to reset its downstream endpoint. > >>> > >>>Hi Takao, > >>> > >>>Why not use existing "reset_devices" parameter instead of introducing > >>>a new one? > >> > >>"reset_devices" is used for each driver to reset their own device, and > >>this patch resets all devices forcibly, so I thought they were different > >>things. > > > >Yes reset_devices currently is used for driver to reset its device. I > >thought one could very well extend its reach to reset pci express devices > >at bus level. > > > >Having them separate is not going to be much useful from kdump > >perspective. We will end up passing both reset_devices and > >reset_pcie_devices to second kernel whill lead to bus level reset as well > >as device level reset. > > > >Ideal situation would be that somehow detect that bus level reset has been > >done and skip device level reset (assuming bus level reset obviates the > >need of device level reset, please correct me if that's not the case). > > > >After pcie reset, can we store the state in a variable and drivers can > >use that variable to check if PCIe level reset was done or not. If yes, > >skip device level reset (Assuming driver knows that device is on a > >PCIe slot). > > > >In that case we will not have to introduce new kernel command line, and > >also avoid double reset? > I found a problem when testing my patch on some machines. > > Originally there are two problems in kdump kernel when iommu is enabled; > DMAR error and PCI SERR. I thought they are fixed by my patch, but I > noticed that PCI SERR is still detected after applying the patch. It > seems that something happens when Interrupt Remapping is initialized in > kdump kernel. > > Therefore resetting devices has to be done before enable_IR() is > called. I have three ideas for it. > > (i) Resetting devices in 1st kernel(panic kernel) > We can reset devices before jumping into 2nd kernel. Of course it may > be dangerous to scan pci device tree and call PCI functions in panic'd > kernel. Beforehand we need to collect device information so that only > minimal code could run on panic. > > (ii) Resetting devices in purgatory > It seems to be be appropriate place to do this, but I'm not sure > where I can save/restore PCI config when resetting devices in > purgatory. > > (iii) Resetting devices in 2nd kernel(kdump kernel) > Important point is to do reset before enable_IR() is called as I wrote > above. I think I should add new function to do reset into > arch/x86/pci/early.c and call it in setup_arch like > early_dump_pci_devices() or early_quirks(). I would not claim that I understand hte PCI SERR issue. But whatever resettings needs to happen, should happen early in second kernel. Doing it in first kernel is not a good idea as it is crashed kernel and we want to as little as possible. Doing it in purgatory is not a good idea either as purgatory does not konw anything about kernel as such. We don't want to bloat purgatory with reset code and embedding the device knowledge there. Keeping it in second kernel makes sense so that code remains with kernel and can be maintained there. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/10 23:36), Vivek Goyal wrote: > On Wed, Sep 05, 2012 at 08:09:58PM +0900, Takao Indoh wrote: >> (2012/08/07 5:39), Vivek Goyal wrote: >>> On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: >>>> Hi Vivek, >>>> >>>> (2012/08/03 20:46), Vivek Goyal wrote: >>>>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: >>>>>> Hi all, >>>>>> >>>>>> This patch adds kernel parameter "reset_pcie_devices" which resets PCIe >>>>>> devices at boot time to address DMA problem on kdump with iommu. When >>>>>> this parameter is specified, a hot reset is triggered on each PCIe root >>>>>> port and downstream port to reset its downstream endpoint. >>>>> >>>>> Hi Takao, >>>>> >>>>> Why not use existing "reset_devices" parameter instead of introducing >>>>> a new one? >>>> >>>> "reset_devices" is used for each driver to reset their own device, and >>>> this patch resets all devices forcibly, so I thought they were different >>>> things. >>> >>> Yes reset_devices currently is used for driver to reset its device. I >>> thought one could very well extend its reach to reset pci express devices >>> at bus level. >>> >>> Having them separate is not going to be much useful from kdump >>> perspective. We will end up passing both reset_devices and >>> reset_pcie_devices to second kernel whill lead to bus level reset as well >>> as device level reset. >>> >>> Ideal situation would be that somehow detect that bus level reset has been >>> done and skip device level reset (assuming bus level reset obviates the >>> need of device level reset, please correct me if that's not the case). >>> >>> After pcie reset, can we store the state in a variable and drivers can >>> use that variable to check if PCIe level reset was done or not. If yes, >>> skip device level reset (Assuming driver knows that device is on a >>> PCIe slot). >>> >>> In that case we will not have to introduce new kernel command line, and >>> also avoid double reset? >> I found a problem when testing my patch on some machines. >> >> Originally there are two problems in kdump kernel when iommu is enabled; >> DMAR error and PCI SERR. I thought they are fixed by my patch, but I >> noticed that PCI SERR is still detected after applying the patch. It >> seems that something happens when Interrupt Remapping is initialized in >> kdump kernel. >> >> Therefore resetting devices has to be done before enable_IR() is >> called. I have three ideas for it. >> >> (i) Resetting devices in 1st kernel(panic kernel) >> We can reset devices before jumping into 2nd kernel. Of course it may >> be dangerous to scan pci device tree and call PCI functions in panic'd >> kernel. Beforehand we need to collect device information so that only >> minimal code could run on panic. >> >> (ii) Resetting devices in purgatory >> It seems to be be appropriate place to do this, but I'm not sure >> where I can save/restore PCI config when resetting devices in >> purgatory. >> >> (iii) Resetting devices in 2nd kernel(kdump kernel) >> Important point is to do reset before enable_IR() is called as I wrote >> above. I think I should add new function to do reset into >> arch/x86/pci/early.c and call it in setup_arch like >> early_dump_pci_devices() or early_quirks(). > > I would not claim that I understand hte PCI SERR issue. But whatever > resettings needs to happen, should happen early in second kernel. > > Doing it in first kernel is not a good idea as it is crashed kernel and > we want to as little as possible. > > Doing it in purgatory is not a good idea either as purgatory does not > konw anything about kernel as such. We don't want to bloat purgatory > with reset code and embedding the device knowledge there. > > Keeping it in second kernel makes sense so that code remains with kernel > and can be maintained there. I'll post new patch which clears bus master bit and resets devices in second kernel. As to the boot parameter to enable this function, you suggested using reset_devices. I found that on a certain platform resetting devices caused PCIe error due to a hardware bug. Therefore I think we need new parameter apart from reset_devices to disable this function on such a machine. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Takao Indoh [mailto:indou.takao@jp.fujitsu.com] > Sent: Monday, September 10, 2012 3:35 PM > To: Kaneshige, Kenji/?? ?? > Cc: vgoyal@redhat.com; kexec@lists.infradead.org; > linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org; > bhelgaas@google.com; hbabu@us.ibm.com; Ishii, Hironobu/?? ??; > martin.wilck@ts.fujitsu.com > Subject: Re: [RFC][PATCH] Reset PCIe devices to address DMA problem on kdump > with iommu > > (2012/09/10 11:34), Kaneshige, Kenji wrote: > >> -----Original Message----- > >> From: linux-pci-owner@vger.kernel.org > >> [mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Takao Indoh > >> Sent: Wednesday, September 05, 2012 8:10 PM > >> To: vgoyal@redhat.com > >> Cc: kexec@lists.infradead.org; linux-kernel@vger.kernel.org; > >> linux-pci@vger.kernel.org; bhelgaas@google.com; hbabu@us.ibm.com; > Ishii, > >> Hironobu/?? ??; martin.wilck@ts.fujitsu.com > >> Subject: Re: [RFC][PATCH] Reset PCIe devices to address DMA problem on > kdump > >> with iommu > >> > >> (2012/08/07 5:39), Vivek Goyal wrote: > >>> On Mon, Aug 06, 2012 at 01:30:47PM +0900, Takao Indoh wrote: > >>>> Hi Vivek, > >>>> > >>>> (2012/08/03 20:46), Vivek Goyal wrote: > >>>>> On Fri, Aug 03, 2012 at 08:24:31PM +0900, Takao Indoh wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> This patch adds kernel parameter "reset_pcie_devices" which resets > >> PCIe > >>>>>> devices at boot time to address DMA problem on kdump with iommu. > When > >>>>>> this parameter is specified, a hot reset is triggered on each PCIe > >> root > >>>>>> port and downstream port to reset its downstream endpoint. > >>>>> > >>>>> Hi Takao, > >>>>> > >>>>> Why not use existing "reset_devices" parameter instead of introducing > >>>>> a new one? > >>>> > >>>> "reset_devices" is used for each driver to reset their own device, > and > >>>> this patch resets all devices forcibly, so I thought they were different > >>>> things. > >>> > >>> Yes reset_devices currently is used for driver to reset its device. > I > >>> thought one could very well extend its reach to reset pci express devices > >>> at bus level. > >>> > >>> Having them separate is not going to be much useful from kdump > >>> perspective. We will end up passing both reset_devices and > >>> reset_pcie_devices to second kernel whill lead to bus level reset as > well > >>> as device level reset. > >>> > >>> Ideal situation would be that somehow detect that bus level reset has > >> been > >>> done and skip device level reset (assuming bus level reset obviates > the > >>> need of device level reset, please correct me if that's not the case). > >>> > >>> After pcie reset, can we store the state in a variable and drivers can > >>> use that variable to check if PCIe level reset was done or not. If yes, > >>> skip device level reset (Assuming driver knows that device is on a > >>> PCIe slot). > >>> > >>> In that case we will not have to introduce new kernel command line, > and > >>> also avoid double reset? > >> > >> I found a problem when testing my patch on some machines. > >> > >> Originally there are two problems in kdump kernel when iommu is enabled; > >> DMAR error and PCI SERR. I thought they are fixed by my patch, but I > >> noticed that PCI SERR is still detected after applying the patch. It > >> seems that something happens when Interrupt Remapping is initialized > in > >> kdump kernel. > > > > I'm not sure, but I guess the PCI SERR might be caused as follows. > > > > - The 1st kernel enables interrupt remapping. The MSI(-X) address and > > data registers of PCI devices are programmed in remappable format. > > > > - At the beginning of 2nd kernel, interrupt remapping is still active. > > And then it is disabled by enable_IR() function for initialization. > > > > - PCI device generate an interrupt. At this moment, interrupt remapping > > is not enabled yet. On the other hand, MSI(-X) address and data of > this > > interrupt is in remappable format because those are programmed by 1st > > kernel. I guess this might be a cause of PCI SERR. > > > > I guess clearing command register or disabling MSI before interrupt > > remapping initialization in 2nd kernel might solve the PCI SERR problem. > > Thank you for your comment. That makes sense. > > Though I think clearing bus master bit is enough, do you think I need to > clear > all command register bit not only bus master? I thought clearing bus master bit. But we might also need to set the disable INTx bit. It would not be needed if we can reset devices before interrupt remapping initialization. I guess resetting devices before interrupt remapping initialization would need complex code change and be difficult to implement. If so, clearing bus master bit would be one of the alternative solution against PCI SERR issue. Regards, Kenji Kaneshige > > Thanks, > Takao Indoh > > > > Regards, > > Kenji Kaneshige > > > > > >> > >> Therefore resetting devices has to be done before enable_IR() is > >> called. I have three ideas for it. > >> > >> (i) Resetting devices in 1st kernel(panic kernel) > >> We can reset devices before jumping into 2nd kernel. Of course it > may > >> be dangerous to scan pci device tree and call PCI functions in panic'd > >> kernel. Beforehand we need to collect device information so that only > >> minimal code could run on panic. > >> > >> (ii) Resetting devices in purgatory > >> It seems to be be appropriate place to do this, but I'm not sure > >> where I can save/restore PCI config when resetting devices in > >> purgatory. > >> > >> (iii) Resetting devices in 2nd kernel(kdump kernel) > >> Important point is to do reset before enable_IR() is called as I wrote > >> above. I think I should add new function to do reset into > >> arch/x86/pci/early.c and call it in setup_arch like > >> early_dump_pci_devices() or early_quirks(). > >> > >> Any comments? > >> > >> Thanks, > >> Takao Indoh > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-pci" > in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 11, 2012 at 07:32:35PM +0900, Takao Indoh wrote: [..] > I'll post new patch which clears bus master bit and resets devices in > second kernel. > > As to the boot parameter to enable this function, you suggested using > reset_devices. I found that on a certain platform resetting devices > caused PCIe error due to a hardware bug. Therefore I think we need > new parameter apart from reset_devices to disable this function on > such a machine. Can you explain a bit more how the error happens. I still don't think that because of a bug in a platform somewhere we should be introducing a separate command line parameter and not reuse the exisiting one. Also you have not explained what's the bug and how a new parameter will avoid the bug. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/11 23:43), Vivek Goyal wrote: > On Tue, Sep 11, 2012 at 07:32:35PM +0900, Takao Indoh wrote: > > [..] >> I'll post new patch which clears bus master bit and resets devices in >> second kernel. >> >> As to the boot parameter to enable this function, you suggested using >> reset_devices. I found that on a certain platform resetting devices >> caused PCIe error due to a hardware bug. Therefore I think we need >> new parameter apart from reset_devices to disable this function on >> such a machine. > > Can you explain a bit more how the error happens. I still don't think > that because of a bug in a platform somewhere we should be introducing > a separate command line parameter and not reuse the exisiting one. Also > you have not explained what's the bug and how a new parameter will > avoid the bug. The bug I mentioned is that ACS Violation occurs at PCIe switch when reading PCI configuration after device reset. I got information that this violation is caused by PCIe switch bug. The machine becomes fatal status by this error. The reason why I try to introduce new parameter is that I want to avoid regression by this patch. Let's say this patch was included in kernel and its reset function was enabled by reset_devices as you said. AFAIK reset_devices is always needed for kdump, so it means that devices are always reset at kdump boot time. It causes a regression that system always becomes abnormal status when we run kdump on the machine which has a bug I mentioned. To avoid this regression, I want to separate reset_devices from this reset function. Or how about this? - if user specify reset_devices, devices are reset by this patch, as you said. - To avoid a regression I said, add new parameter like "pci=noreset". If this parameter is specified, the reset function I add is disabled and we can avoid regression. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 12, 2012 at 06:00:55PM +0900, Takao Indoh wrote: > (2012/09/11 23:43), Vivek Goyal wrote: > >On Tue, Sep 11, 2012 at 07:32:35PM +0900, Takao Indoh wrote: > > > >[..] > >>I'll post new patch which clears bus master bit and resets devices in > >>second kernel. > >> > >>As to the boot parameter to enable this function, you suggested using > >>reset_devices. I found that on a certain platform resetting devices > >>caused PCIe error due to a hardware bug. Therefore I think we need > >>new parameter apart from reset_devices to disable this function on > >>such a machine. > > > >Can you explain a bit more how the error happens. I still don't think > >that because of a bug in a platform somewhere we should be introducing > >a separate command line parameter and not reuse the exisiting one. Also > >you have not explained what's the bug and how a new parameter will > >avoid the bug. > > The bug I mentioned is that ACS Violation occurs at PCIe switch when > reading PCI configuration after device reset. I got information that > this violation is caused by PCIe switch bug. The machine becomes fatal > status by this error. > > The reason why I try to introduce new parameter is that I want to avoid > regression by this patch. Let's say this patch was included in kernel > and its reset function was enabled by reset_devices as you said. AFAIK > reset_devices is always needed for kdump, so it means that devices are > always reset at kdump boot time. It causes a regression that system > always becomes abnormal status when we run kdump on the machine which has > a bug I mentioned. > > To avoid this regression, I want to separate reset_devices from this > reset function. Or how about this? > - if user specify reset_devices, devices are reset by this patch, as you > said. > - To avoid a regression I said, add new parameter like "pci=noreset". > If this parameter is specified, the reset function I add is disabled > and we can avoid regression. Can we identify that particular switch in code and not reset it in code. Introducing new paramenters to avoid bugs really feels odd. Also, what was the conclusion to avoid double reset. I am assuming that we don't want to do bus level reset as well as driver level reset based on reset_devices. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> As to the boot parameter to enable this function, you suggested using > reset_devices. I found that on a certain platform resetting devices > caused PCIe error due to a hardware bug. Therefore I think we need > new parameter apart from reset_devices to disable this function on > such a machine. Wouldn't a DMI quirk be better for this? That way the second kernel internally would know not to do this. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/15 5:03), Konrad Rzeszutek Wilk wrote: >> As to the boot parameter to enable this function, you suggested using >> reset_devices. I found that on a certain platform resetting devices >> caused PCIe error due to a hardware bug. Therefore I think we need >> new parameter apart from reset_devices to disable this function on >> such a machine. > > Wouldn't a DMI quirk be better for this? That way the second kernel > internally would know not to do this. That sounds good. I'll try it, thanks. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/14/2012 04:03 PM, Konrad Rzeszutek Wilk wrote: >> As to the boot parameter to enable this function, you suggested using >> reset_devices. I found that on a certain platform resetting devices >> caused PCIe error due to a hardware bug. Therefore I think we need >> new parameter apart from reset_devices to disable this function on >> such a machine. > > Wouldn't a DMI quirk be better for this? That way the second kernel > internally would know not to do this. > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html a DMI quirk for a PCIe switch? pci-quirk maybe? there is a hook in per-device pci reset code path, which could be setup for such a case. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/22 2:57), Don Dutile wrote: > On 09/14/2012 04:03 PM, Konrad Rzeszutek Wilk wrote: >>> As to the boot parameter to enable this function, you suggested using >>> reset_devices. I found that on a certain platform resetting devices >>> caused PCIe error due to a hardware bug. Therefore I think we need >>> new parameter apart from reset_devices to disable this function on >>> such a machine. >> >> Wouldn't a DMI quirk be better for this? That way the second kernel >> internally would know not to do this. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > a DMI quirk for a PCIe switch? pci-quirk maybe? there is a hook > in per-device pci reset code path, which could be setup for such > a case. Yep, I think early-quirk is better. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(2012/09/15 0:48), Vivek Goyal wrote: > On Wed, Sep 12, 2012 at 06:00:55PM +0900, Takao Indoh wrote: >> (2012/09/11 23:43), Vivek Goyal wrote: >>> On Tue, Sep 11, 2012 at 07:32:35PM +0900, Takao Indoh wrote: >>> >>> [..] >>>> I'll post new patch which clears bus master bit and resets devices in >>>> second kernel. >>>> >>>> As to the boot parameter to enable this function, you suggested using >>>> reset_devices. I found that on a certain platform resetting devices >>>> caused PCIe error due to a hardware bug. Therefore I think we need >>>> new parameter apart from reset_devices to disable this function on >>>> such a machine. >>> >>> Can you explain a bit more how the error happens. I still don't think >>> that because of a bug in a platform somewhere we should be introducing >>> a separate command line parameter and not reuse the exisiting one. Also >>> you have not explained what's the bug and how a new parameter will >>> avoid the bug. >> >> The bug I mentioned is that ACS Violation occurs at PCIe switch when >> reading PCI configuration after device reset. I got information that >> this violation is caused by PCIe switch bug. The machine becomes fatal >> status by this error. >> >> The reason why I try to introduce new parameter is that I want to avoid >> regression by this patch. Let's say this patch was included in kernel >> and its reset function was enabled by reset_devices as you said. AFAIK >> reset_devices is always needed for kdump, so it means that devices are >> always reset at kdump boot time. It causes a regression that system >> always becomes abnormal status when we run kdump on the machine which has >> a bug I mentioned. >> >> To avoid this regression, I want to separate reset_devices from this >> reset function. Or how about this? >> - if user specify reset_devices, devices are reset by this patch, as you >> said. >> - To avoid a regression I said, add new parameter like "pci=noreset". >> If this parameter is specified, the reset function I add is disabled >> and we can avoid regression. > > Can we identify that particular switch in code and not reset it in code. > Introducing new paramenters to avoid bugs really feels odd. Maybe we can do it using PCI quirk or DMI quirk as Konrad and Don said. But I'm still thinking whether I can add boot parameter or something to disable reset function so that we can use it as workaround untill we add quirk when we find such a buggy hardware. > Also, what was the conclusion to avoid double reset. I am assuming that > we don't want to do bus level reset as well as driver level reset based > on reset_devices. I don't have a good idea to do it yet. Maybe we need to write bus:slot:func information to somewhere when we reset device. Thanks, Takao Indoh -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e714a02..e694e9f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2489,6 +2489,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. reset_devices [KNL] Force drivers to reset the underlying device during initialization. + reset_pcie_devices + [PCIE] Reset PCIe endpoint at boot time by sending a + hot reset to root port and downstream port + resume= [SWSUSP] Specify the partition device for software suspend Format: diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 5155317..7f7fc02 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -32,6 +32,65 @@ #include "pci.h" /* + * Reset PCIe endpoint by sending a hot reset to root port and downstream port + */ +unsigned int reset_pcie_devices; +EXPORT_SYMBOL(reset_pcie_devices); +static int __init set_reset_pcie_devices(char *str) +{ + reset_pcie_devices = 1; + return 1; +} +__setup("reset_pcie_devices", set_reset_pcie_devices); + +static void __devinit quirk_pcie_device_reset(struct pci_dev *dev) +{ + struct pci_bus *subordinate; + struct pci_dev *child; + u16 ctrl; + + if (!reset_pcie_devices || !pci_is_pcie(dev) || !dev->subordinate || + ((dev->pcie_type != PCI_EXP_TYPE_ROOT_PORT) && + (dev->pcie_type != PCI_EXP_TYPE_DOWNSTREAM))) + return; + + subordinate = dev->subordinate; + list_for_each_entry(child, &subordinate->devices, bus_list) { + if ((child->pcie_type == PCI_EXP_TYPE_UPSTREAM) || + (child->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) || + ((child->class >> 16) == PCI_BASE_CLASS_DISPLAY)) + /* Don't reset switch, bridge, VGA device */ + return; + } + + dev_info(&dev->dev, "Reset Secondary bus\n"); + + list_for_each_entry(child, &subordinate->devices, bus_list) { + dev_info(&child->dev, "save state\n"); + pci_save_state(child); + } + + /* Assert Secondary Bus Reset */ + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl); + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); + + msleep(2); + + /* De-assert Secondary Bus Reset */ + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl); + + msleep(200); + + list_for_each_entry(child, &subordinate->devices, bus_list) { + dev_info(&child->dev, "restore state\n"); + pci_restore_state(child); + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_pcie_device_reset); + +/* * Decoding should be disabled for a PCI device during BAR sizing to avoid * conflict. But doing so may cause problems on host bridge and perhaps other * key system devices. For devices that need to have mmio decoding always-on,
Hi all, This patch adds kernel parameter "reset_pcie_devices" which resets PCIe devices at boot time to address DMA problem on kdump with iommu. When this parameter is specified, a hot reset is triggered on each PCIe root port and downstream port to reset its downstream endpoint. Background: A kdump problem about DMA has been discussed for a long time. That is, when a kernel is switched to the kdump kernel DMA derived from first kernel affects second kernel. Recently this problem surfaces when iommu is used for PCI passthrough on KVM guest. In the case of the machine I use, when intel_iommu=on is specified, DMAR error is detected in kdump kernel and PCI SERR is also detected. Finally kdump fails because some devices does not work correctly. The root cause is that ongoing DMA from first kernel causes DMAR fault because page table of DMAR is initialized while kdump kernel is booting up. Therefore to address this problem DMA needs to be stopped before DMAR is initialized at kdump kernel boot time. By this patch, PCIe devices are reset by hot reset and its DMA is stopped when reset_pcie_devices is specified. One problem of this solution is that VGA is reset and the monitor blacks out when the link between the port and VGA controller was reset. So this patch does not reset the port whose child endpoint is VGA device. Any comments would be appreciated. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> --- Documentation/kernel-parameters.txt | 4 + drivers/pci/quirks.c | 59 ++++++++++++++++++++++++++ 2 files changed, 63 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html