Message ID | 34355440-197d-44c7-b27d-535267d1161c@codeaurora.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Friday 30 September 2016, Sinan Kaya wrote: > On 9/29/2016 2:00 PM, Ondrej Zary wrote: > >> The previous two patches were in the right direction. > >> > >> > Can we also get the same output from 4.6 kernel with the attached > >> > patch for the same machine you sent these? > > > > Here it is. > > > >> > Something about SCI still doesn't feel right. > >> > > >> > The IRQ assignment fails if the penalty is greater than > >> > PIRQ_PENALTY_ISA_ALWAYS. This will happen if BIOS tells us to use an > >> > IRQ and same IRQ is in use by the SCI. > > Thanks, I reverted penalize_sci function and dropped patch #1. Can you try > this again? It seems to work, at least on one machine.
On 2016-09-30 02:44, Ondrej Zary wrote: > On Friday 30 September 2016, Sinan Kaya wrote: >> On 9/29/2016 2:00 PM, Ondrej Zary wrote: >> >> The previous two patches were in the right direction. >> >> >> >> > Can we also get the same output from 4.6 kernel with the attached >> >> > patch for the same machine you sent these? >> > >> > Here it is. >> > >> >> > Something about SCI still doesn't feel right. >> >> > >> >> > The IRQ assignment fails if the penalty is greater than >> >> > PIRQ_PENALTY_ISA_ALWAYS. This will happen if BIOS tells us to use an >> >> > IRQ and same IRQ is in use by the SCI. >> >> Thanks, I reverted penalize_sci function and dropped patch #1. Can you >> try >> this again? > > It seems to work, at least on one machine. Ok, that comfirms my suspicion. We are having trouble detecting sci interrupt type and we end up penalizing the wrong value. Can you try your other machines too? I need to do some research now. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 30 September 2016 15:14:42 okaya@codeaurora.org wrote: > On 2016-09-30 02:44, Ondrej Zary wrote: > > On Friday 30 September 2016, Sinan Kaya wrote: > >> On 9/29/2016 2:00 PM, Ondrej Zary wrote: > >> >> The previous two patches were in the right direction. > >> >> > >> >> > Can we also get the same output from 4.6 kernel with the attached > >> >> > patch for the same machine you sent these? > >> > > >> > Here it is. > >> > > >> >> > Something about SCI still doesn't feel right. > >> >> > > >> >> > The IRQ assignment fails if the penalty is greater than > >> >> > PIRQ_PENALTY_ISA_ALWAYS. This will happen if BIOS tells us to use > >> >> > an IRQ and same IRQ is in use by the SCI. > >> > >> Thanks, I reverted penalize_sci function and dropped patch #1. Can you > >> try > >> this again? > > > > It seems to work, at least on one machine. > > Ok, that comfirms my suspicion. We are having trouble detecting sci > interrupt type and we end up penalizing the wrong value. > > Can you try your other machines too? Works on the 2nd one too. > I need to do some research now.
Rafael, Bjorn; On 9/30/2016 11:56 AM, Ondrej Zary wrote: >>> It seems to work, at least on one machine. >> > >> > Ok, that comfirms my suspicion. We are having trouble detecting sci >> > interrupt type and we end up penalizing the wrong value. >> > >> > Can you try your other machines too? > Works on the 2nd one too. > >> > I need to do some research now. Thanks for the confirmation. I need to gather some opinion from Rafael and Bjorn. The patch provided only handles SCI in the ISA IRQ range (<16) I looked at the ACPI spec. SCI interrupt is a two byte field: "SCI_INT byte 2 offset 46 System vector the SCI interrupt is wired to in 8259 mode. On systems that do not contain the 8259, this field contains the Global System interrupt number of the SCI interrupt. OSPM is required to treat the ACPI SCI interrupt as a sharable, level, active low interrupt." Since we have a race condition between the time the IRQ is registered via the ISA API and the time it gets to the interrupt driver since irq_get_type function is not matching what SCI API indicated. how do we feel about increasing the ISA IRQ range to 256 so that we are safe for all SCI interrupts? Do you have any other recommendation?
On Fri, Sep 30, 2016 at 9:30 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > Rafael, Bjorn; > > On 9/30/2016 11:56 AM, Ondrej Zary wrote: >>>> It seems to work, at least on one machine. >>> > >>> > Ok, that comfirms my suspicion. We are having trouble detecting sci >>> > interrupt type and we end up penalizing the wrong value. >>> > >>> > Can you try your other machines too? >> Works on the 2nd one too. >> >>> > I need to do some research now. > > Thanks for the confirmation. I need to gather some opinion from Rafael and Bjorn. > > The patch provided only handles SCI in the ISA IRQ range (<16) > > I looked at the ACPI spec. SCI interrupt is a two byte field: > > "SCI_INT byte 2 offset 46 > > System vector the SCI interrupt is wired to in 8259 mode. On > systems that do not contain the 8259, this field contains the > Global System interrupt number of the SCI interrupt. OSPM is > required to treat the ACPI SCI interrupt as a sharable, level, > active low interrupt." > > Since we have a race condition between the time the IRQ is registered > via the ISA API and the time it gets to the interrupt driver > since irq_get_type function is not matching what SCI API indicated. > > how do we feel about increasing the ISA IRQ range to 256 so that > we are safe for all SCI interrupts? I'm not sure how this is related to the problem at hand. Care to elaborate? And why exactly was that race condition not present before your changes? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/30/2016 3:39 PM, Rafael J. Wysocki wrote: >> how do we feel about increasing the ISA IRQ range to 256 so that >> > we are safe for all SCI interrupts? > I'm not sure how this is related to the problem at hand. Care to elaborate? > Sure, let me explain. The code maintains a static array per IRQ number (acpi_irq_penalty) to assign penalties for each IRQ so that during the Link Object allocation, the code tries to choose an IRQ with less penalty that has not been used by another PCI/IRQ if possible. In the kernels prior to my change, the type of the SCI interrupt is given to the ACPI PCI Link driver via this function. +void acpi_penalize_sci_irq(int irq, int trigger, int polarity) +{ + if (irq >= 0 && irq < ARRAY_SIZE(acpi_irq_penalty)) { + if (trigger != ACPI_MADT_TRIGGER_LEVEL || + polarity != ACPI_MADT_POLARITY_ACTIVE_LOW) + acpi_irq_penalty[irq] += PIRQ_PENALTY_ISA_ALWAYS; + else + acpi_irq_penalty[irq] += PIRQ_PENALTY_PCI_USING; + } +} This function says given the trigger and polarity, if it is ACTIVE_LOW and LEVEL then increment the penalty by PCI_USING. It also does some ACPI parameter checking and assigns ISA_ALWAYS if the parameters are not compatible so that this IRQ is never used for PCI. ISA_AKWAYS is the maximum penalty that can be assigned. > And why exactly was that race condition not present before your changes? The size of acpi_irq_penalty array used to be 256. This number puts an artificial limit on the IRQ numbers that can be assigned to PCI. ACPI spec does not put any limits on the PCI numbers and extended IRQ resources can be used for this purpose. During my patch, three things happened: 1. acpi_irq_penalty got renamed to acpi_isa_irq_penalty and the array size was reduced to 16 from 256. 2. Since we have no idea about the range of PCI interrupts that can be assigned through ACPI, we tried to refactor the code so that PCI interrupts are calculated by the time API is called by walking the PCI Link list here. static int acpi_irq_pci_sharing_penalty(int irq) We don't have to preallocate a fixed size array this way and can support any PCI numbers that we want. While refactoring the code, we said we could take the next step and get rid of the acpi_penalize_sci_irq function and calculate the penalty dynamically similar to acpi_irq_pci_sharing_penalty function. The new way to determine if SCI interrupt parameters are compatible is as follows: /* * Penalize IRQ used by ACPI SCI. If ACPI SCI pin attributes conflict * with PCI IRQ attributes, mark ACPI SCI as ISA_ALWAYS so it won't be * use for PCI IRQs. */ if (irq == acpi_gbl_FADT.sci_interrupt) { u32 type = irq_get_trigger_type(irq) & IRQ_TYPE_SENSE_MASK; if (type != IRQ_TYPE_LEVEL_LOW) penalty += PIRQ_PENALTY_ISA_ALWAYS; else penalty += PIRQ_PENALTY_PCI_USING; } This relies on irq_get_trigger_type function to return the correct type. I now remember that Bjorn indicated the race condition possibility in this thread here. https://lkml.org/lkml/2016/3/8/640 > > > This looks racy, because we test irq_get_trigger_type() without any > > > kind of locking, and later acpi_register_gsi() calls > > > irq_create_fwspec_mapping(), which looks like it sets the new trigger > > > type. But I don't know how to fix that. > > > > Right, if that pci link allocation code can be executed concurrent, then you > > might end up with problem, but isn't that a problem even without > > irq_get_trigger_type()? > Yes. It's not a new problem, I just noticed it since we're thinking > more about the details of what's happening here. and this is exactly what we have hit in this thread. The value irq_get_trigger_type(irq) returns does not match what was given with acpi_penalize_sci_irq and we end up penalizing the PCI IRQ for no reason. Since the issue is specific to SCI interrupts and SCI interrupts cannot be greater than 256, my proposal is 1. restore the code for the SCI penalty like in my last email 2. increase the array size back to 256 3. use the acpi_irq_pci_sharing_penalty only if IRQ number is greater than 256 and we know that IRQ numbers are not bounded and can go all the way up to 65535. I hope it makes sense now. I tend to skip details sometimes. Feel free to send more questions.
On Fri, Sep 30, 2016 at 10:24 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > On 9/30/2016 3:39 PM, Rafael J. Wysocki wrote: >>> how do we feel about increasing the ISA IRQ range to 256 so that >>> > we are safe for all SCI interrupts? >> I'm not sure how this is related to the problem at hand. Care to elaborate? >> > > Sure, let me explain. > [cut] > > I hope it makes sense now. I tend to skip details sometimes. Feel free to > send more questions. Thanks for the information! IIUC, basically, what you are proposing would be to restore the old penalizing method for IRQs in the 0-255 range and use the new approach for the rest, right? What's the drawback, if any? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/30/2016 5:04 PM, Rafael J. Wysocki wrote: >> > >> > I hope it makes sense now. I tend to skip details sometimes. Feel free to >> > send more questions. > Thanks for the information! > > IIUC, basically, what you are proposing would be to restore the old > penalizing method for IRQs in the 0-255 range and use the new approach > for the rest, right? Correct. > > What's the drawback, if any? I don't see any drawback to be honest. The reason why I got rid of these ISA API functions was to remove some of the x86ism from ACPI code. This was Bjorn's request to clean it up and we failed in two places so far. 1. acpi_irq_penalty_init 2. acpi_penalize_sci_irq and we ended up reverting both of these changes. The reverts are all because of the fact that these APIs are called asynchronously without any coordination with the PCI Link object or the interrupt controller driver about when it is a good time to be called. During this debug, I learnt that acpi_penalize_isa_irq gets called before even ACPI gets to initialize and relies on static IRQ array for keeping the penalties. > > Thanks, > Rafael >
On Fri, Sep 30, 2016 at 11:14 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > On 9/30/2016 5:04 PM, Rafael J. Wysocki wrote: >>> > >>> > I hope it makes sense now. I tend to skip details sometimes. Feel free to >>> > send more questions. >> Thanks for the information! >> >> IIUC, basically, what you are proposing would be to restore the old >> penalizing method for IRQs in the 0-255 range and use the new approach >> for the rest, right? > > Correct. > >> >> What's the drawback, if any? > > I don't see any drawback to be honest. I'd go for it then, if Bjorn doesn't hate it. > The reason why I got rid of these ISA > API functions was to remove some of the x86ism from ACPI code. This was Bjorn's > request to clean it up and we failed in two places so far. > > 1. acpi_irq_penalty_init > 2. acpi_penalize_sci_irq > > and we ended up reverting both of these changes. The reverts are all because of the > fact that these APIs are called asynchronously without any coordination with the > PCI Link object or the interrupt controller driver about when it is a good time to be > called. > > During this debug, I learnt that acpi_penalize_isa_irq gets called before even ACPI > gets to initialize and relies on static IRQ array for keeping the penalties. So maybe add some comments to that code to explain why things are arranged the way they are. We may need/want to revisit it at one point and such comments will be very useful then. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/30/2016 5:27 PM, Rafael J. Wysocki wrote: > On Fri, Sep 30, 2016 at 11:14 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> On 9/30/2016 5:04 PM, Rafael J. Wysocki wrote: >>>>> >>>>> I hope it makes sense now. I tend to skip details sometimes. Feel free to >>>>> send more questions. >>> Thanks for the information! >>> >>> IIUC, basically, what you are proposing would be to restore the old >>> penalizing method for IRQs in the 0-255 range and use the new approach >>> for the rest, right? >> >> Correct. >> >>> >>> What's the drawback, if any? >> >> I don't see any drawback to be honest. > > I'd go for it then, if Bjorn doesn't hate it. OK. I'll prep something. Bjorn? > >> The reason why I got rid of these ISA >> API functions was to remove some of the x86ism from ACPI code. This was Bjorn's >> request to clean it up and we failed in two places so far. >> >> 1. acpi_irq_penalty_init >> 2. acpi_penalize_sci_irq >> >> and we ended up reverting both of these changes. The reverts are all because of the >> fact that these APIs are called asynchronously without any coordination with the >> PCI Link object or the interrupt controller driver about when it is a good time to be >> called. >> >> During this debug, I learnt that acpi_penalize_isa_irq gets called before even ACPI >> gets to initialize and relies on static IRQ array for keeping the penalties. > > So maybe add some comments to that code to explain why things are > arranged the way they are. We may need/want to revisit it at one > point and such comments will be very useful then. Yeah, I think this code definitely needs some documentation. It ended up being one of the most fragile places in the kernel. This is my 3rd attempt to rearrange the code. > > Thanks, > Rafael > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On 9/30/2016 5:27 PM, Rafael J. Wysocki wrote: >>> >> What's the drawback, if any? >> > >> > I don't see any drawback to be honest. > I'd go for it then, if Bjorn doesn't hate it. > I posted a follow up patch a minute ago. [PATCH 1/3] Revert "ACPI,PCI,IRQ: reduce static IRQ array size to 16" [PATCH 2/3] ACPI, PCI IRQ: add PCI_USING penalty for ISA interrupts [PATCH 3/3] Revert "ACPI,PCI,IRQ: remove SCI penalize function" Can we have some testing coverage? and eventually have tested-by?
On Saturday 01 October 2016 19:49:17 Sinan Kaya wrote: > On 9/30/2016 5:27 PM, Rafael J. Wysocki wrote: > >>> >> What's the drawback, if any? > >> > > >> > I don't see any drawback to be honest. > > > > I'd go for it then, if Bjorn doesn't hate it. > > I posted a follow up patch a minute ago. > > [PATCH 1/3] Revert "ACPI,PCI,IRQ: reduce static IRQ array size to 16" > [PATCH 2/3] ACPI, PCI IRQ: add PCI_USING penalty for ISA interrupts > [PATCH 3/3] Revert "ACPI,PCI,IRQ: remove SCI penalize function" > > Can we have some testing coverage? and eventually have tested-by? Works on two affected machines. More tests tomorrow.
On 10/2/2016 12:53 PM, Ondrej Zary wrote: >> Can we have some testing coverage? and eventually have tested-by? > Works on two affected machines. More tests tomorrow. Thanks, appreciate the feedback. Looking forward to hear your other test results.
On Monday 03 October 2016, Sinan Kaya wrote: > On 10/2/2016 12:53 PM, Ondrej Zary wrote: > >> Can we have some testing coverage? and eventually have tested-by? > > > > Works on two affected machines. More tests tomorrow. > > Thanks, appreciate the feedback. Looking forward to hear your other test > results. The other two affected machines are working too.
Hi Ondrej, On 10/3/2016 3:25 AM, Ondrej Zary wrote: > On Monday 03 October 2016, Sinan Kaya wrote: >> On 10/2/2016 12:53 PM, Ondrej Zary wrote: >>>> Can we have some testing coverage? and eventually have tested-by? >>> >>> Works on two affected machines. More tests tomorrow. >> >> Thanks, appreciate the feedback. Looking forward to hear your other test >> results. > > The other two affected machines are working too. > Can I have your "tested by" before I repost the new version with updated commit message? Sinan
On Saturday 01 October 2016 19:49:17 Sinan Kaya wrote: > On 9/30/2016 5:27 PM, Rafael J. Wysocki wrote: > >>> >> What's the drawback, if any? > >> > > >> > I don't see any drawback to be honest. > > > > I'd go for it then, if Bjorn doesn't hate it. > > I posted a follow up patch a minute ago. > > [PATCH 1/3] Revert "ACPI,PCI,IRQ: reduce static IRQ array size to 16" > [PATCH 2/3] ACPI, PCI IRQ: add PCI_USING penalty for ISA interrupts > [PATCH 3/3] Revert "ACPI,PCI,IRQ: remove SCI penalize function" > > Can we have some testing coverage? and eventually have tested-by? Fixes the problem on all 4 machines. These patches also need to go into 4.7-stable and 4.8-stable. Tested-by: Ondrej Zary <linux@rainbow-software.org>
diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c index c983bf7..a212709 100644 --- a/drivers/acpi/pci_link.c +++ b/drivers/acpi/pci_link.c @@ -619,6 +619,10 @@ static int acpi_pci_link_allocate(struct acpi_pci_link *link) acpi_device_bid(link->device)); return -ENODEV; } else { + if (link->irq.active < ACPI_MAX_ISA_IRQS) + acpi_isa_irq_penalty[link->irq.active] += + PIRQ_PENALTY_PCI_USING; + printk(KERN_WARNING PREFIX "%s [%s] enabled at IRQ %d\n", acpi_device_name(link->device), acpi_device_bid(link->device), link->irq.active);