Message ID | 1458224359-32665-5-git-send-email-jonathanh@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 17 Mar 2016, Jon Hunter wrote: > Setting the interrupt type for private peripheral interrupts (PPIs) may > not be supported by a given GIC because it is IMPLEMENTATION DEFINED > whether this is allowed. There is no way to know if setting the type is > supported for a given GIC and so the value written is read back to > verify it matches the desired configuration. If it does not match then > an error is return. > > There are cases where the interrupt configuration read from firmware > (such as a device-tree blob), has been incorrect and hence > gic_configure_irq() has returned an error. This error has gone > undetected because the error code returned was ignored but the interrupt > still worked fine because the configuration for the interrupt could not > be overwritten. > > Given that this has done undetected and we should only fail to set the > type for PPIs whose configuration cannot be changed anyway, don't return > an error and simply WARN if this fails. This will allows us to fix up any > places in the kernel where we should be checking the return status and > maintain back compatibility with firmware images that may have incorrect > interrupt configurations. Though silently returning 0 is really the wrong thing to do. You can add the warn, but why do you want to return success? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/03/16 14:51, Thomas Gleixner wrote: > On Thu, 17 Mar 2016, Jon Hunter wrote: > >> Setting the interrupt type for private peripheral interrupts (PPIs) may >> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >> whether this is allowed. There is no way to know if setting the type is >> supported for a given GIC and so the value written is read back to >> verify it matches the desired configuration. If it does not match then >> an error is return. >> >> There are cases where the interrupt configuration read from firmware >> (such as a device-tree blob), has been incorrect and hence >> gic_configure_irq() has returned an error. This error has gone >> undetected because the error code returned was ignored but the interrupt >> still worked fine because the configuration for the interrupt could not >> be overwritten. >> >> Given that this has done undetected and we should only fail to set the >> type for PPIs whose configuration cannot be changed anyway, don't return >> an error and simply WARN if this fails. This will allows us to fix up any >> places in the kernel where we should be checking the return status and >> maintain back compatibility with firmware images that may have incorrect >> interrupt configurations. > > Though silently returning 0 is really the wrong thing to do. You can add the > warn, but why do you want to return success? Yes that would be the correct thing to do I agree. However, the problem is that if we do this, then after the patch "irqdomain: Don't set type when mapping an IRQ" is applied, we may break interrupts for some existing device-tree binaries that have bad configuration (such as omap4 and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it is a back compatibility issue. If you are wondering why these interrupts break after "irqdomain: Don't set type when mapping an IRQ", it is because today irq_create_fwspec_mapping() does not check the return code from setting the type, but if we defer setting the type until __setup_irq() which does check the return code, then all of a sudden interrupts that were working (even with bad configurations) start to fail. The reason why I opted not to return an error code from gic_configure_irq() is it really can't fail. The failure being reported does not prevent the interrupt from working, but tells you your configuration does not match the hardware setting which you cannot overwrite. So to maintain back compatibility and avoid any silent errors, I opted to make it a WARN and not return an error. If people are ok with potentially breaking interrupts for device-tree binaries with bad settings, then I am ok to return an error here. Cheers Jon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote: > > On 17/03/16 14:51, Thomas Gleixner wrote: > > On Thu, 17 Mar 2016, Jon Hunter wrote: > > > >> Setting the interrupt type for private peripheral interrupts (PPIs) may > >> not be supported by a given GIC because it is IMPLEMENTATION DEFINED > >> whether this is allowed. There is no way to know if setting the type is > >> supported for a given GIC and so the value written is read back to > >> verify it matches the desired configuration. If it does not match then > >> an error is return. > >> > >> There are cases where the interrupt configuration read from firmware > >> (such as a device-tree blob), has been incorrect and hence > >> gic_configure_irq() has returned an error. This error has gone > >> undetected because the error code returned was ignored but the interrupt > >> still worked fine because the configuration for the interrupt could not > >> be overwritten. > >> > >> Given that this has done undetected and we should only fail to set the > >> type for PPIs whose configuration cannot be changed anyway, don't return > >> an error and simply WARN if this fails. This will allows us to fix up any > >> places in the kernel where we should be checking the return status and > >> maintain back compatibility with firmware images that may have incorrect > >> interrupt configurations. > > > > Though silently returning 0 is really the wrong thing to do. You can add the > > warn, but why do you want to return success? > > Yes that would be the correct thing to do I agree. However, the problem > is that if we do this, then after the patch "irqdomain: Don't set type > when mapping an IRQ" is applied, we may break interrupts for some > existing device-tree binaries that have bad configuration (such as omap4 > and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it > is a back compatibility issue. This sounds like a textbook case for adding a boolean dt property. If "can-set-ppi-type" is absent (old DT blobs and new blobs without the ability), warn and return zero. If it's present, the driver can set the type, returning errors as encountered. thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/03/16 15:18, Jason Cooper wrote: > On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote: >> >> On 17/03/16 14:51, Thomas Gleixner wrote: >>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>> >>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>> whether this is allowed. There is no way to know if setting the type is >>>> supported for a given GIC and so the value written is read back to >>>> verify it matches the desired configuration. If it does not match then >>>> an error is return. >>>> >>>> There are cases where the interrupt configuration read from firmware >>>> (such as a device-tree blob), has been incorrect and hence >>>> gic_configure_irq() has returned an error. This error has gone >>>> undetected because the error code returned was ignored but the interrupt >>>> still worked fine because the configuration for the interrupt could not >>>> be overwritten. >>>> >>>> Given that this has done undetected and we should only fail to set the >>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>> an error and simply WARN if this fails. This will allows us to fix up any >>>> places in the kernel where we should be checking the return status and >>>> maintain back compatibility with firmware images that may have incorrect >>>> interrupt configurations. >>> >>> Though silently returning 0 is really the wrong thing to do. You can add the >>> warn, but why do you want to return success? >> >> Yes that would be the correct thing to do I agree. However, the problem >> is that if we do this, then after the patch "irqdomain: Don't set type >> when mapping an IRQ" is applied, we may break interrupts for some >> existing device-tree binaries that have bad configuration (such as omap4 >> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >> is a back compatibility issue. > > This sounds like a textbook case for adding a boolean dt property. If > "can-set-ppi-type" is absent (old DT blobs and new blobs without the > ability), warn and return zero. If it's present, the driver can set the > type, returning errors as encountered. True. However, if we did have this "can-set-ppi-type" property set for a device, it really should never fail (unless someone specified it incorrectly). So I am trying to understand the value in adding a new DT property. Please note that gic_configure_irq() never used to return an error and only when adding support for setting the type of PPIs was this added. However, given that this has gone unnoticed and does not have a real functional impact on the device behaviour, I wonder now if this function should return an error? Yes, ideally, it should, but does it still make sense? Cheers Jon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote: > On 17/03/16 15:18, Jason Cooper wrote: >> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote: >>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>> whether this is allowed. There is no way to know if setting the type is >>>>> supported for a given GIC and so the value written is read back to >>>>> verify it matches the desired configuration. If it does not match then >>>>> an error is return. >>>>> >>>>> There are cases where the interrupt configuration read from firmware >>>>> (such as a device-tree blob), has been incorrect and hence >>>>> gic_configure_irq() has returned an error. This error has gone >>>>> undetected because the error code returned was ignored but the interrupt >>>>> still worked fine because the configuration for the interrupt could not >>>>> be overwritten. >>>>> >>>>> Given that this has done undetected and we should only fail to set the >>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>> places in the kernel where we should be checking the return status and >>>>> maintain back compatibility with firmware images that may have incorrect >>>>> interrupt configurations. >>>> >>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>> warn, but why do you want to return success? >>> >>> Yes that would be the correct thing to do I agree. However, the problem >>> is that if we do this, then after the patch "irqdomain: Don't set type >>> when mapping an IRQ" is applied, we may break interrupts for some >>> existing device-tree binaries that have bad configuration (such as omap4 >>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>> is a back compatibility issue. Indeed (also for sh73a0 and r8a7779). >> This sounds like a textbook case for adding a boolean dt property. If >> "can-set-ppi-type" is absent (old DT blobs and new blobs without the >> ability), warn and return zero. If it's present, the driver can set the >> type, returning errors as encountered. > > True. However, if we did have this "can-set-ppi-type" property set for a > device, it really should never fail (unless someone specified it > incorrectly). So I am trying to understand the value in adding a new DT > property. Do we really want to add properties that basically indicate that a description in DT is correct? Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then fix TWD). > Please note that gic_configure_irq() never used to return an error and > only when adding support for setting the type of PPIs was this added. > However, given that this has gone unnoticed and does not have a real > functional impact on the device behaviour, I wonder now if this function > should return an error? Yes, ideally, it should, but does it still make > sense? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18/03/16 09:20, Geert Uytterhoeven wrote: > On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote: >> On 17/03/16 15:18, Jason Cooper wrote: >>> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote: >>>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>>> whether this is allowed. There is no way to know if setting the type is >>>>>> supported for a given GIC and so the value written is read back to >>>>>> verify it matches the desired configuration. If it does not match then >>>>>> an error is return. >>>>>> >>>>>> There are cases where the interrupt configuration read from firmware >>>>>> (such as a device-tree blob), has been incorrect and hence >>>>>> gic_configure_irq() has returned an error. This error has gone >>>>>> undetected because the error code returned was ignored but the interrupt >>>>>> still worked fine because the configuration for the interrupt could not >>>>>> be overwritten. >>>>>> >>>>>> Given that this has done undetected and we should only fail to set the >>>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>>> places in the kernel where we should be checking the return status and >>>>>> maintain back compatibility with firmware images that may have incorrect >>>>>> interrupt configurations. >>>>> >>>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>>> warn, but why do you want to return success? >>>> >>>> Yes that would be the correct thing to do I agree. However, the problem >>>> is that if we do this, then after the patch "irqdomain: Don't set type >>>> when mapping an IRQ" is applied, we may break interrupts for some >>>> existing device-tree binaries that have bad configuration (such as omap4 >>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>>> is a back compatibility issue. > > Indeed (also for sh73a0 and r8a7779). Thanks. I was wondering if there are others. Do you know what the correct setting should be? Ie. should it be IRQ_TYPE_EDGE_RISING as well? I can then include this with OMAP and Tegra. >>> This sounds like a textbook case for adding a boolean dt property. If >>> "can-set-ppi-type" is absent (old DT blobs and new blobs without the >>> ability), warn and return zero. If it's present, the driver can set the >>> type, returning errors as encountered. >> >> True. However, if we did have this "can-set-ppi-type" property set for a >> device, it really should never fail (unless someone specified it >> incorrectly). So I am trying to understand the value in adding a new DT >> property. > > Do we really want to add properties that basically indicate that a description > in DT is correct? > > Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then > fix TWD). I am not sure I fully understand your proposal, but please note that it may not be just limited to the TWD (although this does appear to be the one client that is wrong for a lot of SoCs). PPIs are also used for the armv7/8 timers as well. The problem is that we have a lot of SoCs with twd-timers and I have no way to test all of these to know which could be a problem. So I thought that warning would be a good first step to fixing them. However, I am still trying to see the real value in returning an error in this case. May be I am the only one with that perspective? Cheers Jon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jon, On Fri, Mar 18, 2016 at 10:54 AM, Jon Hunter <jonathanh@nvidia.com> wrote: > On 18/03/16 09:20, Geert Uytterhoeven wrote: >> On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote: >>> On 17/03/16 15:18, Jason Cooper wrote: >>>> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote: >>>>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>>>> whether this is allowed. There is no way to know if setting the type is >>>>>>> supported for a given GIC and so the value written is read back to >>>>>>> verify it matches the desired configuration. If it does not match then >>>>>>> an error is return. >>>>>>> >>>>>>> There are cases where the interrupt configuration read from firmware >>>>>>> (such as a device-tree blob), has been incorrect and hence >>>>>>> gic_configure_irq() has returned an error. This error has gone >>>>>>> undetected because the error code returned was ignored but the interrupt >>>>>>> still worked fine because the configuration for the interrupt could not >>>>>>> be overwritten. >>>>>>> >>>>>>> Given that this has done undetected and we should only fail to set the >>>>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>>>> places in the kernel where we should be checking the return status and >>>>>>> maintain back compatibility with firmware images that may have incorrect >>>>>>> interrupt configurations. >>>>>> >>>>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>>>> warn, but why do you want to return success? >>>>> >>>>> Yes that would be the correct thing to do I agree. However, the problem >>>>> is that if we do this, then after the patch "irqdomain: Don't set type >>>>> when mapping an IRQ" is applied, we may break interrupts for some >>>>> existing device-tree binaries that have bad configuration (such as omap4 >>>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>>>> is a back compatibility issue. >> >> Indeed (also for sh73a0 and r8a7779). > > Thanks. I was wondering if there are others. Do you know what the > correct setting should be? Ie. should it be IRQ_TYPE_EDGE_RISING as > well? I can then include this with OMAP and Tegra. The warnings went away when using IRQ_TYPE_EDGE_RISING. Patches sent. >>>> This sounds like a textbook case for adding a boolean dt property. If >>>> "can-set-ppi-type" is absent (old DT blobs and new blobs without the >>>> ability), warn and return zero. If it's present, the driver can set the >>>> type, returning errors as encountered. >>> >>> True. However, if we did have this "can-set-ppi-type" property set for a >>> device, it really should never fail (unless someone specified it >>> incorrectly). So I am trying to understand the value in adding a new DT >>> property. >> >> Do we really want to add properties that basically indicate that a description >> in DT is correct? >> >> Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then >> fix TWD). > > I am not sure I fully understand your proposal, but please note that it The kernel can modify the DT in early startup code if it detects that the TWD's interrupt type is wrong. > may not be just limited to the TWD (although this does appear to be the > one client that is wrong for a lot of SoCs). PPIs are also used for the > armv7/8 timers as well. True. > The problem is that we have a lot of SoCs with twd-timers and I have no > way to test all of these to know which could be a problem. So I thought > that warning would be a good first step to fixing them. Definitely. > However, I am still trying to see the real value in returning an error > in this case. May be I am the only one with that perspective? Given the above, we cannot start returning an error until all problems are fixed. Even after that, we have to care about stable DT ABIs. In this case it's not really an ABI change, but a real buggy description in DTS. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Mar 2016 15:04:01 +0000 Jon Hunter <jonathanh@nvidia.com> wrote: > > On 17/03/16 14:51, Thomas Gleixner wrote: > > On Thu, 17 Mar 2016, Jon Hunter wrote: > > > >> Setting the interrupt type for private peripheral interrupts (PPIs) may > >> not be supported by a given GIC because it is IMPLEMENTATION DEFINED > >> whether this is allowed. There is no way to know if setting the type is > >> supported for a given GIC and so the value written is read back to > >> verify it matches the desired configuration. If it does not match then > >> an error is return. > >> > >> There are cases where the interrupt configuration read from firmware > >> (such as a device-tree blob), has been incorrect and hence > >> gic_configure_irq() has returned an error. This error has gone > >> undetected because the error code returned was ignored but the interrupt > >> still worked fine because the configuration for the interrupt could not > >> be overwritten. > >> > >> Given that this has done undetected and we should only fail to set the > >> type for PPIs whose configuration cannot be changed anyway, don't return > >> an error and simply WARN if this fails. This will allows us to fix up any > >> places in the kernel where we should be checking the return status and > >> maintain back compatibility with firmware images that may have incorrect > >> interrupt configurations. > > > > Though silently returning 0 is really the wrong thing to do. You can add the > > warn, but why do you want to return success? > > Yes that would be the correct thing to do I agree. However, the problem > is that if we do this, then after the patch "irqdomain: Don't set type > when mapping an IRQ" is applied, we may break interrupts for some > existing device-tree binaries that have bad configuration (such as omap4 > and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it > is a back compatibility issue. > > If you are wondering why these interrupts break after "irqdomain: Don't > set type when mapping an IRQ", it is because today > irq_create_fwspec_mapping() does not check the return code from setting > the type, but if we defer setting the type until __setup_irq() which > does check the return code, then all of a sudden interrupts that were > working (even with bad configurations) start to fail. > > The reason why I opted not to return an error code from > gic_configure_irq() is it really can't fail. The failure being reported > does not prevent the interrupt from working, but tells you your > configuration does not match the hardware setting which you cannot > overwrite. > > So to maintain back compatibility and avoid any silent errors, I opted > to make it a WARN and not return an error. > > If people are ok with potentially breaking interrupts for device-tree > binaries with bad settings, then I am ok to return an error here. I think we need to phase things. Let's start with warning people for a few kernel releases. Actively maintained platforms will quickly address the issue (fixing their DT). As I see it, this issue seems rather widespread (even kvmtool outputs a DT with the wrong triggering information). Once we've fixed the bulk of the platforms and virtual environments, we can start thinking about making it fail harder. Thanks, M.
Hi Mark, On 09/04/16 11:58, Marc Zyngier wrote: > On Thu, 17 Mar 2016 15:04:01 +0000 > Jon Hunter <jonathanh@nvidia.com> wrote: > >> >> On 17/03/16 14:51, Thomas Gleixner wrote: >>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>> >>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>> whether this is allowed. There is no way to know if setting the type is >>>> supported for a given GIC and so the value written is read back to >>>> verify it matches the desired configuration. If it does not match then >>>> an error is return. >>>> >>>> There are cases where the interrupt configuration read from firmware >>>> (such as a device-tree blob), has been incorrect and hence >>>> gic_configure_irq() has returned an error. This error has gone >>>> undetected because the error code returned was ignored but the interrupt >>>> still worked fine because the configuration for the interrupt could not >>>> be overwritten. >>>> >>>> Given that this has done undetected and we should only fail to set the >>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>> an error and simply WARN if this fails. This will allows us to fix up any >>>> places in the kernel where we should be checking the return status and >>>> maintain back compatibility with firmware images that may have incorrect >>>> interrupt configurations. >>> >>> Though silently returning 0 is really the wrong thing to do. You can add the >>> warn, but why do you want to return success? >> >> Yes that would be the correct thing to do I agree. However, the problem >> is that if we do this, then after the patch "irqdomain: Don't set type >> when mapping an IRQ" is applied, we may break interrupts for some >> existing device-tree binaries that have bad configuration (such as omap4 >> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >> is a back compatibility issue. >> >> If you are wondering why these interrupts break after "irqdomain: Don't >> set type when mapping an IRQ", it is because today >> irq_create_fwspec_mapping() does not check the return code from setting >> the type, but if we defer setting the type until __setup_irq() which >> does check the return code, then all of a sudden interrupts that were >> working (even with bad configurations) start to fail. >> >> The reason why I opted not to return an error code from >> gic_configure_irq() is it really can't fail. The failure being reported >> does not prevent the interrupt from working, but tells you your >> configuration does not match the hardware setting which you cannot >> overwrite. >> >> So to maintain back compatibility and avoid any silent errors, I opted >> to make it a WARN and not return an error. >> >> If people are ok with potentially breaking interrupts for device-tree >> binaries with bad settings, then I am ok to return an error here. > > I think we need to phase things. Let's start with warning people for a > few kernel releases. Actively maintained platforms will quickly address > the issue (fixing their DT). As I see it, this issue seems rather > widespread (even kvmtool outputs a DT with the wrong triggering > information). > > Once we've fixed the bulk of the platforms and virtual environments, we > can start thinking about making it fail harder. Ok, so are you OK with this patch as-is? If so, can I add your ACK? Cheers Jon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/04/16 16:31, Jon Hunter wrote: > Hi Mark, > > On 09/04/16 11:58, Marc Zyngier wrote: >> On Thu, 17 Mar 2016 15:04:01 +0000 >> Jon Hunter <jonathanh@nvidia.com> wrote: >> >>> >>> On 17/03/16 14:51, Thomas Gleixner wrote: >>>> On Thu, 17 Mar 2016, Jon Hunter wrote: >>>> >>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may >>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED >>>>> whether this is allowed. There is no way to know if setting the type is >>>>> supported for a given GIC and so the value written is read back to >>>>> verify it matches the desired configuration. If it does not match then >>>>> an error is return. >>>>> >>>>> There are cases where the interrupt configuration read from firmware >>>>> (such as a device-tree blob), has been incorrect and hence >>>>> gic_configure_irq() has returned an error. This error has gone >>>>> undetected because the error code returned was ignored but the interrupt >>>>> still worked fine because the configuration for the interrupt could not >>>>> be overwritten. >>>>> >>>>> Given that this has done undetected and we should only fail to set the >>>>> type for PPIs whose configuration cannot be changed anyway, don't return >>>>> an error and simply WARN if this fails. This will allows us to fix up any >>>>> places in the kernel where we should be checking the return status and >>>>> maintain back compatibility with firmware images that may have incorrect >>>>> interrupt configurations. >>>> >>>> Though silently returning 0 is really the wrong thing to do. You can add the >>>> warn, but why do you want to return success? >>> >>> Yes that would be the correct thing to do I agree. However, the problem >>> is that if we do this, then after the patch "irqdomain: Don't set type >>> when mapping an IRQ" is applied, we may break interrupts for some >>> existing device-tree binaries that have bad configuration (such as omap4 >>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it >>> is a back compatibility issue. >>> >>> If you are wondering why these interrupts break after "irqdomain: Don't >>> set type when mapping an IRQ", it is because today >>> irq_create_fwspec_mapping() does not check the return code from setting >>> the type, but if we defer setting the type until __setup_irq() which >>> does check the return code, then all of a sudden interrupts that were >>> working (even with bad configurations) start to fail. >>> >>> The reason why I opted not to return an error code from >>> gic_configure_irq() is it really can't fail. The failure being reported >>> does not prevent the interrupt from working, but tells you your >>> configuration does not match the hardware setting which you cannot >>> overwrite. >>> >>> So to maintain back compatibility and avoid any silent errors, I opted >>> to make it a WARN and not return an error. >>> >>> If people are ok with potentially breaking interrupts for device-tree >>> binaries with bad settings, then I am ok to return an error here. >> >> I think we need to phase things. Let's start with warning people for a >> few kernel releases. Actively maintained platforms will quickly address >> the issue (fixing their DT). As I see it, this issue seems rather >> widespread (even kvmtool outputs a DT with the wrong triggering >> information). >> >> Once we've fixed the bulk of the platforms and virtual environments, we >> can start thinking about making it fail harder. > > Ok, so are you OK with this patch as-is? If so, can I add your ACK? It depends where you plan to handle the error. Ideally, I'd keep on returning the error (because that's the right thing to do), and move the WARN_ON() into the core code. We'd keep on ignoring the error as we're doing today, but we'd scream about it. After a couple of releases, we'd turn the WARN_ON into a hard fail. Thoughts? M.
On 11/04/16 16:39, Marc Zyngier wrote: > On 11/04/16 16:31, Jon Hunter wrote: >> On 09/04/16 11:58, Marc Zyngier wrote: [snip] >>> I think we need to phase things. Let's start with warning people for a >>> few kernel releases. Actively maintained platforms will quickly address >>> the issue (fixing their DT). As I see it, this issue seems rather >>> widespread (even kvmtool outputs a DT with the wrong triggering >>> information). >>> >>> Once we've fixed the bulk of the platforms and virtual environments, we >>> can start thinking about making it fail harder. >> >> Ok, so are you OK with this patch as-is? If so, can I add your ACK? > > It depends where you plan to handle the error. Ideally, I'd keep on > returning the error (because that's the right thing to do), and move the > WARN_ON() into the core code. We'd keep on ignoring the error as we're > doing today, but we'd scream about it. > > After a couple of releases, we'd turn the WARN_ON into a hard fail. > > Thoughts? I agree that would be best/ideal, but looking at it, I don't believe it is possible and this is why I have not done that so far. If we were to add the WARN to the core code, then we would need to add a warning everywhere __irq_set_trigger() is called. One of the places it is called today is from __setup_irq() and today this does the right thing and handle any error returned. The problem is that in irq_create_fwspec_mapping() we have never checked the return code from irq_set_irq_type() (which calls __irq_set_trigger()) or attempted to handle any errors. So the problem is that depending on the path through which the type is programmed, errors may or may not be detected. This is the actual headache :-( Given that this problem so far only pertains to GIC PPI interrupts and that it is a not a catastrophic error (interrupts still work fine), I was thinking we add the warning to the GIC driver. May be a less severe change would be to only return an error if configuring an SPI fails and if it is a PPI then simply WARN and carry-on as we assume we cannot change it. I hope this summarises the issue a bit further. Cheers Jon -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/04/16 09:50, Jon Hunter wrote: > > On 11/04/16 16:39, Marc Zyngier wrote: >> On 11/04/16 16:31, Jon Hunter wrote: >>> On 09/04/16 11:58, Marc Zyngier wrote: > > [snip] > >>>> I think we need to phase things. Let's start with warning people for a >>>> few kernel releases. Actively maintained platforms will quickly address >>>> the issue (fixing their DT). As I see it, this issue seems rather >>>> widespread (even kvmtool outputs a DT with the wrong triggering >>>> information). >>>> >>>> Once we've fixed the bulk of the platforms and virtual environments, we >>>> can start thinking about making it fail harder. >>> >>> Ok, so are you OK with this patch as-is? If so, can I add your ACK? >> >> It depends where you plan to handle the error. Ideally, I'd keep on >> returning the error (because that's the right thing to do), and move the >> WARN_ON() into the core code. We'd keep on ignoring the error as we're >> doing today, but we'd scream about it. >> >> After a couple of releases, we'd turn the WARN_ON into a hard fail. >> >> Thoughts? > > I agree that would be best/ideal, but looking at it, I don't believe it > is possible and this is why I have not done that so far. > > If we were to add the WARN to the core code, then we would need to add a > warning everywhere __irq_set_trigger() is called. One of the places it > is called today is from __setup_irq() and today this does the right > thing and handle any error returned. The problem is that in > irq_create_fwspec_mapping() we have never checked the return code from > irq_set_irq_type() (which calls __irq_set_trigger()) or attempted to > handle any errors. So the problem is that depending on the path through > which the type is programmed, errors may or may not be detected. This is > the actual headache :-( > > Given that this problem so far only pertains to GIC PPI interrupts and > that it is a not a catastrophic error (interrupts still work fine), I > was thinking we add the warning to the GIC driver. > > May be a less severe change would be to only return an error if > configuring an SPI fails and if it is a PPI then simply WARN and > carry-on as we assume we cannot change it. I'd take that. Limiting it to PPIs would be a minimal change, and the warning would hopefully make people realize their DT is wrong. Failing to program an SPI is really not expected, and should definitely explode. Thanks, M.
diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c index ffff5a45f1e3..423a345d7b75 100644 --- a/drivers/irqchip/irq-gic-common.c +++ b/drivers/irqchip/irq-gic-common.c @@ -32,13 +32,12 @@ void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks, } } -int gic_configure_irq(unsigned int irq, unsigned int type, +void gic_configure_irq(unsigned int irq, unsigned int type, void __iomem *base, void (*sync_access)(void)) { u32 confmask = 0x2 << ((irq % 16) * 2); u32 confoff = (irq / 16) * 4; u32 val, oldval; - int ret = 0; /* * Read current configuration register, and insert the config @@ -52,21 +51,17 @@ int gic_configure_irq(unsigned int irq, unsigned int type, /* If the current configuration is the same, then we are done */ if (val == oldval) - return 0; + return; /* * Write back the new configuration, and possibly re-enable - * the interrupt. If we fail to write a new configuration, - * return an error. + * the interrupt. WARN if we fail to write a new configuration. */ writel_relaxed(val, base + GIC_DIST_CONFIG + confoff); - if (readl_relaxed(base + GIC_DIST_CONFIG + confoff) != val) - ret = -EINVAL; + WARN_ON(readl_relaxed(base + GIC_DIST_CONFIG + confoff) != val); if (sync_access) sync_access(); - - return ret; } void __init gic_dist_config(void __iomem *base, int gic_irqs, diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h index fff697db8e22..73dee3bc6bba 100644 --- a/drivers/irqchip/irq-gic-common.h +++ b/drivers/irqchip/irq-gic-common.h @@ -27,7 +27,7 @@ struct gic_quirk { u32 mask; }; -int gic_configure_irq(unsigned int irq, unsigned int type, +void gic_configure_irq(unsigned int irq, unsigned int type, void __iomem *base, void (*sync_access)(void)); void gic_dist_config(void __iomem *base, int gic_irqs, void (*sync_access)(void)); diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 5b7d3c2129d8..c569c466fa31 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -310,7 +310,9 @@ static int gic_set_type(struct irq_data *d, unsigned int type) rwp_wait = gic_dist_wait_for_rwp; } - return gic_configure_irq(irq, type, base, rwp_wait); + gic_configure_irq(irq, type, base, rwp_wait); + + return 0; } static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 282344b95ec2..6c555a2c5315 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -279,7 +279,9 @@ static int gic_set_type(struct irq_data *d, unsigned int type) type != IRQ_TYPE_EDGE_RISING) return -EINVAL; - return gic_configure_irq(gicirq, type, base, NULL); + gic_configure_irq(gicirq, type, base, NULL); + + return 0; } static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu) diff --git a/drivers/irqchip/irq-hip04.c b/drivers/irqchip/irq-hip04.c index 9688d2e2a636..12be5906e91a 100644 --- a/drivers/irqchip/irq-hip04.c +++ b/drivers/irqchip/irq-hip04.c @@ -120,7 +120,6 @@ static int hip04_irq_set_type(struct irq_data *d, unsigned int type) { void __iomem *base = hip04_dist_base(d); unsigned int irq = hip04_irq(d); - int ret; /* Interrupt configuration for SGIs can't be changed */ if (irq < 16) @@ -133,11 +132,11 @@ static int hip04_irq_set_type(struct irq_data *d, unsigned int type) raw_spin_lock(&irq_controller_lock); - ret = gic_configure_irq(irq, type, base, NULL); + gic_configure_irq(irq, type, base, NULL); raw_spin_unlock(&irq_controller_lock); - return ret; + return 0; } #ifdef CONFIG_SMP
Setting the interrupt type for private peripheral interrupts (PPIs) may not be supported by a given GIC because it is IMPLEMENTATION DEFINED whether this is allowed. There is no way to know if setting the type is supported for a given GIC and so the value written is read back to verify it matches the desired configuration. If it does not match then an error is return. There are cases where the interrupt configuration read from firmware (such as a device-tree blob), has been incorrect and hence gic_configure_irq() has returned an error. This error has gone undetected because the error code returned was ignored but the interrupt still worked fine because the configuration for the interrupt could not be overwritten. Given that this has done undetected and we should only fail to set the type for PPIs whose configuration cannot be changed anyway, don't return an error and simply WARN if this fails. This will allows us to fix up any places in the kernel where we should be checking the return status and maintain back compatibility with firmware images that may have incorrect interrupt configurations. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> --- drivers/irqchip/irq-gic-common.c | 13 ++++--------- drivers/irqchip/irq-gic-common.h | 2 +- drivers/irqchip/irq-gic-v3.c | 4 +++- drivers/irqchip/irq-gic.c | 4 +++- drivers/irqchip/irq-hip04.c | 5 ++--- 5 files changed, 13 insertions(+), 15 deletions(-)