diff mbox

[04/15] irqchip/gic: WARN if setting the interrupt type fails

Message ID 1458224359-32665-5-git-send-email-jonathanh@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jon Hunter March 17, 2016, 2:19 p.m. UTC
Setting the interrupt type for private peripheral interrupts (PPIs) may
not be supported by a given GIC because it is IMPLEMENTATION DEFINED
whether this is allowed. There is no way to know if setting the type is
supported for a given GIC and so the value written is read back to
verify it matches the desired configuration. If it does not match then
an error is return.

There are cases where the interrupt configuration read from firmware
(such as a device-tree blob), has been incorrect and hence
gic_configure_irq() has returned an error. This error has gone
undetected because the error code returned was ignored but the interrupt
still worked fine because the configuration for the interrupt could not
be overwritten.

Given that this has done undetected and we should only fail to set the
type for PPIs whose configuration cannot be changed anyway, don't return
an error and simply WARN if this fails. This will allows us to fix up any
places in the kernel where we should be checking the return status and
maintain back compatibility with firmware images that may have incorrect
interrupt configurations.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
---
 drivers/irqchip/irq-gic-common.c | 13 ++++---------
 drivers/irqchip/irq-gic-common.h |  2 +-
 drivers/irqchip/irq-gic-v3.c     |  4 +++-
 drivers/irqchip/irq-gic.c        |  4 +++-
 drivers/irqchip/irq-hip04.c      |  5 ++---
 5 files changed, 13 insertions(+), 15 deletions(-)

Comments

Thomas Gleixner March 17, 2016, 2:51 p.m. UTC | #1
On Thu, 17 Mar 2016, Jon Hunter wrote:

> Setting the interrupt type for private peripheral interrupts (PPIs) may
> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
> whether this is allowed. There is no way to know if setting the type is
> supported for a given GIC and so the value written is read back to
> verify it matches the desired configuration. If it does not match then
> an error is return.
> 
> There are cases where the interrupt configuration read from firmware
> (such as a device-tree blob), has been incorrect and hence
> gic_configure_irq() has returned an error. This error has gone
> undetected because the error code returned was ignored but the interrupt
> still worked fine because the configuration for the interrupt could not
> be overwritten.
> 
> Given that this has done undetected and we should only fail to set the
> type for PPIs whose configuration cannot be changed anyway, don't return
> an error and simply WARN if this fails. This will allows us to fix up any
> places in the kernel where we should be checking the return status and
> maintain back compatibility with firmware images that may have incorrect
> interrupt configurations.

Though silently returning 0 is really the wrong thing to do. You can add the
warn, but why do you want to return success?

Thanks,

	tglx

 
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter March 17, 2016, 3:04 p.m. UTC | #2
On 17/03/16 14:51, Thomas Gleixner wrote:
> On Thu, 17 Mar 2016, Jon Hunter wrote:
> 
>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>> whether this is allowed. There is no way to know if setting the type is
>> supported for a given GIC and so the value written is read back to
>> verify it matches the desired configuration. If it does not match then
>> an error is return.
>>
>> There are cases where the interrupt configuration read from firmware
>> (such as a device-tree blob), has been incorrect and hence
>> gic_configure_irq() has returned an error. This error has gone
>> undetected because the error code returned was ignored but the interrupt
>> still worked fine because the configuration for the interrupt could not
>> be overwritten.
>>
>> Given that this has done undetected and we should only fail to set the
>> type for PPIs whose configuration cannot be changed anyway, don't return
>> an error and simply WARN if this fails. This will allows us to fix up any
>> places in the kernel where we should be checking the return status and
>> maintain back compatibility with firmware images that may have incorrect
>> interrupt configurations.
> 
> Though silently returning 0 is really the wrong thing to do. You can add the
> warn, but why do you want to return success?

Yes that would be the correct thing to do I agree. However, the problem
is that if we do this, then after the patch "irqdomain: Don't set type
when mapping an IRQ" is applied, we may break interrupts for some
existing device-tree binaries that have bad configuration (such as omap4
and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
is a back compatibility issue.

If you are wondering why these interrupts break after "irqdomain: Don't
set type when mapping an IRQ", it is because today
irq_create_fwspec_mapping() does not check the return code from setting
the type, but if we defer setting the type until __setup_irq() which
does check the return code, then all of a sudden interrupts that were
working (even with bad configurations) start to fail.

The reason why I opted not to return an error code from
gic_configure_irq() is it really can't fail. The failure being reported
does not prevent the interrupt from working, but tells you your
configuration does not match the hardware setting which you cannot
overwrite.

So to maintain back compatibility and avoid any silent errors, I opted
to make it a WARN and not return an error.

If people are ok with potentially breaking interrupts for device-tree
binaries with bad settings, then I am ok to return an error here.

Cheers
Jon



--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Cooper March 17, 2016, 3:18 p.m. UTC | #3
On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote:
> 
> On 17/03/16 14:51, Thomas Gleixner wrote:
> > On Thu, 17 Mar 2016, Jon Hunter wrote:
> > 
> >> Setting the interrupt type for private peripheral interrupts (PPIs) may
> >> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
> >> whether this is allowed. There is no way to know if setting the type is
> >> supported for a given GIC and so the value written is read back to
> >> verify it matches the desired configuration. If it does not match then
> >> an error is return.
> >>
> >> There are cases where the interrupt configuration read from firmware
> >> (such as a device-tree blob), has been incorrect and hence
> >> gic_configure_irq() has returned an error. This error has gone
> >> undetected because the error code returned was ignored but the interrupt
> >> still worked fine because the configuration for the interrupt could not
> >> be overwritten.
> >>
> >> Given that this has done undetected and we should only fail to set the
> >> type for PPIs whose configuration cannot be changed anyway, don't return
> >> an error and simply WARN if this fails. This will allows us to fix up any
> >> places in the kernel where we should be checking the return status and
> >> maintain back compatibility with firmware images that may have incorrect
> >> interrupt configurations.
> > 
> > Though silently returning 0 is really the wrong thing to do. You can add the
> > warn, but why do you want to return success?
> 
> Yes that would be the correct thing to do I agree. However, the problem
> is that if we do this, then after the patch "irqdomain: Don't set type
> when mapping an IRQ" is applied, we may break interrupts for some
> existing device-tree binaries that have bad configuration (such as omap4
> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
> is a back compatibility issue.

This sounds like a textbook case for adding a boolean dt property.  If
"can-set-ppi-type" is absent (old DT blobs and new blobs without the
ability), warn and return zero.  If it's present, the driver can set the
type, returning errors as encountered.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter March 17, 2016, 4:20 p.m. UTC | #4
On 17/03/16 15:18, Jason Cooper wrote:
> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote:
>>
>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>
>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>> whether this is allowed. There is no way to know if setting the type is
>>>> supported for a given GIC and so the value written is read back to
>>>> verify it matches the desired configuration. If it does not match then
>>>> an error is return.
>>>>
>>>> There are cases where the interrupt configuration read from firmware
>>>> (such as a device-tree blob), has been incorrect and hence
>>>> gic_configure_irq() has returned an error. This error has gone
>>>> undetected because the error code returned was ignored but the interrupt
>>>> still worked fine because the configuration for the interrupt could not
>>>> be overwritten.
>>>>
>>>> Given that this has done undetected and we should only fail to set the
>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>> places in the kernel where we should be checking the return status and
>>>> maintain back compatibility with firmware images that may have incorrect
>>>> interrupt configurations.
>>>
>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>> warn, but why do you want to return success?
>>
>> Yes that would be the correct thing to do I agree. However, the problem
>> is that if we do this, then after the patch "irqdomain: Don't set type
>> when mapping an IRQ" is applied, we may break interrupts for some
>> existing device-tree binaries that have bad configuration (such as omap4
>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>> is a back compatibility issue.
> 
> This sounds like a textbook case for adding a boolean dt property.  If
> "can-set-ppi-type" is absent (old DT blobs and new blobs without the
> ability), warn and return zero.  If it's present, the driver can set the
> type, returning errors as encountered.

True. However, if we did have this "can-set-ppi-type" property set for a
device, it really should never fail (unless someone specified it
incorrectly). So I am trying to understand the value in adding a new DT
property.

Please note that gic_configure_irq() never used to return an error and
only when adding support for setting the type of PPIs was this added.
However, given that this has gone unnoticed and does not have a real
functional impact on the device behaviour, I wonder now if this function
should return an error? Yes, ideally, it should, but does it still make
sense?

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven March 18, 2016, 9:20 a.m. UTC | #5
On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote:
> On 17/03/16 15:18, Jason Cooper wrote:
>> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote:
>>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>>> whether this is allowed. There is no way to know if setting the type is
>>>>> supported for a given GIC and so the value written is read back to
>>>>> verify it matches the desired configuration. If it does not match then
>>>>> an error is return.
>>>>>
>>>>> There are cases where the interrupt configuration read from firmware
>>>>> (such as a device-tree blob), has been incorrect and hence
>>>>> gic_configure_irq() has returned an error. This error has gone
>>>>> undetected because the error code returned was ignored but the interrupt
>>>>> still worked fine because the configuration for the interrupt could not
>>>>> be overwritten.
>>>>>
>>>>> Given that this has done undetected and we should only fail to set the
>>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>>> places in the kernel where we should be checking the return status and
>>>>> maintain back compatibility with firmware images that may have incorrect
>>>>> interrupt configurations.
>>>>
>>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>>> warn, but why do you want to return success?
>>>
>>> Yes that would be the correct thing to do I agree. However, the problem
>>> is that if we do this, then after the patch "irqdomain: Don't set type
>>> when mapping an IRQ" is applied, we may break interrupts for some
>>> existing device-tree binaries that have bad configuration (such as omap4
>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>>> is a back compatibility issue.

Indeed (also for sh73a0 and r8a7779).

>> This sounds like a textbook case for adding a boolean dt property.  If
>> "can-set-ppi-type" is absent (old DT blobs and new blobs without the
>> ability), warn and return zero.  If it's present, the driver can set the
>> type, returning errors as encountered.
>
> True. However, if we did have this "can-set-ppi-type" property set for a
> device, it really should never fail (unless someone specified it
> incorrectly). So I am trying to understand the value in adding a new DT
> property.

Do we really want to add properties that basically indicate that a description
in DT is correct?

Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then
fix TWD).

> Please note that gic_configure_irq() never used to return an error and
> only when adding support for setting the type of PPIs was this added.
> However, given that this has gone unnoticed and does not have a real
> functional impact on the device behaviour, I wonder now if this function
> should return an error? Yes, ideally, it should, but does it still make
> sense?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter March 18, 2016, 9:54 a.m. UTC | #6
On 18/03/16 09:20, Geert Uytterhoeven wrote:
> On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote:
>> On 17/03/16 15:18, Jason Cooper wrote:
>>> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote:
>>>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>>>> whether this is allowed. There is no way to know if setting the type is
>>>>>> supported for a given GIC and so the value written is read back to
>>>>>> verify it matches the desired configuration. If it does not match then
>>>>>> an error is return.
>>>>>>
>>>>>> There are cases where the interrupt configuration read from firmware
>>>>>> (such as a device-tree blob), has been incorrect and hence
>>>>>> gic_configure_irq() has returned an error. This error has gone
>>>>>> undetected because the error code returned was ignored but the interrupt
>>>>>> still worked fine because the configuration for the interrupt could not
>>>>>> be overwritten.
>>>>>>
>>>>>> Given that this has done undetected and we should only fail to set the
>>>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>>>> places in the kernel where we should be checking the return status and
>>>>>> maintain back compatibility with firmware images that may have incorrect
>>>>>> interrupt configurations.
>>>>>
>>>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>>>> warn, but why do you want to return success?
>>>>
>>>> Yes that would be the correct thing to do I agree. However, the problem
>>>> is that if we do this, then after the patch "irqdomain: Don't set type
>>>> when mapping an IRQ" is applied, we may break interrupts for some
>>>> existing device-tree binaries that have bad configuration (such as omap4
>>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>>>> is a back compatibility issue.
> 
> Indeed (also for sh73a0 and r8a7779).

Thanks. I was wondering if there are others. Do you know what the
correct setting should be? Ie. should it be IRQ_TYPE_EDGE_RISING as
well? I can then include this with OMAP and Tegra.

>>> This sounds like a textbook case for adding a boolean dt property.  If
>>> "can-set-ppi-type" is absent (old DT blobs and new blobs without the
>>> ability), warn and return zero.  If it's present, the driver can set the
>>> type, returning errors as encountered.
>>
>> True. However, if we did have this "can-set-ppi-type" property set for a
>> device, it really should never fail (unless someone specified it
>> incorrectly). So I am trying to understand the value in adding a new DT
>> property.
> 
> Do we really want to add properties that basically indicate that a description
> in DT is correct?
> 
> Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then
> fix TWD).

I am not sure I fully understand your proposal, but please note that it
may not be just limited to the TWD (although this does appear to be the
one client that is wrong for a lot of SoCs). PPIs are also used for the
armv7/8 timers as well.

The problem is that we have a lot of SoCs with twd-timers and I have no
way to test all of these to know which could be a problem. So I thought
that warning would be a good first step to fixing them.

However, I am still trying to see the real value in returning an error
in this case. May be I am the only one with that perspective?

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven March 18, 2016, 10:22 a.m. UTC | #7
Hi Jon,

On Fri, Mar 18, 2016 at 10:54 AM, Jon Hunter <jonathanh@nvidia.com> wrote:
> On 18/03/16 09:20, Geert Uytterhoeven wrote:
>> On Thu, Mar 17, 2016 at 5:20 PM, Jon Hunter <jonathanh@nvidia.com> wrote:
>>> On 17/03/16 15:18, Jason Cooper wrote:
>>>> On Thu, Mar 17, 2016 at 03:04:01PM +0000, Jon Hunter wrote:
>>>>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>>>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>>>>> whether this is allowed. There is no way to know if setting the type is
>>>>>>> supported for a given GIC and so the value written is read back to
>>>>>>> verify it matches the desired configuration. If it does not match then
>>>>>>> an error is return.
>>>>>>>
>>>>>>> There are cases where the interrupt configuration read from firmware
>>>>>>> (such as a device-tree blob), has been incorrect and hence
>>>>>>> gic_configure_irq() has returned an error. This error has gone
>>>>>>> undetected because the error code returned was ignored but the interrupt
>>>>>>> still worked fine because the configuration for the interrupt could not
>>>>>>> be overwritten.
>>>>>>>
>>>>>>> Given that this has done undetected and we should only fail to set the
>>>>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>>>>> places in the kernel where we should be checking the return status and
>>>>>>> maintain back compatibility with firmware images that may have incorrect
>>>>>>> interrupt configurations.
>>>>>>
>>>>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>>>>> warn, but why do you want to return success?
>>>>>
>>>>> Yes that would be the correct thing to do I agree. However, the problem
>>>>> is that if we do this, then after the patch "irqdomain: Don't set type
>>>>> when mapping an IRQ" is applied, we may break interrupts for some
>>>>> existing device-tree binaries that have bad configuration (such as omap4
>>>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>>>>> is a back compatibility issue.
>>
>> Indeed (also for sh73a0 and r8a7779).
>
> Thanks. I was wondering if there are others. Do you know what the
> correct setting should be? Ie. should it be IRQ_TYPE_EDGE_RISING as
> well? I can then include this with OMAP and Tegra.

The warnings went away when using IRQ_TYPE_EDGE_RISING.
Patches sent.

>>>> This sounds like a textbook case for adding a boolean dt property.  If
>>>> "can-set-ppi-type" is absent (old DT blobs and new blobs without the
>>>> ability), warn and return zero.  If it's present, the driver can set the
>>>> type, returning errors as encountered.
>>>
>>> True. However, if we did have this "can-set-ppi-type" property set for a
>>> device, it really should never fail (unless someone specified it
>>> incorrectly). So I am trying to understand the value in adding a new DT
>>> property.
>>
>> Do we really want to add properties that basically indicate that a description
>> in DT is correct?
>>
>> Alternatively, it can be fixed in the kernel in a DT quirk (if SoC == xxx then
>> fix TWD).
>
> I am not sure I fully understand your proposal, but please note that it

The kernel can modify the DT in early startup code if it detects that the
TWD's interrupt type is wrong.

> may not be just limited to the TWD (although this does appear to be the
> one client that is wrong for a lot of SoCs). PPIs are also used for the
> armv7/8 timers as well.

True.

> The problem is that we have a lot of SoCs with twd-timers and I have no
> way to test all of these to know which could be a problem. So I thought
> that warning would be a good first step to fixing them.

Definitely.

> However, I am still trying to see the real value in returning an error
> in this case. May be I am the only one with that perspective?

Given the above, we cannot start returning an error until all problems are
fixed.

Even after that, we have to care about stable DT ABIs. In this case it's not
really an ABI change, but a real buggy description in DTS.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier April 9, 2016, 10:58 a.m. UTC | #8
On Thu, 17 Mar 2016 15:04:01 +0000
Jon Hunter <jonathanh@nvidia.com> wrote:

> 
> On 17/03/16 14:51, Thomas Gleixner wrote:
> > On Thu, 17 Mar 2016, Jon Hunter wrote:
> > 
> >> Setting the interrupt type for private peripheral interrupts (PPIs) may
> >> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
> >> whether this is allowed. There is no way to know if setting the type is
> >> supported for a given GIC and so the value written is read back to
> >> verify it matches the desired configuration. If it does not match then
> >> an error is return.
> >>
> >> There are cases where the interrupt configuration read from firmware
> >> (such as a device-tree blob), has been incorrect and hence
> >> gic_configure_irq() has returned an error. This error has gone
> >> undetected because the error code returned was ignored but the interrupt
> >> still worked fine because the configuration for the interrupt could not
> >> be overwritten.
> >>
> >> Given that this has done undetected and we should only fail to set the
> >> type for PPIs whose configuration cannot be changed anyway, don't return
> >> an error and simply WARN if this fails. This will allows us to fix up any
> >> places in the kernel where we should be checking the return status and
> >> maintain back compatibility with firmware images that may have incorrect
> >> interrupt configurations.
> > 
> > Though silently returning 0 is really the wrong thing to do. You can add the
> > warn, but why do you want to return success?
> 
> Yes that would be the correct thing to do I agree. However, the problem
> is that if we do this, then after the patch "irqdomain: Don't set type
> when mapping an IRQ" is applied, we may break interrupts for some
> existing device-tree binaries that have bad configuration (such as omap4
> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
> is a back compatibility issue.
> 
> If you are wondering why these interrupts break after "irqdomain: Don't
> set type when mapping an IRQ", it is because today
> irq_create_fwspec_mapping() does not check the return code from setting
> the type, but if we defer setting the type until __setup_irq() which
> does check the return code, then all of a sudden interrupts that were
> working (even with bad configurations) start to fail.
> 
> The reason why I opted not to return an error code from
> gic_configure_irq() is it really can't fail. The failure being reported
> does not prevent the interrupt from working, but tells you your
> configuration does not match the hardware setting which you cannot
> overwrite.
> 
> So to maintain back compatibility and avoid any silent errors, I opted
> to make it a WARN and not return an error.
> 
> If people are ok with potentially breaking interrupts for device-tree
> binaries with bad settings, then I am ok to return an error here.

I think we need to phase things. Let's start with warning people for a
few kernel releases. Actively maintained platforms will quickly address
the issue (fixing their DT). As I see it, this issue seems rather
widespread (even kvmtool outputs a DT with the wrong triggering
information).

Once we've fixed the bulk of the platforms and virtual environments, we
can start thinking about making it fail harder.

Thanks,

	M.
Jon Hunter April 11, 2016, 3:31 p.m. UTC | #9
Hi Mark,

On 09/04/16 11:58, Marc Zyngier wrote:
> On Thu, 17 Mar 2016 15:04:01 +0000
> Jon Hunter <jonathanh@nvidia.com> wrote:
> 
>>
>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>
>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>> whether this is allowed. There is no way to know if setting the type is
>>>> supported for a given GIC and so the value written is read back to
>>>> verify it matches the desired configuration. If it does not match then
>>>> an error is return.
>>>>
>>>> There are cases where the interrupt configuration read from firmware
>>>> (such as a device-tree blob), has been incorrect and hence
>>>> gic_configure_irq() has returned an error. This error has gone
>>>> undetected because the error code returned was ignored but the interrupt
>>>> still worked fine because the configuration for the interrupt could not
>>>> be overwritten.
>>>>
>>>> Given that this has done undetected and we should only fail to set the
>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>> places in the kernel where we should be checking the return status and
>>>> maintain back compatibility with firmware images that may have incorrect
>>>> interrupt configurations.
>>>
>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>> warn, but why do you want to return success?
>>
>> Yes that would be the correct thing to do I agree. However, the problem
>> is that if we do this, then after the patch "irqdomain: Don't set type
>> when mapping an IRQ" is applied, we may break interrupts for some
>> existing device-tree binaries that have bad configuration (such as omap4
>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>> is a back compatibility issue.
>>
>> If you are wondering why these interrupts break after "irqdomain: Don't
>> set type when mapping an IRQ", it is because today
>> irq_create_fwspec_mapping() does not check the return code from setting
>> the type, but if we defer setting the type until __setup_irq() which
>> does check the return code, then all of a sudden interrupts that were
>> working (even with bad configurations) start to fail.
>>
>> The reason why I opted not to return an error code from
>> gic_configure_irq() is it really can't fail. The failure being reported
>> does not prevent the interrupt from working, but tells you your
>> configuration does not match the hardware setting which you cannot
>> overwrite.
>>
>> So to maintain back compatibility and avoid any silent errors, I opted
>> to make it a WARN and not return an error.
>>
>> If people are ok with potentially breaking interrupts for device-tree
>> binaries with bad settings, then I am ok to return an error here.
> 
> I think we need to phase things. Let's start with warning people for a
> few kernel releases. Actively maintained platforms will quickly address
> the issue (fixing their DT). As I see it, this issue seems rather
> widespread (even kvmtool outputs a DT with the wrong triggering
> information).
> 
> Once we've fixed the bulk of the platforms and virtual environments, we
> can start thinking about making it fail harder.

Ok, so are you OK with this patch as-is? If so, can I add your ACK?

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier April 11, 2016, 3:39 p.m. UTC | #10
On 11/04/16 16:31, Jon Hunter wrote:
> Hi Mark,
> 
> On 09/04/16 11:58, Marc Zyngier wrote:
>> On Thu, 17 Mar 2016 15:04:01 +0000
>> Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>>>
>>> On 17/03/16 14:51, Thomas Gleixner wrote:
>>>> On Thu, 17 Mar 2016, Jon Hunter wrote:
>>>>
>>>>> Setting the interrupt type for private peripheral interrupts (PPIs) may
>>>>> not be supported by a given GIC because it is IMPLEMENTATION DEFINED
>>>>> whether this is allowed. There is no way to know if setting the type is
>>>>> supported for a given GIC and so the value written is read back to
>>>>> verify it matches the desired configuration. If it does not match then
>>>>> an error is return.
>>>>>
>>>>> There are cases where the interrupt configuration read from firmware
>>>>> (such as a device-tree blob), has been incorrect and hence
>>>>> gic_configure_irq() has returned an error. This error has gone
>>>>> undetected because the error code returned was ignored but the interrupt
>>>>> still worked fine because the configuration for the interrupt could not
>>>>> be overwritten.
>>>>>
>>>>> Given that this has done undetected and we should only fail to set the
>>>>> type for PPIs whose configuration cannot be changed anyway, don't return
>>>>> an error and simply WARN if this fails. This will allows us to fix up any
>>>>> places in the kernel where we should be checking the return status and
>>>>> maintain back compatibility with firmware images that may have incorrect
>>>>> interrupt configurations.
>>>>
>>>> Though silently returning 0 is really the wrong thing to do. You can add the
>>>> warn, but why do you want to return success?
>>>
>>> Yes that would be the correct thing to do I agree. However, the problem
>>> is that if we do this, then after the patch "irqdomain: Don't set type
>>> when mapping an IRQ" is applied, we may break interrupts for some
>>> existing device-tree binaries that have bad configuration (such as omap4
>>> and tegra20/30 ... see patches 1 and 2) that have gone unnoticed. So it
>>> is a back compatibility issue.
>>>
>>> If you are wondering why these interrupts break after "irqdomain: Don't
>>> set type when mapping an IRQ", it is because today
>>> irq_create_fwspec_mapping() does not check the return code from setting
>>> the type, but if we defer setting the type until __setup_irq() which
>>> does check the return code, then all of a sudden interrupts that were
>>> working (even with bad configurations) start to fail.
>>>
>>> The reason why I opted not to return an error code from
>>> gic_configure_irq() is it really can't fail. The failure being reported
>>> does not prevent the interrupt from working, but tells you your
>>> configuration does not match the hardware setting which you cannot
>>> overwrite.
>>>
>>> So to maintain back compatibility and avoid any silent errors, I opted
>>> to make it a WARN and not return an error.
>>>
>>> If people are ok with potentially breaking interrupts for device-tree
>>> binaries with bad settings, then I am ok to return an error here.
>>
>> I think we need to phase things. Let's start with warning people for a
>> few kernel releases. Actively maintained platforms will quickly address
>> the issue (fixing their DT). As I see it, this issue seems rather
>> widespread (even kvmtool outputs a DT with the wrong triggering
>> information).
>>
>> Once we've fixed the bulk of the platforms and virtual environments, we
>> can start thinking about making it fail harder.
> 
> Ok, so are you OK with this patch as-is? If so, can I add your ACK?

It depends where you plan to handle the error. Ideally, I'd keep on
returning the error (because that's the right thing to do), and move the
WARN_ON() into the core code. We'd keep on ignoring the error as we're
doing today, but we'd scream about it.

After a couple of releases, we'd turn the WARN_ON into a hard fail.

Thoughts?

	M.
Jon Hunter April 12, 2016, 8:50 a.m. UTC | #11
On 11/04/16 16:39, Marc Zyngier wrote:
> On 11/04/16 16:31, Jon Hunter wrote:
>> On 09/04/16 11:58, Marc Zyngier wrote:

[snip]

>>> I think we need to phase things. Let's start with warning people for a
>>> few kernel releases. Actively maintained platforms will quickly address
>>> the issue (fixing their DT). As I see it, this issue seems rather
>>> widespread (even kvmtool outputs a DT with the wrong triggering
>>> information).
>>>
>>> Once we've fixed the bulk of the platforms and virtual environments, we
>>> can start thinking about making it fail harder.
>>
>> Ok, so are you OK with this patch as-is? If so, can I add your ACK?
> 
> It depends where you plan to handle the error. Ideally, I'd keep on
> returning the error (because that's the right thing to do), and move the
> WARN_ON() into the core code. We'd keep on ignoring the error as we're
> doing today, but we'd scream about it.
> 
> After a couple of releases, we'd turn the WARN_ON into a hard fail.
> 
> Thoughts?

I agree that would be best/ideal, but looking at it, I don't believe it
is possible and this is why I have not done that so far.

If we were to add the WARN to the core code, then we would need to add a
warning everywhere __irq_set_trigger() is called. One of the places it
is called today is from __setup_irq() and today this does the right
thing and handle any error returned. The problem is that in
irq_create_fwspec_mapping() we have never checked the return code from
irq_set_irq_type() (which calls __irq_set_trigger()) or attempted to
handle any errors. So the problem is that depending on the path through
which the type is programmed, errors may or may not be detected. This is
the actual headache :-(

Given that this problem so far only pertains to GIC PPI interrupts and
that it is a not a catastrophic error (interrupts still work fine), I
was thinking we add the warning to the GIC driver.

May be a less severe change would be to only return an error if
configuring an SPI fails and if it is a PPI then simply WARN and
carry-on as we assume we cannot change it.

I hope this summarises the issue a bit further.

Cheers
Jon




--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier April 12, 2016, 10:14 a.m. UTC | #12
On 12/04/16 09:50, Jon Hunter wrote:
> 
> On 11/04/16 16:39, Marc Zyngier wrote:
>> On 11/04/16 16:31, Jon Hunter wrote:
>>> On 09/04/16 11:58, Marc Zyngier wrote:
> 
> [snip]
> 
>>>> I think we need to phase things. Let's start with warning people for a
>>>> few kernel releases. Actively maintained platforms will quickly address
>>>> the issue (fixing their DT). As I see it, this issue seems rather
>>>> widespread (even kvmtool outputs a DT with the wrong triggering
>>>> information).
>>>>
>>>> Once we've fixed the bulk of the platforms and virtual environments, we
>>>> can start thinking about making it fail harder.
>>>
>>> Ok, so are you OK with this patch as-is? If so, can I add your ACK?
>>
>> It depends where you plan to handle the error. Ideally, I'd keep on
>> returning the error (because that's the right thing to do), and move the
>> WARN_ON() into the core code. We'd keep on ignoring the error as we're
>> doing today, but we'd scream about it.
>>
>> After a couple of releases, we'd turn the WARN_ON into a hard fail.
>>
>> Thoughts?
> 
> I agree that would be best/ideal, but looking at it, I don't believe it
> is possible and this is why I have not done that so far.
> 
> If we were to add the WARN to the core code, then we would need to add a
> warning everywhere __irq_set_trigger() is called. One of the places it
> is called today is from __setup_irq() and today this does the right
> thing and handle any error returned. The problem is that in
> irq_create_fwspec_mapping() we have never checked the return code from
> irq_set_irq_type() (which calls __irq_set_trigger()) or attempted to
> handle any errors. So the problem is that depending on the path through
> which the type is programmed, errors may or may not be detected. This is
> the actual headache :-(
> 
> Given that this problem so far only pertains to GIC PPI interrupts and
> that it is a not a catastrophic error (interrupts still work fine), I
> was thinking we add the warning to the GIC driver.
> 
> May be a less severe change would be to only return an error if
> configuring an SPI fails and if it is a PPI then simply WARN and
> carry-on as we assume we cannot change it.

I'd take that. Limiting it to PPIs would be a minimal change, and the
warning would hopefully make people realize their DT is wrong. Failing
to program an SPI is really not expected, and should definitely explode.

Thanks,

	M.
diff mbox

Patch

diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
index ffff5a45f1e3..423a345d7b75 100644
--- a/drivers/irqchip/irq-gic-common.c
+++ b/drivers/irqchip/irq-gic-common.c
@@ -32,13 +32,12 @@  void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
 	}
 }
 
-int gic_configure_irq(unsigned int irq, unsigned int type,
+void gic_configure_irq(unsigned int irq, unsigned int type,
 		       void __iomem *base, void (*sync_access)(void))
 {
 	u32 confmask = 0x2 << ((irq % 16) * 2);
 	u32 confoff = (irq / 16) * 4;
 	u32 val, oldval;
-	int ret = 0;
 
 	/*
 	 * Read current configuration register, and insert the config
@@ -52,21 +51,17 @@  int gic_configure_irq(unsigned int irq, unsigned int type,
 
 	/* If the current configuration is the same, then we are done */
 	if (val == oldval)
-		return 0;
+		return;
 
 	/*
 	 * Write back the new configuration, and possibly re-enable
-	 * the interrupt. If we fail to write a new configuration,
-	 * return an error.
+	 * the interrupt. WARN if we fail to write a new configuration.
 	 */
 	writel_relaxed(val, base + GIC_DIST_CONFIG + confoff);
-	if (readl_relaxed(base + GIC_DIST_CONFIG + confoff) != val)
-		ret = -EINVAL;
+	WARN_ON(readl_relaxed(base + GIC_DIST_CONFIG + confoff) != val);
 
 	if (sync_access)
 		sync_access();
-
-	return ret;
 }
 
 void __init gic_dist_config(void __iomem *base, int gic_irqs,
diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
index fff697db8e22..73dee3bc6bba 100644
--- a/drivers/irqchip/irq-gic-common.h
+++ b/drivers/irqchip/irq-gic-common.h
@@ -27,7 +27,7 @@  struct gic_quirk {
 	u32 mask;
 };
 
-int gic_configure_irq(unsigned int irq, unsigned int type,
+void gic_configure_irq(unsigned int irq, unsigned int type,
                        void __iomem *base, void (*sync_access)(void));
 void gic_dist_config(void __iomem *base, int gic_irqs,
 		     void (*sync_access)(void));
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 5b7d3c2129d8..c569c466fa31 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -310,7 +310,9 @@  static int gic_set_type(struct irq_data *d, unsigned int type)
 		rwp_wait = gic_dist_wait_for_rwp;
 	}
 
-	return gic_configure_irq(irq, type, base, rwp_wait);
+	gic_configure_irq(irq, type, base, rwp_wait);
+
+	return 0;
 }
 
 static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 282344b95ec2..6c555a2c5315 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -279,7 +279,9 @@  static int gic_set_type(struct irq_data *d, unsigned int type)
 			    type != IRQ_TYPE_EDGE_RISING)
 		return -EINVAL;
 
-	return gic_configure_irq(gicirq, type, base, NULL);
+	gic_configure_irq(gicirq, type, base, NULL);
+
+	return 0;
 }
 
 static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
diff --git a/drivers/irqchip/irq-hip04.c b/drivers/irqchip/irq-hip04.c
index 9688d2e2a636..12be5906e91a 100644
--- a/drivers/irqchip/irq-hip04.c
+++ b/drivers/irqchip/irq-hip04.c
@@ -120,7 +120,6 @@  static int hip04_irq_set_type(struct irq_data *d, unsigned int type)
 {
 	void __iomem *base = hip04_dist_base(d);
 	unsigned int irq = hip04_irq(d);
-	int ret;
 
 	/* Interrupt configuration for SGIs can't be changed */
 	if (irq < 16)
@@ -133,11 +132,11 @@  static int hip04_irq_set_type(struct irq_data *d, unsigned int type)
 
 	raw_spin_lock(&irq_controller_lock);
 
-	ret = gic_configure_irq(irq, type, base, NULL);
+	gic_configure_irq(irq, type, base, NULL);
 
 	raw_spin_unlock(&irq_controller_lock);
 
-	return ret;
+	return 0;
 }
 
 #ifdef CONFIG_SMP