diff mbox series

arch: arm: vgic-v3: fix GICD_ISACTIVER range

Message ID 20191107033709.17575-1-peng.fan@nxp.com (mailing list archive)
State Superseded
Headers show
Series arch: arm: vgic-v3: fix GICD_ISACTIVER range | expand

Commit Message

Peng Fan Nov. 7, 2019, 3:19 a.m. UTC
The end should be GICD_ISACTIVERN not GICD_ISACTIVER.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
---
 xen/arch/arm/vgic-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Stefano Stabellini Nov. 8, 2019, 7:24 p.m. UTC | #1
On Thu, 7 Nov 2019, Peng Fan wrote:
> The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> 
> Signed-off-by: Peng Fan <peng.fan@nxp.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

Juergen, I think this fix should be in the release (and also
backported to stable trees.)



> ---
>  xen/arch/arm/vgic-v3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index 422b94f902..e802f2055a 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
>          goto read_as_zero;
>  
>      /* Read the active status of an IRQ via GICD/GICR is not supported */
> -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
>      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
>          goto read_as_zero;
>  
> -- 
> 2.16.4
>
Julien Grall Nov. 8, 2019, 10:44 p.m. UTC | #2
Hi,

Sorry for the formatting.

On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> On Thu, 7 Nov 2019, Peng Fan wrote:
> > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> >
> > Signed-off-by: Peng Fan <peng.fan@nxp.com>
>
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>

To be honest, I am not sure the code is correct. A read to those registers
should tell you the list of interrupts active. As we always return 0, this
will not return the correct state of the GIC.

I know that returning the list of actives interrupts is complicated with
the old vGIC, but I don't think silently ignoring it is a good idea.

The question here is why the guest accessed those registers? What is it
trying to figure out?



> Juergen, I think this fix should be in the release (and also
> backported to stable trees.)
>

Without an understanding of the problem, I disagree with this request (see
above).

As an aside, the range ISPENDR  has the same issue.

Cheers,




>
>
> > ---
> >  xen/arch/arm/vgic-v3.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index 422b94f902..e802f2055a 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const
> char *name, struct vcpu *v,
> >          goto read_as_zero;
> >
> >      /* Read the active status of an IRQ via GICD/GICR is not supported
> */
> > -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> > +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
> >      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
> >          goto read_as_zero;
> >
> > --
> > 2.16.4
> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Stefano Stabellini Nov. 11, 2019, 7:01 p.m. UTC | #3
On Sat, 9 Nov 2019, Julien Grall wrote:
> On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>       On Thu, 7 Nov 2019, Peng Fan wrote:
>       > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
>       >
>       > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> 
>       Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> 
> To be honest, I am not sure the code is correct. A read to those registers should tell you the list of interrupts active. As we always
> return 0, this will not return the correct state of the GIC.
> 
> I know that returning the list of actives interrupts is complicated with the old vGIC, but I don't think silently ignoring it is a good
> idea.
> The question here is why the guest accessed those registers? What is it trying to figure out?

We are not going to solve the general problem at this stage. At the
moment the code:

- ignore the first register only
- print an error and return an IO_ABORT error for the other regs

For the inconsistency alone the second option is undesirable. Also it
doesn't match the write implementation, which does the same thing for
all the GICD_ISACTIVER* regs instead of having a special treatment for
the first one only. It looks like a typo in the original patch to me.

The proposed patch switches the behavior to:

- silently ignore all the GICD_ISACTIVER* regs (as proposed)

is an improvement.


>       Juergen, I think this fix should be in the release (and also
>       backported to stable trees.)
> 
> 
> Without an understanding of the problem, I disagree with this request (see above).
> 
> As an aside, the range ISPENDR  has the same issue.

You meant GICD_ICPENDR, right? Yep, that one is suffering from the same
typo mistake too.

 
>       > ---
>       >  xen/arch/arm/vgic-v3.c | 2 +-
>       >  1 file changed, 1 insertion(+), 1 deletion(-)
>       >
>       > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
>       > index 422b94f902..e802f2055a 100644
>       > --- a/xen/arch/arm/vgic-v3.c
>       > +++ b/xen/arch/arm/vgic-v3.c
>       > @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
>       >          goto read_as_zero;
>       > 
>       >      /* Read the active status of an IRQ via GICD/GICR is not supported */
>       > -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
>       > +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
>       >      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
>       >          goto read_as_zero;
>       > 
>       > --
>       > 2.16.4
>       >
> 
>       _______________________________________________
>       Xen-devel mailing list
>       Xen-devel@lists.xenproject.org
>       https://lists.xenproject.org/mailman/listinfo/xen-devel
> 
> 
>
Peng Fan Nov. 12, 2019, 4:42 a.m. UTC | #4
Hi Julien,

Inline marked with [Peng Fan]

From: Julien Grall <julien.grall.oss@gmail.com> 
Sent: 2019年11月9日 6:44
To: Stefano Stabellini <sstabellini@kernel.org>; Andre Przywara <andre.przywara@arm.com>
Cc: Peng Fan <peng.fan@nxp.com>; Jürgen Groß <jgross@suse.com>; julien.grall@arm.com; xen-devel@lists.xen.org
Subject: Re: [Xen-devel] [PATCH] arch: arm: vgic-v3: fix GICD_ISACTIVER range

Hi,

Sorry for the formatting.
On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <mailto:sstabellini@kernel.org> wrote:
On Thu, 7 Nov 2019, Peng Fan wrote:
> The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> 
> Signed-off-by: Peng Fan <mailto:peng.fan@nxp.com>

Reviewed-by: Stefano Stabellini <mailto:sstabellini@kernel.org>

To be honest, I am not sure the code is correct. A read to those registers should tell you the list of interrupts active. As we always return 0, this will not return the correct state of the GIC.

I know that returning the list of actives interrupts is complicated with the old vGIC, but I don't think silently ignoring it is a good idea.

The question here is why the guest accessed those registers? What is it trying to figure out?

[Peng Fan] I am running Linux 5.4 kernel dom0, gic_peek_irq triggers abort.



Juergen, I think this fix should be in the release (and also
backported to stable trees.)

Without an understanding of the problem, I disagree with this request (see above).

As an aside, the range ISPENDR  has the same issue.

[Peng Fan] Should I include this change in v2? Or develop new method to fix the issue?
But at least dom0 abort when boot.

Thanks,
Peng.

Cheers,






> ---
>  xen/arch/arm/vgic-v3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index 422b94f902..e802f2055a 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
>          goto read_as_zero;

>      /* Read the active status of an IRQ via GICD/GICR is not supported */
> -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
>      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
>          goto read_as_zero;

> -- 
> 2.16.4
>
Andre Przywara Nov. 12, 2019, 12:46 p.m. UTC | #5
On Mon, 11 Nov 2019 11:01:07 -0800 (PST)
Stefano Stabellini <sstabellini@kernel.org> wrote:

Hi,

> On Sat, 9 Nov 2019, Julien Grall wrote:
> > On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org> wrote:
> >       On Thu, 7 Nov 2019, Peng Fan wrote:  
> >       > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> >       >
> >       > Signed-off-by: Peng Fan <peng.fan@nxp.com>  
> > 
> >       Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > 
> > 
> > To be honest, I am not sure the code is correct. A read to those registers should tell you the list of interrupts active. As we always
> > return 0, this will not return the correct state of the GIC.
> > 
> > I know that returning the list of actives interrupts is complicated with the old vGIC, but I don't think silently ignoring it is a good
> > idea.
> > The question here is why the guest accessed those registers? What is it trying to figure out?

I see Linux querying the active state (IRQCHIP_STATE_ACTIVE) at two relevant points for ARM:
- In kernel/irq/manage.c, in __synchronize_hardirq().
- In KVM's arch timer emulation code.

I think the latter is of no concern (yet), but the first might actually trigger. At the moment it's beyond me what it actually does, but maybe some IRQ changes (RT, threaded IRQs?) trigger this now?
 
> We are not going to solve the general problem at this stage. At the
> moment the code:
> 
> - ignore the first register only
> - print an error and return an IO_ABORT error for the other regs
> 
> For the inconsistency alone the second option is undesirable. Also it
> doesn't match the write implementation, which does the same thing for
> all the GICD_ISACTIVER* regs instead of having a special treatment for
> the first one only. It looks like a typo in the original patch to me.
> 
> The proposed patch switches the behavior to:
> 
> - silently ignore all the GICD_ISACTIVER* regs (as proposed)
> 
> is an improvement.

Yeah, I agree. Getting the actual active state of a virtual IRQ is at least expensive, as you have to bring all VCPUs back, to sync the LR content back into something Xen can access.
This is an architectural property of the GIC virtualisation, as normally the acknowledge happens without exiting, also the EOI, so Xen cannot know which state an IRQ is in while the VCPU is running. Think: Schrödinger ;-)

Regarding this patch: The original code looks indeed like a typo to me: On the read side both ISACTIVERx and ICACTIVERx behave the same, so they should be handled the same.
And returning 0 is probably the best approximation we can do at the moment. The other solution is to add GICv3 support to the new VGIC ;-), as this at least solves the case when we deliberately inject an active IRQ. We could extend this logic to find out which IRQs in this block *could* possibly be active, then bring those VCPUs back to Xen.

Cheers,
Andre.

> >       Juergen, I think this fix should be in the release (and also
> >       backported to stable trees.)
> > 
> > 
> > Without an understanding of the problem, I disagree with this request (see above).
> > 
> > As an aside, the range ISPENDR  has the same issue.  
> 
> You meant GICD_ICPENDR, right? Yep, that one is suffering from the same
> typo mistake too.
> 
>  
> >       > ---
> >       >  xen/arch/arm/vgic-v3.c | 2 +-
> >       >  1 file changed, 1 insertion(+), 1 deletion(-)
> >       >
> >       > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> >       > index 422b94f902..e802f2055a 100644
> >       > --- a/xen/arch/arm/vgic-v3.c
> >       > +++ b/xen/arch/arm/vgic-v3.c
> >       > @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
> >       >          goto read_as_zero;
> >       > 
> >       >      /* Read the active status of an IRQ via GICD/GICR is not supported */
> >       > -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> >       > +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
> >       >      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
> >       >          goto read_as_zero;
> >       > 
> >       > --
> >       > 2.16.4
> >       >  
> > 
> >       _______________________________________________
> >       Xen-devel mailing list
> >       Xen-devel@lists.xenproject.org
> >       https://lists.xenproject.org/mailman/listinfo/xen-devel
> > 
> > 
> >
Stefano Stabellini Nov. 12, 2019, 5:35 p.m. UTC | #6
On Tue, 12 Nov 2019, Peng Fan wrote:
> Hi Julien,
> 
> Inline marked with [Peng Fan]

Please use plain text emails on xen-devel (and other open source
development mailing lists.)


> From: Julien Grall <julien.grall.oss@gmail.com> 
> Sent: 2019年11月9日 6:44
> To: Stefano Stabellini <sstabellini@kernel.org>; Andre Przywara <andre.przywara@arm.com>
> Cc: Peng Fan <peng.fan@nxp.com>; Jürgen Groß <jgross@suse.com>; julien.grall@arm.com; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] arch: arm: vgic-v3: fix GICD_ISACTIVER range
> 
> Hi,
> 
> Sorry for the formatting.
> On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <mailto:sstabellini@kernel.org> wrote:
> On Thu, 7 Nov 2019, Peng Fan wrote:
> > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> > 
> > Signed-off-by: Peng Fan <mailto:peng.fan@nxp.com>
> 
> Reviewed-by: Stefano Stabellini <mailto:sstabellini@kernel.org>
> 
> To be honest, I am not sure the code is correct. A read to those registers should tell you the list of interrupts active. As we always return 0, this will not return the correct state of the GIC.
> 
> I know that returning the list of actives interrupts is complicated with the old vGIC, but I don't think silently ignoring it is a good idea.
> 
> The question here is why the guest accessed those registers? What is it trying to figure out?
> 
> [Peng Fan] I am running Linux 5.4 kernel dom0, gic_peek_irq triggers abort.
> 
> 
> 
> Juergen, I think this fix should be in the release (and also
> backported to stable trees.)
> 
> Without an understanding of the problem, I disagree with this request (see above).
> 
> As an aside, the range ISPENDR  has the same issue.
> 
> [Peng Fan] Should I include this change in v2? Or develop new method to fix the issue?
> But at least dom0 abort when boot.

Also considering Andre's reply, yes, please send another patch to fix
for ISPENDR too. It doesn't have to be the same patch.

Thank you!
 
 
> 
> 
> 
> > ---
> >  xen/arch/arm/vgic-v3.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index 422b94f902..e802f2055a 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -706,7 +706,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
> >          goto read_as_zero;
> >  
> >      /* Read the active status of an IRQ via GICD/GICR is not supported */
> > -    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> > +    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
> >      case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
> >          goto read_as_zero;
> >  
> > -- 
> > 2.16.4
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> mailto:Xen-devel@lists.xenproject.org
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.xenproject.org%2Fmailman%2Flistinfo%2Fxen-devel&data=02%7C01%7Cpeng.fan%40nxp.com%7C33f2e907cdc84ed0a48608d7649d359e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637088498678782239&sdata=G3FA2vefr56FeUX5QVZQwSzG22nfv1m%2F0fKIDOnfuFQ%3D&reserved=0
>
Julien Grall Nov. 13, 2019, 1:08 a.m. UTC | #7
On Tue, 12 Nov 2019, 04:01 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> On Sat, 9 Nov 2019, Julien Grall wrote:
> > On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org>
> wrote:
> >       On Thu, 7 Nov 2019, Peng Fan wrote:
> >       > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> >       >
> >       > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >
> >       Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> >
> >
> > To be honest, I am not sure the code is correct. A read to those
> registers should tell you the list of interrupts active. As we always
> > return 0, this will not return the correct state of the GIC.
> >
> > I know that returning the list of actives interrupts is complicated with
> the old vGIC, but I don't think silently ignoring it is a good
> > idea.
> > The question here is why the guest accessed those registers? What is it
> trying to figure out?
>
> We are not going to solve the general problem at this stage. At the
> moment the code:
>
> - ignore the first register only
> - print an error and return an IO_ABORT error for the other regs
>
> For the inconsistency alone the second option is undesirable. Also it
> doesn't match the write implementation, which does the same thing for
> all the GICD_ISACTIVER* regs instead of having a special treatment for
> the first one only. It looks like a typo in the original patch to me.
>
> The proposed patch switches the behavior to:
>
> - silently ignore all the GICD_ISACTIVER* regs (as proposed)


> is an improvement.
>

Peng mentioned that Linux is accessing it, so the worst thing we can do is
lying to the guest (as you suggest here). I would definitely not call that
an improvement.

In the current state, it is a Nack. If there were a warning, then I would
be more inclined to see this patch going through.

Cheers,
Andre Przywara Nov. 13, 2019, 1:58 a.m. UTC | #8
On 13/11/2019 01:08, Julien Grall wrote:

Hi,

> On Tue, 12 Nov 2019, 04:01 Stefano Stabellini, <sstabellini@kernel.org
> <mailto:sstabellini@kernel.org>> wrote:
> 
>     On Sat, 9 Nov 2019, Julien Grall wrote:
>     > On Sat, 9 Nov 2019, 04:27 Stefano Stabellini,
>     <sstabellini@kernel.org <mailto:sstabellini@kernel.org>> wrote:
>     >       On Thu, 7 Nov 2019, Peng Fan wrote:
>     >       > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
>     >       >
>     >       > Signed-off-by: Peng Fan <peng.fan@nxp.com
>     <mailto:peng.fan@nxp.com>>
>     >
>     >       Reviewed-by: Stefano Stabellini <sstabellini@kernel.org
>     <mailto:sstabellini@kernel.org>>
>     >
>     >
>     > To be honest, I am not sure the code is correct. A read to those
>     registers should tell you the list of interrupts active. As we always
>     > return 0, this will not return the correct state of the GIC.
>     >
>     > I know that returning the list of actives interrupts is
>     complicated with the old vGIC, but I don't think silently ignoring
>     it is a good
>     > idea.
>     > The question here is why the guest accessed those registers? What
>     is it trying to figure out?
> 
>     We are not going to solve the general problem at this stage. At the
>     moment the code:
> 
>     - ignore the first register only
>     - print an error and return an IO_ABORT error for the other regs
> 
>     For the inconsistency alone the second option is undesirable. Also it
>     doesn't match the write implementation, which does the same thing for
>     all the GICD_ISACTIVER* regs instead of having a special treatment for
>     the first one only. It looks like a typo in the original patch to me.
> 
>     The proposed patch switches the behavior to:
> 
>     - silently ignore all the GICD_ISACTIVER* regs (as proposed)
> 
> 
>     is an improvement.
> 
> 
> Peng mentioned that Linux is accessing it, so the worst thing we can do
> is lying to the guest (as you suggest here). I would definitely not call
> that an improvement.

The ISACTIVER range is wrong in the description, it covers only one
register, not multiple. This is obviously a typo, since it's correct in
both GICv2 and in the high level switch/case in GICv3. Reading from
outside of any range will inject an abort into the guest, which runs in
kernel space. This will probably result in a guest crash. I would
consider not crashing an improvement.

About "lying" to the guest: Typically an IRQ is just active for a very
short time, so 0 is a very good answer, actually. The old VGIC in KVM
did exactly the same:
        vgic_reg_access(mmio, NULL, offset,
                        ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);

The proper solution would be:
1) Track the state of the active bit when we can observe it, so when the
guest exits with an active IRQ. The new VGIC does that.
2) Kick out all VCPUs that have IRQs in that given rank, and sample the
active bit from the LRs. Sounds pretty horrible, and chances are very
high you will get all 0s there.

So if I compare "fix those two typos and preserve the state that the Xen
VGIC has been in for years" to "create a lot of racy code for a rare
corner case", the first one surely wins.
That doesn't mean we should never try it, but surely this fix needs to
go in meanwhile.

> In the current state, it is a Nack. If there were a warning, then I
> would be more inclined to see this patch going through.

Do you mean a warning that we are about to lie to the guest? That sounds
pretty useless, since nobody can do anything about it. Plus we have
already those warnings on writing to these registers, and this always
confuses people and triggered pointless bug reports.

I think the old VGIC has bigger problems ;-)

Cheers,
Andre
Julien Grall Nov. 13, 2019, 2:03 a.m. UTC | #9
On Wed, 13 Nov 2019, 10:55 André Przywara, <andre.przywara@arm.com> wrote:

> On 13/11/2019 01:08, Julien Grall wrote:
>
> Hi,
>
> > On Tue, 12 Nov 2019, 04:01 Stefano Stabellini, <sstabellini@kernel.org
> > <mailto:sstabellini@kernel.org>> wrote:
> >
> >     On Sat, 9 Nov 2019, Julien Grall wrote:
> >     > On Sat, 9 Nov 2019, 04:27 Stefano Stabellini,
> >     <sstabellini@kernel.org <mailto:sstabellini@kernel.org>> wrote:
> >     >       On Thu, 7 Nov 2019, Peng Fan wrote:
> >     >       > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> >     >       >
> >     >       > Signed-off-by: Peng Fan <peng.fan@nxp.com
> >     <mailto:peng.fan@nxp.com>>
> >     >
> >     >       Reviewed-by: Stefano Stabellini <sstabellini@kernel.org
> >     <mailto:sstabellini@kernel.org>>
> >     >
> >     >
> >     > To be honest, I am not sure the code is correct. A read to those
> >     registers should tell you the list of interrupts active. As we always
> >     > return 0, this will not return the correct state of the GIC.
> >     >
> >     > I know that returning the list of actives interrupts is
> >     complicated with the old vGIC, but I don't think silently ignoring
> >     it is a good
> >     > idea.
> >     > The question here is why the guest accessed those registers? What
> >     is it trying to figure out?
> >
> >     We are not going to solve the general problem at this stage. At the
> >     moment the code:
> >
> >     - ignore the first register only
> >     - print an error and return an IO_ABORT error for the other regs
> >
> >     For the inconsistency alone the second option is undesirable. Also it
> >     doesn't match the write implementation, which does the same thing for
> >     all the GICD_ISACTIVER* regs instead of having a special treatment
> for
> >     the first one only. It looks like a typo in the original patch to me.
> >
> >     The proposed patch switches the behavior to:
> >
> >     - silently ignore all the GICD_ISACTIVER* regs (as proposed)
> >
> >
> >     is an improvement.
> >
> >
> > Peng mentioned that Linux is accessing it, so the worst thing we can do
> > is lying to the guest (as you suggest here). I would definitely not call
> > that an improvement.
>
> The ISACTIVER range is wrong in the description, it covers only one
> register, not multiple. This is obviously a typo, since it's correct in
> both GICv2 and in the high level switch/case in GICv3. Reading from
> outside of any range will inject an abort into the guest, which runs in
> kernel space. This will probably result in a guest crash. I would
> consider not crashing an improvement.
>

It is not. Neither the current approach to silently doing it.


> About "lying" to the guest: Typically an IRQ is just active for a very
> short time, so 0 is a very good answer, actually.


So why does Linux is checking it? What will happen if there were actually
an active interrupt but don't report it?

The old VGIC in KVM
> did exactly the same:
>         vgic_reg_access(mmio, NULL, offset,
>                         ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
>
> The proper solution would be:
> 1) Track the state of the active bit when we can observe it, so when the
> guest exits with an active IRQ. The new VGIC does that.
> 2) Kick out all VCPUs that have IRQs in that given rank, and sample the
> active bit from the LRs. Sounds pretty horrible, and chances are very
> high you will get all 0s there.
>
> So if I compare "fix those two typos and preserve the state that the Xen
> VGIC has been in for years" to "create a lot of racy code for a rare
> corner case", the first one surely wins.
> That doesn't mean we should never try it, but surely this fix needs to
> go in meanwhile.
>

I don't believe this patch to go in is the correct solution not from a
technical PoV but to get things properly fixed.


> > In the current state, it is a Nack. If there were a warning, then I
> > would be more inclined to see this patch going through.
>
> Do you mean a warning that we are about to lie to the guest? That sounds
> pretty useless, since nobody can do anything about it. Plus we have
> already those warnings on writing to these registers, and this always
> confuses people and triggered pointless bug reports.
>

Well, the warning has the benefits to annoy people. If we do it silently,
then we don't encourage to fix it.


> I think the old VGIC has bigger problems ;-)
>

I agree, but nobody seems to be willing to fix it... My only leverage here
is pushing for a warning to annoy the user.

So I maintain my request for a warning.

Cheers,


> Cheers,
> Andre
>
Julien Grall Nov. 14, 2019, 6:58 a.m. UTC | #10
On Tue, 12 Nov 2019, 11:45 Peng Fan, <peng.fan@nxp.com> wrote:

> Hi Julien,
>
> Inline marked with [Peng Fan]
>
> From: Julien Grall <julien.grall.oss@gmail.com>
> Sent: 2019年11月9日 6:44
> To: Stefano Stabellini <sstabellini@kernel.org>; Andre Przywara <
> andre.przywara@arm.com>
> Cc: Peng Fan <peng.fan@nxp.com>; Jürgen Groß <jgross@suse.com>;
> julien.grall@arm.com; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] arch: arm: vgic-v3: fix GICD_ISACTIVER
> range
>
> Hi,
>
> Sorry for the formatting.
> On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <mailto:
> sstabellini@kernel.org> wrote:
> On Thu, 7 Nov 2019, Peng Fan wrote:
> > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> >
> > Signed-off-by: Peng Fan <mailto:peng.fan@nxp.com>
>
> Reviewed-by: Stefano Stabellini <mailto:sstabellini@kernel.org>
>
> To be honest, I am not sure the code is correct. A read to those registers
> should tell you the list of interrupts active. As we always return 0, this
> will not return the correct state of the GIC.
>
> I know that returning the list of actives interrupts is complicated with
> the old vGIC, but I don't think silently ignoring it is a good idea.
>
> The question here is why the guest accessed those registers? What is it
> trying to figure out?
>
> [Peng Fan] I am running Linux 5.4 kernel dom0, gic_peek_irq triggers abort.
>

Do you have a call stack trace for this?

Cheers,
Peng Fan Nov. 14, 2019, 9:04 a.m. UTC | #11
> Do you have a call stack trace for this?

(XEN) d0v1: vGICD: unhandled read r1 offset 0x000324
(XEN) traps.c:1994:d0v1 HSR=0x93810006 pc=0xffff8000104f2bb4 gva=0xffff800010010324 gpa=0x00000051a00324
[    1.780564] Unhandled fault at 0xffff800010010324
[    1.785771] Mem abort info:
[    1.788899]   ESR = 0x96000000
[    1.792320]   EC = 0x25: DABT (current EL), IL = 32 bits
[    1.798196]   SET = 0, FnV = 0
[    1.801615]   EA = 0, S1PTW = 0
[    1.805124] Data abort info:
[    1.808350]   ISV = 0, ISS = 0x00000000
[    1.812620]   CM = 0, WnR = 0
[    1.815943] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000a17f6000
[    1.823344] [ffff800010010324] pgd=000000095ffff003, pud=000000095fffe003, pmd=000000095fffd003, pte=00e8000051a00707
[    1.835016] Internal error: ttbr address size fault: 96000000 [#1] PREEMPT SMP
[    1.842983] Modules linked in:
[    1.846403] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc5-03191-g20646f88034a-dirty #1774
[    1.855980] Hardware name: Freescale i.MX8QM MEK DOM0 (DT)
[    1.862059] pstate: a0000085 (NzCv daIf -PAN -UAO)
[    1.867374] pc : gic_peek_irq+0x6c/0xb0
[    1.871638] lr : gic_peek_irq+0x28/0xb0
[    1.875899] sp : ffff80001002ba80
[    1.879599] x29: ffff80001002ba80 x28: 0000000000000000
[    1.885483] x27: 0000000000000000 x26: ffff800011800514
[    1.891369] x25: 0000000000000000 x24: 0000000000000000
[    1.897246] x23: ffff0008dc3e5400 x22: ffff0008dc3e5458
[    1.903129] x21: ffff800011d798c8 x20: 0000000000000001
[    1.909011] x19: ffff80001002bb07 x18: 0000000000000004
[    1.914893] x17: ffff8000112243c0 x16: 00000000deadbeef
[    1.920780] x15: ffff0008dc01b490 x14: ffff0008dbcde5c0
[    1.926658] x13: ffffffffffffff00 x12: ffffffffffffffff
[    1.932540] x11: 0000000000000008 x10: ffffffffffffffff
[    1.938422] x9 : c1c1c1c1c1c1c1c1 x8 : 0000000000000000
[    1.944304] x7 : ffffffffffffffff x6 : ffff0008dc3e5458
[    1.950186] x5 : ffff800011d798c8 x4 : 0000000000000008
[    1.956069] x3 : 0000000000000324 x2 : fffffffffffffd03
[    1.961951] x1 : ffff800010010324 x0 : 0000000000000300
[    1.967835] Call trace:
[    1.970588]  gic_peek_irq+0x6c/0xb0
[    1.974481]  gic_irq_get_irqchip_state+0x70/0xa0
[    1.979607]  __synchronize_hardirq+0xd8/0xe0
[    1.984343]  __free_irq+0x13c/0x2b8
[    1.988232]  free_irq+0x30/0x70
[    1.991749]  devm_irq_release+0x14/0x20
[    1.996019]  release_nodes+0x1b0/0x220
[    2.000189]  devres_release_all+0x34/0x50
[    2.004652]  really_probe+0x1b8/0x308
[    2.008728]  driver_probe_device+0x54/0xe8
[    2.013281]  device_driver_attach+0x6c/0x78
[    2.017929]  __driver_attach+0x54/0xd0
[    2.022106]  bus_for_each_dev+0x70/0xc0
[    2.026372]  driver_attach+0x20/0x28
[    2.030359]  bus_add_driver+0x178/0x1d8
[    2.034630]  driver_register+0x60/0x110
[    2.038900]  __platform_driver_register+0x44/0x50
[    2.044121]  fec_driver_init+0x18/0x20
[    2.048294]  do_one_initcall+0x74/0x1b0
[    2.052564]  kernel_init_freeable+0x194/0x22c
[    2.057403]  kernel_init+0x10/0x100
[    2.061293]  ret_from_fork+0x10/0x18
[    2.065279] Code: 53057c63 d37e6863 8b204063 8b030021 (b9400021)
[    2.072024] ---[ end trace 62c8d1d1d10ae3de ]---
[    2.077158] note: swapper/0[1] exited with preempt_count 1
[    2.083296] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.091659] SMP: stopping secondary CPUs
[    2.096037] Kernel Offset: disabled
[    2.099914] CPU features: 0x0002,2000200c
[    2.104371] Memory Limit: none
[    2.107788] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Regards,
Peng.
Julien Grall Nov. 19, 2019, 8:18 p.m. UTC | #12
Hi Andre,

On 12/11/2019 12:46, Andre Przywara wrote:
> On Mon, 11 Nov 2019 11:01:07 -0800 (PST)
> Stefano Stabellini <sstabellini@kernel.org> wrote:
>> On Sat, 9 Nov 2019, Julien Grall wrote:
>>> On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>>>        On Thu, 7 Nov 2019, Peng Fan wrote:
>>>        > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
>>>        >
>>>        > Signed-off-by: Peng Fan <peng.fan@nxp.com>
>>>
>>>        Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>
>>>
>>> To be honest, I am not sure the code is correct. A read to those registers should tell you the list of interrupts active. As we always
>>> return 0, this will not return the correct state of the GIC.
>>>
>>> I know that returning the list of actives interrupts is complicated with the old vGIC, but I don't think silently ignoring it is a good
>>> idea.
>>> The question here is why the guest accessed those registers? What is it trying to figure out?
> 
> I see Linux querying the active state (IRQCHIP_STATE_ACTIVE) at two relevant points for ARM:
> - In kernel/irq/manage.c, in __synchronize_hardirq().
> - In KVM's arch timer emulation code.
> 
> I think the latter is of no concern (yet), but the first might actually trigger. At the moment it's beyond me what it actually does, but maybe some IRQ changes (RT, threaded IRQs?) trigger this now?\
It happens I am sitting right next to Marc now, so I had a chat with him 
about this :). Let me try to summarize the discussion here.

__synchronize_hardirq() is used to ensure that all active interrupts 
have been handled before continuing. When sync_chip == false, we only 
ensure that all in progress interrupts (from Linux PoV) are handled 
before continue.

sync_chip == true will additionally ensure that any interrupt that were 
acknowledge but not yet marked as in progress by the kernel are also 
handled. The assumption is this is called after the interrupt has been 
masked/disabled.

The call to __synchronize_hardirq() in free_irq() (as reported by Peng 
stack trace) was introduced recently (see [1]). It is not entirely clear 
whether this would affect anyone using Linux 5.4 or just a limited 
subset of users.

Anyhow, this is a genuine bug and I think returning 0 is only papering 
over the bug with no long-term resolution. As Marc pointed out, Even the 
old vGIC in KVM was not spec compliant and thankfully this was fixed in 
the new vGIC.

As I said in a different sub-thread, I would reluctanly be ok to see 
returning 0 as long as we have add a warning for *every* access. 
Long-term, the current vGIC should really get killed.

Cheers,

[1] 62e0468650c30f0298822c580f382b16328119f6 "genirq: Add optional 
hardware synchronization for shutdown"
Stefano Stabellini Nov. 21, 2019, 6:50 p.m. UTC | #13
On Tue, 19 Nov 2019, Julien Grall wrote:
> Hi Andre,
> 
> On 12/11/2019 12:46, Andre Przywara wrote:
> > On Mon, 11 Nov 2019 11:01:07 -0800 (PST)
> > Stefano Stabellini <sstabellini@kernel.org> wrote:
> > > On Sat, 9 Nov 2019, Julien Grall wrote:
> > > > On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org>
> > > > wrote:
> > > >        On Thu, 7 Nov 2019, Peng Fan wrote:
> > > >        > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
> > > >        >
> > > >        > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > > > 
> > > >        Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > > > 
> > > > 
> > > > To be honest, I am not sure the code is correct. A read to those
> > > > registers should tell you the list of interrupts active. As we always
> > > > return 0, this will not return the correct state of the GIC.
> > > > 
> > > > I know that returning the list of actives interrupts is complicated with
> > > > the old vGIC, but I don't think silently ignoring it is a good
> > > > idea.
> > > > The question here is why the guest accessed those registers? What is it
> > > > trying to figure out?
> > 
> > I see Linux querying the active state (IRQCHIP_STATE_ACTIVE) at two relevant
> > points for ARM:
> > - In kernel/irq/manage.c, in __synchronize_hardirq().
> > - In KVM's arch timer emulation code.
> > 
> > I think the latter is of no concern (yet), but the first might actually
> > trigger. At the moment it's beyond me what it actually does, but maybe some
> > IRQ changes (RT, threaded IRQs?) trigger this now?\
> It happens I am sitting right next to Marc now, so I had a chat with him about
> this :). Let me try to summarize the discussion here.
> 
> __synchronize_hardirq() is used to ensure that all active interrupts have been
> handled before continuing. When sync_chip == false, we only ensure that all in
> progress interrupts (from Linux PoV) are handled before continue.
> 
> sync_chip == true will additionally ensure that any interrupt that were
> acknowledge but not yet marked as in progress by the kernel are also handled.
> The assumption is this is called after the interrupt has been masked/disabled.
> 
> The call to __synchronize_hardirq() in free_irq() (as reported by Peng stack
> trace) was introduced recently (see [1]). It is not entirely clear whether
> this would affect anyone using Linux 5.4 or just a limited subset of users.
> 
> Anyhow, this is a genuine bug and I think returning 0 is only papering over
> the bug with no long-term resolution. As Marc pointed out, Even the old vGIC
> in KVM was not spec compliant and thankfully this was fixed in the new vGIC.
> 
> As I said in a different sub-thread, I would reluctanly be ok to see returning
> 0 as long as we have add a warning for *every* access. Long-term, the current
> vGIC should really get killed.

I appreciate your intention of raising awareness and getting help in
fixing things in the community, which is the right thing to do. However,
I am doubtful of the usefulness of a warning to achieve the goal. Maybe
it would be best to curate an "open issues" section somewhere, even just
as an email after every release or as part of the release notes, or as a
jira ticket, or a wikipage, or a document under docs/.

Actually, an "open issues" document under docs/ could be a good idea for
this and other similar items.

A warning is a blunt instrument that lacks the subtleties necessary to
raise the attention in the right way and achieve the goal of getting
help and contributions. Especially it has the risk of causing problems
and confusion with less knowledgeable users.  Maybe a dprintk warning
message (only DEBUG builds) could avoid some of the issues, while still
gaining attention of savvy developers who could understand what it means.
But I think that the "open issues" document would be more effective.
Julien Grall Nov. 21, 2019, 7:25 p.m. UTC | #14
Hi Stefano,

On 21/11/2019 18:50, Stefano Stabellini wrote:
> On Tue, 19 Nov 2019, Julien Grall wrote:
>> Hi Andre,
>>
>> On 12/11/2019 12:46, Andre Przywara wrote:
>>> On Mon, 11 Nov 2019 11:01:07 -0800 (PST)
>>> Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>> On Sat, 9 Nov 2019, Julien Grall wrote:
>>>>> On Sat, 9 Nov 2019, 04:27 Stefano Stabellini, <sstabellini@kernel.org>
>>>>> wrote:
>>>>>         On Thu, 7 Nov 2019, Peng Fan wrote:
>>>>>         > The end should be GICD_ISACTIVERN not GICD_ISACTIVER.
>>>>>         >
>>>>>         > Signed-off-by: Peng Fan <peng.fan@nxp.com>
>>>>>
>>>>>         Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
>>>>>
>>>>>
>>>>> To be honest, I am not sure the code is correct. A read to those
>>>>> registers should tell you the list of interrupts active. As we always
>>>>> return 0, this will not return the correct state of the GIC.
>>>>>
>>>>> I know that returning the list of actives interrupts is complicated with
>>>>> the old vGIC, but I don't think silently ignoring it is a good
>>>>> idea.
>>>>> The question here is why the guest accessed those registers? What is it
>>>>> trying to figure out?
>>>
>>> I see Linux querying the active state (IRQCHIP_STATE_ACTIVE) at two relevant
>>> points for ARM:
>>> - In kernel/irq/manage.c, in __synchronize_hardirq().
>>> - In KVM's arch timer emulation code.
>>>
>>> I think the latter is of no concern (yet), but the first might actually
>>> trigger. At the moment it's beyond me what it actually does, but maybe some
>>> IRQ changes (RT, threaded IRQs?) trigger this now?\
>> It happens I am sitting right next to Marc now, so I had a chat with him about
>> this :). Let me try to summarize the discussion here.
>>
>> __synchronize_hardirq() is used to ensure that all active interrupts have been
>> handled before continuing. When sync_chip == false, we only ensure that all in
>> progress interrupts (from Linux PoV) are handled before continue.
>>
>> sync_chip == true will additionally ensure that any interrupt that were
>> acknowledge but not yet marked as in progress by the kernel are also handled.
>> The assumption is this is called after the interrupt has been masked/disabled.
>>
>> The call to __synchronize_hardirq() in free_irq() (as reported by Peng stack
>> trace) was introduced recently (see [1]). It is not entirely clear whether
>> this would affect anyone using Linux 5.4 or just a limited subset of users.
>>
>> Anyhow, this is a genuine bug and I think returning 0 is only papering over
>> the bug with no long-term resolution. As Marc pointed out, Even the old vGIC
>> in KVM was not spec compliant and thankfully this was fixed in the new vGIC.
>>
>> As I said in a different sub-thread, I would reluctanly be ok to see returning
>> 0 as long as we have add a warning for *every* access. Long-term, the current
>> vGIC should really get killed.
> 
> I appreciate your intention of raising awareness and getting help in
> fixing things in the community, which is the right thing to do. However,
> I am doubtful of the usefulness of a warning to achieve the goal. Maybe
> it would be best to curate an "open issues" section somewhere, even just
> as an email after every release or as part of the release notes, or as a
> jira ticket, or a wikipage, or a document under docs/.

The state of the brokeness of the vGIC have been documented numerous 
time on the ML (see [1]) and on JIRA (see XEN-92). And there are 
probably more not reported (Marc mentionned a new one during the week). 
Yet no-one ever looked at finishing the new vGIC.

So allow me to doubt that the documentation is going to help here...

> 
> Actually, an "open issues" document under docs/ could be a good idea for
> this and other similar items.
> 
> A warning is a blunt instrument that lacks the subtleties necessary to
> raise the attention in the right way and achieve the goal of getting
> help and contributions. Especially it has the risk of causing problems
> and confusion with less knowledgeable users.  Maybe a dprintk warning
> message (only DEBUG builds) could avoid some of the issues, while still
> gaining attention of savvy developers who could understand what it means.
> But I think that the "open issues" document would be more effective.

See above, this is not the first time the state of the vGIC was raised. 
While I agree this is a blunt instruction, all the other options had no 
effect so far. So with this solution maybe someone will soon realized 
there are effort to put in the core architecture of Xen.

Now, if you suggest me a plan (a person, a date of completion...) for 
fixing properly the vGIC then I would be more willing to accept the 
workaround suggested here.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2018-02/msg00784.html
Stefano Stabellini Nov. 21, 2019, 10:37 p.m. UTC | #15
On Thu, 21 Nov 2019, Julien Grall wrote:
> > > As I said in a different sub-thread, I would reluctanly be ok to see
> > > returning
> > > 0 as long as we have add a warning for *every* access. Long-term, the
> > > current
> > > vGIC should really get killed.

After speaking with Julien and reading more about the registers in
question and the Linux changes in 5.4, I changed my mind and agree with
the warning.

Peng, would you be willing to submit a new update of the patch, adding a
warning also? You can reuse the words used in the warning from
vgic_v2_distr_mmio_write, something like:

  printk(XENLOG_G_ERR "%pv: vGICD: unhandled read from ISACTIVER%d\n",
         v, gicd_reg - GICD_ISACTIVER);


Juergen, this is an easy-to-understand typo fix, plus a new warning. It
is important to be able to run Linux 5.4 (see
https://marc.info/?l=xen-devel&m=157372234429588). I think it should go
in 4.13.
diff mbox series

Patch

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 422b94f902..e802f2055a 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -706,7 +706,7 @@  static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
         goto read_as_zero;
 
     /* Read the active status of an IRQ via GICD/GICR is not supported */
-    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
+    case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
     case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
         goto read_as_zero;