xen/arm: fix SBDF calculation for vPCI MMIO handlers

Message ID	20211027082533.1406015-1-andr2000@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=eAWX=PP=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3078961090 Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Oleksandr Andrushchenko <andr2000@gmail.com> To: xen-devel@lists.xenproject.org Cc: julien@xen.org, sstabellini@kernel.org, iwj@xenproject.org, bertrand.marquis@arm.com, rahul.singh@arm.com, Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Subject: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers Date: Wed, 27 Oct 2021 11:25:33 +0300 Message-Id: <20211027082533.1406015-1-andr2000@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	xen/arm: fix SBDF calculation for vPCI MMIO handlers \| expand xen/arm: fix SBDF calculation for vPCI MMIO handlers

Oleksandr Andrushchenko Oct. 27, 2021, 8:25 a.m. UTC

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
the base address may not be aligned in the way that the translation
always work. If not adjusted with respect to the base address it may not be
able to properly convert SBDF and crashes:

(XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000

Fix this by adjusting the gpa with respect to the host bridge base address
in a way as it is done for x86.

Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/arch/arm/vpci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Roger Pau Monné Oct. 27, 2021, 8:59 a.m. UTC | #1

On Wed, Oct 27, 2021 at 11:25:33AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> the base address may not be aligned in the way that the translation
> always work.

I think I've already raised this during review [0]. But this is only a
problem if you change the current value of GUEST_VPCI_ECAM_BASE
AFAICT, as the current value has bits [0,27] clear.

I assume this is a problem for the hardware domain that needs to trap
random base addresses as present on hardware, but that code hasn't
been committed yet.

If that's indeed the case, please expand the commit message to contain
this information.

Thanks, Roger.

[0] https://lore.kernel.org/xen-devel/YWlnc3b0sj4akUWa@MacBook-Air-de-Roger.local/

Oleksandr Andrushchenko Oct. 27, 2021, 9:04 a.m. UTC | #2

Hi, Roger!

On 27.10.21 11:59, Roger Pau Monné wrote:
> On Wed, Oct 27, 2021 at 11:25:33AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>> the base address may not be aligned in the way that the translation
>> always work.
> I think I've already raised this during review [0]. But this is only a
> problem if you change the current value of GUEST_VPCI_ECAM_BASE
> AFAICT, as the current value has bits [0,27] clear.
Exactly, so we were just lucky not to hit this before
>
> I assume this is a problem for the hardware domain that needs to trap
> random base addresses as present on hardware, but that code hasn't
> been committed yet.
Yes, I am facing this on the real HW when Dom0's access is trapped
and the base is not taken into account. So, I have a patch for the
future upstream which subtracts the relevant base from the gpa,
e.g. either GUEST_VPCI_ECAM_BASE or bridge->cfg->phys_addr
>
> If that's indeed the case, please expand the commit message to contain
> this information.
I can only mention about "the current value of GUEST_VPCI_ECAM_BASE
AFAICT, as the current value has bits [0,27] clear" as of now because
Dom0 traps are not yet there.
>
> Thanks, Roger.
>
> [0] https://lore.kernel.org/xen-devel/YWlnc3b0sj4akUWa@MacBook-Air-de-Roger.local/
Thank you,
Oleksandr

P.S. Sorry I failed to mark this patch as a fix for 4.16 and explain why it is
a good candidate for 4.16 inclusion

Roger Pau Monné Oct. 27, 2021, 9:23 a.m. UTC | #3

On Wed, Oct 27, 2021 at 09:04:39AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 27.10.21 11:59, Roger Pau Monné wrote:
> > On Wed, Oct 27, 2021 at 11:25:33AM +0300, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>
> >> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> >> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> >> the base address may not be aligned in the way that the translation
> >> always work.
> > I think I've already raised this during review [0]. But this is only a
> > problem if you change the current value of GUEST_VPCI_ECAM_BASE
> > AFAICT, as the current value has bits [0,27] clear.
> Exactly, so we were just lucky not to hit this before
> >
> > I assume this is a problem for the hardware domain that needs to trap
> > random base addresses as present on hardware, but that code hasn't
> > been committed yet.
> Yes, I am facing this on the real HW when Dom0's access is trapped
> and the base is not taken into account. So, I have a patch for the
> future upstream which subtracts the relevant base from the gpa,
> e.g. either GUEST_VPCI_ECAM_BASE or bridge->cfg->phys_addr
> >
> > If that's indeed the case, please expand the commit message to contain
> > this information.
> I can only mention about "the current value of GUEST_VPCI_ECAM_BASE
> AFAICT, as the current value has bits [0,27] clear" as of now because
> Dom0 traps are not yet there.

Indeed, I would rather mention both cases, ie:

"This change is not strictly required given the current value of
GUEST_VPCI_ECAM_BASE which has bits 0 to 27 clear, but could cause
issues if such value is changed, or when handlers for dom0 ECAM
regions are added as those will be mapped over existing hardware
regions that could use non-aligned base addresses."

Regards, Roger.

Oleksandr Andrushchenko Oct. 27, 2021, 9:46 a.m. UTC | #4

On 27.10.21 12:23, Roger Pau Monné wrote:
> On Wed, Oct 27, 2021 at 09:04:39AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 27.10.21 11:59, Roger Pau Monné wrote:
>>> On Wed, Oct 27, 2021 at 11:25:33AM +0300, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>> the base address may not be aligned in the way that the translation
>>>> always work.
>>> I think I've already raised this during review [0]. But this is only a
>>> problem if you change the current value of GUEST_VPCI_ECAM_BASE
>>> AFAICT, as the current value has bits [0,27] clear.
>> Exactly, so we were just lucky not to hit this before
>>> I assume this is a problem for the hardware domain that needs to trap
>>> random base addresses as present on hardware, but that code hasn't
>>> been committed yet.
>> Yes, I am facing this on the real HW when Dom0's access is trapped
>> and the base is not taken into account. So, I have a patch for the
>> future upstream which subtracts the relevant base from the gpa,
>> e.g. either GUEST_VPCI_ECAM_BASE or bridge->cfg->phys_addr
>>> If that's indeed the case, please expand the commit message to contain
>>> this information.
>> I can only mention about "the current value of GUEST_VPCI_ECAM_BASE
>> AFAICT, as the current value has bits [0,27] clear" as of now because
>> Dom0 traps are not yet there.
> Indeed, I would rather mention both cases, ie:
>
> "This change is not strictly required given the current value of
> GUEST_VPCI_ECAM_BASE which has bits 0 to 27 clear, but could cause
> issues if such value is changed, or when handlers for dom0 ECAM
> regions are added as those will be mapped over existing hardware
> regions that could use non-aligned base addresses."
Sounds good, will add
>
> Regards, Roger.
Thank you,
Oleksandr

Ian Jackson Oct. 27, 2021, 5:07 p.m. UTC | #5

Oleksandr Andrushchenko writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
> P.S. Sorry I failed to mark this patch as a fix for 4.16 and explain why it is
> a good candidate for 4.16 inclusion

Hello :-).

Um, can you explain what the practical impact is of not taking this
patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
4.16 and this is not expected to change ?  So there would be no
benefit to users, and taking the patch would add small but nonzero
risk ?

I think according to the freeze policy I set this can go in if the
maintainers feel it is a "straightforward bugfix", but provided it
goes in by the end of this coming Friday.

After that it will need a release-ack and right now, unless I am
mistaken (which may well be the case) it can just as well wait ?

Thanks,
Ian.

Julien Grall Oct. 27, 2021, 5:35 p.m. UTC | #6

Hi Oleksandr,

On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> the base address may not be aligned in the way that the translation
> always work. If not adjusted with respect to the base address it may not be
> able to properly convert SBDF and crashes:
> 
> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc

I can't find a printk() that may output this message. Where does this 
comes from?

Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not 
mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.

IMHO, the stack trace should come from usptream Xen or need some 
information to explain how this was reproduced.

> (XEN) Data Abort Trap. Syndrome=0x6
> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
I can understnad that if we don't substract GUEST_VPCI_ECAM, we would 
(in theory) not get the correct BDF. But... I don't understand how this 
would result to a data abort in the hypervisor.

In fact, I think the vPCI code should be resilient enough to not crash 
if we pass the wrong BDF.

When there is a data abort in Xen, you should get a stack trace from 
where this comes from. Can you paste it here?

> 
> Fix this by adjusting the gpa with respect to the host bridge base address
> in a way as it is done for x86.
> 
> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   xen/arch/arm/vpci.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 8f40a0dec6d2..23f45386f4b3 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>       unsigned long data;
>   
>       /* We ignore segment part and always handle segment 0 */
> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);

Looking at the rest of the rest, it seems that
  1) the issue is latent as the bits 0-27 are clear
  2) this will need to be modified to take into account dom0.

So I would prefer if the base address is passed differently (maybe in 
priv?) from the start. This will avoid extra modification that you 
already plan to have in a follow-up series.

>   
>       if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>                           1U << info->dabt.size, &data) )
> @@ -44,7 +44,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>       pci_sbdf_t sbdf;
>   
>       /* We ignore segment part and always handle segment 0 */
> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>   
>       return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>                              1U << info->dabt.size, r);
> 

Cheers,

Oleksandr Andrushchenko Oct. 28, 2021, 12:09 p.m. UTC | #7

Hi, Julien!

On 27.10.21 20:35, Julien Grall wrote:
> Hi Oleksandr,
>
> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>> the base address may not be aligned in the way that the translation
>> always work. If not adjusted with respect to the base address it may not be
>> able to properly convert SBDF and crashes:
>>
>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>
> I can't find a printk() that may output this message. Where does this comes from?
That was a debug print. I shouldn't have used that in the patch description, but
probably after "---" to better explain what's happening
>
> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
This is from dom0 I am working on now.
>
> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>
>> (XEN) Data Abort Trap. Syndrome=0x6
>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>
> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
guest which may force Xen to access the memory beyond that of PCI host bridge
>
> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
(XEN) 0TH[0x0] = 0x00000000481d4f7f
(XEN) 1ST[0x1] = 0x00000000481d2f7f
(XEN) 2ND[0x33] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
(XEN) LR:     000000000026d36c
(XEN) SP:     000080007ff97c00
(XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
(XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
(XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
(XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
(XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
(XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
(XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
(XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
(XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
(XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
(XEN)
(XEN)   VTCR_EL2: 00000000800a3558
(XEN)  VTTBR_EL2: 00010000bffba000
(XEN)
(XEN)  SCTLR_EL2: 0000000030cd183d
(XEN)    HCR_EL2: 00000000807c663f
(XEN)  TTBR0_EL2: 00000000481d5000
(XEN)
(XEN)    ESR_EL2: 0000000096000006
(XEN)  HPFAR_EL2: 0000000000e65d00
(XEN)    FAR_EL2: 00000000467a28bc
(XEN)
[snip]
(XEN) Xen call trace:
(XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
(XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
(XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
(XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
(XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
(XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
(XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
(XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
(XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
(XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
(XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
(XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************

>
>>
>> Fix this by adjusting the gpa with respect to the host bridge base address
>> in a way as it is done for x86.
>>
>> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   xen/arch/arm/vpci.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>> index 8f40a0dec6d2..23f45386f4b3 100644
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>       unsigned long data;
>>         /* We ignore segment part and always handle segment 0 */
>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>
> Looking at the rest of the rest, it seems that
>  1) the issue is latent as the bits 0-27 are clear
>  2) this will need to be modified to take into account dom0.
>
> So I would prefer if the base address is passed differently (maybe in priv?) from the start. This will avoid extra modification that you already plan to have in a follow-up series.
I was thinking about the same, but the future code will use priv for other purpose:

static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
                           register_t *r, void *p)
{
     struct pci_host_bridge *bridge = p;
     pci_sbdf_t sbdf;
     /* data is needed to prevent a pointer cast on 32bit */
     unsigned long data;

     if ( bridge )
     {
         sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
         sbdf.seg = bridge->segment;
     }
     else
         sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);

>
>>         if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                           1U << info->dabt.size, &data) )
>> @@ -44,7 +44,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>>       pci_sbdf_t sbdf;
>>         /* We ignore segment part and always handle segment 0 */
>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>         return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                              1U << info->dabt.size, r);
>>
>
> Cheers,
>
Thank you,
Oleksandr

Julien Grall Oct. 28, 2021, 1:22 p.m. UTC | #8

On 28/10/2021 13:09, Oleksandr Andrushchenko wrote:
> Hi, Julien!

Hello,

> On 27.10.21 20:35, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>> the base address may not be aligned in the way that the translation
>>> always work. If not adjusted with respect to the base address it may not be
>>> able to properly convert SBDF and crashes:
>>>
>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>
>> I can't find a printk() that may output this message. Where does this comes from?
> That was a debug print. I shouldn't have used that in the patch description, but
> probably after "---" to better explain what's happening
>>
>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
> This is from dom0 I am working on now.
>>
>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>
>>> (XEN) Data Abort Trap. Syndrome=0x6
>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>
>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
> Well, there is no (?) easy way to validate SBDF.

AFAICT pci_ecam_map_bus() is already doing some validation for the bus 
number. So...

  And this could be a problem if we have a misbehaving
> guest which may force Xen to access the memory beyond that of PCI host bridge
>>
>> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
> (XEN) Data Abort Trap. Syndrome=0x6
> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> (XEN) 0TH[0x0] = 0x00000000481d4f7f
> (XEN) 1ST[0x1] = 0x00000000481d2f7f
> (XEN) 2ND[0x33] = 0x0000000000000000
> (XEN) CPU0: Unexpected Trap: Data Abort

... I am getting quite confused why this is crashing. Are we validation 
correctly the access?


> (XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
> (XEN) LR:     000000000026d36c
> (XEN) SP:     000080007ff97c00
> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
> (XEN)
> (XEN)   VTCR_EL2: 00000000800a3558
> (XEN)  VTTBR_EL2: 00010000bffba000
> (XEN)
> (XEN)  SCTLR_EL2: 0000000030cd183d
> (XEN)    HCR_EL2: 00000000807c663f
> (XEN)  TTBR0_EL2: 00000000481d5000
> (XEN)
> (XEN)    ESR_EL2: 0000000096000006
> (XEN)  HPFAR_EL2: 0000000000e65d00
> (XEN)    FAR_EL2: 00000000467a28bc
> (XEN)
> [snip]
> (XEN) Xen call trace:
> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ****************************************
> 
>>
>>>
>>> Fix this by adjusting the gpa with respect to the host bridge base address
>>> in a way as it is done for x86.
>>>
>>> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>>    xen/arch/arm/vpci.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>>> index 8f40a0dec6d2..23f45386f4b3 100644
>>> --- a/xen/arch/arm/vpci.c
>>> +++ b/xen/arch/arm/vpci.c
>>> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>        unsigned long data;
>>>          /* We ignore segment part and always handle segment 0 */
>>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>
>> Looking at the rest of the rest, it seems that
>>   1) the issue is latent as the bits 0-27 are clear
>>   2) this will need to be modified to take into account dom0.
>>
>> So I would prefer if the base address is passed differently (maybe in priv?) from the start. This will avoid extra modification that you already plan to have in a follow-up series.
> I was thinking about the same, but the future code will use priv for other purpose:
> 
> static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>                             register_t *r, void *p)
> {
>       struct pci_host_bridge *bridge = p;
>       pci_sbdf_t sbdf;
>       /* data is needed to prevent a pointer cast on 32bit */
>       unsigned long data;
> 
>       if ( bridge )
>       {
>           sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>           sbdf.seg = bridge->segment;
>       }
>       else
>           sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);

Is it the only place you are doing to use bridge? If so, then I think we 
can simply have a structure that would contain phys_addr and segment.

This would be include in the bridge for dom0 and for guest this could be 
a static global variable for now.

Cheers,

Roger Pau Monné Oct. 28, 2021, 1:36 p.m. UTC | #9

On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
> Hi, Julien!
> 
> On 27.10.21 20:35, Julien Grall wrote:
> > Hi Oleksandr,
> >
> > On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>
> >> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> >> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> >> the base address may not be aligned in the way that the translation
> >> always work. If not adjusted with respect to the base address it may not be
> >> able to properly convert SBDF and crashes:
> >>
> >> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
> >
> > I can't find a printk() that may output this message. Where does this comes from?
> That was a debug print. I shouldn't have used that in the patch description, but
> probably after "---" to better explain what's happening
> >
> > Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
> This is from dom0 I am working on now.
> >
> > IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
> >
> >> (XEN) Data Abort Trap. Syndrome=0x6
> >> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> > I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
> >
> > In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
> guest which may force Xen to access the memory beyond that of PCI host bridge

How could that be? The ECAM region exposed to the guest you should be
the same as the physical one for dom0?

And for domUs you really need to fix vpci_{read,write} to not
passthrough accesses not explicitly handled.

> > When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
> (XEN) Data Abort Trap. Syndrome=0x6
> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> (XEN) 0TH[0x0] = 0x00000000481d4f7f
> (XEN) 1ST[0x1] = 0x00000000481d2f7f
> (XEN) 2ND[0x33] = 0x0000000000000000
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
> (XEN) LR:     000000000026d36c
> (XEN) SP:     000080007ff97c00
> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
> (XEN)
> (XEN)   VTCR_EL2: 00000000800a3558
> (XEN)  VTTBR_EL2: 00010000bffba000
> (XEN)
> (XEN)  SCTLR_EL2: 0000000030cd183d
> (XEN)    HCR_EL2: 00000000807c663f
> (XEN)  TTBR0_EL2: 00000000481d5000
> (XEN)
> (XEN)    ESR_EL2: 0000000096000006
> (XEN)  HPFAR_EL2: 0000000000e65d00
> (XEN)    FAR_EL2: 00000000467a28bc
> (XEN)
> [snip]
> (XEN) Xen call trace:
> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ****************************************

Are you exposing an ECAM region to the guest bigger than the
underlying one, and that's why you get crashes? (because you get out of
the hardware range)

I would assume physical accesses to the ECAM area reported by the
hardware don't trigger traps?

Roger.

Oleksandr Andrushchenko Oct. 28, 2021, 2:16 p.m. UTC | #10

On 28.10.21 16:22, Julien Grall wrote:
> On 28/10/2021 13:09, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>
> Hello,
>
>> On 27.10.21 20:35, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>> the base address may not be aligned in the way that the translation
>>>> always work. If not adjusted with respect to the base address it may not be
>>>> able to properly convert SBDF and crashes:
>>>>
>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>
>>> I can't find a printk() that may output this message. Where does this comes from?
>> That was a debug print. I shouldn't have used that in the patch description, but
>> probably after "---" to better explain what's happening
>>>
>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>> This is from dom0 I am working on now.
>>>
>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>
>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>
>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>> Well, there is no (?) easy way to validate SBDF.
>
> AFAICT pci_ecam_map_bus() is already doing some validation for the bus number. So...
what it does is not enough as...
     if ( busn < cfg->busn_start || busn > cfg->busn_end )
         return NULL;

     busn -= cfg->busn_start;
     base = cfg->win + (busn << ops->bus_shift);

     return base + (PCI_DEVFN2(sbdf.bdf) << devfn_shift) + where;
this can still overrun
>
>  And this could be a problem if we have a misbehaving
>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>
>>> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
>> (XEN) Data Abort Trap. Syndrome=0x6
>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>> (XEN) 0TH[0x0] = 0x00000000481d4f7f
>> (XEN) 1ST[0x1] = 0x00000000481d2f7f
>> (XEN) 2ND[0x33] = 0x0000000000000000
>> (XEN) CPU0: Unexpected Trap: Data Abort
>
> ... I am getting quite confused why this is crashing. Are we validation correctly the access?
See above. If provided with big enough SBDF we can end up getting out of the window.
>
>
>> (XEN) ----[ Xen-4.16-unstable  arm64 debug=y  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
>> (XEN) LR:     000000000026d36c
>> (XEN) SP:     000080007ff97c00
>> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
>> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
>> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
>> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
>> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
>> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
>> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
>> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
>> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
>> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
>> (XEN)
>> (XEN)   VTCR_EL2: 00000000800a3558
>> (XEN)  VTTBR_EL2: 00010000bffba000
>> (XEN)
>> (XEN)  SCTLR_EL2: 0000000030cd183d
>> (XEN)    HCR_EL2: 00000000807c663f
>> (XEN)  TTBR0_EL2: 00000000481d5000
>> (XEN)
>> (XEN)    ESR_EL2: 0000000096000006
>> (XEN)  HPFAR_EL2: 0000000000e65d00
>> (XEN)    FAR_EL2: 00000000467a28bc
>> (XEN)
>> [snip]
>> (XEN) Xen call trace:
>> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
>> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
>> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
>> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
>> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
>> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
>> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
>> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
>> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
>> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
>> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
>> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> (XEN) ****************************************
>>
>>>
>>>>
>>>> Fix this by adjusting the gpa with respect to the host bridge base address
>>>> in a way as it is done for x86.
>>>>
>>>> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> ---
>>>>    xen/arch/arm/vpci.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>>>> index 8f40a0dec6d2..23f45386f4b3 100644
>>>> --- a/xen/arch/arm/vpci.c
>>>> +++ b/xen/arch/arm/vpci.c
>>>> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>>        unsigned long data;
>>>>          /* We ignore segment part and always handle segment 0 */
>>>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>>>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>>
>>> Looking at the rest of the rest, it seems that
>>>   1) the issue is latent as the bits 0-27 are clear
>>>   2) this will need to be modified to take into account dom0.
>>>
>>> So I would prefer if the base address is passed differently (maybe in priv?) from the start. This will avoid extra modification that you already plan to have in a follow-up series.
>> I was thinking about the same, but the future code will use priv for other purpose:
>>
>> static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>                             register_t *r, void *p)
>> {
>>       struct pci_host_bridge *bridge = p;
>>       pci_sbdf_t sbdf;
>>       /* data is needed to prevent a pointer cast on 32bit */
>>       unsigned long data;
>>
>>       if ( bridge )
>>       {
>>           sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>>           sbdf.seg = bridge->segment;
>>       }
>>       else
>>           sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>
> Is it the only place you are doing to use bridge? If so, then I think we can simply have a structure that would contain phys_addr and segment.
>
> This would be include in the bridge for dom0 and for guest this could be a static global variable for now.
Hm. I don't think a global is any better than using info->gpa - GUEST_VPCI_ECAM_BASE.
But I am fine with the structure: please let me know your preference,
so I can have an acceptable fix
>
> Cheers,
>
Thank you,
Oleksandr

Oleksandr Andrushchenko Oct. 28, 2021, 2:23 p.m. UTC | #11

On 28.10.21 16:36, Roger Pau Monné wrote:
> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>>
>> On 27.10.21 20:35, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>> the base address may not be aligned in the way that the translation
>>>> always work. If not adjusted with respect to the base address it may not be
>>>> able to properly convert SBDF and crashes:
>>>>
>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>> I can't find a printk() that may output this message. Where does this comes from?
>> That was a debug print. I shouldn't have used that in the patch description, but
>> probably after "---" to better explain what's happening
>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>> This is from dom0 I am working on now.
>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>
>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>
>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
>> guest which may force Xen to access the memory beyond that of PCI host bridge
> How could that be? The ECAM region exposed to the guest you should be
> the same as the physical one for dom0?
Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
implement the driver for it, so I can be wrong here):
- Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
- "Client" ECAM area ("config")
So from Dom0 POV we have 2 ECAM regions and for the guest
we always emulate a single big region:
/*
  * 256 MB is reserved for VPCI configuration space based on calculation
  * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
  */
#define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
#define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)

So, we have the base address and size of the emulated ECAM space
not connected to the real host bridge
>
> And for domUs you really need to fix vpci_{read,write} to not
> passthrough accesses not explicitly handled.
Do you mean that we need to validate SBDFs there?
This can be tricky if we have a use-case when a PCI device being
passed through if not put at 0000:00:0.0, but requested to be, for
example, 0000:0d:0.0. So, we need to go over the list of virtual
devices and see if SBDF the guest is trying to access is a valid SBDF.
Is this what you mean?
>
>>> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
>> (XEN) Data Abort Trap. Syndrome=0x6
>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>> (XEN) 0TH[0x0] = 0x00000000481d4f7f
>> (XEN) 1ST[0x1] = 0x00000000481d2f7f
>> (XEN) 2ND[0x33] = 0x0000000000000000
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> (XEN) ----[ Xen-4.16-unstable  arm64  debug=y  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
>> (XEN) LR:     000000000026d36c
>> (XEN) SP:     000080007ff97c00
>> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
>> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
>> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
>> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
>> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
>> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
>> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
>> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
>> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
>> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
>> (XEN)
>> (XEN)   VTCR_EL2: 00000000800a3558
>> (XEN)  VTTBR_EL2: 00010000bffba000
>> (XEN)
>> (XEN)  SCTLR_EL2: 0000000030cd183d
>> (XEN)    HCR_EL2: 00000000807c663f
>> (XEN)  TTBR0_EL2: 00000000481d5000
>> (XEN)
>> (XEN)    ESR_EL2: 0000000096000006
>> (XEN)  HPFAR_EL2: 0000000000e65d00
>> (XEN)    FAR_EL2: 00000000467a28bc
>> (XEN)
>> [snip]
>> (XEN) Xen call trace:
>> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
>> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
>> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
>> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
>> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
>> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
>> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
>> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
>> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
>> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
>> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
>> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> (XEN) ****************************************
> Are you exposing an ECAM region to the guest bigger than the
> underlying one, and that's why you get crashes? (because you get out of
> the hardware range)
Please see above
> I would assume physical accesses to the ECAM area reported by the
> hardware don't trigger traps?
No
>
> Roger.
Thank you,
Oleksandr

Julien Grall Oct. 28, 2021, 2:28 p.m. UTC | #12

Hi,

On 28/10/2021 15:16, Oleksandr Andrushchenko wrote:
> On 28.10.21 16:22, Julien Grall wrote:
>> On 28/10/2021 13:09, Oleksandr Andrushchenko wrote:
>>> Hi, Julien!
>>
>> Hello,
>>
>>> On 27.10.21 20:35, Julien Grall wrote:
>>>> Hi Oleksandr,
>>>>
>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>> the base address may not be aligned in the way that the translation
>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>> able to properly convert SBDF and crashes:
>>>>>
>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>
>>>> I can't find a printk() that may output this message. Where does this comes from?
>>> That was a debug print. I shouldn't have used that in the patch description, but
>>> probably after "---" to better explain what's happening
>>>>
>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>> This is from dom0 I am working on now.
>>>>
>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>
>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>
>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>> Well, there is no (?) easy way to validate SBDF.
>>
>> AFAICT pci_ecam_map_bus() is already doing some validation for the bus number. So...
> what it does is not enough as...
>       if ( busn < cfg->busn_start || busn > cfg->busn_end )
>           return NULL;
> 
>       busn -= cfg->busn_start;
>       base = cfg->win + (busn << ops->bus_shift);
> 
>       return base + (PCI_DEVFN2(sbdf.bdf) << devfn_shift) + where;
> this can still overrun

Thanks, I guessed this was not enough... What I am trying to understand 
is *why* this is not enough *and* whether we need to add more validation.

>>
>>   And this could be a problem if we have a misbehaving
>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>>
>>>> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
>>> (XEN) Data Abort Trap. Syndrome=0x6
>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>> (XEN) 0TH[0x0] = 0x00000000481d4f7f
>>> (XEN) 1ST[0x1] = 0x00000000481d2f7f
>>> (XEN) 2ND[0x33] = 0x0000000000000000
>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>
>> ... I am getting quite confused why this is crashing. Are we validation correctly the access?
> See above. If provided with big enough SBDF we can end up getting out of the window.

Shouldn't we validate that we are still in the window?

>>
>>
>>> (XEN) ----[ Xen-4.16-unstable  arm64 debug=y  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
>>> (XEN) LR:     000000000026d36c
>>> (XEN) SP:     000080007ff97c00
>>> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
>>> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
>>> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
>>> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
>>> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
>>> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
>>> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
>>> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
>>> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
>>> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
>>> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
>>> (XEN)
>>> (XEN)   VTCR_EL2: 00000000800a3558
>>> (XEN)  VTTBR_EL2: 00010000bffba000
>>> (XEN)
>>> (XEN)  SCTLR_EL2: 0000000030cd183d
>>> (XEN)    HCR_EL2: 00000000807c663f
>>> (XEN)  TTBR0_EL2: 00000000481d5000
>>> (XEN)
>>> (XEN)    ESR_EL2: 0000000096000006
>>> (XEN)  HPFAR_EL2: 0000000000e65d00
>>> (XEN)    FAR_EL2: 00000000467a28bc
>>> (XEN)
>>> [snip]
>>> (XEN) Xen call trace:
>>> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
>>> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
>>> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
>>> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
>>> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
>>> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
>>> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
>>> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
>>> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
>>> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
>>> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
>>> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>> (XEN) ****************************************
>>>
>>>>
>>>>>
>>>>> Fix this by adjusting the gpa with respect to the host bridge base address
>>>>> in a way as it is done for x86.
>>>>>
>>>>> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
>>>>>
>>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>> ---
>>>>>     xen/arch/arm/vpci.c | 4 ++--
>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>>>>> index 8f40a0dec6d2..23f45386f4b3 100644
>>>>> --- a/xen/arch/arm/vpci.c
>>>>> +++ b/xen/arch/arm/vpci.c
>>>>> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>>>         unsigned long data;
>>>>>           /* We ignore segment part and always handle segment 0 */
>>>>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>>>>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>>>
>>>> Looking at the rest of the rest, it seems that
>>>>    1) the issue is latent as the bits 0-27 are clear
>>>>    2) this will need to be modified to take into account dom0.
>>>>
>>>> So I would prefer if the base address is passed differently (maybe in priv?) from the start. This will avoid extra modification that you already plan to have in a follow-up series.
>>> I was thinking about the same, but the future code will use priv for other purpose:
>>>
>>> static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>                              register_t *r, void *p)
>>> {
>>>        struct pci_host_bridge *bridge = p;
>>>        pci_sbdf_t sbdf;
>>>        /* data is needed to prevent a pointer cast on 32bit */
>>>        unsigned long data;
>>>
>>>        if ( bridge )
>>>        {
>>>            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>>>            sbdf.seg = bridge->segment;
>>>        }
>>>        else
>>>            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>
>> Is it the only place you are doing to use bridge? If so, then I think we can simply have a structure that would contain phys_addr and segment.
>>
>> This would be include in the bridge for dom0 and for guest this could be a static global variable for now.
> Hm. I don't think a global is any better than using info->gpa - GUEST_VPCI_ECAM_BASE.
> But I am fine with the structure: please let me know your preference,
> so I can have an acceptable fix

The difference is you don't duplicate the same check in two places
Alternatively, I would be happy consider an helper that is used in both 
places.

Cheers,

Oleksandr Andrushchenko Oct. 28, 2021, 3:27 p.m. UTC | #13

On 28.10.21 17:28, Julien Grall wrote:
> Hi,
>
> On 28/10/2021 15:16, Oleksandr Andrushchenko wrote:
>> On 28.10.21 16:22, Julien Grall wrote:
>>> On 28/10/2021 13:09, Oleksandr Andrushchenko wrote:
>>>> Hi, Julien!
>>>
>>> Hello,
>>>
>>>> On 27.10.21 20:35, Julien Grall wrote:
>>>>> Hi Oleksandr,
>>>>>
>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>>> the base address may not be aligned in the way that the translation
>>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>>> able to properly convert SBDF and crashes:
>>>>>>
>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>>
>>>>> I can't find a printk() that may output this message. Where does this comes from?
>>>> That was a debug print. I shouldn't have used that in the patch description, but
>>>> probably after "---" to better explain what's happening
>>>>>
>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>>> This is from dom0 I am working on now.
>>>>>
>>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>>
>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>>
>>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>>> Well, there is no (?) easy way to validate SBDF.
>>>
>>> AFAICT pci_ecam_map_bus() is already doing some validation for the bus number. So...
>> what it does is not enough as...
>>       if ( busn < cfg->busn_start || busn > cfg->busn_end )
>>           return NULL;
>>
>>       busn -= cfg->busn_start;
>>       base = cfg->win + (busn << ops->bus_shift);
>>
>>       return base + (PCI_DEVFN2(sbdf.bdf) << devfn_shift) + where;
>> this can still overrun
>
> Thanks, I guessed this was not enough... What I am trying to understand is *why* this is not enough *and* whether we need to add more validation.
>
>>>
>>>   And this could be a problem if we have a misbehaving
>>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>>>
>>>>> When there is a data abort in Xen, you should get a stack trace from where this comes from. Can you paste it here?
>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>> (XEN) 0TH[0x0] = 0x00000000481d4f7f
>>>> (XEN) 1ST[0x1] = 0x00000000481d2f7f
>>>> (XEN) 2ND[0x33] = 0x0000000000000000
>>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>>
>>> ... I am getting quite confused why this is crashing. Are we validation correctly the access?
>> See above. If provided with big enough SBDF we can end up getting out of the window.
>
> Shouldn't we validate that we are still in the window?
Yes, this will work if check is implemented to respect bridge's config window.
>
>>>
>>>
>>>> (XEN) ----[ Xen-4.16-unstable  arm64 debug=y  Not tainted ]----
>>>> (XEN) CPU:    0
>>>> (XEN) PC:     000000000026d3d4 pci_generic_config_read+0x88/0x9c
>>>> (XEN) LR:     000000000026d36c
>>>> (XEN) SP:     000080007ff97c00
>>>> (XEN) CPSR:   0000000060400249 MODE:64-bit EL2h (Hypervisor, handler)
>>>> (XEN)      X0: 00000000467a28bc  X1: 00000000065d08bc  X2: 00000000000008bc
>>>> (XEN)      X3: 000000000000000c  X4: 000080007ffc6fd0  X5: 0000000000000000
>>>> (XEN)      X6: 0000000000000014  X7: ffff800011a58000  X8: ffff0000225a0380
>>>> (XEN)      X9: 0000000000000000 X10: 0101010101010101 X11: 0000000000000028
>>>> (XEN)     X12: 0101010101010101 X13: 0000000000000020 X14: ffffffffffffffff
>>>> (XEN)     X15: 0000000000000001 X16: ffff800010da6708 X17: 0000000000000020
>>>> (XEN)     X18: 0000000000000002 X19: 0000000000000004 X20: 000080007ff97c5c
>>>> (XEN)     X21: 00000000000008bc X22: 00000000000008bc X23: 0000000000000004
>>>> (XEN)     X24: 0000000000000000 X25: 00000000000008bc X26: 00000000000065d0
>>>> (XEN)     X27: 000080007ffb9010 X28: 0000000000000000  FP: 000080007ff97c00
>>>> (XEN)
>>>> (XEN)   VTCR_EL2: 00000000800a3558
>>>> (XEN)  VTTBR_EL2: 00010000bffba000
>>>> (XEN)
>>>> (XEN)  SCTLR_EL2: 0000000030cd183d
>>>> (XEN)    HCR_EL2: 00000000807c663f
>>>> (XEN)  TTBR0_EL2: 00000000481d5000
>>>> (XEN)
>>>> (XEN)    ESR_EL2: 0000000096000006
>>>> (XEN)  HPFAR_EL2: 0000000000e65d00
>>>> (XEN)    FAR_EL2: 00000000467a28bc
>>>> (XEN)
>>>> [snip]
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<000000000026d3d4>] pci_generic_config_read+0x88/0x9c (PC)
>>>> (XEN)    [<000000000026d36c>] pci_generic_config_read+0x20/0x9c (LR)
>>>> (XEN)    [<000000000026d2c8>] pci-access.c#pci_config_read+0x60/0x84
>>>> (XEN)    [<000000000026d4a8>] pci_conf_read32+0x10/0x18
>>>> (XEN)    [<000000000024dcf8>] vpci.c#vpci_read_hw+0x48/0xb8
>>>> (XEN)    [<000000000024e3e4>] vpci_read+0xac/0x24c
>>>> (XEN)    [<000000000024e934>] vpci_ecam_read+0x78/0xa8
>>>> (XEN)    [<000000000026e368>] vpci.c#vpci_mmio_read+0x44/0x7c
>>>> (XEN)    [<0000000000275054>] try_handle_mmio+0x1ec/0x264
>>>> (XEN)    [<000000000027ea50>] traps.c#do_trap_stage2_abort_guest+0x18c/0x2d8
>>>> (XEN)    [<000000000027f088>] do_trap_guest_sync+0xf0/0x618
>>>> (XEN)    [<0000000000269c58>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) CPU0: Unexpected Trap: Data Abort
>>>> (XEN) ****************************************
>>>>
>>>>>
>>>>>>
>>>>>> Fix this by adjusting the gpa with respect to the host bridge base address
>>>>>> in a way as it is done for x86.
>>>>>>
>>>>>> Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM")
>>>>>>
>>>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>> ---
>>>>>>     xen/arch/arm/vpci.c | 4 ++--
>>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>>>>>> index 8f40a0dec6d2..23f45386f4b3 100644
>>>>>> --- a/xen/arch/arm/vpci.c
>>>>>> +++ b/xen/arch/arm/vpci.c
>>>>>> @@ -24,7 +24,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>>>>         unsigned long data;
>>>>>>           /* We ignore segment part and always handle segment 0 */
>>>>>> -    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa);
>>>>>> +    sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>>>>
>>>>> Looking at the rest of the rest, it seems that
>>>>>    1) the issue is latent as the bits 0-27 are clear
>>>>>    2) this will need to be modified to take into account dom0.
>>>>>
>>>>> So I would prefer if the base address is passed differently (maybe in priv?) from the start. This will avoid extra modification that you already plan to have in a follow-up series.
>>>> I was thinking about the same, but the future code will use priv for other purpose:
>>>>
>>>> static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>>>                              register_t *r, void *p)
>>>> {
>>>>        struct pci_host_bridge *bridge = p;
>>>>        pci_sbdf_t sbdf;
>>>>        /* data is needed to prevent a pointer cast on 32bit */
>>>>        unsigned long data;
>>>>
>>>>        if ( bridge )
>>>>        {
>>>>            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>>>>            sbdf.seg = bridge->segment;
>>>>        }
>>>>        else
>>>>            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>>
>>> Is it the only place you are doing to use bridge? If so, then I think we can simply have a structure that would contain phys_addr and segment.
>>>
>>> This would be include in the bridge for dom0 and for guest this could be a static global variable for now.
>> Hm. I don't think a global is any better than using info->gpa - GUEST_VPCI_ECAM_BASE.
>> But I am fine with the structure: please let me know your preference,
>> so I can have an acceptable fix
>
> The difference is you don't duplicate the same check in two places
> Alternatively, I would be happy consider an helper that is used in both places.
But then we duplicate data inside "struct pci_host_bridge *bridge" because we need
to allocate (have it embedded) such a structure per bridge, e.g. we have

bridge->segment
bridge->cfg->phys_addr

and then we add a structure which contains the same:

bridge->new_struct.segment == bridge->segment
bridge->new_struct.phys_addr == bridge->cfg->phys_addr

This is so that bridge->new_struct can be passed as a private to register_mmio_handler

And the above seems to be no so bright comparing to just passing
bridge as private and using info->gpa - GUEST_VPCI_ECAM_BASE...
So, I would stay with simpler

     if ( bridge )
        {
            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
            sbdf.seg = bridge->segment;
        }
        else
            sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>
> Cheers,
>
Thank you,
Oleksandr

Julien Grall Oct. 28, 2021, 3:35 p.m. UTC | #14

Hi,

On 28/10/2021 16:27, Oleksandr Andrushchenko wrote:
> bridge as private and using info->gpa - GUEST_VPCI_ECAM_BASE...
> So, I would stay with simpler
> 
>       if ( bridge )
>          {
>              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>              sbdf.seg = bridge->segment;
>          }
>          else
>              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);

I am fine with that so long this is part of an helper (maybe 
vpci_sbdf_from_addr()) rather than duplicated.

Cheers,

Ian Jackson Oct. 28, 2021, 3:54 p.m. UTC | #15

Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
> On 28/10/2021 16:27, Oleksandr Andrushchenko wrote:
> > bridge as private and using info->gpa - GUEST_VPCI_ECAM_BASE...
> > So, I would stay with simpler
> > 
> >       if ( bridge )
> >          {
> >              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
> >              sbdf.seg = bridge->segment;
> >          }
> >          else
> >              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
> 
> I am fine with that so long this is part of an helper (maybe 
> vpci_sbdf_from_addr()) rather than duplicated.

There are a number of patches that I'm getting CC'd on related to ARM
and vpci (according to the Subject).  Are these targeted for 4.16 ?
Most of them don't have 4.16 Subject tags.

Am I getting these in my capacity as a REST maintainer ?

I am finding it difficult to see the wood for the trees.
It would be really helpful if these vpci fixes were collected
together into a series.

Thanks,
Ian.
(writing as Xen 4.16 Release Manager)

Roger Pau Monné Oct. 28, 2021, 4:03 p.m. UTC | #16

On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 28.10.21 16:36, Roger Pau Monné wrote:
> > On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
> >> Hi, Julien!
> >>
> >> On 27.10.21 20:35, Julien Grall wrote:
> >>> Hi Oleksandr,
> >>>
> >>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
> >>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>
> >>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> >>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> >>>> the base address may not be aligned in the way that the translation
> >>>> always work. If not adjusted with respect to the base address it may not be
> >>>> able to properly convert SBDF and crashes:
> >>>>
> >>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
> >>> I can't find a printk() that may output this message. Where does this comes from?
> >> That was a debug print. I shouldn't have used that in the patch description, but
> >> probably after "---" to better explain what's happening
> >>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
> >> This is from dom0 I am working on now.
> >>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
> >>>
> >>>> (XEN) Data Abort Trap. Syndrome=0x6
> >>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> >>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
> >>>
> >>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
> >> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
> >> guest which may force Xen to access the memory beyond that of PCI host bridge
> > How could that be? The ECAM region exposed to the guest you should be
> > the same as the physical one for dom0?
> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
> implement the driver for it, so I can be wrong here):
> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
> - "Client" ECAM area ("config")
> So from Dom0 POV we have 2 ECAM regions and for the guest
> we always emulate a single big region:

You need support for multiple ECAM regions. That's how we do it on x86
PVH dom0. See register_vpci_mmcfg_handler and related machinery.

> /*
>   * 256 MB is reserved for VPCI configuration space based on calculation
>   * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
>   */
> #define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
> #define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)
> 
> So, we have the base address and size of the emulated ECAM space
> not connected to the real host bridge
> >
> > And for domUs you really need to fix vpci_{read,write} to not
> > passthrough accesses not explicitly handled.
> Do you mean that we need to validate SBDFs there?
> This can be tricky if we have a use-case when a PCI device being
> passed through if not put at 0000:00:0.0, but requested to be, for
> example, 0000:0d:0.0. So, we need to go over the list of virtual
> devices and see if SBDF the guest is trying to access is a valid SBDF.
> Is this what you mean?

No, you need to prevent accesses to registers not explicitly handled
by vpci. Ie: do not forward unhandled accesses to
vpci_{read,wrie}_hw).

Regards, Roger.

Oleksandr Andrushchenko Oct. 28, 2021, 5:55 p.m. UTC | #17

On 28.10.21 19:03, Roger Pau Monné wrote:
> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 28.10.21 16:36, Roger Pau Monné wrote:
>>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>>>> Hi, Julien!
>>>>
>>>> On 27.10.21 20:35, Julien Grall wrote:
>>>>> Hi Oleksandr,
>>>>>
>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>>> the base address may not be aligned in the way that the translation
>>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>>> able to properly convert SBDF and crashes:
>>>>>>
>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>> I can't find a printk() that may output this message. Where does this comes from?
>>>> That was a debug print. I shouldn't have used that in the patch description, but
>>>> probably after "---" to better explain what's happening
>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>>> This is from dom0 I am working on now.
>>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>>
>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>>
>>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
>>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>> How could that be? The ECAM region exposed to the guest you should be
>>> the same as the physical one for dom0?
>> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
>> implement the driver for it, so I can be wrong here):
>> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
>> - "Client" ECAM area ("config")
>> So from Dom0 POV we have 2 ECAM regions and for the guest
>> we always emulate a single big region:
> You need support for multiple ECAM regions. That's how we do it on x86
> PVH dom0. See register_vpci_mmcfg_handler and related machinery.
Is it common for a PCI host bridge to have multiple ECAM regions?
Currently on Arm we were about to support "pci-host-ecam-generic" [1],
e.g. generic ECAM host bridge which normally (?) has a single ECAM
region [2]. But the host bridge I want to support has multiple, so
strictly speaking it is not the one that we implement.

Arm folks, do we want this generalization at this moment to align with x86
with this respect?

We can live with the current approach and when I have my driver implemented
I can send patches to make that generalization.
>
>> /*
>>    * 256 MB is reserved for VPCI configuration space based on calculation
>>    * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
>>    */
>> #define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
>> #define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)
>>
>> So, we have the base address and size of the emulated ECAM space
>> not connected to the real host bridge
>>> And for domUs you really need to fix vpci_{read,write} to not
>>> passthrough accesses not explicitly handled.
>> Do you mean that we need to validate SBDFs there?
>> This can be tricky if we have a use-case when a PCI device being
>> passed through if not put at 0000:00:0.0, but requested to be, for
>> example, 0000:0d:0.0. So, we need to go over the list of virtual
>> devices and see if SBDF the guest is trying to access is a valid SBDF.
>> Is this what you mean?
> No, you need to prevent accesses to registers not explicitly handled
> by vpci. Ie: do not forward unhandled accesses to
> vpci_{read,wrie}_hw).
I see, so those which have no handlers are not passed to the hardware.
I need to see how to do that
>
> Regards, Roger.
Thank you,
Oleksandr

[1] https://elixir.bootlin.com/linux/v5.15-rc7/source/drivers/pci/controller/pci-host-generic.c
[2] https://elixir.bootlin.com/linux/v5.15-rc7/source/drivers/pci/controller/pci-host-common.c#L23

Oleksandr Andrushchenko Oct. 28, 2021, 6 p.m. UTC | #18

On 28.10.21 18:35, Julien Grall wrote:
> Hi,
>
> On 28/10/2021 16:27, Oleksandr Andrushchenko wrote:
>> bridge as private and using info->gpa - GUEST_VPCI_ECAM_BASE...
>> So, I would stay with simpler
>>
>>       if ( bridge )
>>          {
>>              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>>              sbdf.seg = bridge->segment;
>>          }
>>          else
>>              sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>
> I am fine with that so long this is part of an helper (maybe vpci_sbdf_from_addr()) rather than duplicated.
>
Ok, I will implement a helper then:
pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge, paddr_t gpa);
> Cheers,
>
Thank you,
Oleksandr

Roger Pau Monné Oct. 29, 2021, 7:33 a.m. UTC | #19

On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 28.10.21 19:03, Roger Pau Monné wrote:
> > On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
> >>
> >> On 28.10.21 16:36, Roger Pau Monné wrote:
> >>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
> >>>> Hi, Julien!
> >>>>
> >>>> On 27.10.21 20:35, Julien Grall wrote:
> >>>>> Hi Oleksandr,
> >>>>>
> >>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
> >>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>>
> >>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
> >>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
> >>>>>> the base address may not be aligned in the way that the translation
> >>>>>> always work. If not adjusted with respect to the base address it may not be
> >>>>>> able to properly convert SBDF and crashes:
> >>>>>>
> >>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
> >>>>> I can't find a printk() that may output this message. Where does this comes from?
> >>>> That was a debug print. I shouldn't have used that in the patch description, but
> >>>> probably after "---" to better explain what's happening
> >>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
> >>>> This is from dom0 I am working on now.
> >>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
> >>>>>
> >>>>>> (XEN) Data Abort Trap. Syndrome=0x6
> >>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
> >>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
> >>>>>
> >>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
> >>>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
> >>>> guest which may force Xen to access the memory beyond that of PCI host bridge
> >>> How could that be? The ECAM region exposed to the guest you should be
> >>> the same as the physical one for dom0?
> >> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
> >> implement the driver for it, so I can be wrong here):
> >> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
> >> - "Client" ECAM area ("config")
> >> So from Dom0 POV we have 2 ECAM regions and for the guest
> >> we always emulate a single big region:
> > You need support for multiple ECAM regions. That's how we do it on x86
> > PVH dom0. See register_vpci_mmcfg_handler and related machinery.
> Is it common for a PCI host bridge to have multiple ECAM regions?
> Currently on Arm we were about to support "pci-host-ecam-generic" [1],
> e.g. generic ECAM host bridge which normally (?) has a single ECAM
> region [2]. But the host bridge I want to support has multiple, so
> strictly speaking it is not the one that we implement.

It's possible on x86 to have multiple ECAM regions, whether that means
multiple host bridges, or host bridges having multiple ECAM regions is
unknown to me. It's all reported in the MCFG ACPI table (see PCI
Firmware document for the detailed description of MCFG) using the
"Configuration Space Base Address Allocation Structure", and there can
be multiple of those structures.

> Arm folks, do we want this generalization at this moment to align with x86
> with this respect?
> 
> We can live with the current approach and when I have my driver implemented
> I can send patches to make that generalization.
> >
> >> /*
> >>    * 256 MB is reserved for VPCI configuration space based on calculation
> >>    * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
> >>    */
> >> #define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
> >> #define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)
> >>
> >> So, we have the base address and size of the emulated ECAM space
> >> not connected to the real host bridge
> >>> And for domUs you really need to fix vpci_{read,write} to not
> >>> passthrough accesses not explicitly handled.
> >> Do you mean that we need to validate SBDFs there?
> >> This can be tricky if we have a use-case when a PCI device being
> >> passed through if not put at 0000:00:0.0, but requested to be, for
> >> example, 0000:0d:0.0. So, we need to go over the list of virtual
> >> devices and see if SBDF the guest is trying to access is a valid SBDF.
> >> Is this what you mean?
> > No, you need to prevent accesses to registers not explicitly handled
> > by vpci. Ie: do not forward unhandled accesses to
> > vpci_{read,wrie}_hw).
> I see, so those which have no handlers are not passed to the hardware.
> I need to see how to do that

Indeed. Without fixing that passthrough to domUs is completely unsafe,
as you allow domUs full access to registers not explicitly handled by
current vPCI code.

Regards, Roger.

Julien Grall Oct. 29, 2021, 9:15 a.m. UTC | #20

Hi Ian,

On 28/10/2021 16:54, Ian Jackson wrote:
> Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
>> On 28/10/2021 16:27, Oleksandr Andrushchenko wrote:
>>> bridge as private and using info->gpa - GUEST_VPCI_ECAM_BASE...
>>> So, I would stay with simpler
>>>
>>>        if ( bridge )
>>>           {
>>>               sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - bridge->cfg->phys_addr);
>>>               sbdf.seg = bridge->segment;
>>>           }
>>>           else
>>>               sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
>>
>> I am fine with that so long this is part of an helper (maybe
>> vpci_sbdf_from_addr()) rather than duplicated.
> 
> There are a number of patches that I'm getting CC'd on related to ARM
> and vpci (according to the Subject).  Are these targeted for 4.16 ?
> Most of them don't have 4.16 Subject tags.

Oleksandr wants this patch to be included for 4.16 but forgot to tag it 
properly.

Cheers,

Oleksandr Andrushchenko Nov. 1, 2021, 6:14 a.m. UTC | #21

On 29.10.21 10:33, Roger Pau Monné wrote:
> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 28.10.21 19:03, Roger Pau Monné wrote:
>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
>>>>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>>>>>> Hi, Julien!
>>>>>>
>>>>>> On 27.10.21 20:35, Julien Grall wrote:
>>>>>>> Hi Oleksandr,
>>>>>>>
>>>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>>>>> the base address may not be aligned in the way that the translation
>>>>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>>>>> able to properly convert SBDF and crashes:
>>>>>>>>
>>>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>>>> I can't find a printk() that may output this message. Where does this comes from?
>>>>>> That was a debug print. I shouldn't have used that in the patch description, but
>>>>>> probably after "---" to better explain what's happening
>>>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>>>>> This is from dom0 I am working on now.
>>>>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>>>>
>>>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>>>>
>>>>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>>>>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
>>>>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>>> How could that be? The ECAM region exposed to the guest you should be
>>>>> the same as the physical one for dom0?
>>>> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
>>>> implement the driver for it, so I can be wrong here):
>>>> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
>>>> - "Client" ECAM area ("config")
>>>> So from Dom0 POV we have 2 ECAM regions and for the guest
>>>> we always emulate a single big region:
>>> You need support for multiple ECAM regions. That's how we do it on x86
>>> PVH dom0. See register_vpci_mmcfg_handler and related machinery.
>> Is it common for a PCI host bridge to have multiple ECAM regions?
>> Currently on Arm we were about to support "pci-host-ecam-generic" [1],
>> e.g. generic ECAM host bridge which normally (?) has a single ECAM
>> region [2]. But the host bridge I want to support has multiple, so
>> strictly speaking it is not the one that we implement.
> It's possible on x86 to have multiple ECAM regions, whether that means
> multiple host bridges, or host bridges having multiple ECAM regions is
> unknown to me. It's all reported in the MCFG ACPI table (see PCI
> Firmware document for the detailed description of MCFG) using the
> "Configuration Space Base Address Allocation Structure", and there can
> be multiple of those structures.
As we are currently supporting generic ECAM host bridge which
has a single ECAM region I think the existing code we have and
about to upstream is ok as is for now.
I own a bridge which has 2 ECAM regions, so I will work towards
adding its support soon.
>
>> Arm folks, do we want this generalization at this moment to align with x86
>> with this respect?
>>
>> We can live with the current approach and when I have my driver implemented
>> I can send patches to make that generalization.
>>>> /*
>>>>     * 256 MB is reserved for VPCI configuration space based on calculation
>>>>     * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
>>>>     */
>>>> #define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
>>>> #define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)
>>>>
>>>> So, we have the base address and size of the emulated ECAM space
>>>> not connected to the real host bridge
>>>>> And for domUs you really need to fix vpci_{read,write} to not
>>>>> passthrough accesses not explicitly handled.
>>>> Do you mean that we need to validate SBDFs there?
>>>> This can be tricky if we have a use-case when a PCI device being
>>>> passed through if not put at 0000:00:0.0, but requested to be, for
>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
>>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
>>>> Is this what you mean?
>>> No, you need to prevent accesses to registers not explicitly handled
>>> by vpci. Ie: do not forward unhandled accesses to
>>> vpci_{read,wrie}_hw).
>> I see, so those which have no handlers are not passed to the hardware.
>> I need to see how to do that
> Indeed. Without fixing that passthrough to domUs is completely unsafe,
> as you allow domUs full access to registers not explicitly handled by
> current vPCI code.
Well, my understanding is: we can let the guest access whatever
registers it wants with the following exceptions:
- "special" registers we already trap in vPCI, e.g. command, BARs
- we must not let the guest go out of the configuration space of a
specific PCI device, e.g. prevent it from accessing configuration
spaces of other devices.
The rest accesses seem to be ok to me as we do not really want:
- have handlers and emulate all possible registers
- we do not want the guest to fail if it accesses a valid register which
we do not emulate.
>
> Regards, Roger.
>
Thanks,
Oleksandr

Ian Jackson Nov. 1, 2021, 10:25 a.m. UTC | #22

Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
> On 28/10/2021 16:54, Ian Jackson wrote:
> > There are a number of patches that I'm getting CC'd on related to ARM
> > and vpci (according to the Subject).  Are these targeted for 4.16 ?
> > Most of them don't have 4.16 Subject tags.
> 
> Oleksandr wants this patch to be included for 4.16 but forgot to tag it 
> properly.

Oh yes.  However,

1. I also wrote this:

> > I am finding it difficult to see the wood for the trees.
> > It would be really helpful if these vpci fixes were collected
> > together into a series.

Can someone please confirm whether this is the only vpci-related patch
that ought to be on my radar for 4.16 ?

2. I have not had a reply to my question on Wednesday in
<24953.34635.645112.279110@mariner.uk.xensource.com>:

  Um, can you explain what the practical impact is of not taking this
  patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
  4.16 and this is not expected to change ?  So there would be no
  benefit to users, and taking the patch would add small but nonzero
  risk ?

I need this information to decide whether a release-ack is
appropriate.

Note that we are in code freeze so all patches, including bugfixes,
need my ack.

Thanks,
Ian.

Stefano Stabellini Nov. 1, 2021, 9:06 p.m. UTC | #23

On Mon, 1 Nov 2021, Ian Jackson wrote:
> Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
> > On 28/10/2021 16:54, Ian Jackson wrote:
> > > There are a number of patches that I'm getting CC'd on related to ARM
> > > and vpci (according to the Subject).  Are these targeted for 4.16 ?
> > > Most of them don't have 4.16 Subject tags.
> > 
> > Oleksandr wants this patch to be included for 4.16 but forgot to tag it 
> > properly.
> 
> Oh yes.  However,
> 
> 1. I also wrote this:
> 
> > > I am finding it difficult to see the wood for the trees.
> > > It would be really helpful if these vpci fixes were collected
> > > together into a series.
> 
> Can someone please confirm whether this is the only vpci-related patch
> that ought to be on my radar for 4.16 ?
> 
> 2. I have not had a reply to my question on Wednesday in
> <24953.34635.645112.279110@mariner.uk.xensource.com>:
> 
>   Um, can you explain what the practical impact is of not taking this
>   patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
>   4.16 and this is not expected to change ?  So there would be no
>   benefit to users, and taking the patch would add small but nonzero
>   risk ?
> 
> I need this information to decide whether a release-ack is
> appropriate.
> 
> Note that we are in code freeze so all patches, including bugfixes,
> need my ack.

Hi Ian,

This patch [1] is a straightforward 2 lines fix for vpci on ARM.  There
is no risk for the release as the source file affected only builds when
CONFIG_HAS_VPCI is enabled, and it is currently disabled on ARM.

At the same time, as we know vpci is not complete in 4.16 anyway, so the
counter argument is that we don't need to fix it.

Given how trivial the fix is, and that it cannot break the build or
runtime, I would take it.

Cheers,

Stefano


[1] https://marc.info/?l=xen-devel&m=163532307715435

Oleksandr Andrushchenko Nov. 2, 2021, 7:16 a.m. UTC | #24

On 01.11.21 23:06, Stefano Stabellini wrote:
> On Mon, 1 Nov 2021, Ian Jackson wrote:
>> Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
>>> On 28/10/2021 16:54, Ian Jackson wrote:
>>>> There are a number of patches that I'm getting CC'd on related to ARM
>>>> and vpci (according to the Subject).  Are these targeted for 4.16 ?
>>>> Most of them don't have 4.16 Subject tags.
>>> Oleksandr wants this patch to be included for 4.16 but forgot to tag it
>>> properly.
>> Oh yes.  However,
>>
>> 1. I also wrote this:
>>
>>>> I am finding it difficult to see the wood for the trees.
>>>> It would be really helpful if these vpci fixes were collected
>>>> together into a series.
>> Can someone please confirm whether this is the only vpci-related patch
>> that ought to be on my radar for 4.16 ?
>>
>> 2. I have not had a reply to my question on Wednesday in
>> <24953.34635.645112.279110@mariner.uk.xensource.com>:
>>
>>    Um, can you explain what the practical impact is of not taking this
>>    patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
>>    4.16 and this is not expected to change ?  So there would be no
>>    benefit to users, and taking the patch would add small but nonzero
>>    risk ?
>>
>> I need this information to decide whether a release-ack is
>> appropriate.
>>
>> Note that we are in code freeze so all patches, including bugfixes,
>> need my ack.
> Hi Ian,
>
> This patch [1] is a straightforward 2 lines fix for vpci on ARM.  There
> is no risk for the release as the source file affected only builds when
> CONFIG_HAS_VPCI is enabled, and it is currently disabled on ARM.
>
> At the same time, as we know vpci is not complete in 4.16 anyway, so the
> counter argument is that we don't need to fix it.
>
> Given how trivial the fix is, and that it cannot break the build or
> runtime, I would take it.
Thank you,
I can re-send the patch with the updated commit message (Julien),
but I still have no R-b's for the patch, so not sure if it is worth it
>
> Cheers,
>
> Stefano
>
>
> [1] https://marc.info/?l=xen-devel&m=163532307715435
>

Wei Chen Nov. 2, 2021, 7:37 a.m. UTC | #25

Hi Oleksandr,

On 2021/11/1 14:14, Oleksandr Andrushchenko wrote:
> 
> 
> On 29.10.21 10:33, Roger Pau Monné wrote:
>> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
>>>
>>> On 28.10.21 19:03, Roger Pau Monné wrote:
>>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
>>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
>>>>>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> Hi, Julien!
>>>>>>>
>>>>>>> On 27.10.21 20:35, Julien Grall wrote:
>>>>>>>> Hi Oleksandr,
>>>>>>>>
>>>>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>
>>>>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>>>>>> the base address may not be aligned in the way that the translation
>>>>>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>>>>>> able to properly convert SBDF and crashes:
>>>>>>>>>
>>>>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>>>>> I can't find a printk() that may output this message. Where does this comes from?
>>>>>>> That was a debug print. I shouldn't have used that in the patch description, but
>>>>>>> probably after "---" to better explain what's happening
>>>>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>>>>>> This is from dom0 I am working on now.
>>>>>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>>>>>
>>>>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>>>>>
>>>>>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>>>>>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
>>>>>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>>>> How could that be? The ECAM region exposed to the guest you should be
>>>>>> the same as the physical one for dom0?
>>>>> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
>>>>> implement the driver for it, so I can be wrong here):
>>>>> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
>>>>> - "Client" ECAM area ("config")
>>>>> So from Dom0 POV we have 2 ECAM regions and for the guest
>>>>> we always emulate a single big region:
>>>> You need support for multiple ECAM regions. That's how we do it on x86
>>>> PVH dom0. See register_vpci_mmcfg_handler and related machinery.
>>> Is it common for a PCI host bridge to have multiple ECAM regions?
>>> Currently on Arm we were about to support "pci-host-ecam-generic" [1],
>>> e.g. generic ECAM host bridge which normally (?) has a single ECAM
>>> region [2]. But the host bridge I want to support has multiple, so
>>> strictly speaking it is not the one that we implement.
>> It's possible on x86 to have multiple ECAM regions, whether that means
>> multiple host bridges, or host bridges having multiple ECAM regions is
>> unknown to me. It's all reported in the MCFG ACPI table (see PCI
>> Firmware document for the detailed description of MCFG) using the
>> "Configuration Space Base Address Allocation Structure", and there can
>> be multiple of those structures.
> As we are currently supporting generic ECAM host bridge which
> has a single ECAM region I think the existing code we have and
> about to upstream is ok as is for now.
> I own a bridge which has 2 ECAM regions, so I will work towards
> adding its support soon.
>>
>>> Arm folks, do we want this generalization at this moment to align with x86
>>> with this respect?
>>>
>>> We can live with the current approach and when I have my driver implemented
>>> I can send patches to make that generalization.
>>>>> /*
>>>>>      * 256 MB is reserved for VPCI configuration space based on calculation
>>>>>      * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
>>>>>      */
>>>>> #define GUEST_VPCI_ECAM_BASE    xen_mk_ullong(0x10000000)
>>>>> #define GUEST_VPCI_ECAM_SIZE    xen_mk_ullong(0x10000000)
>>>>>
>>>>> So, we have the base address and size of the emulated ECAM space
>>>>> not connected to the real host bridge
>>>>>> And for domUs you really need to fix vpci_{read,write} to not
>>>>>> passthrough accesses not explicitly handled.
>>>>> Do you mean that we need to validate SBDFs there?
>>>>> This can be tricky if we have a use-case when a PCI device being
>>>>> passed through if not put at 0000:00:0.0, but requested to be, for
>>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
>>>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
>>>>> Is this what you mean?
>>>> No, you need to prevent accesses to registers not explicitly handled
>>>> by vpci. Ie: do not forward unhandled accesses to
>>>> vpci_{read,wrie}_hw).
>>> I see, so those which have no handlers are not passed to the hardware.
>>> I need to see how to do that
>> Indeed. Without fixing that passthrough to domUs is completely unsafe,
>> as you allow domUs full access to registers not explicitly handled by
>> current vPCI code.
> Well, my understanding is: we can let the guest access whatever
> registers it wants with the following exceptions:
> - "special" registers we already trap in vPCI, e.g. command, BARs
> - we must not let the guest go out of the configuration space of a
> specific PCI device, e.g. prevent it from accessing configuration
> spaces of other devices.
> The rest accesses seem to be ok to me as we do not really want:
> - have handlers and emulate all possible registers
> - we do not want the guest to fail if it accesses a valid register which
> we do not emulate.

I am tring to review your patch, please point out if there is anything
wrong. IIUC, vPCI only emulates some registers, and forward unhandled
accesses to physical device configuration space (if the accesses passed 
the validate.)?
Does that make the context inconsistent in physical device's 
configuration space? For example, one register in physical device
config space is related to another register. But we just emulate
only one in vPCI?


>>
>> Regards, Roger.
>>
> Thanks,
> Oleksandr
>

Oleksandr Andrushchenko Nov. 2, 2021, 7:46 a.m. UTC | #26

Hi,

On 02.11.21 09:37, Wei Chen wrote:
> Hi Oleksandr,
>
> On 2021/11/1 14:14, Oleksandr Andrushchenko wrote:
>>
>>
>> On 29.10.21 10:33, Roger Pau Monné wrote:
>>> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
>>>>
>>>> On 28.10.21 19:03, Roger Pau Monné wrote:
>>>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
>>>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
>>>>>>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> Hi, Julien!
>>>>>>>>
>>>>>>>> On 27.10.21 20:35, Julien Grall wrote:
>>>>>>>>> Hi Oleksandr,
>>>>>>>>>
>>>>>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>>
>>>>>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge it is not
>>>>>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info->gpa) as
>>>>>>>>>> the base address may not be aligned in the way that the translation
>>>>>>>>>> always work. If not adjusted with respect to the base address it may not be
>>>>>>>>>> able to properly convert SBDF and crashes:
>>>>>>>>>>
>>>>>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
>>>>>>>>> I can't find a printk() that may output this message. Where does this comes from?
>>>>>>>> That was a debug print. I shouldn't have used that in the patch description, but
>>>>>>>> probably after "---" to better explain what's happening
>>>>>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if I am not mistaken, doesn't belong to the range advertised for GUEST_VPCI_ECAM.
>>>>>>>> This is from dom0 I am working on now.
>>>>>>>>> IMHO, the stack trace should come from usptream Xen or need some information to explain how this was reproduced.
>>>>>>>>>
>>>>>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
>>>>>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR 0x00000000481d5000
>>>>>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we would (in theory) not get the correct BDF. But... I don't understand how this would result to a data abort in the hypervisor.
>>>>>>>>>
>>>>>>>>> In fact, I think the vPCI code should be resilient enough to not crash if we pass the wrong BDF.
>>>>>>>> Well, there is no (?) easy way to validate SBDF. And this could be a problem if we have a misbehaving
>>>>>>>> guest which may force Xen to access the memory beyond that of PCI host bridge
>>>>>>> How could that be? The ECAM region exposed to the guest you should be
>>>>>>> the same as the physical one for dom0?
>>>>>> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am starting to
>>>>>> implement the driver for it, so I can be wrong here):
>>>>>> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes long
>>>>>> - "Client" ECAM area ("config")
>>>>>> So from Dom0 POV we have 2 ECAM regions and for the guest
>>>>>> we always emulate a single big region:
>>>>> You need support for multiple ECAM regions. That's how we do it on x86
>>>>> PVH dom0. See register_vpci_mmcfg_handler and related machinery.
>>>> Is it common for a PCI host bridge to have multiple ECAM regions?
>>>> Currently on Arm we were about to support "pci-host-ecam-generic" [1],
>>>> e.g. generic ECAM host bridge which normally (?) has a single ECAM
>>>> region [2]. But the host bridge I want to support has multiple, so
>>>> strictly speaking it is not the one that we implement.
>>> It's possible on x86 to have multiple ECAM regions, whether that means
>>> multiple host bridges, or host bridges having multiple ECAM regions is
>>> unknown to me. It's all reported in the MCFG ACPI table (see PCI
>>> Firmware document for the detailed description of MCFG) using the
>>> "Configuration Space Base Address Allocation Structure", and there can
>>> be multiple of those structures.
>> As we are currently supporting generic ECAM host bridge which
>> has a single ECAM region I think the existing code we have and
>> about to upstream is ok as is for now.
>> I own a bridge which has 2 ECAM regions, so I will work towards
>> adding its support soon.
>>>
>>>> Arm folks, do we want this generalization at this moment to align with x86
>>>> with this respect?
>>>>
>>>> We can live with the current approach and when I have my driver implemented
>>>> I can send patches to make that generalization.
>>>>>> /*
>>>>>>      * 256 MB is reserved for VPCI configuration space based on calculation
>>>>>>      * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
>>>>>>      */
>>>>>> #define GUEST_VPCI_ECAM_BASE xen_mk_ullong(0x10000000)
>>>>>> #define GUEST_VPCI_ECAM_SIZE xen_mk_ullong(0x10000000)
>>>>>>
>>>>>> So, we have the base address and size of the emulated ECAM space
>>>>>> not connected to the real host bridge
>>>>>>> And for domUs you really need to fix vpci_{read,write} to not
>>>>>>> passthrough accesses not explicitly handled.
>>>>>> Do you mean that we need to validate SBDFs there?
>>>>>> This can be tricky if we have a use-case when a PCI device being
>>>>>> passed through if not put at 0000:00:0.0, but requested to be, for
>>>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
>>>>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
>>>>>> Is this what you mean?
>>>>> No, you need to prevent accesses to registers not explicitly handled
>>>>> by vpci. Ie: do not forward unhandled accesses to
>>>>> vpci_{read,wrie}_hw).
>>>> I see, so those which have no handlers are not passed to the hardware.
>>>> I need to see how to do that
>>> Indeed. Without fixing that passthrough to domUs is completely unsafe,
>>> as you allow domUs full access to registers not explicitly handled by
>>> current vPCI code.
>> Well, my understanding is: we can let the guest access whatever
>> registers it wants with the following exceptions:
>> - "special" registers we already trap in vPCI, e.g. command, BARs
>> - we must not let the guest go out of the configuration space of a
>> specific PCI device, e.g. prevent it from accessing configuration
>> spaces of other devices.
>> The rest accesses seem to be ok to me as we do not really want:
>> - have handlers and emulate all possible registers
>> - we do not want the guest to fail if it accesses a valid register which
>> we do not emulate.
>
> I am tring to review your patch, please point out if there is anything
> wrong. IIUC, vPCI only emulates some registers, and forward unhandled
> accesses to physical device configuration space (if the accesses passed the validate.)?
Right
> Does that make the context inconsistent in physical device's configuration space? 
It is always consistent for the hardware domain and some parts of it are emulated
for guests
> For example, one register in physical device
> config space is related to another register. But we just emulate
> only one in vPCI?
So, we trap for all domains and emulate for guests, e.g. hardware domain's view on the
registers is consistent. For guests we emulate:
- PCI_COMMAND - not to allow INTx as we do not support that on Arm
- BARs - we emulate guest's view on these according to the memory spaces
of the emulated host bridge, so the real BARs still have physical values, but
guests see emulated ones

Hope this helps
>
>
>>>
>>> Regards, Roger.
>>>
>> Thanks,
>> Oleksandr
>>

Wei Chen Nov. 2, 2021, 8:12 a.m. UTC | #27

Hi Oleksandr,

> -----Original Message-----
> From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
> Sent: 2021年11月2日 15:47
> To: Wei Chen <Wei.Chen@arm.com>; Roger Pau Monné <roger.pau@citrix.com>
> Cc: Julien Grall <julien@xen.org>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; sstabellini@kernel.org; Rahul Singh
> <Rahul.Singh@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers
> 
> Hi,
> 
> On 02.11.21 09:37, Wei Chen wrote:
> > Hi Oleksandr,
> >
> > On 2021/11/1 14:14, Oleksandr Andrushchenko wrote:
> >>
> >>
> >> On 29.10.21 10:33, Roger Pau Monné wrote:
> >>> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko
> wrote:
> >>>>
> >>>> On 28.10.21 19:03, Roger Pau Monné wrote:
> >>>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko
> wrote:
> >>>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
> >>>>>>> On Thu, Oct 28, 2021 at 12:09:23PM +0000, Oleksandr Andrushchenko
> wrote:
> >>>>>>>> Hi, Julien!
> >>>>>>>>
> >>>>>>>> On 27.10.21 20:35, Julien Grall wrote:
> >>>>>>>>> Hi Oleksandr,
> >>>>>>>>>
> >>>>>>>>> On 27/10/2021 09:25, Oleksandr Andrushchenko wrote:
> >>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>>>>>>
> >>>>>>>>>> While in vPCI MMIO trap handlers for the guest PCI host bridge
> it is not
> >>>>>>>>>> enough for SBDF translation to simply call VPCI_ECAM_BDF(info-
> >gpa) as
> >>>>>>>>>> the base address may not be aligned in the way that the
> translation
> >>>>>>>>>> always work. If not adjusted with respect to the base address
> it may not be
> >>>>>>>>>> able to properly convert SBDF and crashes:
> >>>>>>>>>>
> >>>>>>>>>> (XEN) vpci_mmio_read 0000:65:1a.0 reg 8bc gpa e65d08bc
> >>>>>>>>> I can't find a printk() that may output this message. Where does
> this comes from?
> >>>>>>>> That was a debug print. I shouldn't have used that in the patch
> description, but
> >>>>>>>> probably after "---" to better explain what's happening
> >>>>>>>>> Anyway, IIUC the guest physical address is 0xe65d08bc which, if
> I am not mistaken, doesn't belong to the range advertised for
> GUEST_VPCI_ECAM.
> >>>>>>>> This is from dom0 I am working on now.
> >>>>>>>>> IMHO, the stack trace should come from usptream Xen or need some
> information to explain how this was reproduced.
> >>>>>>>>>
> >>>>>>>>>> (XEN) Data Abort Trap. Syndrome=0x6
> >>>>>>>>>> (XEN) Walking Hypervisor VA 0x467a28bc on CPU0 via TTBR
> 0x00000000481d5000
> >>>>>>>>> I can understnad that if we don't substract GUEST_VPCI_ECAM, we
> would (in theory) not get the correct BDF. But... I don't understand how
> this would result to a data abort in the hypervisor.
> >>>>>>>>>
> >>>>>>>>> In fact, I think the vPCI code should be resilient enough to not
> crash if we pass the wrong BDF.
> >>>>>>>> Well, there is no (?) easy way to validate SBDF. And this could
> be a problem if we have a misbehaving
> >>>>>>>> guest which may force Xen to access the memory beyond that of PCI
> host bridge
> >>>>>>> How could that be? The ECAM region exposed to the guest you should
> be
> >>>>>>> the same as the physical one for dom0?
> >>>>>> Ok, I have a Designware PCI hist which has 2 ECAM regions (I am
> starting to
> >>>>>> implement the driver for it, so I can be wrong here):
> >>>>>> - Root Complex ECAM area ("dbi"), it is something like 0x3000 bytes
> long
> >>>>>> - "Client" ECAM area ("config")
> >>>>>> So from Dom0 POV we have 2 ECAM regions and for the guest
> >>>>>> we always emulate a single big region:
> >>>>> You need support for multiple ECAM regions. That's how we do it on
> x86
> >>>>> PVH dom0. See register_vpci_mmcfg_handler and related machinery.
> >>>> Is it common for a PCI host bridge to have multiple ECAM regions?
> >>>> Currently on Arm we were about to support "pci-host-ecam-generic" [1],
> >>>> e.g. generic ECAM host bridge which normally (?) has a single ECAM
> >>>> region [2]. But the host bridge I want to support has multiple, so
> >>>> strictly speaking it is not the one that we implement.
> >>> It's possible on x86 to have multiple ECAM regions, whether that means
> >>> multiple host bridges, or host bridges having multiple ECAM regions is
> >>> unknown to me. It's all reported in the MCFG ACPI table (see PCI
> >>> Firmware document for the detailed description of MCFG) using the
> >>> "Configuration Space Base Address Allocation Structure", and there can
> >>> be multiple of those structures.
> >> As we are currently supporting generic ECAM host bridge which
> >> has a single ECAM region I think the existing code we have and
> >> about to upstream is ok as is for now.
> >> I own a bridge which has 2 ECAM regions, so I will work towards
> >> adding its support soon.
> >>>
> >>>> Arm folks, do we want this generalization at this moment to align
> with x86
> >>>> with this respect?
> >>>>
> >>>> We can live with the current approach and when I have my driver
> implemented
> >>>> I can send patches to make that generalization.
> >>>>>> /*
> >>>>>>      * 256 MB is reserved for VPCI configuration space based on
> calculation
> >>>>>>      * 256 buses x 32 devices x 8 functions x 4 KB = 256 MB
> >>>>>>      */
> >>>>>> #define GUEST_VPCI_ECAM_BASE xen_mk_ullong(0x10000000)
> >>>>>> #define GUEST_VPCI_ECAM_SIZE xen_mk_ullong(0x10000000)
> >>>>>>
> >>>>>> So, we have the base address and size of the emulated ECAM space
> >>>>>> not connected to the real host bridge
> >>>>>>> And for domUs you really need to fix vpci_{read,write} to not
> >>>>>>> passthrough accesses not explicitly handled.
> >>>>>> Do you mean that we need to validate SBDFs there?
> >>>>>> This can be tricky if we have a use-case when a PCI device being
> >>>>>> passed through if not put at 0000:00:0.0, but requested to be, for
> >>>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
> >>>>>> devices and see if SBDF the guest is trying to access is a valid
> SBDF.
> >>>>>> Is this what you mean?
> >>>>> No, you need to prevent accesses to registers not explicitly handled
> >>>>> by vpci. Ie: do not forward unhandled accesses to
> >>>>> vpci_{read,wrie}_hw).
> >>>> I see, so those which have no handlers are not passed to the hardware.
> >>>> I need to see how to do that
> >>> Indeed. Without fixing that passthrough to domUs is completely unsafe,
> >>> as you allow domUs full access to registers not explicitly handled by
> >>> current vPCI code.
> >> Well, my understanding is: we can let the guest access whatever
> >> registers it wants with the following exceptions:
> >> - "special" registers we already trap in vPCI, e.g. command, BARs
> >> - we must not let the guest go out of the configuration space of a
> >> specific PCI device, e.g. prevent it from accessing configuration
> >> spaces of other devices.
> >> The rest accesses seem to be ok to me as we do not really want:
> >> - have handlers and emulate all possible registers
> >> - we do not want the guest to fail if it accesses a valid register
> which
> >> we do not emulate.
> >
> > I am tring to review your patch, please point out if there is anything
> > wrong. IIUC, vPCI only emulates some registers, and forward unhandled
> > accesses to physical device configuration space (if the accesses passed
> the validate.)?
> Right
> > Does that make the context inconsistent in physical device's
> configuration space?
> It is always consistent for the hardware domain and some parts of it are
> emulated
> for guests
> > For example, one register in physical device
> > config space is related to another register. But we just emulate
> > only one in vPCI?
> So, we trap for all domains and emulate for guests, e.g. hardware domain's
> view on the
> registers is consistent. For guests we emulate:
> - PCI_COMMAND - not to allow INTx as we do not support that on Arm
> - BARs - we emulate guest's view on these according to the memory spaces
> of the emulated host bridge, so the real BARs still have physical values,
> but
> guests see emulated ones
> 
> Hope this helps

Thanks, it's very helpful!

> >
> >
> >>>
> >>> Regards, Roger.
> >>>
> >> Thanks,
> >> Oleksandr
> >>

Roger Pau Monné Nov. 2, 2021, 8:48 a.m. UTC | #28

On Mon, Nov 01, 2021 at 06:14:40AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 29.10.21 10:33, Roger Pau Monné wrote:
> > On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
> >>
> >> On 28.10.21 19:03, Roger Pau Monné wrote:
> >>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
> >>>> On 28.10.21 16:36, Roger Pau Monné wrote:
> >>>>> And for domUs you really need to fix vpci_{read,write} to not
> >>>>> passthrough accesses not explicitly handled.
> >>>> Do you mean that we need to validate SBDFs there?
> >>>> This can be tricky if we have a use-case when a PCI device being
> >>>> passed through if not put at 0000:00:0.0, but requested to be, for
> >>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
> >>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
> >>>> Is this what you mean?
> >>> No, you need to prevent accesses to registers not explicitly handled
> >>> by vpci. Ie: do not forward unhandled accesses to
> >>> vpci_{read,wrie}_hw).
> >> I see, so those which have no handlers are not passed to the hardware.
> >> I need to see how to do that
> > Indeed. Without fixing that passthrough to domUs is completely unsafe,
> > as you allow domUs full access to registers not explicitly handled by
> > current vPCI code.
> Well, my understanding is: we can let the guest access whatever
> registers it wants with the following exceptions:
> - "special" registers we already trap in vPCI, e.g. command, BARs
> - we must not let the guest go out of the configuration space of a
> specific PCI device, e.g. prevent it from accessing configuration
> spaces of other devices.
> The rest accesses seem to be ok to me as we do not really want:
> - have handlers and emulate all possible registers
> - we do not want the guest to fail if it accesses a valid register which
> we do not emulate.

IMO that's not good from a security PoV. Xen needs to be sure that
every registers a guest accesses is not going to cause the system to
malfunction, so Xen needs to keep a list of the registers it's safe
for a guest to access.

For example we should only expose the PCI capabilities that we know
are safe for a guest to use, ie: MSI and MSI-X initially. The rest of
the capabilities should be blocked from guest access, unless we audit
them and declare safe for a guest to access.

As a reference you might want to look at the approach currently used
by QEMU in order to do PCI passthrough. A very limited set of PCI
capabilities known to be safe for untrusted access are exposed to the
guest, and registers need to be explicitly handled or else access is
rejected. We need a fairly similar model in vPCI or else none of this
will be safe for unprivileged access.

Regards, Roger.

Oleksandr Andrushchenko Nov. 2, 2021, 9:07 a.m. UTC | #29

On 02.11.21 10:48, Roger Pau Monné wrote:
> On Mon, Nov 01, 2021 at 06:14:40AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 29.10.21 10:33, Roger Pau Monné wrote:
>>> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
>>>> On 28.10.21 19:03, Roger Pau Monné wrote:
>>>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
>>>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
>>>>>>> And for domUs you really need to fix vpci_{read,write} to not
>>>>>>> passthrough accesses not explicitly handled.
>>>>>> Do you mean that we need to validate SBDFs there?
>>>>>> This can be tricky if we have a use-case when a PCI device being
>>>>>> passed through if not put at 0000:00:0.0, but requested to be, for
>>>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
>>>>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
>>>>>> Is this what you mean?
>>>>> No, you need to prevent accesses to registers not explicitly handled
>>>>> by vpci. Ie: do not forward unhandled accesses to
>>>>> vpci_{read,wrie}_hw).
>>>> I see, so those which have no handlers are not passed to the hardware.
>>>> I need to see how to do that
>>> Indeed. Without fixing that passthrough to domUs is completely unsafe,
>>> as you allow domUs full access to registers not explicitly handled by
>>> current vPCI code.
>> Well, my understanding is: we can let the guest access whatever
>> registers it wants with the following exceptions:
>> - "special" registers we already trap in vPCI, e.g. command, BARs
>> - we must not let the guest go out of the configuration space of a
>> specific PCI device, e.g. prevent it from accessing configuration
>> spaces of other devices.
>> The rest accesses seem to be ok to me as we do not really want:
>> - have handlers and emulate all possible registers
>> - we do not want the guest to fail if it accesses a valid register which
>> we do not emulate.
> IMO that's not good from a security PoV. Xen needs to be sure that
> every registers a guest accesses is not going to cause the system to
> malfunction, so Xen needs to keep a list of the registers it's safe
> for a guest to access.
>
> For example we should only expose the PCI capabilities that we know
> are safe for a guest to use, ie: MSI and MSI-X initially. The rest of
> the capabilities should be blocked from guest access, unless we audit
> them and declare safe for a guest to access.
>
> As a reference you might want to look at the approach currently used
> by QEMU in order to do PCI passthrough. A very limited set of PCI
> capabilities known to be safe for untrusted access are exposed to the
> guest, and registers need to be explicitly handled or else access is
> rejected. We need a fairly similar model in vPCI or else none of this
> will be safe for unprivileged access.
I do agree with this. But at the moment we only emulate some of them,
so in the future we will need revisiting the emulation and put many
more registers under Xen's control
>
> Regards, Roger.

Roger Pau Monné Nov. 2, 2021, 9:32 a.m. UTC | #30

On Tue, Nov 02, 2021 at 09:07:56AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 02.11.21 10:48, Roger Pau Monné wrote:
> > On Mon, Nov 01, 2021 at 06:14:40AM +0000, Oleksandr Andrushchenko wrote:
> >>
> >> On 29.10.21 10:33, Roger Pau Monné wrote:
> >>> On Thu, Oct 28, 2021 at 05:55:25PM +0000, Oleksandr Andrushchenko wrote:
> >>>> On 28.10.21 19:03, Roger Pau Monné wrote:
> >>>>> On Thu, Oct 28, 2021 at 02:23:34PM +0000, Oleksandr Andrushchenko wrote:
> >>>>>> On 28.10.21 16:36, Roger Pau Monné wrote:
> >>>>>>> And for domUs you really need to fix vpci_{read,write} to not
> >>>>>>> passthrough accesses not explicitly handled.
> >>>>>> Do you mean that we need to validate SBDFs there?
> >>>>>> This can be tricky if we have a use-case when a PCI device being
> >>>>>> passed through if not put at 0000:00:0.0, but requested to be, for
> >>>>>> example, 0000:0d:0.0. So, we need to go over the list of virtual
> >>>>>> devices and see if SBDF the guest is trying to access is a valid SBDF.
> >>>>>> Is this what you mean?
> >>>>> No, you need to prevent accesses to registers not explicitly handled
> >>>>> by vpci. Ie: do not forward unhandled accesses to
> >>>>> vpci_{read,wrie}_hw).
> >>>> I see, so those which have no handlers are not passed to the hardware.
> >>>> I need to see how to do that
> >>> Indeed. Without fixing that passthrough to domUs is completely unsafe,
> >>> as you allow domUs full access to registers not explicitly handled by
> >>> current vPCI code.
> >> Well, my understanding is: we can let the guest access whatever
> >> registers it wants with the following exceptions:
> >> - "special" registers we already trap in vPCI, e.g. command, BARs
> >> - we must not let the guest go out of the configuration space of a
> >> specific PCI device, e.g. prevent it from accessing configuration
> >> spaces of other devices.
> >> The rest accesses seem to be ok to me as we do not really want:
> >> - have handlers and emulate all possible registers
> >> - we do not want the guest to fail if it accesses a valid register which
> >> we do not emulate.
> > IMO that's not good from a security PoV. Xen needs to be sure that
> > every registers a guest accesses is not going to cause the system to
> > malfunction, so Xen needs to keep a list of the registers it's safe
> > for a guest to access.
> >
> > For example we should only expose the PCI capabilities that we know
> > are safe for a guest to use, ie: MSI and MSI-X initially. The rest of
> > the capabilities should be blocked from guest access, unless we audit
> > them and declare safe for a guest to access.
> >
> > As a reference you might want to look at the approach currently used
> > by QEMU in order to do PCI passthrough. A very limited set of PCI
> > capabilities known to be safe for untrusted access are exposed to the
> > guest, and registers need to be explicitly handled or else access is
> > rejected. We need a fairly similar model in vPCI or else none of this
> > will be safe for unprivileged access.
> I do agree with this. But at the moment we only emulate some of them,
> so in the future we will need revisiting the emulation and put many
> more registers under Xen's control

Indeed. That's my main point - there's still a lot of work to do
internally in vPCI in order to be safe for unprivileged guests to
use.

Thanks, Roger.

Julien Grall Nov. 2, 2021, 9:32 a.m. UTC | #31

Hi Oleksandr,

On 02/11/2021 07:16, Oleksandr Andrushchenko wrote:
> 
> 
> On 01.11.21 23:06, Stefano Stabellini wrote:
>> On Mon, 1 Nov 2021, Ian Jackson wrote:
>>> Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
>>>> On 28/10/2021 16:54, Ian Jackson wrote:
>>>>> There are a number of patches that I'm getting CC'd on related to ARM
>>>>> and vpci (according to the Subject).  Are these targeted for 4.16 ?
>>>>> Most of them don't have 4.16 Subject tags.
>>>> Oleksandr wants this patch to be included for 4.16 but forgot to tag it
>>>> properly.
>>> Oh yes.  However,
>>>
>>> 1. I also wrote this:
>>>
>>>>> I am finding it difficult to see the wood for the trees.
>>>>> It would be really helpful if these vpci fixes were collected
>>>>> together into a series.
>>> Can someone please confirm whether this is the only vpci-related patch
>>> that ought to be on my radar for 4.16 ?
>>>
>>> 2. I have not had a reply to my question on Wednesday in
>>> <24953.34635.645112.279110@mariner.uk.xensource.com>:
>>>
>>>     Um, can you explain what the practical impact is of not taking this
>>>     patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
>>>     4.16 and this is not expected to change ?  So there would be no
>>>     benefit to users, and taking the patch would add small but nonzero
>>>     risk ?
>>>
>>> I need this information to decide whether a release-ack is
>>> appropriate.
>>>
>>> Note that we are in code freeze so all patches, including bugfixes,
>>> need my ack.
>> Hi Ian,
>>
>> This patch [1] is a straightforward 2 lines fix for vpci on ARM.  There
>> is no risk for the release as the source file affected only builds when
>> CONFIG_HAS_VPCI is enabled, and it is currently disabled on ARM.
>>
>> At the same time, as we know vpci is not complete in 4.16 anyway, so the
>> counter argument is that we don't need to fix it.
>>
>> Given how trivial the fix is, and that it cannot break the build or
>> runtime, I would take it.
> Thank you,
> I can re-send the patch with the updated commit message (Julien),
> but I still have no R-b's for the patch, so not sure if it is worth it

I can't speak for the others. In my case, I didn't give my reviewed-by 
because the commit message needs to be updated. If you resend, I will 
have another look.

Cheers,

Oleksandr Andrushchenko Nov. 2, 2021, 11:21 a.m. UTC | #32

On 02.11.21 11:32, Julien Grall wrote:
> Hi Oleksandr,
>
> On 02/11/2021 07:16, Oleksandr Andrushchenko wrote:
>>
>>
>> On 01.11.21 23:06, Stefano Stabellini wrote:
>>> On Mon, 1 Nov 2021, Ian Jackson wrote:
>>>> Julien Grall writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers"):
>>>>> On 28/10/2021 16:54, Ian Jackson wrote:
>>>>>> There are a number of patches that I'm getting CC'd on related to ARM
>>>>>> and vpci (according to the Subject).  Are these targeted for 4.16 ?
>>>>>> Most of them don't have 4.16 Subject tags.
>>>>> Oleksandr wants this patch to be included for 4.16 but forgot to tag it
>>>>> properly.
>>>> Oh yes.  However,
>>>>
>>>> 1. I also wrote this:
>>>>
>>>>>> I am finding it difficult to see the wood for the trees.
>>>>>> It would be really helpful if these vpci fixes were collected
>>>>>> together into a series.
>>>> Can someone please confirm whether this is the only vpci-related patch
>>>> that ought to be on my radar for 4.16 ?
>>>>
>>>> 2. I have not had a reply to my question on Wednesday in
>>>> <24953.34635.645112.279110@mariner.uk.xensource.com>:
>>>>
>>>>     Um, can you explain what the practical impact is of not taking this
>>>>     patch for 4.16 ?  As I understand it vpci for ARM is non-functional in
>>>>     4.16 and this is not expected to change ?  So there would be no
>>>>     benefit to users, and taking the patch would add small but nonzero
>>>>     risk ?
>>>>
>>>> I need this information to decide whether a release-ack is
>>>> appropriate.
>>>>
>>>> Note that we are in code freeze so all patches, including bugfixes,
>>>> need my ack.
>>> Hi Ian,
>>>
>>> This patch [1] is a straightforward 2 lines fix for vpci on ARM.  There
>>> is no risk for the release as the source file affected only builds when
>>> CONFIG_HAS_VPCI is enabled, and it is currently disabled on ARM.
>>>
>>> At the same time, as we know vpci is not complete in 4.16 anyway, so the
>>> counter argument is that we don't need to fix it.
>>>
>>> Given how trivial the fix is, and that it cannot break the build or
>>> runtime, I would take it.
>> Thank you,
>> I can re-send the patch with the updated commit message (Julien),
>> but I still have no R-b's for the patch, so not sure if it is worth it
>
> I can't speak for the others. In my case, I didn't give my reviewed-by because the commit message needs to be updated. If you resend, I will have another look.
Sure
>
> Cheers,
>
Thanks,
Oleksandr

Ian Jackson Nov. 2, 2021, 3:55 p.m. UTC | #33

Stefano Stabellini writes ("Re: [PATCH] xen/arm: fix SBDF calculation for vPCI MMIO handlers [and 2 more messages]"):
> This patch [1] is a straightforward 2 lines fix for vpci on ARM.  There
> is no risk for the release as the source file affected only builds when
> CONFIG_HAS_VPCI is enabled, and it is currently disabled on ARM.
> 
> At the same time, as we know vpci is not complete in 4.16 anyway, so the
> counter argument is that we don't need to fix it.
> 
> Given how trivial the fix is, and that it cannot break the build or
> runtime, I would take it.

Thanks, that is helpful.

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

xen/arm: fix SBDF calculation for vPCI MMIO handlers

Commit Message

Comments

Patch