diff mbox series

[v8,08/13] vpci/header: program p2m with guest BAR view

Message ID 20230720003205.1828537-9-volodymyr_babchuk@epam.com (mailing list archive)
State Superseded
Headers show
Series PCI devices passthrough on Arm, part 3 | expand

Commit Message

Volodymyr Babchuk July 20, 2023, 12:32 a.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value as set
up by the PCI bus driver in the hardware domain.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v5:
- remove debug print in map_range callback
- remove "identity" from the debug print
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Comments

Roger Pau Monné July 21, 2023, 1:05 p.m. UTC | #1
On Thu, Jul 20, 2023 at 12:32:33AM +0000, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Take into account guest's BAR view and program its p2m accordingly:
> gfn is guest's view of the BAR and mfn is the physical BAR value as set
> up by the PCI bus driver in the hardware domain.

Who sets that value should be left out of the commit message.  On x86
PCI BARs are positioned by the firmware usually.

> This way hardware domain sees physical BAR values and guest sees
> emulated ones.

This last sentence is kind of confusing, I would maybe write:

"Hardware domain continues getting the BARs identity mapped, while for
domUs the BARs are mapped at the requested guest address without
modifying the BAR address in the device PCI config space."

I'm afraid you are missing changes in modify_bars():  the overlaps for
domU should be checked against the guest address of the BAR, not the
host one.  So you need to adjust the code in modify_bars() to use the
newly introduced guest_reg when checking for overlaps in the domU case
(and when populating the rangesets).

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v5:
> - remove debug print in map_range callback
> - remove "identity" from the debug print
> Since v4:
> - moved start_{gfn|mfn} calculation into map_range
> - pass vpci_bar in the map_data instead of start_{gfn|mfn}
> - s/guest_addr/guest_reg
> Since v3:
> - updated comment (Roger)
> - removed gfn_add(map->start_gfn, rc); which is wrong
> - use v->domain instead of v->vpci.pdev->domain
> - removed odd e.g. in comment
> - s/d%d/%pd in altered code
> - use gdprintk for map/unmap logs
> Since v2:
> - improve readability for data.start_gfn and restructure ?: construct
> Since v1:
>  - s/MSI/MSI-X in comments
> ---
>  xen/drivers/vpci/header.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index eb07fa0bb2..e1a448b674 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -30,6 +30,7 @@
>  
>  struct map_data {
>      struct domain *d;
> +    const struct vpci_bar *bar;
>      bool map;
>  };
>  
> @@ -41,8 +42,21 @@ static int cf_check map_range(
>  
>      for ( ; ; )
>      {
> +        /* Start address of the BAR as seen by the guest. */
> +        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
> +                                        ? map->bar->addr
> +                                        : map->bar->guest_reg));
> +        /* Physical start address of the BAR. */
> +        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
>          unsigned long size = e - s + 1;
>  
> +        /*
> +         * Ranges to be mapped don't always start at the BAR start address, as
> +         * there can be holes or partially consumed ranges. Account for the
> +         * offset of the current address from the BAR start.
> +         */
> +        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));

The rangeset for guests should contain the guest address,
not the physical position of the BAR, so the logic here will be
slightly different (as you will need to adjust the mfn parameter of
{,un}map_mmio_regions() instead).

That's so you can do overlap checking in the guest address space, as
it's where the mappings will be created.

> +
>          /*
>           * ARM TODOs:
>           * - On ARM whether the memory is prefetchable or not should be passed
> @@ -52,8 +66,8 @@ static int cf_check map_range(
>           * - {un}map_mmio_regions doesn't support preemption.
>           */
>  
> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> +        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
> +                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
>          if ( rc == 0 )
>          {
>              *c += size;
> @@ -62,8 +76,8 @@ static int cf_check map_range(
>          if ( rc < 0 )
>          {
>              printk(XENLOG_G_WARNING
> -                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
> -                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
> +                   "Failed to %smap [%lx, %lx] for %pd: %d\n",
> +                   map->map ? "" : "un", s, e, map->d, rc);

I would also print the gfn -> mfn values if it's no longer an identity
map.

>              break;
>          }
>          ASSERT(rc < size);
> @@ -165,6 +179,7 @@ bool vpci_process_pending(struct vcpu *v)
>              if ( rangeset_is_empty(bar->mem) )
>                  continue;
>  
> +            data.bar = bar;

Please init the .bar field at declaration, like it's done for the rest
of the field.  It doesn't matter if the BAR turns out to be empty
(same below).

Thanks, Roger.
Jan Beulich July 24, 2023, 10:30 a.m. UTC | #2
On 21.07.2023 15:05, Roger Pau Monné wrote:
> On Thu, Jul 20, 2023 at 12:32:33AM +0000, Volodymyr Babchuk wrote:
>> @@ -62,8 +76,8 @@ static int cf_check map_range(
>>          if ( rc < 0 )
>>          {
>>              printk(XENLOG_G_WARNING
>> -                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
>> -                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
>> +                   "Failed to %smap [%lx, %lx] for %pd: %d\n",
>> +                   map->map ? "" : "un", s, e, map->d, rc);
> 
> I would also print the gfn -> mfn values if it's no longer an identity
> map.

And also the actual range - it's not [s,e] anymore.

Jan
Jan Beulich July 24, 2023, 10:43 a.m. UTC | #3
On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> @@ -52,8 +66,8 @@ static int cf_check map_range(
>           * - {un}map_mmio_regions doesn't support preemption.
>           */
>  
> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> +        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
> +                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));

Aiui this is the first direct exposure of these functions to DomU-s;
so far all calls were Xen-internal or from a domctl. There are a
couple of Arm TODOs listed in the comment ahead, but I'm not sure
that's all what is lacking here, and it's unclear whether this can
sensibly be left as a follow-on activity (at the very least known
open issues need mentioning as TODOs).

For example the x86 function truncates an unsigned long local
variable to (signed) int in its main return statement. This may for
the moment still be only a theoretical issue, but will need dealing
with sooner or later, I think.

Furthermore this yet again allows DomU-s to fiddle with their p2m.
To a degree this is unavoidable, I suppose. But some thought may
need putting into this anyway. Aiui on real hardware if a BAR is
placed over RAM, behavior is simply undefined. Once the BAR is
moved away though, behavior will become defined again: The RAM will
"reappear" in case the earlier undefined-ness made it disappear. I
don't know how the Arm variants of the functions behave, but on x86
the RAM pages will disappear from the guest's p2m upon putting a
BAR there, but they won't reappear upon unmapping of the BAR.

Luckily at least preemption looks to be handled in a satisfactory
manner already.

Jan
Roger Pau Monné July 24, 2023, 1:16 p.m. UTC | #4
On Mon, Jul 24, 2023 at 12:43:26PM +0200, Jan Beulich wrote:
> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> > @@ -52,8 +66,8 @@ static int cf_check map_range(
> >           * - {un}map_mmio_regions doesn't support preemption.
> >           */
> >  
> > -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> > -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> > +        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
> > +                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
> 
> Aiui this is the first direct exposure of these functions to DomU-s;

I guess it depends on how direct you consider exposure from
XEN_DOMCTL_memory_mapping hypercall, as that's what gets called by
QEMU also in order to set up BAR mappings.

> so far all calls were Xen-internal or from a domctl. There are a
> couple of Arm TODOs listed in the comment ahead, but I'm not sure
> that's all what is lacking here, and it's unclear whether this can
> sensibly be left as a follow-on activity (at the very least known
> open issues need mentioning as TODOs).
> 
> For example the x86 function truncates an unsigned long local
> variable to (signed) int in its main return statement. This may for
> the moment still be only a theoretical issue, but will need dealing
> with sooner or later, I think.

One bit that we need to add is the iomem_access_permitted() plus the
xsm_iomem_mapping() checks to map_range().

> Furthermore this yet again allows DomU-s to fiddle with their p2m.
> To a degree this is unavoidable, I suppose. But some thought may
> need putting into this anyway. Aiui on real hardware if a BAR is
> placed over RAM, behavior is simply undefined. Once the BAR is
> moved away though, behavior will become defined again: The RAM will
> "reappear" in case the earlier undefined-ness made it disappear. I
> don't know how the Arm variants of the functions behave, but on x86
> the RAM pages will disappear from the guest's p2m upon putting a
> BAR there, but they won't reappear upon unmapping of the BAR.

Yeah, that's unfortunate, but I'm afraid it's the same behavior when
using QEMU, so I wouldn't consider it strictly a regression from the
current handling that we do for BARs when doing PCI passthrough.

Furthermore, I don't see any easy way to deal with this so that RAM
can be re-added when the BAR is re positioned to not overlap a RAM
region.

> Luckily at least preemption looks to be handled in a satisfactory
> manner already.

I spend quite a lot of time trying to make sure this was at least
attempted to solve properly.

Thanks, Roger.
Jan Beulich July 24, 2023, 1:31 p.m. UTC | #5
On 24.07.2023 15:16, Roger Pau Monné wrote:
> On Mon, Jul 24, 2023 at 12:43:26PM +0200, Jan Beulich wrote:
>> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
>>> @@ -52,8 +66,8 @@ static int cf_check map_range(
>>>           * - {un}map_mmio_regions doesn't support preemption.
>>>           */
>>>  
>>> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
>>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
>>> +        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
>>> +                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
>>
>> Aiui this is the first direct exposure of these functions to DomU-s;
> 
> I guess it depends on how direct you consider exposure from
> XEN_DOMCTL_memory_mapping hypercall, as that's what gets called by
> QEMU also in order to set up BAR mappings.

Fair point - it is one of the few domctls not covered by XSA-77.

>> so far all calls were Xen-internal or from a domctl. There are a
>> couple of Arm TODOs listed in the comment ahead, but I'm not sure
>> that's all what is lacking here, and it's unclear whether this can
>> sensibly be left as a follow-on activity (at the very least known
>> open issues need mentioning as TODOs).
>>
>> For example the x86 function truncates an unsigned long local
>> variable to (signed) int in its main return statement. This may for
>> the moment still be only a theoretical issue, but will need dealing
>> with sooner or later, I think.
> 
> One bit that we need to add is the iomem_access_permitted() plus the
> xsm_iomem_mapping() checks to map_range().

The former would just be reassurance, wouldn't it? Assigning a PCI
device surely implies granting access to all its BARs (minus the
MSI-X page(s), if any). The latter would of course be more
"interesting", as XSM could in principle interject.

Jan
Roger Pau Monné July 24, 2023, 1:42 p.m. UTC | #6
On Mon, Jul 24, 2023 at 03:31:56PM +0200, Jan Beulich wrote:
> On 24.07.2023 15:16, Roger Pau Monné wrote:
> > On Mon, Jul 24, 2023 at 12:43:26PM +0200, Jan Beulich wrote:
> >> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> >>> @@ -52,8 +66,8 @@ static int cf_check map_range(
> >>>           * - {un}map_mmio_regions doesn't support preemption.
> >>>           */
> >>>  
> >>> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> >>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> >>> +        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
> >>> +                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
> >>
> >> Aiui this is the first direct exposure of these functions to DomU-s;
> > 
> > I guess it depends on how direct you consider exposure from
> > XEN_DOMCTL_memory_mapping hypercall, as that's what gets called by
> > QEMU also in order to set up BAR mappings.
> 
> Fair point - it is one of the few domctls not covered by XSA-77.
> 
> >> so far all calls were Xen-internal or from a domctl. There are a
> >> couple of Arm TODOs listed in the comment ahead, but I'm not sure
> >> that's all what is lacking here, and it's unclear whether this can
> >> sensibly be left as a follow-on activity (at the very least known
> >> open issues need mentioning as TODOs).
> >>
> >> For example the x86 function truncates an unsigned long local
> >> variable to (signed) int in its main return statement. This may for
> >> the moment still be only a theoretical issue, but will need dealing
> >> with sooner or later, I think.
> > 
> > One bit that we need to add is the iomem_access_permitted() plus the
> > xsm_iomem_mapping() checks to map_range().
> 
> The former would just be reassurance, wouldn't it? Assigning a PCI
> device surely implies granting access to all its BARs (minus the
> MSI-X page(s), if any).

Indeed.  But for consistency we need to match the same checks that are
done in XEN_DOMCTL_memory_mapping.

> The latter would of course be more
> "interesting", as XSM could in principle interject.

Yes, we need both.  In fact I'm just writing a patch to add them
straight away.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index eb07fa0bb2..e1a448b674 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -30,6 +30,7 @@ 
 
 struct map_data {
     struct domain *d;
+    const struct vpci_bar *bar;
     bool map;
 };
 
@@ -41,8 +42,21 @@  static int cf_check map_range(
 
     for ( ; ; )
     {
+        /* Start address of the BAR as seen by the guest. */
+        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
+                                        ? map->bar->addr
+                                        : map->bar->guest_reg));
+        /* Physical start address of the BAR. */
+        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
         unsigned long size = e - s + 1;
 
+        /*
+         * Ranges to be mapped don't always start at the BAR start address, as
+         * there can be holes or partially consumed ranges. Account for the
+         * offset of the current address from the BAR start.
+         */
+        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
+
         /*
          * ARM TODOs:
          * - On ARM whether the memory is prefetchable or not should be passed
@@ -52,8 +66,8 @@  static int cf_check map_range(
          * - {un}map_mmio_regions doesn't support preemption.
          */
 
-        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
+                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
         if ( rc == 0 )
         {
             *c += size;
@@ -62,8 +76,8 @@  static int cf_check map_range(
         if ( rc < 0 )
         {
             printk(XENLOG_G_WARNING
-                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+                   "Failed to %smap [%lx, %lx] for %pd: %d\n",
+                   map->map ? "" : "un", s, e, map->d, rc);
             break;
         }
         ASSERT(rc < size);
@@ -165,6 +179,7 @@  bool vpci_process_pending(struct vcpu *v)
             if ( rangeset_is_empty(bar->mem) )
                 continue;
 
+            data.bar = bar;
             rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
             if ( rc == -ERESTART )
@@ -228,6 +243,7 @@  static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
         if ( rangeset_is_empty(bar->mem) )
             continue;
 
+        data.bar = bar;
         while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
                                               &data)) == -ERESTART )
         {