diff mbox series

[RFC,XEN,2/6] vpci: accept BAR writes if dom0 is PVH

Message ID 20230312075455.450187-3-ray.huang@amd.com (mailing list archive)
State New, archived
Headers show
Series Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 | expand

Commit Message

Huang Rui March 12, 2023, 7:54 a.m. UTC
From: Chen Jiqian <Jiqian.Chen@amd.com>

When dom0 is PVH and we want to passthrough gpu to guest,
we should allow BAR writes even through BAR is mapped. If
not, the value of BARs are not initialized when guest firstly
start.

Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 xen/drivers/vpci/header.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian König March 13, 2023, 7:23 a.m. UTC | #1
Am 12.03.23 um 08:54 schrieb Huang Rui:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
>
> When dom0 is PVH and we want to passthrough gpu to guest,
> we should allow BAR writes even through BAR is mapped. If
> not, the value of BARs are not initialized when guest firstly
> start.
>
> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   xen/drivers/vpci/header.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ec2e978a4e..918d11fbce 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>        * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>        * writes as long as the BAR is not mapped into the p2m.
>        */
> -    if ( bar->enabled )
> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )

Checkpath.pl gives here:

ERROR: space prohibited after that open parenthesis '('
#115: FILE: xen/drivers/vpci/header.c:395:
+    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )

Christian.


>       {
>           /* If the value written is the current one avoid printing a warning. */
>           if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
Christian König March 13, 2023, 7:26 a.m. UTC | #2
Am 13.03.23 um 08:23 schrieb Christian König:
>
>
> Am 12.03.23 um 08:54 schrieb Huang Rui:
>> From: Chen Jiqian <Jiqian.Chen@amd.com>
>>
>> When dom0 is PVH and we want to passthrough gpu to guest,
>> we should allow BAR writes even through BAR is mapped. If
>> not, the value of BARs are not initialized when guest firstly
>> start.
>>
>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> ---
>>   xen/drivers/vpci/header.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ec2e978a4e..918d11fbce 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>>        * Xen only cares whether the BAR is mapped into the p2m, so 
>> allow BAR
>>        * writes as long as the BAR is not mapped into the p2m.
>>        */
>> -    if ( bar->enabled )
>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & 
>> PCI_COMMAND_MEMORY )
>
> Checkpath.pl gives here:
>
> ERROR: space prohibited after that open parenthesis '('
> #115: FILE: xen/drivers/vpci/header.c:395:
> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )

But I should probably mention that I'm not 100% sure if this code base 
uses kernel coding style!

Christian.

>
> Christian.
>
>
>>       {
>>           /* If the value written is the current one avoid printing a 
>> warning. */
>>           if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>
Jan Beulich March 13, 2023, 8:46 a.m. UTC | #3
On 13.03.2023 08:26, Christian König wrote:
> Am 13.03.23 um 08:23 schrieb Christian König:
>> Am 12.03.23 um 08:54 schrieb Huang Rui:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>>>        * Xen only cares whether the BAR is mapped into the p2m, so 
>>> allow BAR
>>>        * writes as long as the BAR is not mapped into the p2m.
>>>        */
>>> -    if ( bar->enabled )
>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & 
>>> PCI_COMMAND_MEMORY )
>>
>> Checkpath.pl gives here:
>>
>> ERROR: space prohibited after that open parenthesis '('
>> #115: FILE: xen/drivers/vpci/header.c:395:
>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> 
> But I should probably mention that I'm not 100% sure if this code base 
> uses kernel coding style!

It doesn't - see ./CODING_STYLE.

Jan
Huang Rui March 13, 2023, 8:55 a.m. UTC | #4
On Mon, Mar 13, 2023 at 03:26:09PM +0800, Koenig, Christian wrote:
> Am 13.03.23 um 08:23 schrieb Christian König:
> >
> >
> > Am 12.03.23 um 08:54 schrieb Huang Rui:
> >> From: Chen Jiqian <Jiqian.Chen@amd.com>
> >>
> >> When dom0 is PVH and we want to passthrough gpu to guest,
> >> we should allow BAR writes even through BAR is mapped. If
> >> not, the value of BARs are not initialized when guest firstly
> >> start.
> >>
> >> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> >> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >> ---
> >>   xen/drivers/vpci/header.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> >> index ec2e978a4e..918d11fbce 100644
> >> --- a/xen/drivers/vpci/header.c
> >> +++ b/xen/drivers/vpci/header.c
> >> @@ -392,7 +392,7 @@ static void cf_check bar_write(
> >>        * Xen only cares whether the BAR is mapped into the p2m, so 
> >> allow BAR
> >>        * writes as long as the BAR is not mapped into the p2m.
> >>        */
> >> -    if ( bar->enabled )
> >> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & 
> >> PCI_COMMAND_MEMORY )
> >
> > Checkpath.pl gives here:
> >
> > ERROR: space prohibited after that open parenthesis '('
> > #115: FILE: xen/drivers/vpci/header.c:395:
> > +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> 
> But I should probably mention that I'm not 100% sure if this code base 
> uses kernel coding style!
> 

I noticed that actully Xen's coding style was different with Linux kernel.

Thanks,
Ray
Jan Beulich March 14, 2023, 4:02 p.m. UTC | #5
On 12.03.2023 08:54, Huang Rui wrote:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
> 
> When dom0 is PVH and we want to passthrough gpu to guest,
> we should allow BAR writes even through BAR is mapped. If
> not, the value of BARs are not initialized when guest firstly
> start.

From this it doesn't become clear why a GPU would be special in this
regard, or what (if any) prior bug there was. Are you suggesting ...

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>       * writes as long as the BAR is not mapped into the p2m.
>       */
> -    if ( bar->enabled )
> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>      {
>          /* If the value written is the current one avoid printing a warning. */
>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )

... bar->enabled doesn't properly reflect the necessary state? It
generally shouldn't be necessary to look at the physical device's
state here.

Furthermore when you make a change in a case like this, the
accompanying comment also needs updating (which might have clarified
what, if anything, has been wrong).

Jan
Stefano Stabellini March 14, 2023, 11:42 p.m. UTC | #6
On Mon, 13 Mar 2023, Christian König wrote:
> Am 13.03.23 um 08:23 schrieb Christian König:
> > Am 12.03.23 um 08:54 schrieb Huang Rui:
> > > From: Chen Jiqian <Jiqian.Chen@amd.com>
> > > 
> > > When dom0 is PVH and we want to passthrough gpu to guest,
> > > we should allow BAR writes even through BAR is mapped. If
> > > not, the value of BARs are not initialized when guest firstly
> > > start.
> > > 
> > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> > > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > > ---
> > >   xen/drivers/vpci/header.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> > > index ec2e978a4e..918d11fbce 100644
> > > --- a/xen/drivers/vpci/header.c
> > > +++ b/xen/drivers/vpci/header.c
> > > @@ -392,7 +392,7 @@ static void cf_check bar_write(
> > >        * Xen only cares whether the BAR is mapped into the p2m, so allow
> > > BAR
> > >        * writes as long as the BAR is not mapped into the p2m.
> > >        */
> > > -    if ( bar->enabled )
> > > +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> > 
> > Checkpath.pl gives here:
> > 
> > ERROR: space prohibited after that open parenthesis '('
> > #115: FILE: xen/drivers/vpci/header.c:395:
> > +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> 
> But I should probably mention that I'm not 100% sure if this code base uses
> kernel coding style!

Hi Christian,

Thanks for taking a look at these patches! For better or for worse Xen
follows a different coding style from the Linux kernel (see CODING_STYLE
under xen.git).  In Xen we use:

    if ( rc != 0 ) 
    {
        return rc;
    }
Huang Rui March 21, 2023, 9:36 a.m. UTC | #7
Hi Jan,

On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
> On 12.03.2023 08:54, Huang Rui wrote:
> > From: Chen Jiqian <Jiqian.Chen@amd.com>
> > 
> > When dom0 is PVH and we want to passthrough gpu to guest,
> > we should allow BAR writes even through BAR is mapped. If
> > not, the value of BARs are not initialized when guest firstly
> > start.
> 
> From this it doesn't become clear why a GPU would be special in this
> regard, or what (if any) prior bug there was. Are you suggesting ...
> 

You're right. This is in fact a buggy we encountered while we start the
guest domU.

> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -392,7 +392,7 @@ static void cf_check bar_write(
> >       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> >       * writes as long as the BAR is not mapped into the p2m.
> >       */
> > -    if ( bar->enabled )
> > +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> >      {
> >          /* If the value written is the current one avoid printing a warning. */
> >          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> 
> ... bar->enabled doesn't properly reflect the necessary state? It
> generally shouldn't be necessary to look at the physical device's
> state here.
> 
> Furthermore when you make a change in a case like this, the
> accompanying comment also needs updating (which might have clarified
> what, if anything, has been wrong).
> 

That is the problem that we start domU at the first time, the enable flag
will be set while the passthrough device would like to write the real pcie
bar on the host. And yes, it's temporary workaround, we should figure out
the root cause.

Thanks,
Ray
Jan Beulich March 21, 2023, 9:41 a.m. UTC | #8
On 21.03.2023 10:36, Huang Rui wrote:
> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
>> On 12.03.2023 08:54, Huang Rui wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>>>       * writes as long as the BAR is not mapped into the p2m.
>>>       */
>>> -    if ( bar->enabled )
>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>>      {
>>>          /* If the value written is the current one avoid printing a warning. */
>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>
>> ... bar->enabled doesn't properly reflect the necessary state? It
>> generally shouldn't be necessary to look at the physical device's
>> state here.
>>
>> Furthermore when you make a change in a case like this, the
>> accompanying comment also needs updating (which might have clarified
>> what, if anything, has been wrong).
>>
> 
> That is the problem that we start domU at the first time, the enable flag
> will be set while the passthrough device would like to write the real pcie
> bar on the host.

A pass-through device (i.e. one already owned by a DomU) should never
be allowed to write to the real BAR. But it's not clear whether I'm not
misinterpreting what you said ...

> And yes, it's temporary workaround, we should figure out
> the root cause.

Right, that's the only way to approach this, imo.

Jan
Huang Rui March 21, 2023, 10:14 a.m. UTC | #9
On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
> On 21.03.2023 10:36, Huang Rui wrote:
> > On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
> >> On 12.03.2023 08:54, Huang Rui wrote:
> >>> --- a/xen/drivers/vpci/header.c
> >>> +++ b/xen/drivers/vpci/header.c
> >>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
> >>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> >>>       * writes as long as the BAR is not mapped into the p2m.
> >>>       */
> >>> -    if ( bar->enabled )
> >>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> >>>      {
> >>>          /* If the value written is the current one avoid printing a warning. */
> >>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> >>
> >> ... bar->enabled doesn't properly reflect the necessary state? It
> >> generally shouldn't be necessary to look at the physical device's
> >> state here.
> >>
> >> Furthermore when you make a change in a case like this, the
> >> accompanying comment also needs updating (which might have clarified
> >> what, if anything, has been wrong).
> >>
> > 
> > That is the problem that we start domU at the first time, the enable flag
> > will be set while the passthrough device would like to write the real pcie
> > bar on the host.
> 
> A pass-through device (i.e. one already owned by a DomU) should never
> be allowed to write to the real BAR. But it's not clear whether I'm not
> misinterpreting what you said ...
> 

OK. Thanks to clarify this. May I know how does a passthrough device modify
pci bar with correct behavior on Xen?

Thanks,
Ray

> > And yes, it's temporary workaround, we should figure out
> > the root cause.
> 
> Right, that's the only way to approach this, imo.
> 
> Jan
Jan Beulich March 21, 2023, 10:20 a.m. UTC | #10
On 21.03.2023 11:14, Huang Rui wrote:
> On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
>> On 21.03.2023 10:36, Huang Rui wrote:
>>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
>>>> On 12.03.2023 08:54, Huang Rui wrote:
>>>>> --- a/xen/drivers/vpci/header.c
>>>>> +++ b/xen/drivers/vpci/header.c
>>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>>>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>>>>>       * writes as long as the BAR is not mapped into the p2m.
>>>>>       */
>>>>> -    if ( bar->enabled )
>>>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>>>>      {
>>>>>          /* If the value written is the current one avoid printing a warning. */
>>>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>>>
>>>> ... bar->enabled doesn't properly reflect the necessary state? It
>>>> generally shouldn't be necessary to look at the physical device's
>>>> state here.
>>>>
>>>> Furthermore when you make a change in a case like this, the
>>>> accompanying comment also needs updating (which might have clarified
>>>> what, if anything, has been wrong).
>>>>
>>>
>>> That is the problem that we start domU at the first time, the enable flag
>>> will be set while the passthrough device would like to write the real pcie
>>> bar on the host.
>>
>> A pass-through device (i.e. one already owned by a DomU) should never
>> be allowed to write to the real BAR. But it's not clear whether I'm not
>> misinterpreting what you said ...
>>
> 
> OK. Thanks to clarify this. May I know how does a passthrough device modify
> pci bar with correct behavior on Xen?

A pass-through device may write to the virtual BAR, changing where in its
own memory space the MMIO range appears. But it cannot (and may not) alter
where in host memory space the (physical) MMIO range appears.

Jan
Huang Rui March 21, 2023, 11:49 a.m. UTC | #11
On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote:
> On 21.03.2023 11:14, Huang Rui wrote:
> > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
> >> On 21.03.2023 10:36, Huang Rui wrote:
> >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
> >>>> On 12.03.2023 08:54, Huang Rui wrote:
> >>>>> --- a/xen/drivers/vpci/header.c
> >>>>> +++ b/xen/drivers/vpci/header.c
> >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
> >>>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> >>>>>       * writes as long as the BAR is not mapped into the p2m.
> >>>>>       */
> >>>>> -    if ( bar->enabled )
> >>>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> >>>>>      {
> >>>>>          /* If the value written is the current one avoid printing a warning. */
> >>>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> >>>>
> >>>> ... bar->enabled doesn't properly reflect the necessary state? It
> >>>> generally shouldn't be necessary to look at the physical device's
> >>>> state here.
> >>>>
> >>>> Furthermore when you make a change in a case like this, the
> >>>> accompanying comment also needs updating (which might have clarified
> >>>> what, if anything, has been wrong).
> >>>>
> >>>
> >>> That is the problem that we start domU at the first time, the enable flag
> >>> will be set while the passthrough device would like to write the real pcie
> >>> bar on the host.
> >>
> >> A pass-through device (i.e. one already owned by a DomU) should never
> >> be allowed to write to the real BAR. But it's not clear whether I'm not
> >> misinterpreting what you said ...
> >>
> > 
> > OK. Thanks to clarify this. May I know how does a passthrough device modify
> > pci bar with correct behavior on Xen?
> 
> A pass-through device may write to the virtual BAR, changing where in its
> own memory space the MMIO range appears. But it cannot (and may not) alter
> where in host memory space the (physical) MMIO range appears.
> 

Thanks, but we found if dom0 is PV domain, the passthrough device will
access this function to write the real bar.

Thanks,
Ray
Roger Pau Monné March 21, 2023, 12:20 p.m. UTC | #12
On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote:
> On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote:
> > On 21.03.2023 11:14, Huang Rui wrote:
> > > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
> > >> On 21.03.2023 10:36, Huang Rui wrote:
> > >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
> > >>>> On 12.03.2023 08:54, Huang Rui wrote:
> > >>>>> --- a/xen/drivers/vpci/header.c
> > >>>>> +++ b/xen/drivers/vpci/header.c
> > >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
> > >>>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> > >>>>>       * writes as long as the BAR is not mapped into the p2m.
> > >>>>>       */
> > >>>>> -    if ( bar->enabled )
> > >>>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> > >>>>>      {
> > >>>>>          /* If the value written is the current one avoid printing a warning. */
> > >>>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> > >>>>
> > >>>> ... bar->enabled doesn't properly reflect the necessary state? It
> > >>>> generally shouldn't be necessary to look at the physical device's
> > >>>> state here.
> > >>>>
> > >>>> Furthermore when you make a change in a case like this, the
> > >>>> accompanying comment also needs updating (which might have clarified
> > >>>> what, if anything, has been wrong).
> > >>>>
> > >>>
> > >>> That is the problem that we start domU at the first time, the enable flag
> > >>> will be set while the passthrough device would like to write the real pcie
> > >>> bar on the host.
> > >>
> > >> A pass-through device (i.e. one already owned by a DomU) should never
> > >> be allowed to write to the real BAR. But it's not clear whether I'm not
> > >> misinterpreting what you said ...
> > >>
> > > 
> > > OK. Thanks to clarify this. May I know how does a passthrough device modify
> > > pci bar with correct behavior on Xen?
> > 
> > A pass-through device may write to the virtual BAR, changing where in its
> > own memory space the MMIO range appears. But it cannot (and may not) alter
> > where in host memory space the (physical) MMIO range appears.
> > 
> 
> Thanks, but we found if dom0 is PV domain, the passthrough device will
> access this function to write the real bar.

I'm very confused now, are you trying to use vPCI with HVM domains?

As I understood it you are attempting to enable PCI passthrough for
HVM guests from a PVH dom0, but now you say your dom0 is PV?

Thanks, Roger.
Jan Beulich March 21, 2023, 12:25 p.m. UTC | #13
On 21.03.2023 13:20, Roger Pau Monné wrote:
> On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote:
>> On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote:
>>> On 21.03.2023 11:14, Huang Rui wrote:
>>>> On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
>>>>> On 21.03.2023 10:36, Huang Rui wrote:
>>>>>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
>>>>>>> On 12.03.2023 08:54, Huang Rui wrote:
>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
>>>>>>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
>>>>>>>>       * writes as long as the BAR is not mapped into the p2m.
>>>>>>>>       */
>>>>>>>> -    if ( bar->enabled )
>>>>>>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>>>>>>>      {
>>>>>>>>          /* If the value written is the current one avoid printing a warning. */
>>>>>>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>>>>>>
>>>>>>> ... bar->enabled doesn't properly reflect the necessary state? It
>>>>>>> generally shouldn't be necessary to look at the physical device's
>>>>>>> state here.
>>>>>>>
>>>>>>> Furthermore when you make a change in a case like this, the
>>>>>>> accompanying comment also needs updating (which might have clarified
>>>>>>> what, if anything, has been wrong).
>>>>>>>
>>>>>>
>>>>>> That is the problem that we start domU at the first time, the enable flag
>>>>>> will be set while the passthrough device would like to write the real pcie
>>>>>> bar on the host.
>>>>>
>>>>> A pass-through device (i.e. one already owned by a DomU) should never
>>>>> be allowed to write to the real BAR. But it's not clear whether I'm not
>>>>> misinterpreting what you said ...
>>>>>
>>>>
>>>> OK. Thanks to clarify this. May I know how does a passthrough device modify
>>>> pci bar with correct behavior on Xen?
>>>
>>> A pass-through device may write to the virtual BAR, changing where in its
>>> own memory space the MMIO range appears. But it cannot (and may not) alter
>>> where in host memory space the (physical) MMIO range appears.
>>>
>>
>> Thanks, but we found if dom0 is PV domain, the passthrough device will
>> access this function to write the real bar.
> 
> I'm very confused now, are you trying to use vPCI with HVM domains?
> 
> As I understood it you are attempting to enable PCI passthrough for
> HVM guests from a PVH dom0, but now you say your dom0 is PV?

I didn't read it like this. Instead my way of understanding the reply
is that they try to mimic on PVH Dom0 what they observe on PV Dom0.

Jan
Jan Beulich March 21, 2023, 12:27 p.m. UTC | #14
On 21.03.2023 12:49, Huang Rui wrote:
> Thanks, but we found if dom0 is PV domain, the passthrough device will
> access this function to write the real bar.

Can you please be quite a bit more detailed about this? The specific code
paths taken (in upstream software) to result in such would of of interest.

Jan
Huang Rui March 21, 2023, 12:59 p.m. UTC | #15
On Tue, Mar 21, 2023 at 08:20:53PM +0800, Roger Pau Monné wrote:
> On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote:
> > On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote:
> > > On 21.03.2023 11:14, Huang Rui wrote:
> > > > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote:
> > > >> On 21.03.2023 10:36, Huang Rui wrote:
> > > >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote:
> > > >>>> On 12.03.2023 08:54, Huang Rui wrote:
> > > >>>>> --- a/xen/drivers/vpci/header.c
> > > >>>>> +++ b/xen/drivers/vpci/header.c
> > > >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write(
> > > >>>>>       * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
> > > >>>>>       * writes as long as the BAR is not mapped into the p2m.
> > > >>>>>       */
> > > >>>>> -    if ( bar->enabled )
> > > >>>>> +    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> > > >>>>>      {
> > > >>>>>          /* If the value written is the current one avoid printing a warning. */
> > > >>>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> > > >>>>
> > > >>>> ... bar->enabled doesn't properly reflect the necessary state? It
> > > >>>> generally shouldn't be necessary to look at the physical device's
> > > >>>> state here.
> > > >>>>
> > > >>>> Furthermore when you make a change in a case like this, the
> > > >>>> accompanying comment also needs updating (which might have clarified
> > > >>>> what, if anything, has been wrong).
> > > >>>>
> > > >>>
> > > >>> That is the problem that we start domU at the first time, the enable flag
> > > >>> will be set while the passthrough device would like to write the real pcie
> > > >>> bar on the host.
> > > >>
> > > >> A pass-through device (i.e. one already owned by a DomU) should never
> > > >> be allowed to write to the real BAR. But it's not clear whether I'm not
> > > >> misinterpreting what you said ...
> > > >>
> > > > 
> > > > OK. Thanks to clarify this. May I know how does a passthrough device modify
> > > > pci bar with correct behavior on Xen?
> > > 
> > > A pass-through device may write to the virtual BAR, changing where in its
> > > own memory space the MMIO range appears. But it cannot (and may not) alter
> > > where in host memory space the (physical) MMIO range appears.
> > > 
> > 
> > Thanks, but we found if dom0 is PV domain, the passthrough device will
> > access this function to write the real bar.
> 
> I'm very confused now, are you trying to use vPCI with HVM domains?

We are using QEMU for passthrough at this moment.

> 
> As I understood it you are attempting to enable PCI passthrough for
> HVM guests from a PVH dom0, but now you say your dom0 is PV?
> 

Ah, sorry to make you confused, you're right. I am using PVH dom0 + HVM
domU. But we are comparing passthrough function on PV dom0 + HVM domU as a
reference.

Thanks,
Ray
Huang Rui March 21, 2023, 1:03 p.m. UTC | #16
On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote:
> On 21.03.2023 12:49, Huang Rui wrote:
> > Thanks, but we found if dom0 is PV domain, the passthrough device will
> > access this function to write the real bar.
> 
> Can you please be quite a bit more detailed about this? The specific code
> paths taken (in upstream software) to result in such would of of interest.
> 

yes, please wait for a moment. let me capture a trace dump in my side.

Thanks,
Ray
Huang Rui March 22, 2023, 7:28 a.m. UTC | #17
On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote:
> On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote:
> > On 21.03.2023 12:49, Huang Rui wrote:
> > > Thanks, but we found if dom0 is PV domain, the passthrough device will
> > > access this function to write the real bar.
> > 
> > Can you please be quite a bit more detailed about this? The specific code
> > paths taken (in upstream software) to result in such would of of interest.
> > 
> 
> yes, please wait for a moment. let me capture a trace dump in my side.
> 

Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called,
please ignore above information.

While xen is on initialization on PVH dom0, it will add all PCI devices in
the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1
(Audio device).

Audio is another function in the pcie device, but we won't use it here. So
we will remove it after that.

Please see below xl dmesg:

(XEN) PCI add device 0000:03:00.0
(XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
(XEN) PCI add device 0000:03:00.1
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
(XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
(XEN) PCI add device 0000:04:00.0

...

(XEN) PCI add device 0000:07:00.7
(XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented
(XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented
(XEN) PCI remove device 0000:03:00.1

We run below script to remove audio

echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove

(XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented
(XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented

Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as
passthrough. At this moment, the real bar is trying to be written.

(XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
(XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
(XEN) Xen WARN at drivers/vpci/header.c:408
(XEN) ----[ Xen-4.18-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    8
(XEN) RIP:    e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v7)
(XEN) rax: ffff8303fc36d06c   rbx: ffff8303f90468b0   rcx: 0000000000000010
(XEN) rdx: 0000000000000002   rsi: ffff8303fc36a020   rdi: ffff8303fc36a018
(XEN) rbp: ffff8303fc367c18   rsp: ffff8303fc367be8   r8:  0000000000000001
(XEN) r9:  ffff8303fc36a010   r10: 0000000000000001   r11: 0000000000000001
(XEN) r12: 00000000d0700000   r13: ffff8303fc6d9230   r14: ffff8303fc6d9270
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003506e0
(XEN) cr3: 00000003fc3c4000   cr2: 00007f180f6371e8
(XEN) fsb: 00007fce655edbc0   gsb: ffff88822f3c0000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce):
(XEN)  b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9
(XEN) Xen stack trace from rsp=ffff8303fc367be8:
(XEN)    00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004
(XEN)    0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683
(XEN)    00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230
(XEN)    0000000000000000 0000000000000000 0000002400000004 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000004 00000000d0700000
(XEN)    0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0
(XEN)    ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000
(XEN)    ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0
(XEN)    ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000
(XEN)    ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000
(XEN)    0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004
(XEN)    ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357
(XEN)    0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000
(XEN)    0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000
(XEN)    0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44
(XEN)    ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000
(XEN)    ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000
(XEN)    0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44
(XEN)    0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490
(XEN)    000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000
(XEN) Xen call trace:
(XEN)    [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce
(XEN)    [<ffff82d040263683>] F vpci_write+0x123/0x26c
(XEN)    [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7
(XEN)    [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f
(XEN)    [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c
(XEN)    [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb
(XEN)    [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a
(XEN)    [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35
(XEN)    [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7
(XEN)    [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa
(XEN)    [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18
(XEN)
(XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
(XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1

Thanks,
Ray
Jan Beulich March 22, 2023, 7:45 a.m. UTC | #18
On 22.03.2023 08:28, Huang Rui wrote:
> On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote:
>> On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote:
>>> On 21.03.2023 12:49, Huang Rui wrote:
>>>> Thanks, but we found if dom0 is PV domain, the passthrough device will
>>>> access this function to write the real bar.
>>>
>>> Can you please be quite a bit more detailed about this? The specific code
>>> paths taken (in upstream software) to result in such would of of interest.
>>>
>>
>> yes, please wait for a moment. let me capture a trace dump in my side.
>>
> 
> Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called,
> please ignore above information.
> 
> While xen is on initialization on PVH dom0, it will add all PCI devices in
> the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1
> (Audio device).
> 
> Audio is another function in the pcie device, but we won't use it here. So
> we will remove it after that.
> 
> Please see below xl dmesg:
> 
> (XEN) PCI add device 0000:03:00.0
> (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> (XEN) PCI add device 0000:03:00.1
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) PCI add device 0000:04:00.0
> 
> ...
> 
> (XEN) PCI add device 0000:07:00.7
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented
> (XEN) PCI remove device 0000:03:00.1
> 
> We run below script to remove audio
> 
> echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove

Why would you do that? Aiui this is a preparatory step to hot-unplug
the device, which surely you don't mean to do. (But this is largely
unrelated to the issue at hand; I'm merely curious.)

> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented
> 
> Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as
> passthrough. At this moment, the real bar is trying to be written.

How do you conclude it's the "real" BAR? And where is this attempt
coming from? We refuse BAR updates for enabled BARs for a reason,
so possibly there's code elsewhere which needs adjusting.

> (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
> (XEN) Xen WARN at drivers/vpci/header.c:408

None of these exist in upstream code. Therefore, for the output you
supply to be meaningful, we also need to know what code changes you
made (which then tells us by how much line numbers have shifted, and
what e.g. the WARN_ON() condition is - it clearly isn't tied to
bar->enabled being true alone, or else there would have been a 2nd
instance at the bottom, unless of course you've stripped that).

Jan

> (XEN) ----[ Xen-4.18-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    8
> (XEN) RIP:    e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v7)
> (XEN) rax: ffff8303fc36d06c   rbx: ffff8303f90468b0   rcx: 0000000000000010
> (XEN) rdx: 0000000000000002   rsi: ffff8303fc36a020   rdi: ffff8303fc36a018
> (XEN) rbp: ffff8303fc367c18   rsp: ffff8303fc367be8   r8:  0000000000000001
> (XEN) r9:  ffff8303fc36a010   r10: 0000000000000001   r11: 0000000000000001
> (XEN) r12: 00000000d0700000   r13: ffff8303fc6d9230   r14: ffff8303fc6d9270
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003506e0
> (XEN) cr3: 00000003fc3c4000   cr2: 00007f180f6371e8
> (XEN) fsb: 00007fce655edbc0   gsb: ffff88822f3c0000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce):
> (XEN)  b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9
> (XEN) Xen stack trace from rsp=ffff8303fc367be8:
> (XEN)    00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004
> (XEN)    0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683
> (XEN)    00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230
> (XEN)    0000000000000000 0000000000000000 0000002400000004 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000004 00000000d0700000
> (XEN)    0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0
> (XEN)    ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000
> (XEN)    ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0
> (XEN)    ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000
> (XEN)    ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000
> (XEN)    0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004
> (XEN)    ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357
> (XEN)    0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000
> (XEN)    0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000
> (XEN)    0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44
> (XEN)    ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000
> (XEN)    ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000
> (XEN)    0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44
> (XEN)    0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490
> (XEN)    000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce
> (XEN)    [<ffff82d040263683>] F vpci_write+0x123/0x26c
> (XEN)    [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7
> (XEN)    [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f
> (XEN)    [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c
> (XEN)    [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb
> (XEN)    [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a
> (XEN)    [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35
> (XEN)    [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7
> (XEN)    [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa
> (XEN)    [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18
> (XEN)
> (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
> 
> Thanks,
> Ray
Roger Pau Monné March 22, 2023, 9:34 a.m. UTC | #19
On Wed, Mar 22, 2023 at 03:28:58PM +0800, Huang Rui wrote:
> On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote:
> > On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote:
> > > On 21.03.2023 12:49, Huang Rui wrote:
> > > > Thanks, but we found if dom0 is PV domain, the passthrough device will
> > > > access this function to write the real bar.
> > > 
> > > Can you please be quite a bit more detailed about this? The specific code
> > > paths taken (in upstream software) to result in such would of of interest.
> > > 
> > 
> > yes, please wait for a moment. let me capture a trace dump in my side.
> > 
> 
> Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called,
> please ignore above information.
> 
> While xen is on initialization on PVH dom0, it will add all PCI devices in
> the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1
> (Audio device).
> 
> Audio is another function in the pcie device, but we won't use it here. So
> we will remove it after that.
> 
> Please see below xl dmesg:
> 
> (XEN) PCI add device 0000:03:00.0
> (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> (XEN) PCI add device 0000:03:00.1
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> (XEN) PCI add device 0000:04:00.0
> 
> ...
> 
> (XEN) PCI add device 0000:07:00.7
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented
> (XEN) PCI remove device 0000:03:00.1
> 
> We run below script to remove audio
> 
> echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove
> 
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented
> (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented
> 
> Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as
> passthrough. At this moment, the real bar is trying to be written.
> 
> (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
> (XEN) Xen WARN at drivers/vpci/header.c:408
> (XEN) ----[ Xen-4.18-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    8
> (XEN) RIP:    e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce
> (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v7)
> (XEN) rax: ffff8303fc36d06c   rbx: ffff8303f90468b0   rcx: 0000000000000010
> (XEN) rdx: 0000000000000002   rsi: ffff8303fc36a020   rdi: ffff8303fc36a018
> (XEN) rbp: ffff8303fc367c18   rsp: ffff8303fc367be8   r8:  0000000000000001
> (XEN) r9:  ffff8303fc36a010   r10: 0000000000000001   r11: 0000000000000001
> (XEN) r12: 00000000d0700000   r13: ffff8303fc6d9230   r14: ffff8303fc6d9270
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003506e0
> (XEN) cr3: 00000003fc3c4000   cr2: 00007f180f6371e8
> (XEN) fsb: 00007fce655edbc0   gsb: ffff88822f3c0000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce):
> (XEN)  b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9
> (XEN) Xen stack trace from rsp=ffff8303fc367be8:
> (XEN)    00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004
> (XEN)    0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683
> (XEN)    00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230
> (XEN)    0000000000000000 0000000000000000 0000002400000004 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000004 00000000d0700000
> (XEN)    0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0
> (XEN)    ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000
> (XEN)    ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0
> (XEN)    ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000
> (XEN)    ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000
> (XEN)    0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004
> (XEN)    ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357
> (XEN)    0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000
> (XEN)    0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000
> (XEN)    0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44
> (XEN)    ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000
> (XEN)    ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000
> (XEN)    0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44
> (XEN)    0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490
> (XEN)    000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce
> (XEN)    [<ffff82d040263683>] F vpci_write+0x123/0x26c
> (XEN)    [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7
> (XEN)    [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f
> (XEN)    [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c
> (XEN)    [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb
> (XEN)    [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a
> (XEN)    [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35
> (XEN)    [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7
> (XEN)    [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa
> (XEN)    [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18
> (XEN)
> (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1

As said by Jan, it's hard to figure out where are the printks placed without a
diff of your changes.

So far the above seems to be expected, as we currently don't handle BAR
register writes with memory decoding enabled.

Given the change proposed in this patch, can you check whether `bar->enabled ==
true` but the PCI command register has the memory decoding bit unset?

If so it would mean Xen state got out-of-sync with the hardware state, and we
would need to figure out where it happened.  Is there any backdoor in the AMD
GPU that allows to disable memory decoding without using the PCI command
register?

Regards, Roger.
Huang Rui March 22, 2023, 12:33 p.m. UTC | #20
On Wed, Mar 22, 2023 at 05:34:41PM +0800, Roger Pau Monné wrote:
> On Wed, Mar 22, 2023 at 03:28:58PM +0800, Huang Rui wrote:
> > On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote:
> > > On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote:
> > > > On 21.03.2023 12:49, Huang Rui wrote:
> > > > > Thanks, but we found if dom0 is PV domain, the passthrough device will
> > > > > access this function to write the real bar.
> > > > 
> > > > Can you please be quite a bit more detailed about this? The specific code
> > > > paths taken (in upstream software) to result in such would of of interest.
> > > > 
> > > 
> > > yes, please wait for a moment. let me capture a trace dump in my side.
> > > 
> > 
> > Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called,
> > please ignore above information.
> > 
> > While xen is on initialization on PVH dom0, it will add all PCI devices in
> > the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1
> > (Audio device).
> > 
> > Audio is another function in the pcie device, but we won't use it here. So
> > we will remove it after that.
> > 
> > Please see below xl dmesg:
> > 
> > (XEN) PCI add device 0000:03:00.0
> > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0
> > (XEN) PCI add device 0000:03:00.1
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0
> > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0
> > (XEN) PCI add device 0000:04:00.0
> > 
> > ...
> > 
> > (XEN) PCI add device 0000:07:00.7
> > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented
> > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented
> > (XEN) PCI remove device 0000:03:00.1
> > 
> > We run below script to remove audio
> > 
> > echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove
> > 
> > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented
> > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented
> > 
> > Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as
> > passthrough. At this moment, the real bar is trying to be written.
> > 
> > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
> > (XEN) Xen WARN at drivers/vpci/header.c:408
> > (XEN) ----[ Xen-4.18-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    8
> > (XEN) RIP:    e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce
> > (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v7)
> > (XEN) rax: ffff8303fc36d06c   rbx: ffff8303f90468b0   rcx: 0000000000000010
> > (XEN) rdx: 0000000000000002   rsi: ffff8303fc36a020   rdi: ffff8303fc36a018
> > (XEN) rbp: ffff8303fc367c18   rsp: ffff8303fc367be8   r8:  0000000000000001
> > (XEN) r9:  ffff8303fc36a010   r10: 0000000000000001   r11: 0000000000000001
> > (XEN) r12: 00000000d0700000   r13: ffff8303fc6d9230   r14: ffff8303fc6d9270
> > (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000003506e0
> > (XEN) cr3: 00000003fc3c4000   cr2: 00007f180f6371e8
> > (XEN) fsb: 00007fce655edbc0   gsb: ffff88822f3c0000   gss: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce):
> > (XEN)  b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9
> > (XEN) Xen stack trace from rsp=ffff8303fc367be8:
> > (XEN)    00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004
> > (XEN)    0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683
> > (XEN)    00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230
> > (XEN)    0000000000000000 0000000000000000 0000002400000004 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000004 00000000d0700000
> > (XEN)    0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0
> > (XEN)    ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000
> > (XEN)    ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0
> > (XEN)    ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000
> > (XEN)    ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000
> > (XEN)    0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004
> > (XEN)    ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357
> > (XEN)    0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000
> > (XEN)    0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000
> > (XEN)    0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44
> > (XEN)    ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000
> > (XEN)    ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000
> > (XEN)    0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44
> > (XEN)    0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490
> > (XEN)    000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce
> > (XEN)    [<ffff82d040263683>] F vpci_write+0x123/0x26c
> > (XEN)    [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7
> > (XEN)    [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f
> > (XEN)    [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c
> > (XEN)    [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb
> > (XEN)    [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a
> > (XEN)    [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35
> > (XEN)    [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7
> > (XEN)    [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa
> > (XEN)    [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18
> > (XEN)
> > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1
> > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1
> 

Hi Jan, Roger,

> As said by Jan, it's hard to figure out where are the printks placed without a
> diff of your changes.

I attached the diff of my prints below, and I want to figure out why the
Bar_write() is called while we use pci-assignable-add to assign passthrough device in PVH dom0.


diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 918d11fbce..35447aff2a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -388,12 +388,14 @@ static void cf_check bar_write(
     else
         val &= PCI_BASE_ADDRESS_MEM_MASK;
 
+    gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled);
     /*
      * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
      * writes as long as the BAR is not mapped into the p2m.
      */
     if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
     {
+        gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled);
         /* If the value written is the current one avoid printing a warning. */
         if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
             gprintk(XENLOG_WARNING,
@@ -401,7 +403,9 @@ static void cf_check bar_write(
                     &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
         return;
     }
-
+    gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled);
+    if (bar->enabled)
+	WARN_ON(1);
 
     /*
      * Update the cached address, so that when memory decoding is enabled

> 
> So far the above seems to be expected, as we currently don't handle BAR
> register writes with memory decoding enabled.
> 
> Given the change proposed in this patch, can you check whether `bar->enabled ==
> true` but the PCI command register has the memory decoding bit unset?


I traced that while we do pci-assignable-add, we will follow below trace to
bind the passthrough device.

pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()

Then kernel xen-pciback driver want to add virtual configuration spaces. In
this phase, the bar_write() in xen hypervisor will be called. I still need
a bit more time to figure the exact reason. May I know where the
xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?

[  309.719049] xen_pciback: wants to seize 0000:03:00.0
[  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
[  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
[  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
[  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
[  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
[  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
[  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
[  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
[  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
[  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
[  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
[  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
[  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
[  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
[  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
[  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
[  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
[  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
[  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
[  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
[  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
[  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
[  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
[  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
[  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
[  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
[  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
[  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
[  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
[  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
[  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
[  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
[  462.911658] Already setup the GSI :28
[  462.911668] Already map the GSI :28 and IRQ: 115
[  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
[  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
[  463.954998] pciback 0000:03:00.0: xen_pciback: reset device


> 
> If so it would mean Xen state got out-of-sync with the hardware state, and we
> would need to figure out where it happened.  Is there any backdoor in the AMD
> GPU that allows to disable memory decoding without using the PCI command
> register?
> 

I don't think we have any backdoor.

Thanks,
Ray
Jan Beulich March 22, 2023, 12:48 p.m. UTC | #21
On 22.03.2023 13:33, Huang Rui wrote:
> I traced that while we do pci-assignable-add, we will follow below trace to
> bind the passthrough device.
> 
> pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> 
> Then kernel xen-pciback driver want to add virtual configuration spaces. In
> this phase, the bar_write() in xen hypervisor will be called. I still need
> a bit more time to figure the exact reason. May I know where the
> xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?

Any config space access would. And I might guess ...

> [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> [  462.911658] Already setup the GSI :28
> [  462.911668] Already map the GSI :28 and IRQ: 115
> [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device

... it is actually the reset here, saving and then restoring config space.
If e.g. that restore was done "blindly" (i.e. simply writing fields low to
high), then memory decode would be re-enabled before the BARs are written.

Jan
Huang Rui March 23, 2023, 10:26 a.m. UTC | #22
On Wed, Mar 22, 2023 at 08:48:30PM +0800, Jan Beulich wrote:
> On 22.03.2023 13:33, Huang Rui wrote:
> > I traced that while we do pci-assignable-add, we will follow below trace to
> > bind the passthrough device.
> > 
> > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> > 
> > Then kernel xen-pciback driver want to add virtual configuration spaces. In
> > this phase, the bar_write() in xen hypervisor will be called. I still need
> > a bit more time to figure the exact reason. May I know where the
> > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
> 
> Any config space access would. And I might guess ...
> 
> > [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> > [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> > [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> > [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> > [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> > [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> > [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> > [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> > [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> > [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> > [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> > [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> > [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> > [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> > [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> > [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> > [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> > [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> > [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> > [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> > [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> > [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> > [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> > [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> > [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> > [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> > [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> > [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> > [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> > [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> > [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> > [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> > [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> > [  462.911658] Already setup the GSI :28
> > [  462.911668] Already map the GSI :28 and IRQ: 115
> > [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> > [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> > [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
> 
> ... it is actually the reset here, saving and then restoring config space.
> If e.g. that restore was done "blindly" (i.e. simply writing fields low to
> high), then memory decode would be re-enabled before the BARs are written.
> 

Yes, we confirm the problem is while the xen-pciback driver initializes
passthrough device with pcistub_init_device() -> pci_restore_state() ->
pci_restore_config_space() -> pci_restore_config_space_range() ->
pci_restore_config_dword() -> pci_write_config_dword(), the pci config
write will trigger io interrupt to bar_write() in the xen, then bar->enable
is set, the write is not actually allowed.

May I know whether this behavior (restore) is expected? Or it should not
reset the device.

Thanks,
Ray
Roger Pau Monné March 23, 2023, 10:43 a.m. UTC | #23
On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote:
> On 22.03.2023 13:33, Huang Rui wrote:
> > I traced that while we do pci-assignable-add, we will follow below trace to
> > bind the passthrough device.
> > 
> > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> > 
> > Then kernel xen-pciback driver want to add virtual configuration spaces. In
> > this phase, the bar_write() in xen hypervisor will be called. I still need
> > a bit more time to figure the exact reason. May I know where the
> > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
> 
> Any config space access would. And I might guess ...
> 
> > [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> > [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> > [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> > [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> > [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> > [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> > [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> > [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> > [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> > [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> > [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> > [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> > [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> > [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> > [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> > [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> > [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> > [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> > [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> > [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> > [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> > [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> > [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> > [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> > [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> > [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> > [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> > [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> > [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> > [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> > [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> > [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> > [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> > [  462.911658] Already setup the GSI :28
> > [  462.911668] Already map the GSI :28 and IRQ: 115
> > [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> > [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> > [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
> 
> ... it is actually the reset here, saving and then restoring config space.
> If e.g. that restore was done "blindly" (i.e. simply writing fields low to
> high), then memory decode would be re-enabled before the BARs are written.

The problem is also that we don't tell vPCI that the device has been
reset, so the current cached state in pdev->vpci is all out of date
with the real device state.

I didn't hit this on my test because the device I was using had no
reset support.

I don't think it's feasible for Xen to detect all the possible reset
methods dom0 might use, as some of those are device specific for
example.

We would have to introduce a new hypercall that clears all vPCI device
state, PHYSDEVOP_pci_device_reset for example.  This will involve
adding proper cleanup functions, as the current code in
vpci_remove_device() only deals with allocated memory (because so far
devices where not deassigned) but we now also need to make sure
MSI(-X) interrupts are torn down and freed, and will also require
removing any mappings of BARs into the dom0 physmap.

Thanks, Roger.
Huang Rui March 23, 2023, 1:34 p.m. UTC | #24
On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote:
> On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote:
> > On 22.03.2023 13:33, Huang Rui wrote:
> > > I traced that while we do pci-assignable-add, we will follow below trace to
> > > bind the passthrough device.
> > > 
> > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> > > 
> > > Then kernel xen-pciback driver want to add virtual configuration spaces. In
> > > this phase, the bar_write() in xen hypervisor will be called. I still need
> > > a bit more time to figure the exact reason. May I know where the
> > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
> > 
> > Any config space access would. And I might guess ...
> > 
> > > [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> > > [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> > > [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> > > [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> > > [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> > > [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> > > [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> > > [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> > > [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> > > [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> > > [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> > > [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> > > [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> > > [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> > > [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> > > [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> > > [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> > > [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> > > [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> > > [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> > > [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> > > [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> > > [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> > > [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> > > [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> > > [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> > > [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> > > [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> > > [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> > > [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> > > [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> > > [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> > > [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> > > [  462.911658] Already setup the GSI :28
> > > [  462.911668] Already map the GSI :28 and IRQ: 115
> > > [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> > > [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> > > [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
> > 
> > ... it is actually the reset here, saving and then restoring config space.
> > If e.g. that restore was done "blindly" (i.e. simply writing fields low to
> > high), then memory decode would be re-enabled before the BARs are written.
> 
> The problem is also that we don't tell vPCI that the device has been
> reset, so the current cached state in pdev->vpci is all out of date
> with the real device state.
> 
> I didn't hit this on my test because the device I was using had no
> reset support.
> 
> I don't think it's feasible for Xen to detect all the possible reset
> methods dom0 might use, as some of those are device specific for
> example.

OK.

> 
> We would have to introduce a new hypercall that clears all vPCI device
> state, PHYSDEVOP_pci_device_reset for example.  This will involve
> adding proper cleanup functions, as the current code in
> vpci_remove_device() only deals with allocated memory (because so far
> devices where not deassigned) but we now also need to make sure
> MSI(-X) interrupts are torn down and freed, and will also require
> removing any mappings of BARs into the dom0 physmap.
> 

Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset
in the next version instead of current workaround.

The MSI(-X) interrupts doesn't work in our platform, I don't figure the
root cause yet. Could you please elaborate where we should require removing
any mappings of BARs into the dom0 physmap here?

Thanks,
Ray
Jan Beulich March 23, 2023, 2:16 p.m. UTC | #25
On 23.03.2023 11:26, Huang Rui wrote:
> On Wed, Mar 22, 2023 at 08:48:30PM +0800, Jan Beulich wrote:
>> On 22.03.2023 13:33, Huang Rui wrote:
>>> I traced that while we do pci-assignable-add, we will follow below trace to
>>> bind the passthrough device.
>>>
>>> pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
>>>
>>> Then kernel xen-pciback driver want to add virtual configuration spaces. In
>>> this phase, the bar_write() in xen hypervisor will be called. I still need
>>> a bit more time to figure the exact reason. May I know where the
>>> xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
>>
>> Any config space access would. And I might guess ...
>>
>>> [  309.719049] xen_pciback: wants to seize 0000:03:00.0
>>> [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
>>> [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
>>> [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
>>> [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
>>> [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
>>> [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
>>> [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
>>> [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
>>> [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
>>> [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
>>> [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
>>> [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
>>> [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
>>> [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
>>> [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
>>> [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
>>> [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
>>> [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
>>> [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
>>> [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
>>> [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
>>> [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
>>> [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
>>> [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
>>> [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
>>> [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
>>> [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
>>> [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
>>> [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
>>> [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
>>> [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
>>> [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
>>> [  462.911658] Already setup the GSI :28
>>> [  462.911668] Already map the GSI :28 and IRQ: 115
>>> [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
>>> [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
>>> [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
>>
>> ... it is actually the reset here, saving and then restoring config space.
>> If e.g. that restore was done "blindly" (i.e. simply writing fields low to
>> high), then memory decode would be re-enabled before the BARs are written.
>>
> 
> Yes, we confirm the problem is while the xen-pciback driver initializes
> passthrough device with pcistub_init_device() -> pci_restore_state() ->
> pci_restore_config_space() -> pci_restore_config_space_range() ->
> pci_restore_config_dword() -> pci_write_config_dword(), the pci config
> write will trigger io interrupt to bar_write() in the xen, then bar->enable
> is set, the write is not actually allowed.
> 
> May I know whether this behavior (restore) is expected? Or it should not
> reset the device.

The reset is expected. To expand slightly on Roger's reply: The reset we're
unaware of has likely indeed brought bar->enable and command register state
out of sync. For everything else see Roger's response.

Jan
Roger Pau Monné March 23, 2023, 4:23 p.m. UTC | #26
On Thu, Mar 23, 2023 at 09:34:40PM +0800, Huang Rui wrote:
> On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote:
> > On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote:
> > > On 22.03.2023 13:33, Huang Rui wrote:
> > > > I traced that while we do pci-assignable-add, we will follow below trace to
> > > > bind the passthrough device.
> > > > 
> > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> > > > 
> > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In
> > > > this phase, the bar_write() in xen hypervisor will be called. I still need
> > > > a bit more time to figure the exact reason. May I know where the
> > > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
> > > 
> > > Any config space access would. And I might guess ...
> > > 
> > > > [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> > > > [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> > > > [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> > > > [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> > > > [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> > > > [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> > > > [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> > > > [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> > > > [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> > > > [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> > > > [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> > > > [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> > > > [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> > > > [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> > > > [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> > > > [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> > > > [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> > > > [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> > > > [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> > > > [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> > > > [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> > > > [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> > > > [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> > > > [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> > > > [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> > > > [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> > > > [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> > > > [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> > > > [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> > > > [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> > > > [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> > > > [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> > > > [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> > > > [  462.911658] Already setup the GSI :28
> > > > [  462.911668] Already map the GSI :28 and IRQ: 115
> > > > [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> > > > [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> > > > [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
> > > 
> > > ... it is actually the reset here, saving and then restoring config space.
> > > If e.g. that restore was done "blindly" (i.e. simply writing fields low to
> > > high), then memory decode would be re-enabled before the BARs are written.
> > 
> > The problem is also that we don't tell vPCI that the device has been
> > reset, so the current cached state in pdev->vpci is all out of date
> > with the real device state.
> > 
> > I didn't hit this on my test because the device I was using had no
> > reset support.
> > 
> > I don't think it's feasible for Xen to detect all the possible reset
> > methods dom0 might use, as some of those are device specific for
> > example.
> 
> OK.
> 
> > 
> > We would have to introduce a new hypercall that clears all vPCI device
> > state, PHYSDEVOP_pci_device_reset for example.  This will involve
> > adding proper cleanup functions, as the current code in
> > vpci_remove_device() only deals with allocated memory (because so far
> > devices where not deassigned) but we now also need to make sure
> > MSI(-X) interrupts are torn down and freed, and will also require
> > removing any mappings of BARs into the dom0 physmap.
> > 
> 
> Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset
> in the next version instead of current workaround.
> 
> The MSI(-X) interrupts doesn't work in our platform, I don't figure the
> root cause yet.

Do MSI-X interrupts work when the device is in use by dom0 (both Pv
and PVH)?

> Could you please elaborate where we should require removing
> any mappings of BARs into the dom0 physmap here?

I think you can just use `modify_bars(pdev, 0, 0)`, as that will
effectively remove any BARs from the memory map.  That should also
take care of preemption, so you should be good to go.

Regards, Roger.
Huang Rui March 24, 2023, 4:37 a.m. UTC | #27
On Fri, Mar 24, 2023 at 12:23:39AM +0800, Roger Pau Monné wrote:
> On Thu, Mar 23, 2023 at 09:34:40PM +0800, Huang Rui wrote:
> > On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote:
> > > On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote:
> > > > On 22.03.2023 13:33, Huang Rui wrote:
> > > > > I traced that while we do pci-assignable-add, we will follow below trace to
> > > > > bind the passthrough device.
> > > > > 
> > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign()
> > > > > 
> > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In
> > > > > this phase, the bar_write() in xen hypervisor will be called. I still need
> > > > > a bit more time to figure the exact reason. May I know where the
> > > > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor?
> > > > 
> > > > Any config space access would. And I might guess ...
> > > > 
> > > > > [  309.719049] xen_pciback: wants to seize 0000:03:00.0
> > > > > [  462.911251] pciback 0000:03:00.0: xen_pciback: probing...
> > > > > [  462.911256] pciback 0000:03:00.0: xen_pciback: seizing device
> > > > > [  462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc
> > > > > [  462.911261] pciback 0000:03:00.0: xen_pciback: initializing...
> > > > > [  462.911263] pciback 0000:03:00.0: xen_pciback: initializing config
> > > > > [  462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space
> > > > > [  462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00
> > > > > [  462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02
> > > > > [  462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04
> > > > > [  462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c
> > > > > [  462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d
> > > > > [  462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c
> > > > > [  462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d
> > > > > [  462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f
> > > > > [  462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10
> > > > > [  462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14
> > > > > [  462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18
> > > > > [  462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c
> > > > > [  462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20
> > > > > [  462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24
> > > > > [  462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30
> > > > > [  462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50
> > > > > [  462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50
> > > > > [  462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52
> > > > > [  462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54
> > > > > [  462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56
> > > > > [  462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57
> > > > > [  462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0
> > > > > [  462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0
> > > > > [  462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2
> > > > > [  462.911405] pciback 0000:03:00.0: xen_pciback: enabling device
> > > > > [  462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007)
> > > > > [  462.911658] Already setup the GSI :28
> > > > > [  462.911668] Already map the GSI :28 and IRQ: 115
> > > > > [  462.911684] pciback 0000:03:00.0: xen_pciback: save state of device
> > > > > [  462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device
> > > > > [  463.954998] pciback 0000:03:00.0: xen_pciback: reset device
> > > > 
> > > > ... it is actually the reset here, saving and then restoring config space.
> > > > If e.g. that restore was done "blindly" (i.e. simply writing fields low to
> > > > high), then memory decode would be re-enabled before the BARs are written.
> > > 
> > > The problem is also that we don't tell vPCI that the device has been
> > > reset, so the current cached state in pdev->vpci is all out of date
> > > with the real device state.
> > > 
> > > I didn't hit this on my test because the device I was using had no
> > > reset support.
> > > 
> > > I don't think it's feasible for Xen to detect all the possible reset
> > > methods dom0 might use, as some of those are device specific for
> > > example.
> > 
> > OK.
> > 
> > > 
> > > We would have to introduce a new hypercall that clears all vPCI device
> > > state, PHYSDEVOP_pci_device_reset for example.  This will involve
> > > adding proper cleanup functions, as the current code in
> > > vpci_remove_device() only deals with allocated memory (because so far
> > > devices where not deassigned) but we now also need to make sure
> > > MSI(-X) interrupts are torn down and freed, and will also require
> > > removing any mappings of BARs into the dom0 physmap.
> > > 
> > 
> > Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset
> > in the next version instead of current workaround.
> > 
> > The MSI(-X) interrupts doesn't work in our platform, I don't figure the
> > root cause yet.
> 
> Do MSI-X interrupts work when the device is in use by dom0 (both Pv
> and PVH)?

Yes, dom0 works well. But they don't work on passthrough devices in domU
whatever with PV or PVH. So I would like to implement the gsi firstly, then
continue checking the MSI(-X) issues.

> 
> > Could you please elaborate where we should require removing
> > any mappings of BARs into the dom0 physmap here?
> 
> I think you can just use `modify_bars(pdev, 0, 0)`, as that will
> effectively remove any BARs from the memory map.  That should also
> take care of preemption, so you should be good to go.
> 

Thanks,
Ray
diff mbox series

Patch

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ec2e978a4e..918d11fbce 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -392,7 +392,7 @@  static void cf_check bar_write(
      * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
      * writes as long as the BAR is not mapped into the p2m.
      */
-    if ( bar->enabled )
+    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
     {
         /* If the value written is the current one avoid printing a warning. */
         if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )