diff mbox series

[RFC,XEN,v3,2/3] x86/pvh: Add (un)map_pirq and setup_gsi for PVH dom0

Message ID 20231210164009.1551147-3-Jiqian.Chen@amd.com (mailing list archive)
State Superseded
Headers show
Series Support device passthrough when dom0 is PVH on Xen | expand

Commit Message

Chen, Jiqian Dec. 10, 2023, 4:40 p.m. UTC
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
So, allow PHYSDEVOP_map_pirq when currd is dom0 no matter if
dom0 has X86_EMU_USE_PIRQ flag and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.

What's more, in PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
So, add PHYSDEVOP_setup_gsi for PVH dom0, because PVH dom0
will setup gsi during assigning a device to passthrough.

Co-developed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 xen/arch/x86/hvm/hypercall.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Roger Pau Monné Dec. 11, 2023, 3:31 p.m. UTC | #1
On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see
> xen_pt_realize->xc_physdev_map_pirq and
> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
> is not allowed because currd is PVH dom0 and PVH has no
> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
> So, allow PHYSDEVOP_map_pirq when currd is dom0 no matter if
> dom0 has X86_EMU_USE_PIRQ flag and also allow
> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
> 
> What's more, in PVH dom0, the gsis don't get registered, but
> the gsi of a passthrough device must be configured for it to
> be able to be mapped into a hvm domU.
> So, add PHYSDEVOP_setup_gsi for PVH dom0, because PVH dom0
> will setup gsi during assigning a device to passthrough.
> 
> Co-developed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  xen/arch/x86/hvm/hypercall.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 6ad5b4d5f1..621d789bd3 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( cmd )
>      {
> +    case PHYSDEVOP_setup_gsi:

I think given the new approach on the Linux side patches, where
pciback will configure the interrupt, there's no need to expose
setup_gsi anymore?

>      case PHYSDEVOP_map_pirq:
>      case PHYSDEVOP_unmap_pirq:
> +        if ( is_hardware_domain(currd) )
> +            break;

Also Jan already pointed this out in v2: this hypercall needs to be
limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
the hypercall if DOMID_SELF or the passed domid matches the current
domain domid.

Thanks, Roger.
Chen, Jiqian Dec. 12, 2023, 6:49 a.m. UTC | #2
On 2023/12/11 23:31, Roger Pau Monné wrote:
> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see
>> xen_pt_realize->xc_physdev_map_pirq and
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>> So, allow PHYSDEVOP_map_pirq when currd is dom0 no matter if
>> dom0 has X86_EMU_USE_PIRQ flag and also allow
>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
>>
>> What's more, in PVH dom0, the gsis don't get registered, but
>> the gsi of a passthrough device must be configured for it to
>> be able to be mapped into a hvm domU.
>> So, add PHYSDEVOP_setup_gsi for PVH dom0, because PVH dom0
>> will setup gsi during assigning a device to passthrough.
>>
>> Co-developed-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  xen/arch/x86/hvm/hypercall.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 6ad5b4d5f1..621d789bd3 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>      switch ( cmd )
>>      {
>> +    case PHYSDEVOP_setup_gsi:
> 
> I think given the new approach on the Linux side patches, where
> pciback will configure the interrupt, there's no need to expose
> setup_gsi anymore?
The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.

> 
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        if ( is_hardware_domain(currd) )
>> +            break;
> 
> Also Jan already pointed this out in v2: this hypercall needs to be
> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
> the hypercall if DOMID_SELF or the passed domid matches the current
> domain domid.
Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.

> 
> Thanks, Roger.
Jan Beulich Dec. 12, 2023, 9:30 a.m. UTC | #3
On 12.12.2023 07:49, Chen, Jiqian wrote:
> On 2023/12/11 23:31, Roger Pau Monné wrote:
>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>  
>>>      switch ( cmd )
>>>      {
>>> +    case PHYSDEVOP_setup_gsi:
>>
>> I think given the new approach on the Linux side patches, where
>> pciback will configure the interrupt, there's no need to expose
>> setup_gsi anymore?
> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
> 
>>
>>>      case PHYSDEVOP_map_pirq:
>>>      case PHYSDEVOP_unmap_pirq:
>>> +        if ( is_hardware_domain(currd) )
>>> +            break;
>>
>> Also Jan already pointed this out in v2: this hypercall needs to be
>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>> the hypercall if DOMID_SELF or the passed domid matches the current
>> domain domid.
> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.

And why exactly would it do specifically the map_pirq? (Even the setup_gsi
looks questionable to me, but there might be reasons there.)

Jan
Chen, Jiqian Dec. 13, 2023, 2:47 a.m. UTC | #4
On 2023/12/12 17:30, Jan Beulich wrote:
> On 12.12.2023 07:49, Chen, Jiqian wrote:
>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>  
>>>>      switch ( cmd )
>>>>      {
>>>> +    case PHYSDEVOP_setup_gsi:
>>>
>>> I think given the new approach on the Linux side patches, where
>>> pciback will configure the interrupt, there's no need to expose
>>> setup_gsi anymore?
>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>
>>>
>>>>      case PHYSDEVOP_map_pirq:
>>>>      case PHYSDEVOP_unmap_pirq:
>>>> +        if ( is_hardware_domain(currd) )
>>>> +            break;
>>>
>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>> domain domid.
>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
> 
> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
> looks questionable to me, but there might be reasons there.)
Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).

> 
> Jan
Jan Beulich Dec. 13, 2023, 7:03 a.m. UTC | #5
On 13.12.2023 03:47, Chen, Jiqian wrote:
> On 2023/12/12 17:30, Jan Beulich wrote:
>> On 12.12.2023 07:49, Chen, Jiqian wrote:
>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>  
>>>>>      switch ( cmd )
>>>>>      {
>>>>> +    case PHYSDEVOP_setup_gsi:
>>>>
>>>> I think given the new approach on the Linux side patches, where
>>>> pciback will configure the interrupt, there's no need to expose
>>>> setup_gsi anymore?
>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>>
>>>>
>>>>>      case PHYSDEVOP_map_pirq:
>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>> +        if ( is_hardware_domain(currd) )
>>>>> +            break;
>>>>
>>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>>> domain domid.
>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
>>
>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
>> looks questionable to me, but there might be reasons there.)
> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).

And it was previously made pretty clear by Roger, I think, that doing a "map"
just for the purpose of granting permission is, well, at best a temporary
workaround in the early development phase. If there's presently no hypercall
to _only_ grant permission to IRQ, we need to add one. In fact "map" would
likely better not have done two things at a time from the very beginning ...

Jan
Chen, Jiqian Dec. 14, 2023, 8:55 a.m. UTC | #6
On 2023/12/13 15:03, Jan Beulich wrote:
> On 13.12.2023 03:47, Chen, Jiqian wrote:
>> On 2023/12/12 17:30, Jan Beulich wrote:
>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>  
>>>>>>      switch ( cmd )
>>>>>>      {
>>>>>> +    case PHYSDEVOP_setup_gsi:
>>>>>
>>>>> I think given the new approach on the Linux side patches, where
>>>>> pciback will configure the interrupt, there's no need to expose
>>>>> setup_gsi anymore?
>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>>>
>>>>>
>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>> +        if ( is_hardware_domain(currd) )
>>>>>> +            break;
>>>>>
>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>>>> domain domid.
>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
>>>
>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
>>> looks questionable to me, but there might be reasons there.)
>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
> 
> And it was previously made pretty clear by Roger, I think, that doing a "map"
> just for the purpose of granting permission is, well, at best a temporary
> workaround in the early development phase. If there's presently no hypercall
> to _only_ grant permission to IRQ, we need to add one.
Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
So, it seems the logic of XEN_DOMCTL_irq_permission is not suitable when PVH dom0? Maybe it directly needs to get irq from the caller(domU) instead of "current" if the "current" has no PIRQ flag?

> In fact "map" would likely better not have done two things at a time from the very beginning ...
> 
> Jan
Jan Beulich Dec. 14, 2023, 9:17 a.m. UTC | #7
On 14.12.2023 09:55, Chen, Jiqian wrote:
> On 2023/12/13 15:03, Jan Beulich wrote:
>> On 13.12.2023 03:47, Chen, Jiqian wrote:
>>> On 2023/12/12 17:30, Jan Beulich wrote:
>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>  
>>>>>>>      switch ( cmd )
>>>>>>>      {
>>>>>>> +    case PHYSDEVOP_setup_gsi:
>>>>>>
>>>>>> I think given the new approach on the Linux side patches, where
>>>>>> pciback will configure the interrupt, there's no need to expose
>>>>>> setup_gsi anymore?
>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>>>>
>>>>>>
>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>> +        if ( is_hardware_domain(currd) )
>>>>>>> +            break;
>>>>>>
>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>>>>> domain domid.
>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
>>>>
>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
>>>> looks questionable to me, but there might be reasons there.)
>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
>>
>> And it was previously made pretty clear by Roger, I think, that doing a "map"
>> just for the purpose of granting permission is, well, at best a temporary
>> workaround in the early development phase. If there's presently no hypercall
>> to _only_ grant permission to IRQ, we need to add one.
> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.

Hmm, yes and no. May I turn your attention to
https://lists.xen.org/archives/html/xen-devel/2023-07/msg02056.html
and its earlier version
https://lists.xen.org/archives/html/xen-devel/2023-05/msg00301.html
(it's imo a shame that this series continues to be stuck)?

Both make pretty clear that without pIRQ, this domctl cannot be used in
its present shape anyway, for ...

> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.

... this very reason. Addressing this one way or another is a necessary
part of making passthrough work with PVH Dom0. So _effectively_ there is
no hypercall allowing PVH Dom0 to grant IRQ permission.

> So, it seems the logic of XEN_DOMCTL_irq_permission is not suitable when PVH dom0?

That's my view, yes.

> Maybe it directly needs to get irq from the caller(domU) instead of "current" if the "current" has no PIRQ flag?

I don't think the IRQ mapping in the DomU is necessary to be known here.
What we want to grant is access to a host resource. That host resource is
therefore all that should need specifying for the operation to be carried
out. It just so happens that a PV Dom0 would specify the host IRQ by way
of supplying its own equivalent pIRQ.

Things are more "interesting" for MSI, though: The (Xen) IRQ may not be
known early enough. There wants to be a way of indicating that when such
an IRQ is created, permission should be granted to the domain that is
going to use that IRQ (by way of being assigned the respective device).
(This aspect may be part of why "map" presently also grants permission,
yet I continue to think that was wrong from the start. The more that
access there is [likely needlessly] granted to the domain requesting the
mapping, just for it to then further grant access to the DomU.)

Jan
Roger Pau Monné Dec. 14, 2023, 9:55 a.m. UTC | #8
On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
> On 2023/12/13 15:03, Jan Beulich wrote:
> > On 13.12.2023 03:47, Chen, Jiqian wrote:
> >> On 2023/12/12 17:30, Jan Beulich wrote:
> >>> On 12.12.2023 07:49, Chen, Jiqian wrote:
> >>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
> >>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
> >>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>>  
> >>>>>>      switch ( cmd )
> >>>>>>      {
> >>>>>> +    case PHYSDEVOP_setup_gsi:
> >>>>>
> >>>>> I think given the new approach on the Linux side patches, where
> >>>>> pciback will configure the interrupt, there's no need to expose
> >>>>> setup_gsi anymore?
> >>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
> >>>>
> >>>>>
> >>>>>>      case PHYSDEVOP_map_pirq:
> >>>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>>> +        if ( is_hardware_domain(currd) )
> >>>>>> +            break;
> >>>>>
> >>>>> Also Jan already pointed this out in v2: this hypercall needs to be
> >>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
> >>>>> the hypercall if DOMID_SELF or the passed domid matches the current
> >>>>> domain domid.
> >>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
> >>>
> >>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
> >>> looks questionable to me, but there might be reasons there.)
> >> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
> >> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
> > 
> > And it was previously made pretty clear by Roger, I think, that doing a "map"
> > just for the purpose of granting permission is, well, at best a temporary
> > workaround in the early development phase. If there's presently no hypercall
> > to _only_ grant permission to IRQ, we need to add one.
> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.

One way to bodge this would be to detect whether the caller of
XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
assume the pirq field is a GSI.  I'm unsure however how that will work
with non-x86 architectures.

It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
maybe XEN_DOMCTL_intr_permission that can take a struct we can use to
accommodate GSIs and other arch specific interrupt identifiers.

I'm also wondering whether the hypercall should be in a stable
interface so it could be easily used from QEMU if needed.

> So, it seems the logic of XEN_DOMCTL_irq_permission is not suitable when PVH dom0? Maybe it directly needs to get irq from the caller(domU) instead of "current" if the "current" has no PIRQ flag?

Hm, I'm kind of confused by this last sentence, as you mention "the
caller(domU)".  The caller of XEN_DOMCTL_irq_permission will always be
dom0 or the hardware domain.

Thanks, Roger.
Jan Beulich Dec. 14, 2023, 9:58 a.m. UTC | #9
On 14.12.2023 10:55, Roger Pau Monné wrote:
> On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
>> On 2023/12/13 15:03, Jan Beulich wrote:
>>> On 13.12.2023 03:47, Chen, Jiqian wrote:
>>>> On 2023/12/12 17:30, Jan Beulich wrote:
>>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
>>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>  
>>>>>>>>      switch ( cmd )
>>>>>>>>      {
>>>>>>>> +    case PHYSDEVOP_setup_gsi:
>>>>>>>
>>>>>>> I think given the new approach on the Linux side patches, where
>>>>>>> pciback will configure the interrupt, there's no need to expose
>>>>>>> setup_gsi anymore?
>>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>>>>>
>>>>>>>
>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>> +        if ( is_hardware_domain(currd) )
>>>>>>>> +            break;
>>>>>>>
>>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>>>>>> domain domid.
>>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
>>>>>
>>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
>>>>> looks questionable to me, but there might be reasons there.)
>>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
>>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
>>>
>>> And it was previously made pretty clear by Roger, I think, that doing a "map"
>>> just for the purpose of granting permission is, well, at best a temporary
>>> workaround in the early development phase. If there's presently no hypercall
>>> to _only_ grant permission to IRQ, we need to add one.
>> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
>> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
>> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
> 
> One way to bodge this would be to detect whether the caller of
> XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> assume the pirq field is a GSI.  I'm unsure however how that will work
> with non-x86 architectures.
> 
> It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> maybe XEN_DOMCTL_intr_permission that can take a struct we can use to
> accommodate GSIs and other arch specific interrupt identifiers.

How would you see MSI being handled then?

Jan
Roger Pau Monné Dec. 14, 2023, 10:06 a.m. UTC | #10
On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
> On 14.12.2023 10:55, Roger Pau Monné wrote:
> > On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
> >> On 2023/12/13 15:03, Jan Beulich wrote:
> >>> On 13.12.2023 03:47, Chen, Jiqian wrote:
> >>>> On 2023/12/12 17:30, Jan Beulich wrote:
> >>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
> >>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
> >>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
> >>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> >>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> >>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >>>>>>>>  
> >>>>>>>>      switch ( cmd )
> >>>>>>>>      {
> >>>>>>>> +    case PHYSDEVOP_setup_gsi:
> >>>>>>>
> >>>>>>> I think given the new approach on the Linux side patches, where
> >>>>>>> pciback will configure the interrupt, there's no need to expose
> >>>>>>> setup_gsi anymore?
> >>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
> >>>>>>
> >>>>>>>
> >>>>>>>>      case PHYSDEVOP_map_pirq:
> >>>>>>>>      case PHYSDEVOP_unmap_pirq:
> >>>>>>>> +        if ( is_hardware_domain(currd) )
> >>>>>>>> +            break;
> >>>>>>>
> >>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
> >>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
> >>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
> >>>>>>> domain domid.
> >>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
> >>>>>
> >>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
> >>>>> looks questionable to me, but there might be reasons there.)
> >>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
> >>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
> >>>
> >>> And it was previously made pretty clear by Roger, I think, that doing a "map"
> >>> just for the purpose of granting permission is, well, at best a temporary
> >>> workaround in the early development phase. If there's presently no hypercall
> >>> to _only_ grant permission to IRQ, we need to add one.
> >> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
> >> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
> >> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
> > 
> > One way to bodge this would be to detect whether the caller of
> > XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> > assume the pirq field is a GSI.  I'm unsure however how that will work
> > with non-x86 architectures.
> > 
> > It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> > maybe XEN_DOMCTL_intr_permission that can take a struct we can use to
> > accommodate GSIs and other arch specific interrupt identifiers.
> 
> How would you see MSI being handled then?

I wasn't really accounting for MSI here, as MSI is not handled by
XEN_DOMCTL_irq_permission now either.  My plan long term was to
introduce a new hypercall (part of dm_ops possibly) in order to be
able to bind MSI directly without having to 'map' it first.

Roger.
Stefano Stabellini Dec. 14, 2023, 10:49 p.m. UTC | #11
On Thu, 14 Dec 2023, Roger Pau Monné wrote:
> On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
> > On 14.12.2023 10:55, Roger Pau Monné wrote:
> > > On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
> > >> On 2023/12/13 15:03, Jan Beulich wrote:
> > >>> On 13.12.2023 03:47, Chen, Jiqian wrote:
> > >>>> On 2023/12/12 17:30, Jan Beulich wrote:
> > >>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
> > >>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
> > >>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
> > >>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> > >>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> > >>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> > >>>>>>>>  
> > >>>>>>>>      switch ( cmd )
> > >>>>>>>>      {
> > >>>>>>>> +    case PHYSDEVOP_setup_gsi:
> > >>>>>>>
> > >>>>>>> I think given the new approach on the Linux side patches, where
> > >>>>>>> pciback will configure the interrupt, there's no need to expose
> > >>>>>>> setup_gsi anymore?
> > >>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>      case PHYSDEVOP_map_pirq:
> > >>>>>>>>      case PHYSDEVOP_unmap_pirq:
> > >>>>>>>> +        if ( is_hardware_domain(currd) )
> > >>>>>>>> +            break;
> > >>>>>>>
> > >>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
> > >>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
> > >>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
> > >>>>>>> domain domid.
> > >>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
> > >>>>>
> > >>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
> > >>>>> looks questionable to me, but there might be reasons there.)
> > >>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
> > >>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
> > >>>
> > >>> And it was previously made pretty clear by Roger, I think, that doing a "map"
> > >>> just for the purpose of granting permission is, well, at best a temporary
> > >>> workaround in the early development phase. If there's presently no hypercall
> > >>> to _only_ grant permission to IRQ, we need to add one.
> > >> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
> > >> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
> > >> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
> > > 
> > > One way to bodge this would be to detect whether the caller of
> > > XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> > > assume the pirq field is a GSI.  I'm unsure however how that will work
> > > with non-x86 architectures.

PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
I expect RISC-V will be the same.


> > > It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or

"GSI" is another x86-only concept.

So actually the best name was indeed XEN_DOMCTL_irq_permission, given
that it is using the more arch-neutral "irq" terminology.

Perhaps it was always a mistake to pass PIRQs to
XEN_DOMCTL_irq_permission and we should always have passed the real
interrupt number (GSI on x86, SPI on ARM).

So your "bodge" is actually kind of OK in my opinion. Basically everyone
else (x86 HVM/PVH, ARM, RISC-V, probably PPC too) will use
XEN_DOMCTL_irq_permission with hardware interrupt numbers (GSIs, SPIs,
etc.), the only special case is x86 PV. It is x86 PV the odd one.

Given that DOMCTL is an unstable interface anyway, I feel OK making
changes to it, even better if backward compatible.
Chen, Jiqian Dec. 15, 2023, 7:20 a.m. UTC | #12
On 2023/12/15 06:49, Stefano Stabellini wrote:
> On Thu, 14 Dec 2023, Roger Pau Monné wrote:
>> On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
>>> On 14.12.2023 10:55, Roger Pau Monné wrote:
>>>> On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
>>>>> On 2023/12/13 15:03, Jan Beulich wrote:
>>>>>> On 13.12.2023 03:47, Chen, Jiqian wrote:
>>>>>>> On 2023/12/12 17:30, Jan Beulich wrote:
>>>>>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
>>>>>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
>>>>>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
>>>>>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>>>>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>>>>>  
>>>>>>>>>>>      switch ( cmd )
>>>>>>>>>>>      {
>>>>>>>>>>> +    case PHYSDEVOP_setup_gsi:
>>>>>>>>>>
>>>>>>>>>> I think given the new approach on the Linux side patches, where
>>>>>>>>>> pciback will configure the interrupt, there's no need to expose
>>>>>>>>>> setup_gsi anymore?
>>>>>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>      case PHYSDEVOP_map_pirq:
>>>>>>>>>>>      case PHYSDEVOP_unmap_pirq:
>>>>>>>>>>> +        if ( is_hardware_domain(currd) )
>>>>>>>>>>> +            break;
>>>>>>>>>>
>>>>>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
>>>>>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
>>>>>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
>>>>>>>>>> domain domid.
>>>>>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
>>>>>>>>
>>>>>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
>>>>>>>> looks questionable to me, but there might be reasons there.)
>>>>>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
>>>>>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
>>>>>>
>>>>>> And it was previously made pretty clear by Roger, I think, that doing a "map"
>>>>>> just for the purpose of granting permission is, well, at best a temporary
>>>>>> workaround in the early development phase. If there's presently no hypercall
>>>>>> to _only_ grant permission to IRQ, we need to add one.
>>>>> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
>>>>> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
>>>>> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
>>>>
>>>> One way to bodge this would be to detect whether the caller of
>>>> XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
>>>> assume the pirq field is a GSI.  I'm unsure however how that will work
>>>> with non-x86 architectures.
> 
> PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
> I expect RISC-V will be the same.
> 
> 
>>>> It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> 
> "GSI" is another x86-only concept.
> 
> So actually the best name was indeed XEN_DOMCTL_irq_permission, given
> that it is using the more arch-neutral "irq" terminology.
> 
> Perhaps it was always a mistake to pass PIRQs to
> XEN_DOMCTL_irq_permission and we should always have passed the real
> interrupt number (GSI on x86, SPI on ARM).
> 
> So your "bodge" is actually kind of OK in my opinion. Basically everyone
> else (x86 HVM/PVH, ARM, RISC-V, probably PPC too) will use
> XEN_DOMCTL_irq_permission with hardware interrupt numbers (GSIs, SPIs,
> etc.), the only special case is x86 PV. It is x86 PV the odd one.
> 
> Given that DOMCTL is an unstable interface anyway, I feel OK making
> changes to it, even better if backward compatible.
I try to understand your discussion about the modification of XEN_DOMCTL_irq_permission. At the xl level, gsi needs to be passed in instead of pirq, and then a judgment is added to XEN_DOMCTL_irq_permission, just like the implementation below?
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index d3507d13a029..f665d17afbf5 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
         goto out_no_irq;
     }
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+        int gsi = irq;
         r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
         if (r < 0) {
             LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1494,7 +1495,7 @@ static void pci_add_dm_done(libxl__egc *egc,
             rc = ERROR_FAIL;
             goto out;
         }
-        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+        r = xc_domain_irq_permission(ctx->xch, domid, gsi, 1);
         if (r < 0) {
             LOGED(ERROR, domainid,
                   "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index f5a71ee5f78d..782c4a7a70a4 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -658,7 +658,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             ret = -EINVAL;
             break;
         }
-        irq = pirq_access_permitted(current->domain, pirq);
+
+        if ( is_hvm_domain(current->domain) )
+            irq = pirq;
+        else
+            irq = pirq_access_permitted(current->domain, pirq);
+
         if ( !irq || xsm_irq_permission(XSM_HOOK, d, irq, allow) )
             ret = -EPERM;
         else if ( allow )
Roger Pau Monné Dec. 15, 2023, 8:21 a.m. UTC | #13
On Thu, Dec 14, 2023 at 02:49:18PM -0800, Stefano Stabellini wrote:
> On Thu, 14 Dec 2023, Roger Pau Monné wrote:
> > On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
> > > On 14.12.2023 10:55, Roger Pau Monné wrote:
> > > > On Thu, Dec 14, 2023 at 08:55:45AM +0000, Chen, Jiqian wrote:
> > > >> On 2023/12/13 15:03, Jan Beulich wrote:
> > > >>> On 13.12.2023 03:47, Chen, Jiqian wrote:
> > > >>>> On 2023/12/12 17:30, Jan Beulich wrote:
> > > >>>>> On 12.12.2023 07:49, Chen, Jiqian wrote:
> > > >>>>>> On 2023/12/11 23:31, Roger Pau Monné wrote:
> > > >>>>>>> On Mon, Dec 11, 2023 at 12:40:08AM +0800, Jiqian Chen wrote:
> > > >>>>>>>> --- a/xen/arch/x86/hvm/hypercall.c
> > > >>>>>>>> +++ b/xen/arch/x86/hvm/hypercall.c
> > > >>>>>>>> @@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> > > >>>>>>>>  
> > > >>>>>>>>      switch ( cmd )
> > > >>>>>>>>      {
> > > >>>>>>>> +    case PHYSDEVOP_setup_gsi:
> > > >>>>>>>
> > > >>>>>>> I think given the new approach on the Linux side patches, where
> > > >>>>>>> pciback will configure the interrupt, there's no need to expose
> > > >>>>>>> setup_gsi anymore?
> > > >>>>>> The latest patch(the second patch of v3 on kernel side) does setup_gsi and map_pirq for passthrough device in pciback, so we need this and below.
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>>      case PHYSDEVOP_map_pirq:
> > > >>>>>>>>      case PHYSDEVOP_unmap_pirq:
> > > >>>>>>>> +        if ( is_hardware_domain(currd) )
> > > >>>>>>>> +            break;
> > > >>>>>>>
> > > >>>>>>> Also Jan already pointed this out in v2: this hypercall needs to be
> > > >>>>>>> limited so a PVH dom0 cannot execute it against itself.  IOW: refuse
> > > >>>>>>> the hypercall if DOMID_SELF or the passed domid matches the current
> > > >>>>>>> domain domid.
> > > >>>>>> Yes, I remember Jan's suggestion, but since the latest patch(the second patch of v3 on kernel side) has change the implementation, it does setup_gsi and map_pirq for dom0 itself, so I didn't add the DOMID_SELF check.
> > > >>>>>
> > > >>>>> And why exactly would it do specifically the map_pirq? (Even the setup_gsi
> > > >>>>> looks questionable to me, but there might be reasons there.)
> > > >>>> Map_pirq is to solve the check failure problem. (pci_add_dm_done-> xc_domain_irq_permission-> XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0)
> > > >>>> Setup_gsi is because the gsi is never be unmasked, so the gsi is never be registered( vioapic_hwdom_map_gsi-> mp_register_gsi is never be called).
> > > >>>
> > > >>> And it was previously made pretty clear by Roger, I think, that doing a "map"
> > > >>> just for the purpose of granting permission is, well, at best a temporary
> > > >>> workaround in the early development phase. If there's presently no hypercall
> > > >>> to _only_ grant permission to IRQ, we need to add one.
> > > >> Could you please describe it in detail? Do you mean to add a new hypercall to grant irq access for dom0 or domU?
> > > >> It seems XEN_DOMCTL_irq_permission is the hypercall to grant irq access from dom0 to domU(see XEN_DOMCTL_irq_permission-> irq_permit_access). There is no need to add hypercall to grant irq access.
> > > >> We failed here (XEN_DOMCTL_irq_permission-> pirq_access_permitted->domain_pirq_to_irq->return irq is 0) is because the PVH dom0 didn't use PIRQ, so we can't get irq from pirq if "current" is PVH dom0.
> > > > 
> > > > One way to bodge this would be to detect whether the caller of
> > > > XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> > > > assume the pirq field is a GSI.  I'm unsure however how that will work
> > > > with non-x86 architectures.
> 
> PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
> I expect RISC-V will be the same.
> 
> 
> > > > It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> 
> "GSI" is another x86-only concept.

Yes, that hypercall would be x86-specific.

> So actually the best name was indeed XEN_DOMCTL_irq_permission, given
> that it is using the more arch-neutral "irq" terminology.
> 
> Perhaps it was always a mistake to pass PIRQs to
> XEN_DOMCTL_irq_permission and we should always have passed the real
> interrupt number (GSI on x86, SPI on ARM).

I really don't know much about Arm, but don't you also have LPIs, and
would need to add some kind of type field to
xen_domctl_irq_permission?

> So your "bodge" is actually kind of OK in my opinion. Basically everyone
> else (x86 HVM/PVH, ARM, RISC-V, probably PPC too) will use
> XEN_DOMCTL_irq_permission with hardware interrupt numbers (GSIs, SPIs,
> etc.), the only special case is x86 PV. It is x86 PV the odd one.

x86 PV could also pass the GSI if we wanted to change the interface
uniformly.  AFAICT the hypercall is only used by libxl, so would
likely be fine to change.

> Given that DOMCTL is an unstable interface anyway, I feel OK making
> changes to it, even better if backward compatible.

Me calling this a 'bodge' was mostly because I think it would be nice
to take the opportunity to move the hypercall to a stable
interface.

Thanks, Roger.
Jan Beulich Dec. 15, 2023, 8:24 a.m. UTC | #14
On 14.12.2023 23:49, Stefano Stabellini wrote:
> On Thu, 14 Dec 2023, Roger Pau Monné wrote:
>> On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
>>> On 14.12.2023 10:55, Roger Pau Monné wrote:
>>>> One way to bodge this would be to detect whether the caller of
>>>> XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
>>>> assume the pirq field is a GSI.  I'm unsure however how that will work
>>>> with non-x86 architectures.
> 
> PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
> I expect RISC-V will be the same.
> 
> 
>>>> It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> 
> "GSI" is another x86-only concept.

Just to mention it - going through the ACPI spec, this looks to be an
arch-neutral ACPI term. It is also used in places which to me look
pretty Arm-centric.

Jan

> So actually the best name was indeed XEN_DOMCTL_irq_permission, given
> that it is using the more arch-neutral "irq" terminology.
> 
> Perhaps it was always a mistake to pass PIRQs to
> XEN_DOMCTL_irq_permission and we should always have passed the real
> interrupt number (GSI on x86, SPI on ARM).
> 
> So your "bodge" is actually kind of OK in my opinion. Basically everyone
> else (x86 HVM/PVH, ARM, RISC-V, probably PPC too) will use
> XEN_DOMCTL_irq_permission with hardware interrupt numbers (GSIs, SPIs,
> etc.), the only special case is x86 PV. It is x86 PV the odd one.
> 
> Given that DOMCTL is an unstable interface anyway, I feel OK making
> changes to it, even better if backward compatible.
Roger Pau Monné Dec. 15, 2023, 8:29 a.m. UTC | #15
On Fri, Dec 15, 2023 at 07:20:24AM +0000, Chen, Jiqian wrote:
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index d3507d13a029..f665d17afbf5 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>          goto out_no_irq;
>      }
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +        int gsi = irq;
>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>          if (r < 0) {
>              LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
> @@ -1494,7 +1495,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>              rc = ERROR_FAIL;
>              goto out;
>          }
> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +        r = xc_domain_irq_permission(ctx->xch, domid, gsi, 1);
>          if (r < 0) {
>              LOGED(ERROR, domainid,
>                    "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index f5a71ee5f78d..782c4a7a70a4 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -658,7 +658,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>              ret = -EINVAL;
>              break;
>          }
> -        irq = pirq_access_permitted(current->domain, pirq);
> +
> +        if ( is_hvm_domain(current->domain) )
> +            irq = pirq;
> +        else
> +            irq = pirq_access_permitted(current->domain, pirq);

You are dropping an irq_access_permitted() check here for the HVM
case, as pirq_access_permitted() translates from pirq to irq and also
checks for permissions.

This would need to be something along the lines of:

irq = 0;
if ( is_hvm_domain(current->domain) &&
     irq_access_permitted(current->domain, pirq) )
    irq = pirq;
else
    irq = pirq_access_permitted(current->domain, pirq);

And then I wonder whether it wouldn't be best to uniformly use a GSI
for both PV and HVM.

Thanks, Roger.
Roger Pau Monné Dec. 15, 2023, 8:40 a.m. UTC | #16
On Fri, Dec 15, 2023 at 09:24:22AM +0100, Jan Beulich wrote:
> On 14.12.2023 23:49, Stefano Stabellini wrote:
> > On Thu, 14 Dec 2023, Roger Pau Monné wrote:
> >> On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
> >>> On 14.12.2023 10:55, Roger Pau Monné wrote:
> >>>> One way to bodge this would be to detect whether the caller of
> >>>> XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> >>>> assume the pirq field is a GSI.  I'm unsure however how that will work
> >>>> with non-x86 architectures.
> > 
> > PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
> > I expect RISC-V will be the same.
> > 
> > 
> >>>> It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> > 
> > "GSI" is another x86-only concept.
> 
> Just to mention it - going through the ACPI spec, this looks to be an
> arch-neutral ACPI term. It is also used in places which to me look
> pretty Arm-centric.

Oh, indeed, they have retrofitted GSI(V?) for Arm also, as a way to have a
"flat" uniform interrupt space.  So I guess Arm would also need the
GSI type, unless the translation from GSI to SPI or whatever platform
interrupt type is done by the guest and Xen is completely agnostic to
GSIs (if that's even possible).

Thanks, Roger.
Stefano Stabellini Dec. 15, 2023, 9:01 p.m. UTC | #17
On Fri, 15 Dec 2023, Roger Pau Monné wrote:
> On Fri, Dec 15, 2023 at 09:24:22AM +0100, Jan Beulich wrote:
> > On 14.12.2023 23:49, Stefano Stabellini wrote:
> > > On Thu, 14 Dec 2023, Roger Pau Monné wrote:
> > >> On Thu, Dec 14, 2023 at 10:58:24AM +0100, Jan Beulich wrote:
> > >>> On 14.12.2023 10:55, Roger Pau Monné wrote:
> > >>>> One way to bodge this would be to detect whether the caller of
> > >>>> XEN_DOMCTL_irq_permission is a PV or an HVM domain, and in case of HVM
> > >>>> assume the pirq field is a GSI.  I'm unsure however how that will work
> > >>>> with non-x86 architectures.
> > > 
> > > PIRQ is an x86-only concept. We have event channels but no PIRQs on ARM.
> > > I expect RISC-V will be the same.
> > > 
> > > 
> > >>>> It would  be better to introduce a new XEN_DOMCTL_gsi_permission, or
> > > 
> > > "GSI" is another x86-only concept.
> > 
> > Just to mention it - going through the ACPI spec, this looks to be an
> > arch-neutral ACPI term. It is also used in places which to me look
> > pretty Arm-centric.
> 
> Oh, indeed, they have retrofitted GSI(V?) for Arm also, as a way to have a
> "flat" uniform interrupt space.

Interesting, and I am not surprised. (I don't usually work with ACPI on
ARM because none of our boards come with ACPI, they are all Device
Tree.)


> So I guess Arm would also need the
> GSI type, unless the translation from GSI to SPI or whatever platform
> interrupt type is done by the guest and Xen is completely agnostic to
> GSIs (if that's even possible).

I am guessing that GSIs on ARM must be mapped 1:1 to SPIs otherwise we
would have severe inconsistencies between ACPI and DeviceTree booting
and some boards support both.

Also to answer your question about LPIs: those are MSIs on ARM.
Chen, Jiqian Dec. 18, 2023, 3:25 a.m. UTC | #18
On 2023/12/15 16:29, Roger Pau Monné wrote:
> On Fri, Dec 15, 2023 at 07:20:24AM +0000, Chen, Jiqian wrote:
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index d3507d13a029..f665d17afbf5 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>          goto out_no_irq;
>>      }
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> +        int gsi = irq;
>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>          if (r < 0) {
>>              LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
>> @@ -1494,7 +1495,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>              rc = ERROR_FAIL;
>>              goto out;
>>          }
>> -        r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> +        r = xc_domain_irq_permission(ctx->xch, domid, gsi, 1);
>>          if (r < 0) {
>>              LOGED(ERROR, domainid,
>>                    "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index f5a71ee5f78d..782c4a7a70a4 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -658,7 +658,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>              ret = -EINVAL;
>>              break;
>>          }
>> -        irq = pirq_access_permitted(current->domain, pirq);
>> +
>> +        if ( is_hvm_domain(current->domain) )
>> +            irq = pirq;
>> +        else
>> +            irq = pirq_access_permitted(current->domain, pirq);
> 
> You are dropping an irq_access_permitted() check here for the HVM
> case, as pirq_access_permitted() translates from pirq to irq and also
> checks for permissions.
> 
> This would need to be something along the lines of:
> 
> irq = 0;
> if ( is_hvm_domain(current->domain) &&
>      irq_access_permitted(current->domain, pirq) )
Oh, yes, it should add this check.

>     irq = pirq;
> else
>     irq = pirq_access_permitted(current->domain, pirq);
> 
> And then I wonder whether it wouldn't be best to uniformly use a GSI
> for both PV and HVM.
If we only look at the value(seems the number of gsi == pirq == irq in PV), it seems that gsi can also be used uniformly for PV.
And then here should be. 
if ( irq_access_permitted(current->domain, pirq) )
	irq = pirq;
else
{
	ret = -EPERM;
	break;
}

> 
> Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f1..621d789bd3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -72,8 +72,11 @@  long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd )
     {
+    case PHYSDEVOP_setup_gsi:
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        if ( is_hardware_domain(currd) )
+            break;
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq: