diff mbox series

[XEN,v10,3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

Message ID 20240617090035.839640-4-Jiqian.Chen@amd.com (mailing list archive)
State Superseded
Headers show
Series Support device passthrough when dom0 is PVH on Xen | expand

Commit Message

Chen, Jiqian June 17, 2024, 9 a.m. UTC
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
The code link that will call this hypercall on linux kernel side is as follows:
https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jan Beulich June 17, 2024, 2:52 p.m. UTC | #1
On 17.06.2024 11:00, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis don't get registered, it causes
> the info of apic, pin and irq not be added into irq_2_pin list,
> and the handler of irq_desc is not set, then when passthrough a
> device, setting ioapic affinity and vector will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> The code link that will call this hypercall on linux kernel side is as follows:
> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/

One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
one here:
"As to GSIs not being registered: If that's not a problem for Dom0's own
 operation, I think it'll also want/need explaining why what is sufficient for
 Dom0 alone isn't sufficient when pass-through comes into play."

Jan
Chen, Jiqian June 18, 2024, 6:57 a.m. UTC | #2
On 2024/6/17 22:52, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> The code link that will call this hypercall on linux kernel side is as follows:
>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
> 
> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
> one here:
> "As to GSIs not being registered: If that's not a problem for Dom0's own
>  operation, I think it'll also want/need explaining why what is sufficient for
>  Dom0 alone isn't sufficient when pass-through comes into play."
I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
" it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
What description do you want me to add?

> 
> Jan
>
Jan Beulich June 18, 2024, 8:55 a.m. UTC | #3
On 18.06.2024 08:57, Chen, Jiqian wrote:
> On 2024/6/17 22:52, Jan Beulich wrote:
>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>> The gsi of a passthrough device must be configured for it to be
>>> able to be mapped into a hvm domU.
>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>> and the handler of irq_desc is not set, then when passthrough a
>>> device, setting ioapic affinity and vector will fail.
>>>
>>> To fix above problem, on Linux kernel side, a new code will
>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>> register gsi when dom0 is PVH.
>>>
>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>> purpose.
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>> The code link that will call this hypercall on linux kernel side is as follows:
>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>
>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>> one here:
>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>  operation, I think it'll also want/need explaining why what is sufficient for
>>  Dom0 alone isn't sufficient when pass-through comes into play."
> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
> What description do you want me to add?

What I'd first like to have clarification on (i.e. before putting it in
the description one way or another): How come Dom0 alone gets away fine
without making the call, yet for passthrough-to-DomU it's needed? Is it
perhaps that it just so happened that for Dom0 things have been working
on systems where it was tested, but the call should in principle have been
there in this case, too [1]? That (to me at least) would make quite a
difference for both this patch's description and us accepting it.

Jan

[1] Alternative e.g. being that because of other actions PVH Dom0 takes,
like the IO-APIC RTE programming it does for IRQs it wants to use for
itself, the necessary information is already suitably conveyed to Xen in
that case. In such a case imo it's relevant to mention in the description.
Not the least because iirc the pciback driver sets up a fake IRQ handler
in such cases, which ought to lead to similar IO-APIC RTE programming, at
which point the question would again arise why the hypercall needs
exposing.
Chen, Jiqian June 19, 2024, 7:53 a.m. UTC | #4
On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> The gsi of a passthrough device must be configured for it to be
>>>> able to be mapped into a hvm domU.
>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>> and the handler of irq_desc is not set, then when passthrough a
>>>> device, setting ioapic affinity and vector will fail.
>>>>
>>>> To fix above problem, on Linux kernel side, a new code will
>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>> register gsi when dom0 is PVH.
>>>>
>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>> purpose.
>>>>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> ---
>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>> What description do you want me to add?
> 
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
Oh, I think I know what's your concern now. Thanks.
First question, why gsi of device can work on PVH dom0:
Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
Second question, why gsi of passthrough can't work on PVH dom0:
Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

> 
> Jan
> 
> [1] Alternative e.g. being that because of other actions PVH Dom0 takes,
> like the IO-APIC RTE programming it does for IRQs it wants to use for
> itself, the necessary information is already suitably conveyed to Xen in
> that case. In such a case imo it's relevant to mention in the description.
> Not the least because iirc the pciback driver sets up a fake IRQ handler
> in such cases, which ought to lead to similar IO-APIC RTE programming, at
> which point the question would again arise why the hypercall needs
> exposing.
Jan Beulich June 19, 2024, 8:06 a.m. UTC | #5
On 19.06.2024 09:53, Chen, Jiqian wrote:
> On 2024/6/18 16:55, Jan Beulich wrote:
>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>> The gsi of a passthrough device must be configured for it to be
>>>>> able to be mapped into a hvm domU.
>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>> device, setting ioapic affinity and vector will fail.
>>>>>
>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>> register gsi when dom0 is PVH.
>>>>>
>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>> purpose.
>>>>>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> ---
>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>
>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>> one here:
>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>> What description do you want me to add?
>>
>> What I'd first like to have clarification on (i.e. before putting it in
>> the description one way or another): How come Dom0 alone gets away fine
>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>> perhaps that it just so happened that for Dom0 things have been working
>> on systems where it was tested, but the call should in principle have been
>> there in this case, too [1]? That (to me at least) would make quite a
>> difference for both this patch's description and us accepting it.
> Oh, I think I know what's your concern now. Thanks.
> First question, why gsi of device can work on PVH dom0:
> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
> Second question, why gsi of passthrough can't work on PVH dom0:
> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.

And why exactly would the fake IRQ handler not be set up by pciback? Its
setting up ought to lead to those same IO-APIC RTE writes that Xen
intercepts.

In any event, imo a summary of the above wants to be part of the patch
description.

Jan
Chen, Jiqian June 19, 2024, 8:51 a.m. UTC | #6
On 2024/6/19 16:06, Jan Beulich wrote:
> On 19.06.2024 09:53, Chen, Jiqian wrote:
>> On 2024/6/18 16:55, Jan Beulich wrote:
>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>> able to be mapped into a hvm domU.
>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>
>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>> register gsi when dom0 is PVH.
>>>>>>
>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>> purpose.
>>>>>>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> ---
>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>
>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>> one here:
>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>> What description do you want me to add?
>>>
>>> What I'd first like to have clarification on (i.e. before putting it in
>>> the description one way or another): How come Dom0 alone gets away fine
>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>> perhaps that it just so happened that for Dom0 things have been working
>>> on systems where it was tested, but the call should in principle have been
>>> there in this case, too [1]? That (to me at least) would make quite a
>>> difference for both this patch's description and us accepting it.
>> Oh, I think I know what's your concern now. Thanks.
>> First question, why gsi of device can work on PVH dom0:
>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>> Second question, why gsi of passthrough can't work on PVH dom0:
>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
> 
> And why exactly would the fake IRQ handler not be set up by pciback? Its
> setting up ought to lead to those same IO-APIC RTE writes that Xen
> intercepts.
Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

> 
> In any event, imo a summary of the above wants to be part of the patch
> description.
OK, will add into the commit message in next version.

> 
> Jan
Jan Beulich June 19, 2024, 9:49 a.m. UTC | #7
On 19.06.2024 10:51, Chen, Jiqian wrote:
> On 2024/6/19 16:06, Jan Beulich wrote:
>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>> able to be mapped into a hvm domU.
>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>
>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>> register gsi when dom0 is PVH.
>>>>>>>
>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>> purpose.
>>>>>>>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>> ---
>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>
>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>> one here:
>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>> What description do you want me to add?
>>>>
>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>> the description one way or another): How come Dom0 alone gets away fine
>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>> perhaps that it just so happened that for Dom0 things have been working
>>>> on systems where it was tested, but the call should in principle have been
>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>> difference for both this patch's description and us accepting it.
>>> Oh, I think I know what's your concern now. Thanks.
>>> First question, why gsi of device can work on PVH dom0:
>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>
>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>> intercepts.
> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.

I'm afraid I don't follow you here. Quoting from the function:

	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;

I.e. we bail if the request was to _disable_ and there is no ISR.

I can't exclude though that command_write()'s logic to set ->enable_intx
is insufficient. But in the common case one would surely expect at least
one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
IOW at some point I'd expect xen_pcibk_control_isr() to be called with
the second argument 0 and with ->enable_intx set.

> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.

The sysfs interface is, according to my reading of the description
of the commit introducing it, merely for debugging / recovery purposes.
(It also looks to me as if this was partly broken: If one would use it,
thus clearing ->isr_on, a subsequent disable request would take exactly
that early bailing path quoted above, with nothing removing the IRQ
handler.)

That description also talks about both an edge vs level distinction in
behavior and one for shared vs non-shared, but neither in that commit
nor in present code I can spot any respective checks. Otherwise I could
understand that there are cases where the necessary information isn't
propagated to Xen.

Jan
Chen, Jiqian June 19, 2024, 10:10 a.m. UTC | #8
On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>>> able to be mapped into a hvm domU.
>>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>>
>>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>>> register gsi when dom0 is PVH.
>>>>>>>>
>>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>>>> ---
>>>>>>>> The code link that will call this hypercall on linux kernel side is as follows:
>>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-Jiqian.Chen@amd.com/
>>>>>>>
>>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
>>>>>>> one here:
>>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>>  operation, I think it'll also want/need explaining why what is sufficient for
>>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>>> I have modified the commit message to describe why GSIs are not registered can cause passthrough not work, according to this v9 comment.
>>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, and the handler of irq_desc is not set, then when passthrough a device, setting ioapic affinity and vector will fail."
>>>>>> What description do you want me to add?
>>>>>
>>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>>> the description one way or another): How come Dom0 alone gets away fine
>>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>>> perhaps that it just so happened that for Dom0 things have been working
>>>>> on systems where it was tested, but the call should in principle have been
>>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>>> difference for both this patch's description and us accepting it.
>>>> Oh, I think I know what's your concern now. Thanks.
>>>> First question, why gsi of device can work on PVH dom0:
>>>> Because when probe a driver to a normal device, it will call linux kernel side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>>> Because when assign a device to be passthrough, it uses pciback to probe the device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE writes that Xen
>>> intercepts.
>> Because isr_on is not set, when xen_pcibk_control_isr is called, it will return due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
> 
> I'm afraid I don't follow you here. Quoting from the function:
> 
> 	enable =  dev_data->enable_intx;
> 
> 	/* Asked to disable, but ISR isn't runnig */
> 	if (!enable && !dev_data->isr_on)
> 		return;
> 
> I.e. we bail if the request was to _disable_ and there is no ISR.
I mean, after debugging the pcistub_probe callstack:
pcistub_seize-> pcistub_init_device-> xen_pcibk_reset_device-> xen_pcibk_control_isr(dev, 1 /*reset device*/)
and in xen_pcibk_control_isr code:
	if (reset) {
		dev_data->enable_intx = 0;
		dev_data->ack_intr = 0;
	}
	enable =  dev_data->enable_intx;

	/* Asked to disable, but ISR isn't runnig */
	if (!enable && !dev_data->isr_on)
		return;
"reset" is 1, so "dev_date->enable_intx" is set 0, then "enable" is 0, and then arrive the second "if" check, "enable" is 0 and "dev_date->isr_on" is also 0, so it return here.
It can't reach following code to install irq handle.

> 
> I can't exclude though that command_write()'s logic to set ->enable_intx
> is insufficient. But in the common case one would surely expect at least
> one of PCI_COMMAND_MEMORY and PCI_COMMAND_IO to be set first by a guest.
> IOW at some point I'd expect xen_pcibk_control_isr() to be called with
> the second argument 0 and with ->enable_intx set.
I only see xen_pcibk_control_isr(dev, 0) is called in xen_pcibk_do_one_op, but xen_pcibk_do_one_op is not called during assigning a device to be passthrough.

> 
>> And it seems isr_on is set through driver sysfs " irq_handler_state" for a level device that is to be shared with guest and the IRQ is shared with the initial domain.
> 
> The sysfs interface is, according to my reading of the description
> of the commit introducing it, merely for debugging / recovery purposes.
> (It also looks to me as if this was partly broken: If one would use it,
> thus clearing ->isr_on, a subsequent disable request would take exactly
> that early bailing path quoted above, with nothing removing the IRQ
> handler.)
> 
> That description also talks about both an edge vs level distinction in
> behavior and one for shared vs non-shared, but neither in that commit
> nor in present code I can spot any respective checks. Otherwise I could
> understand that there are cases where the necessary information isn't
> propagated to Xen.
> 
> Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@  long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ENOSYS;
         break;
 
+    case PHYSDEVOP_setup_gsi:
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove: