diff mbox series

[v4] x86/xen: Add support for HVMOP_set_evtchn_upcall_vector

Message ID 20220729070416.23306-1-jane.malalane@citrix.com (mailing list archive)
State Accepted
Commit b1c3497e604ddccea5af4071831ed0e4680fb35e
Headers show
Series [v4] x86/xen: Add support for HVMOP_set_evtchn_upcall_vector | expand

Commit Message

Jane Malalane July 29, 2022, 7:04 a.m. UTC
Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in
order to set the per-vCPU event channel vector callback on Linux and
use it in preference of HVM_PARAM_CALLBACK_IRQ.

If the per-VCPU vector setup is successful on BSP, use this method
for the APs. If not, fallback to the global vector-type callback.

Also register callback_irq at per-vCPU event channel setup to trick
toolstack to think the domain is enlightened.

Suggested-by: "Roger Pau Monné" <roger.pau@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: x86@kernel.org
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Maximilian Heyne <mheyne@amazon.de>
CC: xen-devel@lists.xenproject.org

v4:
 * amend code comment

v3:
 * comment style
 * add comment on toolstack trick
 * remove unnecessary variable and function call
 * surround x86-specific code with #ifdef

v2:
 * remove no_vector_callback
 * make xen_have_vector_callback a bool
 * rename xen_ack_upcall to xen_percpu_upcall
 * fail to bring CPU up on init instead of crashing kernel
 * add and use xen_set_upcall_vector where suitable
 * xen_setup_upcall_vector -> xen_init_setup_upcall_vector for clarity
---
 arch/x86/include/asm/xen/cpuid.h   |  2 ++
 arch/x86/include/asm/xen/events.h  |  3 ++-
 arch/x86/xen/enlighten.c           |  2 +-
 arch/x86/xen/enlighten_hvm.c       | 24 ++++++++++++-----
 arch/x86/xen/suspend_hvm.c         | 10 ++++++-
 drivers/xen/events/events_base.c   | 53 +++++++++++++++++++++++++++++++++-----
 include/xen/hvm.h                  |  2 ++
 include/xen/interface/hvm/hvm_op.h | 19 ++++++++++++++
 8 files changed, 100 insertions(+), 15 deletions(-)

Comments

Jürgen Groß Aug. 14, 2022, 8:37 a.m. UTC | #1
On 29.07.22 09:04, Jane Malalane wrote:
> Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in
> order to set the per-vCPU event channel vector callback on Linux and
> use it in preference of HVM_PARAM_CALLBACK_IRQ.
> 
> If the per-VCPU vector setup is successful on BSP, use this method
> for the APs. If not, fallback to the global vector-type callback.
> 
> Also register callback_irq at per-vCPU event channel setup to trick
> toolstack to think the domain is enlightened.
> 
> Suggested-by: "Roger Pau Monné" <roger.pau@citrix.com>
> Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Pushed to xen/tip.git for-linus-6.0


Juergen
Jan Beulich Nov. 3, 2022, 1:38 p.m. UTC | #2
On 29.07.2022 09:04, Jane Malalane wrote:
> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>  {
>  	struct pt_regs *old_regs = set_irq_regs(regs);
>  
> +	if (xen_percpu_upcall)
> +		ack_APIC_irq();
> +
>  	inc_irq_stat(irq_hv_callback_count);
>  
>  	xen_hvm_evtchn_do_upcall();
> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>  	if (!xen_have_vector_callback)
>  		return 0;
>  
> +	if (xen_percpu_upcall) {
> +		rc = xen_set_upcall_vector(cpu);

From all I can tell at least for APs this happens before setup_local_apic().
With there being APIC interaction in this operation mode, as seen e.g. in
the earlier hunk above, I think this is logically wrong. And it leads to
apic_pending_intr_clear() issuing its warning: The vector registration, as
an intentional side effect, marks the vector as pending. Unless IRQs were
enabled at any point between the registration and the check, there's
simply no way for the corresponding IRR bit to be dealt with (by
propagating to ISR when the interrupt is delivered, and then being cleared
from ISR by EOI).

As a note to x86 maintainers: This comment there

	/*
	 * If the ISR map is not empty. ACK the APIC and run another round
	 * to verify whether a pending IRR has been unblocked and turned
	 * into a ISR.
	 */

is pretty clearly wrong - with IRQs disabled there's no way for a pending
IRR bit to be "unblocked and turned into a ISR" one. And even if IRQs
were enabled TPR would still prevent the handling of any bits potentially
set in the 16....31 range.

Jan
Jan Beulich Nov. 3, 2022, 3:41 p.m. UTC | #3
On 03.11.2022 14:38, Jan Beulich wrote:
> On 29.07.2022 09:04, Jane Malalane wrote:
>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>  {
>>  	struct pt_regs *old_regs = set_irq_regs(regs);
>>  
>> +	if (xen_percpu_upcall)
>> +		ack_APIC_irq();
>> +
>>  	inc_irq_stat(irq_hv_callback_count);
>>  
>>  	xen_hvm_evtchn_do_upcall();
>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>  	if (!xen_have_vector_callback)
>>  		return 0;
>>  
>> +	if (xen_percpu_upcall) {
>> +		rc = xen_set_upcall_vector(cpu);
> 
> From all I can tell at least for APs this happens before setup_local_apic().
> With there being APIC interaction in this operation mode, as seen e.g. in
> the earlier hunk above, I think this is logically wrong. And it leads to
> apic_pending_intr_clear() issuing its warning: The vector registration, as
> an intentional side effect, marks the vector as pending. Unless IRQs were
> enabled at any point between the registration and the check, there's
> simply no way for the corresponding IRR bit to be dealt with (by
> propagating to ISR when the interrupt is delivered, and then being cleared
> from ISR by EOI).

With Roger's help I now have a pointer to osstest also exposing the issue:

http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz

Jan
Jan Beulich Nov. 8, 2022, 4:26 p.m. UTC | #4
On 03.11.2022 16:41, Jan Beulich wrote:
> On 03.11.2022 14:38, Jan Beulich wrote:
>> On 29.07.2022 09:04, Jane Malalane wrote:
>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>  {
>>>  	struct pt_regs *old_regs = set_irq_regs(regs);
>>>  
>>> +	if (xen_percpu_upcall)
>>> +		ack_APIC_irq();
>>> +
>>>  	inc_irq_stat(irq_hv_callback_count);
>>>  
>>>  	xen_hvm_evtchn_do_upcall();
>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>  	if (!xen_have_vector_callback)
>>>  		return 0;
>>>  
>>> +	if (xen_percpu_upcall) {
>>> +		rc = xen_set_upcall_vector(cpu);
>>
>> From all I can tell at least for APs this happens before setup_local_apic().
>> With there being APIC interaction in this operation mode, as seen e.g. in
>> the earlier hunk above, I think this is logically wrong. And it leads to
>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>> an intentional side effect, marks the vector as pending. Unless IRQs were
>> enabled at any point between the registration and the check, there's
>> simply no way for the corresponding IRR bit to be dealt with (by
>> propagating to ISR when the interrupt is delivered, and then being cleared
>> from ISR by EOI).
> 
> With Roger's help I now have a pointer to osstest also exposing the issue:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz

I've noticed only now that my mail to Jane bounced, and I'm now told
she's no longer in her role at Citrix. Since I don't expect to have time
to investigate an appropriate solution here, may I ask whether one of
the two of you could look into this, being the maintainers of this code?

Thanks, Jan
Jürgen Groß Nov. 11, 2022, 9:01 a.m. UTC | #5
On 08.11.22 17:26, Jan Beulich wrote:
> On 03.11.2022 16:41, Jan Beulich wrote:
>> On 03.11.2022 14:38, Jan Beulich wrote:
>>> On 29.07.2022 09:04, Jane Malalane wrote:
>>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>>   {
>>>>   	struct pt_regs *old_regs = set_irq_regs(regs);
>>>>   
>>>> +	if (xen_percpu_upcall)
>>>> +		ack_APIC_irq();
>>>> +
>>>>   	inc_irq_stat(irq_hv_callback_count);
>>>>   
>>>>   	xen_hvm_evtchn_do_upcall();
>>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>>   	if (!xen_have_vector_callback)
>>>>   		return 0;
>>>>   
>>>> +	if (xen_percpu_upcall) {
>>>> +		rc = xen_set_upcall_vector(cpu);
>>>
>>>  From all I can tell at least for APs this happens before setup_local_apic().
>>> With there being APIC interaction in this operation mode, as seen e.g. in
>>> the earlier hunk above, I think this is logically wrong. And it leads to
>>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>>> an intentional side effect, marks the vector as pending. Unless IRQs were
>>> enabled at any point between the registration and the check, there's
>>> simply no way for the corresponding IRR bit to be dealt with (by
>>> propagating to ISR when the interrupt is delivered, and then being cleared
>>> from ISR by EOI).
>>
>> With Roger's help I now have a pointer to osstest also exposing the issue:
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz
> 
> I've noticed only now that my mail to Jane bounced, and I'm now told
> she's no longer in her role at Citrix. Since I don't expect to have time
> to investigate an appropriate solution here, may I ask whether one of
> the two of you could look into this, being the maintainers of this code?

I think the correct way to handle this would be:

- rename CPUHP_AP_ARM_XEN_STARTING to CPUHP_AP_XEN_STARTING
- move the xen_set_upcall_vector() call to a new hotplug callback
   registered for CPUHP_AP_XEN_STARTING (this can be done even
   conditionally only if xen_percpu_upcall is set)

Writing a patch now ...


Juergen
Jürgen Groß Nov. 11, 2022, 12:44 p.m. UTC | #6
On 11.11.22 10:01, Juergen Gross wrote:
> On 08.11.22 17:26, Jan Beulich wrote:
>> On 03.11.2022 16:41, Jan Beulich wrote:
>>> On 03.11.2022 14:38, Jan Beulich wrote:
>>>> On 29.07.2022 09:04, Jane Malalane wrote:
>>>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>>>   {
>>>>>       struct pt_regs *old_regs = set_irq_regs(regs);
>>>>> +    if (xen_percpu_upcall)
>>>>> +        ack_APIC_irq();
>>>>> +
>>>>>       inc_irq_stat(irq_hv_callback_count);
>>>>>       xen_hvm_evtchn_do_upcall();
>>>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>>>       if (!xen_have_vector_callback)
>>>>>           return 0;
>>>>> +    if (xen_percpu_upcall) {
>>>>> +        rc = xen_set_upcall_vector(cpu);
>>>>
>>>>  From all I can tell at least for APs this happens before setup_local_apic().
>>>> With there being APIC interaction in this operation mode, as seen e.g. in
>>>> the earlier hunk above, I think this is logically wrong. And it leads to
>>>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>>>> an intentional side effect, marks the vector as pending. Unless IRQs were
>>>> enabled at any point between the registration and the check, there's
>>>> simply no way for the corresponding IRR bit to be dealt with (by
>>>> propagating to ISR when the interrupt is delivered, and then being cleared
>>>> from ISR by EOI).
>>>
>>> With Roger's help I now have a pointer to osstest also exposing the issue:
>>>
>>> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz
>>
>> I've noticed only now that my mail to Jane bounced, and I'm now told
>> she's no longer in her role at Citrix. Since I don't expect to have time
>> to investigate an appropriate solution here, may I ask whether one of
>> the two of you could look into this, being the maintainers of this code?
> 
> I think the correct way to handle this would be:
> 
> - rename CPUHP_AP_ARM_XEN_STARTING to CPUHP_AP_XEN_STARTING
> - move the xen_set_upcall_vector() call to a new hotplug callback
>    registered for CPUHP_AP_XEN_STARTING (this can be done even
>    conditionally only if xen_percpu_upcall is set)
> 
> Writing a patch now ...

For the APs this is working as expected.

The boot processor seems to be harder to fix. The related message is being
issued even with interrupts being on when setup_local_APIC() is called.

I've tried to register the callback only after the setup_local_APIC() call,
but this results in a system hang when the APs are started.

Any ideas?


Juergen
Jan Beulich Nov. 11, 2022, 1:17 p.m. UTC | #7
On 11.11.2022 13:44, Juergen Gross wrote:
> On 11.11.22 10:01, Juergen Gross wrote:
>> On 08.11.22 17:26, Jan Beulich wrote:
>>> On 03.11.2022 16:41, Jan Beulich wrote:
>>>> On 03.11.2022 14:38, Jan Beulich wrote:
>>>>> On 29.07.2022 09:04, Jane Malalane wrote:
>>>>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>>>>   {
>>>>>>       struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>> +    if (xen_percpu_upcall)
>>>>>> +        ack_APIC_irq();
>>>>>> +
>>>>>>       inc_irq_stat(irq_hv_callback_count);
>>>>>>       xen_hvm_evtchn_do_upcall();
>>>>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>>>>       if (!xen_have_vector_callback)
>>>>>>           return 0;
>>>>>> +    if (xen_percpu_upcall) {
>>>>>> +        rc = xen_set_upcall_vector(cpu);
>>>>>
>>>>>  From all I can tell at least for APs this happens before setup_local_apic().
>>>>> With there being APIC interaction in this operation mode, as seen e.g. in
>>>>> the earlier hunk above, I think this is logically wrong. And it leads to
>>>>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>>>>> an intentional side effect, marks the vector as pending. Unless IRQs were
>>>>> enabled at any point between the registration and the check, there's
>>>>> simply no way for the corresponding IRR bit to be dealt with (by
>>>>> propagating to ISR when the interrupt is delivered, and then being cleared
>>>>> from ISR by EOI).
>>>>
>>>> With Roger's help I now have a pointer to osstest also exposing the issue:
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz
>>>
>>> I've noticed only now that my mail to Jane bounced, and I'm now told
>>> she's no longer in her role at Citrix. Since I don't expect to have time
>>> to investigate an appropriate solution here, may I ask whether one of
>>> the two of you could look into this, being the maintainers of this code?
>>
>> I think the correct way to handle this would be:
>>
>> - rename CPUHP_AP_ARM_XEN_STARTING to CPUHP_AP_XEN_STARTING
>> - move the xen_set_upcall_vector() call to a new hotplug callback
>>    registered for CPUHP_AP_XEN_STARTING (this can be done even
>>    conditionally only if xen_percpu_upcall is set)
>>
>> Writing a patch now ...
> 
> For the APs this is working as expected.
> 
> The boot processor seems to be harder to fix. The related message is being
> issued even with interrupts being on when setup_local_APIC() is called.

Hmm, puzzling: I don't recall having seen the message for the BSP. Which
made me assume (without having actually checked) that ...

> I've tried to register the callback only after the setup_local_APIC() call,

... it's already happening afterwards in that case.

> but this results in a system hang when the APs are started.
> 
> Any ideas?

Not really, to be honest.

Jan
Jürgen Groß Nov. 11, 2022, 2:50 p.m. UTC | #8
On 11.11.22 14:17, Jan Beulich wrote:
> On 11.11.2022 13:44, Juergen Gross wrote:
>> On 11.11.22 10:01, Juergen Gross wrote:
>>> On 08.11.22 17:26, Jan Beulich wrote:
>>>> On 03.11.2022 16:41, Jan Beulich wrote:
>>>>> On 03.11.2022 14:38, Jan Beulich wrote:
>>>>>> On 29.07.2022 09:04, Jane Malalane wrote:
>>>>>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>>>>>    {
>>>>>>>        struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>>> +    if (xen_percpu_upcall)
>>>>>>> +        ack_APIC_irq();
>>>>>>> +
>>>>>>>        inc_irq_stat(irq_hv_callback_count);
>>>>>>>        xen_hvm_evtchn_do_upcall();
>>>>>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>>>>>        if (!xen_have_vector_callback)
>>>>>>>            return 0;
>>>>>>> +    if (xen_percpu_upcall) {
>>>>>>> +        rc = xen_set_upcall_vector(cpu);
>>>>>>
>>>>>>   From all I can tell at least for APs this happens before setup_local_apic().
>>>>>> With there being APIC interaction in this operation mode, as seen e.g. in
>>>>>> the earlier hunk above, I think this is logically wrong. And it leads to
>>>>>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>>>>>> an intentional side effect, marks the vector as pending. Unless IRQs were
>>>>>> enabled at any point between the registration and the check, there's
>>>>>> simply no way for the corresponding IRR bit to be dealt with (by
>>>>>> propagating to ISR when the interrupt is delivered, and then being cleared
>>>>>> from ISR by EOI).
>>>>>
>>>>> With Roger's help I now have a pointer to osstest also exposing the issue:
>>>>>
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz
>>>>
>>>> I've noticed only now that my mail to Jane bounced, and I'm now told
>>>> she's no longer in her role at Citrix. Since I don't expect to have time
>>>> to investigate an appropriate solution here, may I ask whether one of
>>>> the two of you could look into this, being the maintainers of this code?
>>>
>>> I think the correct way to handle this would be:
>>>
>>> - rename CPUHP_AP_ARM_XEN_STARTING to CPUHP_AP_XEN_STARTING
>>> - move the xen_set_upcall_vector() call to a new hotplug callback
>>>     registered for CPUHP_AP_XEN_STARTING (this can be done even
>>>     conditionally only if xen_percpu_upcall is set)
>>>
>>> Writing a patch now ...
>>
>> For the APs this is working as expected.
>>
>> The boot processor seems to be harder to fix. The related message is being
>> issued even with interrupts being on when setup_local_APIC() is called.
> 
> Hmm, puzzling: I don't recall having seen the message for the BSP. Which
> made me assume (without having actually checked) that ...
> 
>> I've tried to register the callback only after the setup_local_APIC() call,
> 
> ... it's already happening afterwards in that case.
> 
>> but this results in a system hang when the APs are started.
>>
>> Any ideas?
> 
> Not really, to be honest.

I might be wrong here, but is a bit set in IRR plus interrupts enabled
enough to make the kernel happy? The local APIC isn't enabled yet when
apic_pending_intr_clear() is being called, so IMHO the hypervisor will
never propagate the bit to ISR.

I didn't find any specific information in the SDM regarding "accepting
an interrupt" of a disabled local APIC, but maybe I didn't find the
relevant part of the manual.


Juergen
Jan Beulich Nov. 14, 2022, 9:26 a.m. UTC | #9
(shrinking Cc list to just xen-devel@)

On 11.11.2022 15:50, Juergen Gross wrote:
> On 11.11.22 14:17, Jan Beulich wrote:
>> On 11.11.2022 13:44, Juergen Gross wrote:
>>> On 11.11.22 10:01, Juergen Gross wrote:
>>>> Writing a patch now ...
>>>
>>> For the APs this is working as expected.
>>>
>>> The boot processor seems to be harder to fix. The related message is being
>>> issued even with interrupts being on when setup_local_APIC() is called.
>>
>> Hmm, puzzling: I don't recall having seen the message for the BSP. Which
>> made me assume (without having actually checked) that ...
>>
>>> I've tried to register the callback only after the setup_local_APIC() call,
>>
>> ... it's already happening afterwards in that case.
>>
>>> but this results in a system hang when the APs are started.
>>>
>>> Any ideas?
>>
>> Not really, to be honest.
> 
> I might be wrong here, but is a bit set in IRR plus interrupts enabled
> enough to make the kernel happy?

If you add in PPR, then yes.

> The local APIC isn't enabled yet when
> apic_pending_intr_clear() is being called, so IMHO the hypervisor will
> never propagate the bit to ISR.

What would suffice is an interrupts-enabled window between the hypercall
and apic_pending_intr_clear(), like is occurring e.g. in
timer_irq_works() (which is what I was guessing might avoid the issue
on the BSP).

As an aside - it may be the hypervisor or hardware, depending on APIC
virtualization capabilities of the latter.

> I didn't find any specific information in the SDM regarding "accepting
> an interrupt" of a disabled local APIC, but maybe I didn't find the
> relevant part of the manual.

Indeed much needs to be inferred from how things have been written for
all the years, rather than being explicitly said. This specific aspect
is probably worse, because you can't really infer the behavior from
anything that's written anywhere (afaict; or maybe like you I haven't
been able to spot relevant text). The section that's looks to be
supposed to have this information is "Local APIC State After It Has
Been Software Disabled", but behavior as to IRR is only explicitly
described there for things already "in progress" when the enable bit
is cleared. I'm inclined to infer that no such processing would occur
afterwards anymore, until the enable bit was set again.

Which raises the question whether in the hypervisor a call to
vlapic_enabled() isn't actually missing from hvm_assert_evtchn_irq()
before calling vlapic_set_irq(). If so, a Linux side change would
likely be unnecessary. The problem then is that if an upcall was
already pending, it would never be delivered to the vCPU (since
vcpu_mark_events_pending() is a no-op when the pending flag is
already set, even subsequently arising causes would result in no
further signaling). IOW adding the check there would likely need to
be accompanied by a further change elsewhere - e.g. adding of a
(conditional) call to hvm_assert_evtchn_irq() (or directly to
vlapic_set_irq()) when the software-enable bit is set by the guest,
much like we already call pt_may_unmask_irq() there.

Andrew, Roger - do you have any thoughts here?

Similarly I wonder whether that call to hvm_assert_evtchn_irq() is
actually necessary when evtchn_upcall_pending is "false".

Jan
diff mbox series

Patch

diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
index 78e667a31d6c..6daa9b0c8d11 100644
--- a/arch/x86/include/asm/xen/cpuid.h
+++ b/arch/x86/include/asm/xen/cpuid.h
@@ -107,6 +107,8 @@ 
  * ID field from 8 to 15 bits, allowing to target APIC IDs up 32768.
  */
 #define XEN_HVM_CPUID_EXT_DEST_ID      (1u << 5)
+/* Per-vCPU event channel upcalls */
+#define XEN_HVM_CPUID_UPCALL_VECTOR    (1u << 6)
 
 /*
  * Leaf 6 (0x40000x05)
diff --git a/arch/x86/include/asm/xen/events.h b/arch/x86/include/asm/xen/events.h
index 068d9b067c83..62bdceb594f1 100644
--- a/arch/x86/include/asm/xen/events.h
+++ b/arch/x86/include/asm/xen/events.h
@@ -23,7 +23,7 @@  static inline int xen_irqs_disabled(struct pt_regs *regs)
 /* No need for a barrier -- XCHG is a barrier on x86. */
 #define xchg_xen_ulong(ptr, val) xchg((ptr), (val))
 
-extern int xen_have_vector_callback;
+extern bool xen_have_vector_callback;
 
 /*
  * Events delivered via platform PCI interrupts are always
@@ -34,4 +34,5 @@  static inline bool xen_support_evtchn_rebind(void)
 	return (!xen_hvm_domain() || xen_have_vector_callback);
 }
 
+extern bool xen_percpu_upcall;
 #endif /* _ASM_X86_XEN_EVENTS_H */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 30c6e986a6cd..b8db2148c07d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -51,7 +51,7 @@  EXPORT_SYMBOL_GPL(xen_start_info);
 
 struct shared_info xen_dummy_shared_info;
 
-__read_mostly int xen_have_vector_callback;
+__read_mostly bool xen_have_vector_callback = true;
 EXPORT_SYMBOL_GPL(xen_have_vector_callback);
 
 /*
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 8b71b1dd7639..198d3cd3e9a5 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -7,6 +7,8 @@ 
 
 #include <xen/features.h>
 #include <xen/events.h>
+#include <xen/hvm.h>
+#include <xen/interface/hvm/hvm_op.h>
 #include <xen/interface/memory.h>
 
 #include <asm/apic.h>
@@ -30,6 +32,9 @@ 
 
 static unsigned long shared_info_pfn;
 
+__ro_after_init bool xen_percpu_upcall;
+EXPORT_SYMBOL_GPL(xen_percpu_upcall);
+
 void xen_hvm_init_shared_info(void)
 {
 	struct xen_add_to_physmap xatp;
@@ -125,6 +130,9 @@  DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
 
+	if (xen_percpu_upcall)
+		ack_APIC_irq();
+
 	inc_irq_stat(irq_hv_callback_count);
 
 	xen_hvm_evtchn_do_upcall();
@@ -168,6 +176,15 @@  static int xen_cpu_up_prepare_hvm(unsigned int cpu)
 	if (!xen_have_vector_callback)
 		return 0;
 
+	if (xen_percpu_upcall) {
+		rc = xen_set_upcall_vector(cpu);
+		if (rc) {
+			WARN(1, "HVMOP_set_evtchn_upcall_vector"
+			     " for CPU %d failed: %d\n", cpu, rc);
+			return rc;
+		}
+	}
+
 	if (xen_feature(XENFEAT_hvm_safe_pvclock))
 		xen_setup_timer(cpu);
 
@@ -188,8 +205,6 @@  static int xen_cpu_dead_hvm(unsigned int cpu)
 	return 0;
 }
 
-static bool no_vector_callback __initdata;
-
 static void __init xen_hvm_guest_init(void)
 {
 	if (xen_pv_domain())
@@ -211,9 +226,6 @@  static void __init xen_hvm_guest_init(void)
 
 	xen_panic_handler_init();
 
-	if (!no_vector_callback && xen_feature(XENFEAT_hvm_callback_vector))
-		xen_have_vector_callback = 1;
-
 	xen_hvm_smp_init();
 	WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
 	xen_unplug_emulated_devices();
@@ -239,7 +251,7 @@  early_param("xen_nopv", xen_parse_nopv);
 
 static __init int xen_parse_no_vector_callback(char *arg)
 {
-	no_vector_callback = true;
+	xen_have_vector_callback = false;
 	return 0;
 }
 early_param("xen_no_vector_callback", xen_parse_no_vector_callback);
diff --git a/arch/x86/xen/suspend_hvm.c b/arch/x86/xen/suspend_hvm.c
index 9d548b0c772f..0c4f7554b7cc 100644
--- a/arch/x86/xen/suspend_hvm.c
+++ b/arch/x86/xen/suspend_hvm.c
@@ -5,6 +5,7 @@ 
 #include <xen/hvm.h>
 #include <xen/features.h>
 #include <xen/interface/features.h>
+#include <xen/events.h>
 
 #include "xen-ops.h"
 
@@ -14,6 +15,13 @@  void xen_hvm_post_suspend(int suspend_cancelled)
 		xen_hvm_init_shared_info();
 		xen_vcpu_restore();
 	}
-	xen_setup_callback_vector();
+	if (xen_percpu_upcall) {
+		unsigned int cpu;
+
+		for_each_online_cpu(cpu)
+			BUG_ON(xen_set_upcall_vector(cpu));
+	} else {
+		xen_setup_callback_vector();
+	}
 	xen_unplug_emulated_devices();
 }
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 46d9295d9a6e..206d4b466e44 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -45,6 +45,7 @@ 
 #include <asm/irq.h>
 #include <asm/io_apic.h>
 #include <asm/i8259.h>
+#include <asm/xen/cpuid.h>
 #include <asm/xen/pci.h>
 #endif
 #include <asm/sync_bitops.h>
@@ -2183,6 +2184,7 @@  static struct irq_chip xen_percpu_chip __read_mostly = {
 	.irq_ack		= ack_dynirq,
 };
 
+#ifdef CONFIG_X86
 #ifdef CONFIG_XEN_PVHVM
 /* Vector callbacks are better than PCI interrupts to receive event
  * channel notifications because we can receive vector callbacks on any
@@ -2195,11 +2197,48 @@  void xen_setup_callback_vector(void)
 		callback_via = HVM_CALLBACK_VECTOR(HYPERVISOR_CALLBACK_VECTOR);
 		if (xen_set_callback_via(callback_via)) {
 			pr_err("Request for Xen HVM callback vector failed\n");
-			xen_have_vector_callback = 0;
+			xen_have_vector_callback = false;
 		}
 	}
 }
 
+/*
+ * Setup per-vCPU vector-type callbacks. If this setup is unavailable,
+ * fallback to the global vector-type callback.
+ */
+static __init void xen_init_setup_upcall_vector(void)
+{
+	if (!xen_have_vector_callback)
+		return;
+
+	if ((cpuid_eax(xen_cpuid_base() + 4) & XEN_HVM_CPUID_UPCALL_VECTOR) &&
+	    !xen_set_upcall_vector(0))
+		xen_percpu_upcall = true;
+	else if (xen_feature(XENFEAT_hvm_callback_vector))
+		xen_setup_callback_vector();
+	else
+		xen_have_vector_callback = false;
+}
+
+int xen_set_upcall_vector(unsigned int cpu)
+{
+	int rc;
+	xen_hvm_evtchn_upcall_vector_t op = {
+		.vector = HYPERVISOR_CALLBACK_VECTOR,
+		.vcpu = per_cpu(xen_vcpu_id, cpu),
+	};
+
+	rc = HYPERVISOR_hvm_op(HVMOP_set_evtchn_upcall_vector, &op);
+	if (rc)
+		return rc;
+
+	/* Trick toolstack to think we are enlightened. */
+	if (!cpu)
+		rc = xen_set_callback_via(1);
+
+	return rc;
+}
+
 static __init void xen_alloc_callback_vector(void)
 {
 	if (!xen_have_vector_callback)
@@ -2210,8 +2249,11 @@  static __init void xen_alloc_callback_vector(void)
 }
 #else
 void xen_setup_callback_vector(void) {}
+static inline void xen_init_setup_upcall_vector(void) {}
+int xen_set_upcall_vector(unsigned int cpu) {}
 static inline void xen_alloc_callback_vector(void) {}
-#endif
+#endif /* CONFIG_XEN_PVHVM */
+#endif /* CONFIG_X86 */
 
 bool xen_fifo_events = true;
 module_param_named(fifo_events, xen_fifo_events, bool, 0);
@@ -2271,10 +2313,9 @@  void __init xen_init_IRQ(void)
 		if (xen_initial_domain())
 			pci_xen_initial_domain();
 	}
-	if (xen_feature(XENFEAT_hvm_callback_vector)) {
-		xen_setup_callback_vector();
-		xen_alloc_callback_vector();
-	}
+	xen_init_setup_upcall_vector();
+	xen_alloc_callback_vector();
+
 
 	if (xen_hvm_domain()) {
 		native_init_IRQ();
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
index b7fd7fc9ad41..8da7a6747058 100644
--- a/include/xen/hvm.h
+++ b/include/xen/hvm.h
@@ -60,4 +60,6 @@  static inline int hvm_get_parameter(int idx, uint64_t *value)
 
 void xen_setup_callback_vector(void);
 
+int xen_set_upcall_vector(unsigned int cpu);
+
 #endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index f3097e79bb03..03134bf3cec1 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -46,4 +46,23 @@  struct xen_hvm_get_mem_type {
 };
 DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_get_mem_type);
 
+#if defined(__i386__) || defined(__x86_64__)
+
+/*
+ * HVMOP_set_evtchn_upcall_vector: Set a <vector> that should be used for event
+ *                                 channel upcalls on the specified <vcpu>. If set,
+ *                                 this vector will be used in preference to the
+ *                                 domain global callback via (see
+ *                                 HVM_PARAM_CALLBACK_IRQ).
+ */
+#define HVMOP_set_evtchn_upcall_vector 23
+struct xen_hvm_evtchn_upcall_vector {
+    uint32_t vcpu;
+    uint8_t vector;
+};
+typedef struct xen_hvm_evtchn_upcall_vector xen_hvm_evtchn_upcall_vector_t;
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_evtchn_upcall_vector_t);
+
+#endif /* defined(__i386__) || defined(__x86_64__) */
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */