[v4,18/21] KVM: ARM64: Add PMU overflow interrupt routing

Message ID	565DB021.3020901@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Message-ID: <565DB021.3020901@huawei.com> Date: Tue, 1 Dec 2015 22:35:13 +0800 From: Shannon Zhao <zhaoshenglong@huawei.com> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Marc Zyngier <marc.zyngier@arm.com> CC: <kvmarm@lists.cs.columbia.edu>, <linux-arm-kernel@lists.infradead.org>, <kvm@vger.kernel.org>, <christoffer.dall@linaro.org>, <will.deacon@arm.com>, <alex.bennee@linaro.org>, <wei@redhat.com>, <cov@codeaurora.org>, <shannon.zhao@linaro.org>, <peter.huangpeng@huawei.com> Subject: Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing References: <1446186123-11548-1-git-send-email-zhaoshenglong@huawei.com> <1446186123-11548-19-git-send-email-zhaoshenglong@huawei.com> <20151130182258.684c9df6@arm.com> In-Reply-To: <20151130182258.684c9df6@arm.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Shannon Zhao Dec. 1, 2015, 2:35 p.m. UTC

On 2015/12/1 2:22, Marc Zyngier wrote:
> On Fri, 30 Oct 2015 14:22:00 +0800
> Shannon Zhao <zhaoshenglong@huawei.com> wrote:
> 
>> From: Shannon Zhao <shannon.zhao@linaro.org>
>>
>> When calling perf_event_create_kernel_counter to create perf_event,
>> assign a overflow handler. Then when perf event overflows, set
>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>
>> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
>> ---
>>  arch/arm/kvm/arm.c    |  4 +++
>>  include/kvm/arm_pmu.h |  4 +++
>>  virt/kvm/arm/pmu.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 83 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 78b2869..9c0fec4 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/sched.h>
>>  #include <linux/kvm.h>
>>  #include <trace/events/kvm.h>
>> +#include <kvm/arm_pmu.h>
>>  
>>  #define CREATE_TRACE_POINTS
>>  #include "trace.h"
>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  
>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>  			local_irq_enable();
>> +			kvm_pmu_sync_hwstate(vcpu);
> 
> This is very weird. Are you only injecting interrupts when a signal is
> pending? I don't understand how this works...
> 
>>  			kvm_vgic_sync_hwstate(vcpu);
>>  			preempt_enable();
>>  			kvm_timer_sync_hwstate(vcpu);
>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  		kvm_guest_exit();
>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>  
>> +		kvm_pmu_post_sync_hwstate(vcpu);
>> +
>>  		kvm_vgic_sync_hwstate(vcpu);
>>  
>>  		preempt_enable();
>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>> index acd025a..5e7f943 100644
>> --- a/include/kvm/arm_pmu.h
>> +++ b/include/kvm/arm_pmu.h
>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>  };
>>  
>>  #ifdef CONFIG_KVM_ARM_PMU
>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
> 
> Please follow the current terminology: _flush_ on VM entry, _sync_ on
> VM exit.
> 

Hi Marc,

Is below patch the right way for this?

Marc Zyngier Dec. 1, 2015, 2:50 p.m. UTC | #1

On 01/12/15 14:35, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 2:22, Marc Zyngier wrote:
>> On Fri, 30 Oct 2015 14:22:00 +0800
>> Shannon Zhao <zhaoshenglong@huawei.com> wrote:
>>
>>> From: Shannon Zhao <shannon.zhao@linaro.org>
>>>
>>> When calling perf_event_create_kernel_counter to create perf_event,
>>> assign a overflow handler. Then when perf event overflows, set
>>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>>
>>> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
>>> ---
>>>  arch/arm/kvm/arm.c    |  4 +++
>>>  include/kvm/arm_pmu.h |  4 +++
>>>  virt/kvm/arm/pmu.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  3 files changed, 83 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 78b2869..9c0fec4 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -28,6 +28,7 @@
>>>  #include <linux/sched.h>
>>>  #include <linux/kvm.h>
>>>  #include <trace/events/kvm.h>
>>> +#include <kvm/arm_pmu.h>
>>>  
>>>  #define CREATE_TRACE_POINTS
>>>  #include "trace.h"
>>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			kvm_pmu_sync_hwstate(vcpu);
>>
>> This is very weird. Are you only injecting interrupts when a signal is
>> pending? I don't understand how this works...
>>
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			preempt_enable();
>>>  			kvm_timer_sync_hwstate(vcpu);
>>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_guest_exit();
>>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>  
>>> +		kvm_pmu_post_sync_hwstate(vcpu);
>>> +
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>  
>>>  		preempt_enable();
>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>> index acd025a..5e7f943 100644
>>> --- a/include/kvm/arm_pmu.h
>>> +++ b/include/kvm/arm_pmu.h
>>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>>  };
>>>  
>>>  #ifdef CONFIG_KVM_ARM_PMU
>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
>>
>> Please follow the current terminology: _flush_ on VM entry, _sync_ on
>> VM exit.
>>
> 
> Hi Marc,
> 
> Is below patch the right way for this?
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 78b2869..84008d1 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -28,6 +28,7 @@
>  #include <linux/sched.h>
>  #include <linux/kvm.h>
>  #include <trace/events/kvm.h>
> +#include <kvm/arm_pmu.h>
> 
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>                  */
>                 kvm_timer_flush_hwstate(vcpu);
> 
> +               kvm_pmu_flush_hwstate(vcpu);
> +
>                 /*
>                  * Preparing the interrupts to be injected also
>                  * involves poking the GIC, which must be done in a
> @@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>                         kvm_vgic_sync_hwstate(vcpu);
>                         preempt_enable();
>                         kvm_timer_sync_hwstate(vcpu);
> +                       kvm_pmu_sync_hwstate(vcpu);
>                         continue;
>                 }
> 
> @@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
> 
>                 kvm_timer_sync_hwstate(vcpu);
> 
> +               kvm_pmu_sync_hwstate(vcpu);
> +
>                 ret = handle_exit(vcpu, run, ret);
>         }

yeah, that's more like it!

> 
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 47bbd43..edfe4e5 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -41,6 +41,8 @@ struct kvm_pmu {
>  };
> 
>  #ifdef CONFIG_KVM_ARM_PMU
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
> select_idx);
>  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
> all_enable);
> @@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
> *vcpu, u32 data,
>                                     u32 select_idx);
>  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>  #else
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
> select_idx)
>  {
>         return 0;
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 15cac45..9aad2f7 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -21,6 +21,7 @@
>  #include <linux/perf_event.h>
>  #include <asm/kvm_emulate.h>
>  #include <kvm/arm_pmu.h>
> +#include <kvm/arm_vgic.h>
> 
>  /**
>   * kvm_pmu_get_counter_value - get PMU counter value
> @@ -79,6 +80,78 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
>  }
> 
>  /**
> + * kvm_pmu_flush_hwstate - flush pmu state to cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Inject virtual PMU IRQ if IRQ is pending for this cpu.
> + */
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +       u32 overflow;
> +
> +       if (!vcpu_mode_is_32bit(vcpu))
> +               overflow = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
> +       else
> +               overflow = vcpu_cp15(vcpu, c9_PMOVSSET);
> +
> +       if ((pmu->irq_pending || overflow != 0) && (pmu->irq_num != -1))
> +               kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> pmu->irq_num, 1);
> +
> +       pmu->irq_pending = false;

Now, we get to the critical point. Why do you need to keep this shadow
state for the interrupt?

The way I see it, you should set the line high when the overflow has
been registered, and set it low when the overflow condition has been
cleared by the guest. And nothing else.

> +}
> +
> +/**
> + * kvm_pmu_sync_hwstate - sync pmu state for cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Inject virtual PMU IRQ if IRQ is pending for this cpu when back from
> guest.
> + */
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +
> +       if (pmu->irq_pending && (pmu->irq_num != -1))
> +               kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> pmu->irq_num, 1);
> +
> +       pmu->irq_pending = false;
> +}
> 

Why do you have to do it twice??

Thanks,

	M.

Shannon Zhao Dec. 1, 2015, 3:13 p.m. UTC | #2

On 2015/12/1 22:50, Marc Zyngier wrote:
> On 01/12/15 14:35, Shannon Zhao wrote:
>>
>>
>> On 2015/12/1 2:22, Marc Zyngier wrote:
>>> On Fri, 30 Oct 2015 14:22:00 +0800
>>> Shannon Zhao <zhaoshenglong@huawei.com> wrote:
>>>
>>>> From: Shannon Zhao <shannon.zhao@linaro.org>
>>>>
>>>> When calling perf_event_create_kernel_counter to create perf_event,
>>>> assign a overflow handler. Then when perf event overflows, set
>>>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>>>
>>>> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
>>>> ---
>>>>   arch/arm/kvm/arm.c    |  4 +++
>>>>   include/kvm/arm_pmu.h |  4 +++
>>>>   virt/kvm/arm/pmu.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>   3 files changed, 83 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 78b2869..9c0fec4 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -28,6 +28,7 @@
>>>>   #include <linux/sched.h>
>>>>   #include <linux/kvm.h>
>>>>   #include <trace/events/kvm.h>
>>>> +#include <kvm/arm_pmu.h>
>>>>
>>>>   #define CREATE_TRACE_POINTS
>>>>   #include "trace.h"
>>>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>
>>>>   		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>>   			local_irq_enable();
>>>> +			kvm_pmu_sync_hwstate(vcpu);
>>>
>>> This is very weird. Are you only injecting interrupts when a signal is
>>> pending? I don't understand how this works...
>>>
>>>>   			kvm_vgic_sync_hwstate(vcpu);
>>>>   			preempt_enable();
>>>>   			kvm_timer_sync_hwstate(vcpu);
>>>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>   		kvm_guest_exit();
>>>>   		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>
>>>> +		kvm_pmu_post_sync_hwstate(vcpu);
>>>> +
>>>>   		kvm_vgic_sync_hwstate(vcpu);
>>>>
>>>>   		preempt_enable();
>>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>>> index acd025a..5e7f943 100644
>>>> --- a/include/kvm/arm_pmu.h
>>>> +++ b/include/kvm/arm_pmu.h
>>>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>>>   };
>>>>
>>>>   #ifdef CONFIG_KVM_ARM_PMU
>>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
>>>
>>> Please follow the current terminology: _flush_ on VM entry, _sync_ on
>>> VM exit.
>>>
>>
>> Hi Marc,
>>
>> Is below patch the right way for this?
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 78b2869..84008d1 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -28,6 +28,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/kvm.h>
>>   #include <trace/events/kvm.h>
>> +#include <kvm/arm_pmu.h>
>>
>>   #define CREATE_TRACE_POINTS
>>   #include "trace.h"
>> @@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>                   */
>>                  kvm_timer_flush_hwstate(vcpu);
>>
>> +               kvm_pmu_flush_hwstate(vcpu);
>> +
>>                  /*
>>                   * Preparing the interrupts to be injected also
>>                   * involves poking the GIC, which must be done in a
>> @@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>                          kvm_vgic_sync_hwstate(vcpu);
>>                          preempt_enable();
>>                          kvm_timer_sync_hwstate(vcpu);
>> +                       kvm_pmu_sync_hwstate(vcpu);
>>                          continue;
>>                  }
>>
>> @@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>> struct kvm_run *run)
>>
>>                  kvm_timer_sync_hwstate(vcpu);
>>
>> +               kvm_pmu_sync_hwstate(vcpu);
>> +
>>                  ret = handle_exit(vcpu, run, ret);
>>          }
>
> yeah, that's more like it!
>
>>
>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>> index 47bbd43..edfe4e5 100644
>> --- a/include/kvm/arm_pmu.h
>> +++ b/include/kvm/arm_pmu.h
>> @@ -41,6 +41,8 @@ struct kvm_pmu {
>>   };
>>
>>   #ifdef CONFIG_KVM_ARM_PMU
>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>   unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
>> select_idx);
>>   void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>>   void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
>> all_enable);
>> @@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
>> *vcpu, u32 data,
>>                                      u32 select_idx);
>>   void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>>   #else
>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
>>   unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
>> select_idx)
>>   {
>>          return 0;
>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
>> index 15cac45..9aad2f7 100644
>> --- a/virt/kvm/arm/pmu.c
>> +++ b/virt/kvm/arm/pmu.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/perf_event.h>
>>   #include <asm/kvm_emulate.h>
>>   #include <kvm/arm_pmu.h>
>> +#include <kvm/arm_vgic.h>
>>
>>   /**
>>    * kvm_pmu_get_counter_value - get PMU counter value
>> @@ -79,6 +80,78 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
>>   }
>>
>>   /**
>> + * kvm_pmu_flush_hwstate - flush pmu state to cpu
>> + * @vcpu: The vcpu pointer
>> + *
>> + * Inject virtual PMU IRQ if IRQ is pending for this cpu.
>> + */
>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
>> +{
>> +       struct kvm_pmu *pmu = &vcpu->arch.pmu;
>> +       u32 overflow;
>> +
>> +       if (!vcpu_mode_is_32bit(vcpu))
>> +               overflow = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
>> +       else
>> +               overflow = vcpu_cp15(vcpu, c9_PMOVSSET);
>> +
>> +       if ((pmu->irq_pending || overflow != 0) && (pmu->irq_num != -1))
>> +               kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> pmu->irq_num, 1);
>> +
>> +       pmu->irq_pending = false;
>
> Now, we get to the critical point. Why do you need to keep this shadow
> state for the interrupt?
>
The reason is that when guest clear the overflow register, it will trap 
to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment, 
the overflow register is still overflowed(that is some bit is still 1). 
So We need to use some flag to mark we already inject this interrupt. 
And if during guest handling the overflow, there is a new overflow 
happening, the pmu->irq_pending will be set ture by 
kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?

> The way I see it, you should set the line high when the overflow has
> been registered, and set it low when the overflow condition has been
> cleared by the guest. And nothing else.
>
>> +}
>> +
>> +/**
>> + * kvm_pmu_sync_hwstate - sync pmu state for cpu
>> + * @vcpu: The vcpu pointer
>> + *
>> + * Inject virtual PMU IRQ if IRQ is pending for this cpu when back from
>> guest.
>> + */
>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
>> +{
>> +       struct kvm_pmu *pmu = &vcpu->arch.pmu;
>> +
>> +       if (pmu->irq_pending && (pmu->irq_num != -1))
>> +               kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> pmu->irq_num, 1);
>> +
>> +       pmu->irq_pending = false;
>> +}
>>
>
> Why do you have to do it twice??
>
> Thanks,
>
> 	M.
>

Marc Zyngier Dec. 1, 2015, 3:41 p.m. UTC | #3

On 01/12/15 15:13, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 22:50, Marc Zyngier wrote:
>> On 01/12/15 14:35, Shannon Zhao wrote:
>>>
>>>
>>> On 2015/12/1 2:22, Marc Zyngier wrote:
>>>> On Fri, 30 Oct 2015 14:22:00 +0800
>>>> Shannon Zhao <zhaoshenglong@huawei.com> wrote:
>>>>
>>>>> From: Shannon Zhao <shannon.zhao@linaro.org>
>>>>>
>>>>> When calling perf_event_create_kernel_counter to create perf_event,
>>>>> assign a overflow handler. Then when perf event overflows, set
>>>>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>>>>
>>>>> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
>>>>> ---
>>>>>   arch/arm/kvm/arm.c    |  4 +++
>>>>>   include/kvm/arm_pmu.h |  4 +++
>>>>>   virt/kvm/arm/pmu.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>   3 files changed, 83 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>>> index 78b2869..9c0fec4 100644
>>>>> --- a/arch/arm/kvm/arm.c
>>>>> +++ b/arch/arm/kvm/arm.c
>>>>> @@ -28,6 +28,7 @@
>>>>>   #include <linux/sched.h>
>>>>>   #include <linux/kvm.h>
>>>>>   #include <trace/events/kvm.h>
>>>>> +#include <kvm/arm_pmu.h>
>>>>>
>>>>>   #define CREATE_TRACE_POINTS
>>>>>   #include "trace.h"
>>>>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>
>>>>>   		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>>>   			local_irq_enable();
>>>>> +			kvm_pmu_sync_hwstate(vcpu);
>>>>
>>>> This is very weird. Are you only injecting interrupts when a signal is
>>>> pending? I don't understand how this works...
>>>>
>>>>>   			kvm_vgic_sync_hwstate(vcpu);
>>>>>   			preempt_enable();
>>>>>   			kvm_timer_sync_hwstate(vcpu);
>>>>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>   		kvm_guest_exit();
>>>>>   		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>>>
>>>>> +		kvm_pmu_post_sync_hwstate(vcpu);
>>>>> +
>>>>>   		kvm_vgic_sync_hwstate(vcpu);
>>>>>
>>>>>   		preempt_enable();
>>>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>>>> index acd025a..5e7f943 100644
>>>>> --- a/include/kvm/arm_pmu.h
>>>>> +++ b/include/kvm/arm_pmu.h
>>>>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>>>>   };
>>>>>
>>>>>   #ifdef CONFIG_KVM_ARM_PMU
>>>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>>>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
>>>>
>>>> Please follow the current terminology: _flush_ on VM entry, _sync_ on
>>>> VM exit.
>>>>
>>>
>>> Hi Marc,
>>>
>>> Is below patch the right way for this?
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 78b2869..84008d1 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -28,6 +28,7 @@
>>>   #include <linux/sched.h>
>>>   #include <linux/kvm.h>
>>>   #include <trace/events/kvm.h>
>>> +#include <kvm/arm_pmu.h>
>>>
>>>   #define CREATE_TRACE_POINTS
>>>   #include "trace.h"
>>> @@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>                   */
>>>                  kvm_timer_flush_hwstate(vcpu);
>>>
>>> +               kvm_pmu_flush_hwstate(vcpu);
>>> +
>>>                  /*
>>>                   * Preparing the interrupts to be injected also
>>>                   * involves poking the GIC, which must be done in a
>>> @@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>                          kvm_vgic_sync_hwstate(vcpu);
>>>                          preempt_enable();
>>>                          kvm_timer_sync_hwstate(vcpu);
>>> +                       kvm_pmu_sync_hwstate(vcpu);
>>>                          continue;
>>>                  }
>>>
>>> @@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>
>>>                  kvm_timer_sync_hwstate(vcpu);
>>>
>>> +               kvm_pmu_sync_hwstate(vcpu);
>>> +
>>>                  ret = handle_exit(vcpu, run, ret);
>>>          }
>>
>> yeah, that's more like it!
>>
>>>
>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>> index 47bbd43..edfe4e5 100644
>>> --- a/include/kvm/arm_pmu.h
>>> +++ b/include/kvm/arm_pmu.h
>>> @@ -41,6 +41,8 @@ struct kvm_pmu {
>>>   };
>>>
>>>   #ifdef CONFIG_KVM_ARM_PMU
>>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>>   unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
>>> select_idx);
>>>   void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>>>   void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
>>> all_enable);
>>> @@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
>>> *vcpu, u32 data,
>>>                                      u32 select_idx);
>>>   void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>>>   #else
>>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
>>>   unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
>>> select_idx)
>>>   {
>>>          return 0;
>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
>>> index 15cac45..9aad2f7 100644
>>> --- a/virt/kvm/arm/pmu.c
>>> +++ b/virt/kvm/arm/pmu.c
>>> @@ -21,6 +21,7 @@
>>>   #include <linux/perf_event.h>
>>>   #include <asm/kvm_emulate.h>
>>>   #include <kvm/arm_pmu.h>
>>> +#include <kvm/arm_vgic.h>
>>>
>>>   /**
>>>    * kvm_pmu_get_counter_value - get PMU counter value
>>> @@ -79,6 +80,78 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
>>>   }
>>>
>>>   /**
>>> + * kvm_pmu_flush_hwstate - flush pmu state to cpu
>>> + * @vcpu: The vcpu pointer
>>> + *
>>> + * Inject virtual PMU IRQ if IRQ is pending for this cpu.
>>> + */
>>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
>>> +{
>>> +       struct kvm_pmu *pmu = &vcpu->arch.pmu;
>>> +       u32 overflow;
>>> +
>>> +       if (!vcpu_mode_is_32bit(vcpu))
>>> +               overflow = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
>>> +       else
>>> +               overflow = vcpu_cp15(vcpu, c9_PMOVSSET);
>>> +
>>> +       if ((pmu->irq_pending || overflow != 0) && (pmu->irq_num != -1))
>>> +               kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>>> pmu->irq_num, 1);
>>> +
>>> +       pmu->irq_pending = false;
>>
>> Now, we get to the critical point. Why do you need to keep this shadow
>> state for the interrupt?
>>
> The reason is that when guest clear the overflow register, it will trap 
> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment, 
> the overflow register is still overflowed(that is some bit is still 1). 
> So We need to use some flag to mark we already inject this interrupt. 
> And if during guest handling the overflow, there is a new overflow 
> happening, the pmu->irq_pending will be set ture by 
> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?

I don't think so. This is a level interrupt, so the level should stay
high as long as the guest hasn't cleared all possible sources for that
interrupt.

For your example, the guest writes to PMOVSCLR to clear the overflow
caused by a given counter. If the status is now 0, the interrupt line
drops. If the status is still non zero, the line stays high. And I
believe that writing a 1 to PMOVSSET would actually trigger an
interrupt, or keep it high if it has already high.

In essence, do not try to maintain side state. I've been bitten.

Thanks,

	M.

Shannon Zhao Dec. 1, 2015, 4:26 p.m. UTC | #4

On 2015/12/1 23:41, Marc Zyngier wrote:
>> The reason is that when guest clear the overflow register, it will trap
>> >to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>> >the overflow register is still overflowed(that is some bit is still 1).
>> >So We need to use some flag to mark we already inject this interrupt.
>> >And if during guest handling the overflow, there is a new overflow
>> >happening, the pmu->irq_pending will be set ture by
>> >kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
> I don't think so. This is a level interrupt, so the level should stay
> high as long as the guest hasn't cleared all possible sources for that
> interrupt.
>
> For your example, the guest writes to PMOVSCLR to clear the overflow
> caused by a given counter. If the status is now 0, the interrupt line
> drops. If the status is still non zero, the line stays high. And I
> believe that writing a 1 to PMOVSSET would actually trigger an
> interrupt, or keep it high if it has already high.
>
Right, writing 1 to PMOVSSET will trigger an interrupt.

> In essence, do not try to maintain side state. I've been bitten.

So on VM entry, it check if PMOVSSET is zero. If not, call 
kvm_vgic_inject_irq to set the level high. If so, set the level low.
On VM exit, it seems there is nothing to do.

Marc Zyngier Dec. 1, 2015, 4:57 p.m. UTC | #5

On 01/12/15 16:26, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 23:41, Marc Zyngier wrote:
>>> The reason is that when guest clear the overflow register, it will trap
>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>>>> the overflow register is still overflowed(that is some bit is still 1).
>>>> So We need to use some flag to mark we already inject this interrupt.
>>>> And if during guest handling the overflow, there is a new overflow
>>>> happening, the pmu->irq_pending will be set ture by
>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>> I don't think so. This is a level interrupt, so the level should stay
>> high as long as the guest hasn't cleared all possible sources for that
>> interrupt.
>>
>> For your example, the guest writes to PMOVSCLR to clear the overflow
>> caused by a given counter. If the status is now 0, the interrupt line
>> drops. If the status is still non zero, the line stays high. And I
>> believe that writing a 1 to PMOVSSET would actually trigger an
>> interrupt, or keep it high if it has already high.
>>
> Right, writing 1 to PMOVSSET will trigger an interrupt.
> 
>> In essence, do not try to maintain side state. I've been bitten.
> 
> So on VM entry, it check if PMOVSSET is zero. If not, call 
> kvm_vgic_inject_irq to set the level high. If so, set the level low.
> On VM exit, it seems there is nothing to do.

It is even simpler than that:

- When you get an overflow, you inject an interrupt with the level set to 1.
- When the overflow register gets cleared, you inject the same interrupt
with the level set to 0.

I don't think you need to do anything else, and the world switch should
be left untouched.

Thanks,

	M.

Shannon Zhao Dec. 2, 2015, 2:40 a.m. UTC | #6

On 2015/12/2 0:57, Marc Zyngier wrote:
> On 01/12/15 16:26, Shannon Zhao wrote:
>>
>>
>> On 2015/12/1 23:41, Marc Zyngier wrote:
>>>> The reason is that when guest clear the overflow register, it will trap
>>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>>>>> the overflow register is still overflowed(that is some bit is still 1).
>>>>> So We need to use some flag to mark we already inject this interrupt.
>>>>> And if during guest handling the overflow, there is a new overflow
>>>>> happening, the pmu->irq_pending will be set ture by
>>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>>> I don't think so. This is a level interrupt, so the level should stay
>>> high as long as the guest hasn't cleared all possible sources for that
>>> interrupt.
>>>
>>> For your example, the guest writes to PMOVSCLR to clear the overflow
>>> caused by a given counter. If the status is now 0, the interrupt line
>>> drops. If the status is still non zero, the line stays high. And I
>>> believe that writing a 1 to PMOVSSET would actually trigger an
>>> interrupt, or keep it high if it has already high.
>>>
>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>
>>> In essence, do not try to maintain side state. I've been bitten.
>>
>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>> On VM exit, it seems there is nothing to do.
> 
> It is even simpler than that:
> 
> - When you get an overflow, you inject an interrupt with the level set to 1.
> - When the overflow register gets cleared, you inject the same interrupt
> with the level set to 0.
> 
> I don't think you need to do anything else, and the world switch should
> be left untouched.
> 

On 2015/7/17 23:28, Christoffer Dall wrote:>> > +		
kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> > +					    pmu->irq_num, 1);
> what context is this overflow handler function?  kvm_vgic_inject_irq
> grabs a mutex, so it can sleep...
>
> from a quick glance at the perf core code, it looks like this is in
> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>

But as Christoffer said before, it's not good to call
kvm_vgic_inject_irq directly in interrupt context. So if we just kick
the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?

Thanks,

Marc Zyngier Dec. 2, 2015, 8:45 a.m. UTC | #7

On 02/12/15 02:40, Shannon Zhao wrote:
> 
> 
> On 2015/12/2 0:57, Marc Zyngier wrote:
>> On 01/12/15 16:26, Shannon Zhao wrote:
>>>
>>>
>>> On 2015/12/1 23:41, Marc Zyngier wrote:
>>>>> The reason is that when guest clear the overflow register, it will trap
>>>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>>>>>> the overflow register is still overflowed(that is some bit is still 1).
>>>>>> So We need to use some flag to mark we already inject this interrupt.
>>>>>> And if during guest handling the overflow, there is a new overflow
>>>>>> happening, the pmu->irq_pending will be set ture by
>>>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>>>> I don't think so. This is a level interrupt, so the level should stay
>>>> high as long as the guest hasn't cleared all possible sources for that
>>>> interrupt.
>>>>
>>>> For your example, the guest writes to PMOVSCLR to clear the overflow
>>>> caused by a given counter. If the status is now 0, the interrupt line
>>>> drops. If the status is still non zero, the line stays high. And I
>>>> believe that writing a 1 to PMOVSSET would actually trigger an
>>>> interrupt, or keep it high if it has already high.
>>>>
>>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>>
>>>> In essence, do not try to maintain side state. I've been bitten.
>>>
>>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>>> On VM exit, it seems there is nothing to do.
>>
>> It is even simpler than that:
>>
>> - When you get an overflow, you inject an interrupt with the level set to 1.
>> - When the overflow register gets cleared, you inject the same interrupt
>> with the level set to 0.
>>
>> I don't think you need to do anything else, and the world switch should
>> be left untouched.
>>
> 
> On 2015/7/17 23:28, Christoffer Dall wrote:>> > +		
> kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>>>> +					    pmu->irq_num, 1);
>> what context is this overflow handler function?  kvm_vgic_inject_irq
>> grabs a mutex, so it can sleep...
>>
>> from a quick glance at the perf core code, it looks like this is in
>> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>
> 
> But as Christoffer said before, it's not good to call
> kvm_vgic_inject_irq directly in interrupt context. So if we just kick
> the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?

Possibly. I'm slightly worried that inject_irq itself is going to kick
the vcpu again for no good reason. I guess we'll find out (and maybe
we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).

Thanks,

	M.

Shannon Zhao Dec. 2, 2015, 9:49 a.m. UTC | #8

On 2015/12/2 16:45, Marc Zyngier wrote:
> On 02/12/15 02:40, Shannon Zhao wrote:
>> > 
>> > 
>> > On 2015/12/2 0:57, Marc Zyngier wrote:
>>> >> On 01/12/15 16:26, Shannon Zhao wrote:
>>>> >>>
>>>> >>>
>>>> >>> On 2015/12/1 23:41, Marc Zyngier wrote:
>>>>>> >>>>> The reason is that when guest clear the overflow register, it will trap
>>>>>>> >>>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>>>>>>> >>>>>> the overflow register is still overflowed(that is some bit is still 1).
>>>>>>> >>>>>> So We need to use some flag to mark we already inject this interrupt.
>>>>>>> >>>>>> And if during guest handling the overflow, there is a new overflow
>>>>>>> >>>>>> happening, the pmu->irq_pending will be set ture by
>>>>>>> >>>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>>>>> >>>> I don't think so. This is a level interrupt, so the level should stay
>>>>> >>>> high as long as the guest hasn't cleared all possible sources for that
>>>>> >>>> interrupt.
>>>>> >>>>
>>>>> >>>> For your example, the guest writes to PMOVSCLR to clear the overflow
>>>>> >>>> caused by a given counter. If the status is now 0, the interrupt line
>>>>> >>>> drops. If the status is still non zero, the line stays high. And I
>>>>> >>>> believe that writing a 1 to PMOVSSET would actually trigger an
>>>>> >>>> interrupt, or keep it high if it has already high.
>>>>> >>>>
>>>> >>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>>> >>>
>>>>> >>>> In essence, do not try to maintain side state. I've been bitten.
>>>> >>>
>>>> >>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>>>> >>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>>>> >>> On VM exit, it seems there is nothing to do.
>>> >>
>>> >> It is even simpler than that:
>>> >>
>>> >> - When you get an overflow, you inject an interrupt with the level set to 1.
>>> >> - When the overflow register gets cleared, you inject the same interrupt
>>> >> with the level set to 0.
>>> >>
>>> >> I don't think you need to do anything else, and the world switch should
>>> >> be left untouched.
>>> >>
>> > 
>> > On 2015/7/17 23:28, Christoffer Dall wrote:>> > +		
>> > kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>>>>> >>>> +					    pmu->irq_num, 1);
>>> >> what context is this overflow handler function?  kvm_vgic_inject_irq
>>> >> grabs a mutex, so it can sleep...
>>> >>
>>> >> from a quick glance at the perf core code, it looks like this is in
>>> >> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>> >>
>> > 
>> > But as Christoffer said before, it's not good to call
>> > kvm_vgic_inject_irq directly in interrupt context. So if we just kick
>> > the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
> Possibly. I'm slightly worried that inject_irq itself is going to kick
> the vcpu again for no good reason. 
Yes, this will introduce a extra kick. What's the impact of kicking a
kicked vcpu?

> I guess we'll find out (and maybe
> we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).
And add a parameter "bool kick" for vgic_update_irq_pending ?

Marc Zyngier Dec. 2, 2015, 10:22 a.m. UTC | #9

On 02/12/15 09:49, Shannon Zhao wrote:
> 
> 
> On 2015/12/2 16:45, Marc Zyngier wrote:
>> On 02/12/15 02:40, Shannon Zhao wrote:
>>>>
>>>>
>>>> On 2015/12/2 0:57, Marc Zyngier wrote:
>>>>>> On 01/12/15 16:26, Shannon Zhao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2015/12/1 23:41, Marc Zyngier wrote:
>>>>>>>>>>>> The reason is that when guest clear the overflow register, it will trap
>>>>>>>>>>>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>>>>>>>>>>>>>> the overflow register is still overflowed(that is some bit is still 1).
>>>>>>>>>>>>>> So We need to use some flag to mark we already inject this interrupt.
>>>>>>>>>>>>>> And if during guest handling the overflow, there is a new overflow
>>>>>>>>>>>>>> happening, the pmu->irq_pending will be set ture by
>>>>>>>>>>>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>>>>>>>>>> I don't think so. This is a level interrupt, so the level should stay
>>>>>>>>>> high as long as the guest hasn't cleared all possible sources for that
>>>>>>>>>> interrupt.
>>>>>>>>>>
>>>>>>>>>> For your example, the guest writes to PMOVSCLR to clear the overflow
>>>>>>>>>> caused by a given counter. If the status is now 0, the interrupt line
>>>>>>>>>> drops. If the status is still non zero, the line stays high. And I
>>>>>>>>>> believe that writing a 1 to PMOVSSET would actually trigger an
>>>>>>>>>> interrupt, or keep it high if it has already high.
>>>>>>>>>>
>>>>>>>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>>>>>>>
>>>>>>>>>> In essence, do not try to maintain side state. I've been bitten.
>>>>>>>>
>>>>>>>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>>>>>>>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>>>>>>>> On VM exit, it seems there is nothing to do.
>>>>>>
>>>>>> It is even simpler than that:
>>>>>>
>>>>>> - When you get an overflow, you inject an interrupt with the level set to 1.
>>>>>> - When the overflow register gets cleared, you inject the same interrupt
>>>>>> with the level set to 0.
>>>>>>
>>>>>> I don't think you need to do anything else, and the world switch should
>>>>>> be left untouched.
>>>>>>
>>>>
>>>> On 2015/7/17 23:28, Christoffer Dall wrote:>> > +		
>>>> kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>>>>>>>>>> +					    pmu->irq_num, 1);
>>>>>> what context is this overflow handler function?  kvm_vgic_inject_irq
>>>>>> grabs a mutex, so it can sleep...
>>>>>>
>>>>>> from a quick glance at the perf core code, it looks like this is in
>>>>>> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>>>>>
>>>>
>>>> But as Christoffer said before, it's not good to call
>>>> kvm_vgic_inject_irq directly in interrupt context. So if we just kick
>>>> the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
>> Possibly. I'm slightly worried that inject_irq itself is going to kick
>> the vcpu again for no good reason. 
> Yes, this will introduce a extra kick. What's the impact of kicking a
> kicked vcpu?

As long as you only kick yourself, it shouldn't be much (trying to
decipher vcpu_kick).

>> I guess we'll find out (and maybe
>> we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).
> And add a parameter "bool kick" for vgic_update_irq_pending ?

Given that we're completely rewriting the thing, I'd rather not add more
hacks to it if we can avoid it.

Give it a go, and we'll find out!

Thanks,

	M.

Christoffer Dall Dec. 2, 2015, 4:27 p.m. UTC | #10

On Wed, Dec 02, 2015 at 10:22:04AM +0000, Marc Zyngier wrote:
> On 02/12/15 09:49, Shannon Zhao wrote:
> > 
> > 
> > On 2015/12/2 16:45, Marc Zyngier wrote:
> >> On 02/12/15 02:40, Shannon Zhao wrote:
> >>>>
> >>>>
> >>>> On 2015/12/2 0:57, Marc Zyngier wrote:
> >>>>>> On 01/12/15 16:26, Shannon Zhao wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2015/12/1 23:41, Marc Zyngier wrote:
> >>>>>>>>>>>> The reason is that when guest clear the overflow register, it will trap
> >>>>>>>>>>>>>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
> >>>>>>>>>>>>>> the overflow register is still overflowed(that is some bit is still 1).
> >>>>>>>>>>>>>> So We need to use some flag to mark we already inject this interrupt.
> >>>>>>>>>>>>>> And if during guest handling the overflow, there is a new overflow
> >>>>>>>>>>>>>> happening, the pmu->irq_pending will be set ture by
> >>>>>>>>>>>>>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
> >>>>>>>>>> I don't think so. This is a level interrupt, so the level should stay
> >>>>>>>>>> high as long as the guest hasn't cleared all possible sources for that
> >>>>>>>>>> interrupt.
> >>>>>>>>>>
> >>>>>>>>>> For your example, the guest writes to PMOVSCLR to clear the overflow
> >>>>>>>>>> caused by a given counter. If the status is now 0, the interrupt line
> >>>>>>>>>> drops. If the status is still non zero, the line stays high. And I
> >>>>>>>>>> believe that writing a 1 to PMOVSSET would actually trigger an
> >>>>>>>>>> interrupt, or keep it high if it has already high.
> >>>>>>>>>>
> >>>>>>>> Right, writing 1 to PMOVSSET will trigger an interrupt.
> >>>>>>>>
> >>>>>>>>>> In essence, do not try to maintain side state. I've been bitten.
> >>>>>>>>
> >>>>>>>> So on VM entry, it check if PMOVSSET is zero. If not, call 
> >>>>>>>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
> >>>>>>>> On VM exit, it seems there is nothing to do.
> >>>>>>
> >>>>>> It is even simpler than that:
> >>>>>>
> >>>>>> - When you get an overflow, you inject an interrupt with the level set to 1.
> >>>>>> - When the overflow register gets cleared, you inject the same interrupt
> >>>>>> with the level set to 0.
> >>>>>>
> >>>>>> I don't think you need to do anything else, and the world switch should
> >>>>>> be left untouched.
> >>>>>>
> >>>>
> >>>> On 2015/7/17 23:28, Christoffer Dall wrote:>> > +		
> >>>> kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> >>>>>>>>>> +					    pmu->irq_num, 1);
> >>>>>> what context is this overflow handler function?  kvm_vgic_inject_irq
> >>>>>> grabs a mutex, so it can sleep...
> >>>>>>
> >>>>>> from a quick glance at the perf core code, it looks like this is in
> >>>>>> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
> >>>>>>
> >>>>
> >>>> But as Christoffer said before, it's not good to call
> >>>> kvm_vgic_inject_irq directly in interrupt context. So if we just kick
> >>>> the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
> >> Possibly. I'm slightly worried that inject_irq itself is going to kick
> >> the vcpu again for no good reason. 
> > Yes, this will introduce a extra kick. What's the impact of kicking a
> > kicked vcpu?
> 
> As long as you only kick yourself, it shouldn't be much (trying to
> decipher vcpu_kick).
> 

The behavior of vcpu_kick really depends on a number of things:

 - If you're kicking yourself, nothing happens.
 - If you're kicking a sleeping vcpu, wake it up
 - If you're kicking a running vcpu, send it a physical IPI
 - If the vcpu is not running, and not sleeping (so still runnable)
   don't do anything, just wait until it gets scheduled.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v4,18/21] KVM: ARM64: Add PMU overflow interrupt routing

Commit Message

Comments

Patch