diff mbox

ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.

Message ID 1376480832-18705-1-git-send-email-pranavkumar@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

PranavkumarSawargaonkar Aug. 14, 2013, 11:47 a.m. UTC
Systems with large external L3-cache (few MBs), might have dirty
content belonging to the guest page in L3-cache. To tackle this,
we need to flush such dirty content from d-cache so that guest
will see correct contents of guest page when guest MMU is disabled.

The patch fixes coherent_icache_guest_page() for external L3-cache.

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
---
 arch/arm64/include/asm/kvm_mmu.h |    2 ++
 1 file changed, 2 insertions(+)

Comments

Marc Zyngier Aug. 14, 2013, 12:04 p.m. UTC | #1
Hi Pranav,

On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
>
> The patch fixes coherent_icache_guest_page() for external L3-cache.
>
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void
> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */
> +		__flush_dcache_area((void *) hva, PAGE_SIZE);
>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();

[adding Will to the discussion as we talked about this in the past]

That's definitely an issue, but I'm not sure the fix is to hit the data 
cache on each page mapping. It looks overkill.

Wouldn't it be enough to let userspace do the cache cleaning? kvmtools 
knows which bits of the guest memory have been touched, and can do a "DC 
DVAC" on this region.

The alternative is do it in the kernel before running any vcpu - but 
that's not very nice either (have to clean the whole of the guest 
memory, which makes a full dcache clean more appealing).

         M.
Anup Patel Aug. 14, 2013, 2:22 p.m. UTC | #2
On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Pranav,
>
> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>> Systems with large external L3-cache (few MBs), might have dirty
>> content belonging to the guest page in L3-cache. To tackle this,
>> we need to flush such dirty content from d-cache so that guest
>> will see correct contents of guest page when guest MMU is disabled.
>>
>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index efe609c..5129038 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -123,6 +123,8 @@ static inline void
>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>       if (!icache_is_aliasing()) {            /* PIPT */
>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> +             /* Flush d-cache for systems with external caches. */
>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>               /* any kind of VIPT cache */
>>               __flush_icache_all();
>
> [adding Will to the discussion as we talked about this in the past]
>
> That's definitely an issue, but I'm not sure the fix is to hit the data
> cache on each page mapping. It looks overkill.
>
> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
> knows which bits of the guest memory have been touched, and can do a "DC
> DVAC" on this region.

It seems a bit unnatural to have cache cleaning is user-space. I am sure
other architectures don't have such cache cleaning in user-space for KVM.

>
> The alternative is do it in the kernel before running any vcpu - but
> that's not very nice either (have to clean the whole of the guest
> memory, which makes a full dcache clean more appealing).

Actually, cleaning full d-cache by set/way upon first run of VCPU was
our second option but current approach seemed very simple hence
we went for this.

If more people vote for full d-cache clean upon first run of VCPU then
we should revise this patch.

>
>          M.
> --
> Fast, cheap, reliable. Pick two.
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

--Anup
Sudeep KarkadaNagesha Aug. 14, 2013, 2:23 p.m. UTC | #3
On 14/08/13 12:47, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
> 
> The patch fixes coherent_icache_guest_page() for external L3-cache.
> 

I don't understand KVM but the commit message caught my attention.

You are referring to clean the external/outer L3 cache but using
internal cache maintenance APIs. You should be using outer_cache APIs if
you really intend to do outer cache maintenance ?
Is the external/outer L3 cache not unified ?

Or did you mean to flush d-cache to the point of coherency(external L3
in your case)? If so it's not clear from the commit log.

Regards,
Sudeep

> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */
> +		__flush_dcache_area((void *) hva, PAGE_SIZE);
>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
>
Anup Patel Aug. 14, 2013, 2:35 p.m. UTC | #4
On Wed, Aug 14, 2013 at 7:53 PM, Sudeep KarkadaNagesha
<Sudeep.KarkadaNagesha@arm.com> wrote:
> On 14/08/13 12:47, Pranavkumar Sawargaonkar wrote:
>> Systems with large external L3-cache (few MBs), might have dirty
>> content belonging to the guest page in L3-cache. To tackle this,
>> we need to flush such dirty content from d-cache so that guest
>> will see correct contents of guest page when guest MMU is disabled.
>>
>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>
>
> I don't understand KVM but the commit message caught my attention.
>
> You are referring to clean the external/outer L3 cache but using
> internal cache maintenance APIs. You should be using outer_cache APIs if
> you really intend to do outer cache maintenance ?
> Is the external/outer L3 cache not unified ?
>
> Or did you mean to flush d-cache to the point of coherency(external L3
> in your case)? If so it's not clear from the commit log.

Yes, we mean to flush d-cache to the point of coherency (which in our
case is external L3-cache).

The patch description is more in-terms of KVM virtualization but, we
can certainly add more details.

>
> Regards,
> Sudeep
>
>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index efe609c..5129038 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>       if (!icache_is_aliasing()) {            /* PIPT */
>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>               flush_icache_range(hva, hva + PAGE_SIZE);
>> +             /* Flush d-cache for systems with external caches. */
>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>               /* any kind of VIPT cache */
>>               __flush_icache_all();
>>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Regards,
Anup
Alexander Graf Aug. 14, 2013, 3:06 p.m. UTC | #5
On 14.08.2013, at 16:22, Anup Patel wrote:

> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> Hi Pranav,
>> 
>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> Systems with large external L3-cache (few MBs), might have dirty
>>> content belonging to the guest page in L3-cache. To tackle this,
>>> we need to flush such dirty content from d-cache so that guest
>>> will see correct contents of guest page when guest MMU is disabled.
>>> 
>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>> 
>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> ---
>>> arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>> 1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> index efe609c..5129038 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -123,6 +123,8 @@ static inline void
>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>      if (!icache_is_aliasing()) {            /* PIPT */
>>>              unsigned long hva = gfn_to_hva(kvm, gfn);
>>>              flush_icache_range(hva, hva + PAGE_SIZE);
>>> +             /* Flush d-cache for systems with external caches. */
>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>      } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>              /* any kind of VIPT cache */
>>>              __flush_icache_all();
>> 
>> [adding Will to the discussion as we talked about this in the past]
>> 
>> That's definitely an issue, but I'm not sure the fix is to hit the data
>> cache on each page mapping. It looks overkill.
>> 
>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>> knows which bits of the guest memory have been touched, and can do a "DC
>> DVAC" on this region.
> 
> It seems a bit unnatural to have cache cleaning is user-space. I am sure
> other architectures don't have such cache cleaning in user-space for KVM.

Not sure I understand at which point you really need to make things coherent here. When you assign a new ASID because there could be stale cache entries left from an old ASID run? Can't you just flush all caches when you overrun your ASID allocator?

> 
>> 
>> The alternative is do it in the kernel before running any vcpu - but
>> that's not very nice either (have to clean the whole of the guest
>> memory, which makes a full dcache clean more appealing).
> 
> Actually, cleaning full d-cache by set/way upon first run of VCPU was
> our second option but current approach seemed very simple hence
> we went for this.
> 
> If more people vote for full d-cache clean upon first run of VCPU then
> we should revise this patch.

Is there any difference in performance between the 2 approaches?


Alex
Marc Zyngier Aug. 14, 2013, 3:23 p.m. UTC | #6
On 2013-08-14 15:22, Anup Patel wrote:
> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> 
> wrote:
>> Hi Pranav,
>>
>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>> Systems with large external L3-cache (few MBs), might have dirty
>>> content belonging to the guest page in L3-cache. To tackle this,
>>> we need to flush such dirty content from d-cache so that guest
>>> will see correct contents of guest page when guest MMU is disabled.
>>>
>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>
>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>> b/arch/arm64/include/asm/kvm_mmu.h
>>> index efe609c..5129038 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -123,6 +123,8 @@ static inline void
>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>> +             /* Flush d-cache for systems with external caches. */
>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>> VIVT */
>>>               /* any kind of VIPT cache */
>>>               __flush_icache_all();
>>
>> [adding Will to the discussion as we talked about this in the past]
>>
>> That's definitely an issue, but I'm not sure the fix is to hit the 
>> data
>> cache on each page mapping. It looks overkill.
>>
>> Wouldn't it be enough to let userspace do the cache cleaning? 
>> kvmtools
>> knows which bits of the guest memory have been touched, and can do a 
>> "DC
>> DVAC" on this region.
>
> It seems a bit unnatural to have cache cleaning is user-space. I am 
> sure
> other architectures don't have such cache cleaning in user-space for 
> KVM.

Well, we have it on AArch64. Why would we blindly nuke the whole cache 
if we can do the right thing, efficiently, on the right range?

>> The alternative is do it in the kernel before running any vcpu - but
>> that's not very nice either (have to clean the whole of the guest
>> memory, which makes a full dcache clean more appealing).
>
> Actually, cleaning full d-cache by set/way upon first run of VCPU was
> our second option but current approach seemed very simple hence
> we went for this.
>
> If more people vote for full d-cache clean upon first run of VCPU 
> then
> we should revise this patch.

That sounds a lot better than randomly flushing the dcache for no good 
reason. Most of the time, your MMU will be ON, and only the initial 
text/data loaded from userspace is at risk.

But the userspace version sounds definitely better, and I'd like you to 
evaluate it before going the kernel way.

Thanks,

         M.
Marc Zyngier Aug. 14, 2013, 3:34 p.m. UTC | #7
On 2013-08-14 16:06, Alexander Graf wrote:
> On 14.08.2013, at 16:22, Anup Patel wrote:
>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com> 
>> wrote:
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is 
>>>> disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external 
>>>> L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>> arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>      if (!icache_is_aliasing()) {            /* PIPT */
>>>>              unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>              flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. 
>>>> */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>      } else if (!icache_is_aivivt()) {       /* non ASID-tagged 
>>>> VIVT */
>>>>              /* any kind of VIPT cache */
>>>>              __flush_icache_all();
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the 
>>> data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? 
>>> kvmtools
>>> knows which bits of the guest memory have been touched, and can do 
>>> a "DC
>>> DVAC" on this region.
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am 
>> sure
>> other architectures don't have such cache cleaning in user-space for 
>> KVM.
>
> Not sure I understand at which point you really need to make things
> coherent here. When you assign a new ASID because there could be 
> stale
> cache entries left from an old ASID run? Can't you just flush all
> caches when you overrun your ASID allocator?

First, a bit of terminology:
- ASIDs are used for userspace. KVM doesn't deal with those.
- VMIDs are used for VMs. That's what KVM deals with.

The issue here is that when the MMU is disabled, the access goes 
straight to RAM, bypassing the caches altogether (OK not completely 
true, but for the sake of the argument, that's close enough).

When userspace loads the kernel into memory, the kernel is not flushed 
to RAM, and may sit in the L3 cache if the cache is big enough. You 
end-up executing garbage... My proposed fix is to let kvmtool do the 
flushing, as we have userspace cache management operations for this 
exact purpose.

>>
>>>
>>> The alternative is do it in the kernel before running any vcpu - 
>>> but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU 
>> was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU 
>> then
>> we should revise this patch.
>
> Is there any difference in performance between the 2 approaches?

dcache clean on each and every page being mapped in stage-2, versus a 
one-time penalty. Should be significant indeed.

But again, I'd like to see figures. What is the impact of each of the 3 
methods for various memory and cache sizes?

         M.
Peter Maydell Aug. 14, 2013, 3:35 p.m. UTC | #8
On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-14 15:22, Anup Patel wrote:
>> It seems a bit unnatural to have cache cleaning is user-space. I am
>> sure
>> other architectures don't have such cache cleaning in user-space for
>> KVM.
>
> Well, we have it on AArch64. Why would we blindly nuke the whole cache
> if we can do the right thing, efficiently, on the right range?

When exactly would userspace have to care about the cache?
This patch isn't exactly clear about the circumstances. I
think you'd need a really strong reason for not dealing with
this in the kernel -- in general userspace KVM tools don't
otherwise have to deal with cache maintenance at all.

-- PMM
Anup Patel Aug. 14, 2013, 3:36 p.m. UTC | #9
On Wed, Aug 14, 2013 at 8:53 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 2013-08-14 15:22, Anup Patel wrote:
>>
>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier <marc.zyngier@arm.com>
>> wrote:
>>>
>>> Hi Pranav,
>>>
>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
>>>>
>>>> Systems with large external L3-cache (few MBs), might have dirty
>>>> content belonging to the guest page in L3-cache. To tackle this,
>>>> we need to flush such dirty content from d-cache so that guest
>>>> will see correct contents of guest page when guest MMU is disabled.
>>>>
>>>> The patch fixes coherent_icache_guest_page() for external L3-cache.
>>>>
>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>>>> b/arch/arm64/include/asm/kvm_mmu.h
>>>> index efe609c..5129038 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -123,6 +123,8 @@ static inline void
>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>>>>       if (!icache_is_aliasing()) {            /* PIPT */
>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
>>>> +             /* Flush d-cache for systems with external caches. */
>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
>>>>       } else if (!icache_is_aivivt()) {       /* non ASID-tagged VIVT */
>>>>               /* any kind of VIPT cache */
>>>>               __flush_icache_all();
>>>
>>>
>>> [adding Will to the discussion as we talked about this in the past]
>>>
>>> That's definitely an issue, but I'm not sure the fix is to hit the data
>>> cache on each page mapping. It looks overkill.
>>>
>>> Wouldn't it be enough to let userspace do the cache cleaning? kvmtools
>>> knows which bits of the guest memory have been touched, and can do a "DC
>>> DVAC" on this region.
>>
>>
>> It seems a bit unnatural to have cache cleaning is user-space. I am sure
>> other architectures don't have such cache cleaning in user-space for KVM.
>
>
> Well, we have it on AArch64. Why would we blindly nuke the whole cache if we
> can do the right thing, efficiently, on the right range?

Flushing from user-space would be better only if target virtual address
range is small. If user-space loads several images at different locations
or large large images in guest RAM then flushing entire d-cache by
set/way would be more efficient.

>
>
>>> The alternative is do it in the kernel before running any vcpu - but
>>> that's not very nice either (have to clean the whole of the guest
>>> memory, which makes a full dcache clean more appealing).
>>
>>
>> Actually, cleaning full d-cache by set/way upon first run of VCPU was
>> our second option but current approach seemed very simple hence
>> we went for this.
>>
>> If more people vote for full d-cache clean upon first run of VCPU then
>> we should revise this patch.
>
>
> That sounds a lot better than randomly flushing the dcache for no good
> reason. Most of the time, your MMU will be ON, and only the initial
> text/data loaded from userspace is at risk.
>
> But the userspace version sounds definitely better, and I'd like you to
> evaluate it before going the kernel way.

Considering the penalty, I like the approach of flushing d-cache by
set/way upon first run of VCPU because its just one time penalty.

>
> Thanks,
>
>
>         M.
> --
> Fast, cheap, reliable. Pick two.

--Anup
Peter Maydell Aug. 14, 2013, 3:41 p.m. UTC | #10
On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
> When userspace loads the kernel into memory, the kernel is not flushed
> to RAM, and may sit in the L3 cache if the cache is big enough. You
> end-up executing garbage... My proposed fix is to let kvmtool do the
> flushing, as we have userspace cache management operations for this
> exact purpose.

Why does this issue only apply to the loaded kernel and not to
the zero bytes in the rest of RAM? I know executing zeroes
isn't a very useful thing to do but it should be a well defined
thing.

-- PMM
Marc Zyngier Aug. 14, 2013, 3:49 p.m. UTC | #11
On 2013-08-14 16:35, Peter Maydell wrote:
> On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> On 2013-08-14 15:22, Anup Patel wrote:
>>> It seems a bit unnatural to have cache cleaning is user-space. I am
>>> sure
>>> other architectures don't have such cache cleaning in user-space 
>>> for
>>> KVM.
>>
>> Well, we have it on AArch64. Why would we blindly nuke the whole 
>> cache
>> if we can do the right thing, efficiently, on the right range?
>
> When exactly would userspace have to care about the cache?

Only for the initial payload, I'd expect. Unless I've missed something 
more crucial?

> This patch isn't exactly clear about the circumstances. I
> think you'd need a really strong reason for not dealing with
> this in the kernel -- in general userspace KVM tools don't
> otherwise have to deal with cache maintenance at all.

I believe we *could* do it in the kernel, just at the expense of a lot 
more CPU cycles.

A possible alternative would be to use HCR.DC, but I need to have a 
look and see what it breaks...

         M.
Marc Zyngier Aug. 14, 2013, 3:57 p.m. UTC | #12
On 2013-08-14 16:41, Peter Maydell wrote:
> On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
>> When userspace loads the kernel into memory, the kernel is not 
>> flushed
>> to RAM, and may sit in the L3 cache if the cache is big enough. You
>> end-up executing garbage... My proposed fix is to let kvmtool do the
>> flushing, as we have userspace cache management operations for this
>> exact purpose.
>
> Why does this issue only apply to the loaded kernel and not to
> the zero bytes in the rest of RAM? I know executing zeroes
> isn't a very useful thing to do but it should be a well defined
> thing.

Good point, and not quite sure just yet. Probably we get a zeroed, 
clean page?

Anup, can you elaborate on how your L3 cache behaves?

Thanks,

         M.
Anup Patel Aug. 14, 2013, 4:36 p.m. UTC | #13
On Wed, Aug 14, 2013 at 9:27 PM, Marc Zyngier <maz@misterjones.org> wrote:
> On 2013-08-14 16:41, Peter Maydell wrote:
>> On 14 August 2013 16:34, Marc Zyngier <maz@misterjones.org> wrote:
>>> When userspace loads the kernel into memory, the kernel is not
>>> flushed
>>> to RAM, and may sit in the L3 cache if the cache is big enough. You
>>> end-up executing garbage... My proposed fix is to let kvmtool do the
>>> flushing, as we have userspace cache management operations for this
>>> exact purpose.
>>
>> Why does this issue only apply to the loaded kernel and not to
>> the zero bytes in the rest of RAM? I know executing zeroes
>> isn't a very useful thing to do but it should be a well defined
>> thing.
>
> Good point, and not quite sure just yet. Probably we get a zeroed,
> clean page?

This would apply to zeroed pages too if some Guest OS expects
zeroed-out portions in Guest RAM.

>
> Anup, can you elaborate on how your L3 cache behaves?

The L3-cache is transparent to Aarch64 CPUs. It is only bypassed
when caching is disabled or when accessing non-cacheable pages.
Currently, we cannot disclose more details about line allocation and
replacement policy because APM X-Gene specs are not released.

The issue pointed out by this patch would also apply to L2-cache
(i.e. L3-cache disabled) but it usually does not manifest because
L2-cache is not too big (few hundred KBs) and so dirty lines don't
stay for long time in L2-cache.

>
> Thanks,
>
>          M.
> --
> Who you jivin' with that Cosmik Debris?
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

Regards,
Anup
Christoffer Dall Aug. 14, 2013, 5:34 p.m. UTC | #14
On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
> On 2013-08-14 16:35, Peter Maydell wrote:
> > On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> On 2013-08-14 15:22, Anup Patel wrote:
> >>> It seems a bit unnatural to have cache cleaning is user-space. I am
> >>> sure
> >>> other architectures don't have such cache cleaning in user-space 
> >>> for
> >>> KVM.
> >>
> >> Well, we have it on AArch64. Why would we blindly nuke the whole 
> >> cache
> >> if we can do the right thing, efficiently, on the right range?
> >
> > When exactly would userspace have to care about the cache?
> 
> Only for the initial payload, I'd expect. Unless I've missed something 
> more crucial?

What happens if the page is swapped out, is the kernel guaranteed to
flush it all the way to RAM when it swaps it back in, or would KVM have
to take care of that?

> 
> > This patch isn't exactly clear about the circumstances. I
> > think you'd need a really strong reason for not dealing with
> > this in the kernel -- in general userspace KVM tools don't
> > otherwise have to deal with cache maintenance at all.
> 
> I believe we *could* do it in the kernel, just at the expense of a lot 
> more CPU cycles.
> 
> A possible alternative would be to use HCR.DC, but I need to have a 
> look and see what it breaks...
> 
Could we flush the cache when we fault in pages only if the guest has
the MMU disabled and trap if the guest disabled the MMU and flush the
whole outer dcache at that point in time?

-Christoffer
Christoffer Dall Aug. 14, 2013, 5:37 p.m. UTC | #15
On Wed, Aug 14, 2013 at 05:17:12PM +0530, Pranavkumar Sawargaonkar wrote:
> Systems with large external L3-cache (few MBs), might have dirty
> content belonging to the guest page in L3-cache. To tackle this,
> we need to flush such dirty content from d-cache so that guest
> will see correct contents of guest page when guest MMU is disabled.
> 
> The patch fixes coherent_icache_guest_page() for external L3-cache.
> 
> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index efe609c..5129038 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -123,6 +123,8 @@ static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
>  	if (!icache_is_aliasing()) {		/* PIPT */
>  		unsigned long hva = gfn_to_hva(kvm, gfn);
>  		flush_icache_range(hva, hva + PAGE_SIZE);
> +		/* Flush d-cache for systems with external caches. */

This comment is nowhere near explanatory enough for someone who comes by
later and tries to figure out why we're flushing the dcache on every
page we swap in under given circumstances.  Yes, you can do git blame,
until you modify the line for some other reason and it just becomes a
pain.

> +		__flush_dcache_area((void *) hva, PAGE_SIZE);

eh, why is this only relevant for a non-aliasing icache?  In fact, this
does not seem to be icache related at all, but rather instruction-stream
to dcache related, which would warrant either a rename of the function
to something more generic or a separate function.

>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> -- 
> 1.7.9.5
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
Marc Zyngier Aug. 15, 2013, 4:44 a.m. UTC | #16
On 2013-08-14 18:34, Christoffer Dall wrote:
> On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
>> On 2013-08-14 16:35, Peter Maydell wrote:
>> > On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com> 
>> wrote:
>> >> On 2013-08-14 15:22, Anup Patel wrote:
>> >>> It seems a bit unnatural to have cache cleaning is user-space. I 
>> am
>> >>> sure
>> >>> other architectures don't have such cache cleaning in user-space
>> >>> for
>> >>> KVM.
>> >>
>> >> Well, we have it on AArch64. Why would we blindly nuke the whole
>> >> cache
>> >> if we can do the right thing, efficiently, on the right range?
>> >
>> > When exactly would userspace have to care about the cache?
>>
>> Only for the initial payload, I'd expect. Unless I've missed 
>> something
>> more crucial?
>
> What happens if the page is swapped out, is the kernel guaranteed to
> flush it all the way to RAM when it swaps it back in, or would KVM 
> have
> to take care of that?

I'd expect the kernel to deal with it, otherwise you'd hit the wrong 
data each time you swap in a page in any userspace process.

>>
>> > This patch isn't exactly clear about the circumstances. I
>> > think you'd need a really strong reason for not dealing with
>> > this in the kernel -- in general userspace KVM tools don't
>> > otherwise have to deal with cache maintenance at all.
>>
>> I believe we *could* do it in the kernel, just at the expense of a 
>> lot
>> more CPU cycles.
>>
>> A possible alternative would be to use HCR.DC, but I need to have a
>> look and see what it breaks...
>>
> Could we flush the cache when we fault in pages only if the guest has
> the MMU disabled and trap if the guest disabled the MMU and flush the
> whole outer dcache at that point in time?

We don't need to trap the disabling of the MMU. If the guest does that, 
it *must* have flushed its cache to RAM already. Otherwise, it is 
utterly broken, virtualization or not.

What the guest doesn't expect is the initial data to sit in the cache 
while it hasn't set the MMU on just yet. I may have a patch for that.

         M.
Christoffer Dall Aug. 15, 2013, 4:58 p.m. UTC | #17
On Thu, Aug 15, 2013 at 05:44:52AM +0100, Marc Zyngier wrote:
> On 2013-08-14 18:34, Christoffer Dall wrote:
> >On Wed, Aug 14, 2013 at 04:49:24PM +0100, Marc Zyngier wrote:
> >>On 2013-08-14 16:35, Peter Maydell wrote:
> >>> On 14 August 2013 16:23, Marc Zyngier <marc.zyngier@arm.com>
> >>wrote:
> >>>> On 2013-08-14 15:22, Anup Patel wrote:
> >>>>> It seems a bit unnatural to have cache cleaning is
> >>user-space. I am
> >>>>> sure
> >>>>> other architectures don't have such cache cleaning in user-space
> >>>>> for
> >>>>> KVM.
> >>>>
> >>>> Well, we have it on AArch64. Why would we blindly nuke the whole
> >>>> cache
> >>>> if we can do the right thing, efficiently, on the right range?
> >>>
> >>> When exactly would userspace have to care about the cache?
> >>
> >>Only for the initial payload, I'd expect. Unless I've missed
> >>something
> >>more crucial?
> >
> >What happens if the page is swapped out, is the kernel guaranteed to
> >flush it all the way to RAM when it swaps it back in, or would KVM
> >have
> >to take care of that?
> 
> I'd expect the kernel to deal with it, otherwise you'd hit the wrong
> data each time you swap in a page in any userspace process.
> 

Unless the kernel only does that for pages that are mapped executable,
which it cannot know for VMs.

Admittedly I haven't looked at the code recently.  Do you know?

> >>
> >>> This patch isn't exactly clear about the circumstances. I
> >>> think you'd need a really strong reason for not dealing with
> >>> this in the kernel -- in general userspace KVM tools don't
> >>> otherwise have to deal with cache maintenance at all.
> >>
> >>I believe we *could* do it in the kernel, just at the expense of
> >>a lot
> >>more CPU cycles.
> >>
> >>A possible alternative would be to use HCR.DC, but I need to have a
> >>look and see what it breaks...
> >>
> >Could we flush the cache when we fault in pages only if the guest has
> >the MMU disabled and trap if the guest disabled the MMU and flush the
> >whole outer dcache at that point in time?
> 
> We don't need to trap the disabling of the MMU. If the guest does
> that, it *must* have flushed its cache to RAM already. Otherwise, it
> is utterly broken, virtualization or not.
> 
> What the guest doesn't expect is the initial data to sit in the
> cache while it hasn't set the MMU on just yet. I may have a patch
> for that.
> 

Otherwise we can at least do this only when faulting in pages and the
MMU is off then, correct?

Is it completely inconceivable that user space starts the guest and when
some device is poked, it loads something else into memory that is going
to be executed after the VM has been started, but before turning on the
MMU?  If not, then doing a single flush before starting the VM won't cut
it.

-Christoffer
diff mbox

Patch

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index efe609c..5129038 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -123,6 +123,8 @@  static inline void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
 	if (!icache_is_aliasing()) {		/* PIPT */
 		unsigned long hva = gfn_to_hva(kvm, gfn);
 		flush_icache_range(hva, hva + PAGE_SIZE);
+		/* Flush d-cache for systems with external caches. */
+		__flush_dcache_area((void *) hva, PAGE_SIZE);
 	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
 		/* any kind of VIPT cache */
 		__flush_icache_all();