diff mbox

[06/12] KVM: MMU: flush tlb if the spte can be locklessly modified

Message ID 1375189330-24066-7-git-send-email-xiaoguangrong@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong July 30, 2013, 1:02 p.m. UTC
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Gleb Natapov Aug. 28, 2013, 7:23 a.m. UTC | #1
On Tue, Jul 30, 2013 at 09:02:04PM +0800, Xiao Guangrong wrote:
> Relax the tlb flush condition since we will write-protect the spte out of mmu
> lock. Note lockless write-protection only marks the writable spte to readonly
> and the spte can be writable only if both SPTE_HOST_WRITEABLE and
> SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)
> 
> This patch is used to avoid this kind of race:
> 
>       VCPU 0                         VCPU 1
> lockless wirte protection:
>       set spte.w = 0
>                                  lock mmu-lock
> 
>                                  write protection the spte to sync shadow page,
>                                  see spte.w = 0, then without flush tlb
> 
> 				 unlock mmu-lock
> 
>                                  !!! At this point, the shadow page can still be
>                                      writable due to the corrupt tlb entry
>      Flush all TLB
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 58283bf..5a40564 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -600,7 +600,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
>  	 * we always atomicly update it, see the comments in
>  	 * spte_has_volatile_bits().
>  	 */
> -	if (is_writable_pte(old_spte) && !is_writable_pte(new_spte))
> +	if (spte_is_locklessly_modifiable(old_spte) &&
> +	      !is_writable_pte(new_spte))
>  		ret = true;
This will needlessly flush tlbs when dirty login is not in use (common
case) and old spte is non writable. Can you estimate how serious the
performance hit is?

>  
>  	if (!shadow_accessed_mask)
> -- 
> 1.8.1.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Aug. 28, 2013, 7:50 a.m. UTC | #2
On 08/28/2013 03:23 PM, Gleb Natapov wrote:
> On Tue, Jul 30, 2013 at 09:02:04PM +0800, Xiao Guangrong wrote:
>> Relax the tlb flush condition since we will write-protect the spte out of mmu
>> lock. Note lockless write-protection only marks the writable spte to readonly
>> and the spte can be writable only if both SPTE_HOST_WRITEABLE and
>> SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)
>>
>> This patch is used to avoid this kind of race:
>>
>>       VCPU 0                         VCPU 1
>> lockless wirte protection:
>>       set spte.w = 0
>>                                  lock mmu-lock
>>
>>                                  write protection the spte to sync shadow page,
>>                                  see spte.w = 0, then without flush tlb
>>
>> 				 unlock mmu-lock
>>
>>                                  !!! At this point, the shadow page can still be
>>                                      writable due to the corrupt tlb entry
>>      Flush all TLB
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 58283bf..5a40564 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -600,7 +600,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
>>  	 * we always atomicly update it, see the comments in
>>  	 * spte_has_volatile_bits().
>>  	 */
>> -	if (is_writable_pte(old_spte) && !is_writable_pte(new_spte))
>> +	if (spte_is_locklessly_modifiable(old_spte) &&
>> +	      !is_writable_pte(new_spte))
>>  		ret = true;
> This will needlessly flush tlbs when dirty login is not in use (common
> case) and old spte is non writable. Can you estimate how serious the
> performance hit is?

If non write-protection caused by dirty log, the spte is always writable
if SPTE_HOST_WRITEABLE and SPTE_MMU_WRITEABLE are set. In other words,
spte_is_locklessly_modifiable(old_spte) is the same as
is_writable_pte(old_spte) in the common case.

There are two cases causing unnecessary TLB flush that are
1) guest read faults on the spte write-protected by dirty log and uses a
   readonly host pfn to fix it.
   This is really rare since read access on the readonly can not trigger
   #PF.

2) guest requires write-protect caused by syncing shadow page.
   this is only needed if ept is disabled and in the most case, the guest
   has many sptes need to be write-protected. Unnecessary TLB flush is rare
   too.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58283bf..5a40564 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -600,7 +600,8 @@  static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	 * we always atomicly update it, see the comments in
 	 * spte_has_volatile_bits().
 	 */
-	if (is_writable_pte(old_spte) && !is_writable_pte(new_spte))
+	if (spte_is_locklessly_modifiable(old_spte) &&
+	      !is_writable_pte(new_spte))
 		ret = true;
 
 	if (!shadow_accessed_mask)