[v2,0/3] Fix several bugs in KVM stage 2 translation

Message ID	20201201201034.116760-1-wangyanan55@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=r2Hb=FF=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1118D20644 From: Yanan Wang <wangyanan55@huawei.com> To: <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, Marc Zyngier <maz@kernel.org>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, James Morse <james.morse@arm.com>, "Julien Thierry" <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Gavin Shan <gshan@redhat.com>, Quentin Perret <qperret@google.com> Subject: [PATCH v2 0/3] Fix several bugs in KVM stage 2 translation Date: Wed, 2 Dec 2020 04:10:31 +0800 Message-ID: <20201201201034.116760-1-wangyanan55@huawei.com> MIME-Version: 1.0 Precedence: list Cc: lushenming@huawei.com, jiangkunkun@huawei.com, Yanan Wang <wangyanan55@huawei.com>, yezengruan@huawei.com, wangjingyi11@huawei.com, yuzenghui@huawei.com, wanghaibin.wang@huawei.com, zhukeqian1@huawei.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	Fix several bugs in KVM stage 2 translation \| expand [v2,0/3] Fix several bugs in KVM stage 2 translation [v2,1/3] KVM: arm64: Fix possible memory leak in kvm stage2 [v2,2/3] KVM: arm64: Fix handling of merging tables into a block entry [v2,3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()

Yanan Wang Dec. 1, 2020, 8:10 p.m. UTC

When installing a new pte entry or updating an old valid entry in stage 2
translation, we use get_page()/put_page() to record page_count of the page-table
pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
which might make page-table pages unable to be freed when unmapping a range.

When dirty logging of a guest with hugepages is finished, we should merge tables
back into a block entry if adjustment of huge mapping is found necessary.
In addition to installing the block entry, we should not only free the non-huge
page-table pages but also invalidate all the TLB entries of non-huge mappings for
the block. PATCH 2/3 adds enough TLBI when merging tables into a block entry.

The rewrite of page-table code and fault handling add two different handlers
for "just relaxing permissions" and "map by stage2 page-table walk", that's
good improvement. Yet, in function user_mem_abort(), conditions where we choose
the above two fault handlers are not strictly distinguished. This will causes
guest errors such as infinite-loop (soft lockup will occur in result), because of
calling the inappropriate fault handler. So, a solution that can strictly
distinguish conditions is introduced in PATCH 3/3.

Changes from v1:
 * In PATCH 1/3, introduce a more concise fix.
 * In PATCH 2/3, using full S2 TLB invalidation when merging tables into
   a block entry.

Yanan Wang (3):
  KVM: arm64: Fix possible memory leak in kvm stage2
  KVM: arm64: Fix handling of merging tables into a block entry
  KVM: arm64: Add usage of stage 2 fault lookup level in
    user_mem_abort()

 arch/arm64/include/asm/esr.h         |  1 +
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 11 ++++++++++-
 arch/arm64/kvm/mmu.c                 | 11 +++++++++--
 4 files changed, 25 insertions(+), 3 deletions(-)

Will Deacon Dec. 1, 2020, 8:59 p.m. UTC | #1

On Wed, Dec 02, 2020 at 04:10:31AM +0800, Yanan Wang wrote:
> When installing a new pte entry or updating an old valid entry in stage 2
> translation, we use get_page()/put_page() to record page_count of the page-table
> pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
> which might make page-table pages unable to be freed when unmapping a range.
> 
> When dirty logging of a guest with hugepages is finished, we should merge tables
> back into a block entry if adjustment of huge mapping is found necessary.
> In addition to installing the block entry, we should not only free the non-huge
> page-table pages but also invalidate all the TLB entries of non-huge mappings for
> the block. PATCH 2/3 adds enough TLBI when merging tables into a block entry.
> 
> The rewrite of page-table code and fault handling add two different handlers
> for "just relaxing permissions" and "map by stage2 page-table walk", that's
> good improvement. Yet, in function user_mem_abort(), conditions where we choose
> the above two fault handlers are not strictly distinguished. This will causes
> guest errors such as infinite-loop (soft lockup will occur in result), because of
> calling the inappropriate fault handler. So, a solution that can strictly
> distinguish conditions is introduced in PATCH 3/3.

For the series:

Acked-by: Will Deacon <will@kernel.org>

Thanks for reporting these, helping me to understand the issues and then
spinning a v2 so promptly.

Will

Yanan Wang Dec. 2, 2020, noon UTC | #2

Hi Will, Marc,
On 2020/12/2 4:59, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 04:10:31AM +0800, Yanan Wang wrote:
>> When installing a new pte entry or updating an old valid entry in stage 2
>> translation, we use get_page()/put_page() to record page_count of the page-table
>> pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
>> which might make page-table pages unable to be freed when unmapping a range.
>>
>> When dirty logging of a guest with hugepages is finished, we should merge tables
>> back into a block entry if adjustment of huge mapping is found necessary.
>> In addition to installing the block entry, we should not only free the non-huge
>> page-table pages but also invalidate all the TLB entries of non-huge mappings for
>> the block. PATCH 2/3 adds enough TLBI when merging tables into a block entry.
>>
>> The rewrite of page-table code and fault handling add two different handlers
>> for "just relaxing permissions" and "map by stage2 page-table walk", that's
>> good improvement. Yet, in function user_mem_abort(), conditions where we choose
>> the above two fault handlers are not strictly distinguished. This will causes
>> guest errors such as infinite-loop (soft lockup will occur in result), because of
>> calling the inappropriate fault handler. So, a solution that can strictly
>> distinguish conditions is introduced in PATCH 3/3.
> For the series:
>
> Acked-by: Will Deacon <will@kernel.org>
>
> Thanks for reporting these, helping me to understand the issues and then
> spinning a v2 so promptly.
>
> Will
> .

Thanks for the help and suggestions.

BTW: there are two more things below that I want to talk about.

1.  Recently, I have been focusing on the ARMv8.4-TTRem feature which is 
aimed at changing block size in stage 2 mapping.

I have a plan to implement this feature for stage 2 translation when 
splitting a block into tables or merging tables into a block.

This feature supports changing block size without performing 
*break-before-make*, which might have some improvement on performance.

What do you think about this?

2. Given that the issues we discussed before were found in practice when 
guest state changes from dirty logging to dirty logging canceled.

I could add a test file testing on this case to selftests/ or kvm unit 
tests/, if it's necessary.

Yanan

Marc Zyngier Dec. 2, 2020, 12:23 p.m. UTC | #3

Hi Yanan,

[...]

> BTW: there are two more things below that I want to talk about.
> 
> 1.  Recently, I have been focusing on the ARMv8.4-TTRem feature which
> is aimed at changing block size in stage 2 mapping.
> 
> I have a plan to implement this feature for stage 2 translation when
> splitting a block into tables or merging tables into a block.
> 
> This feature supports changing block size without performing
> *break-before-make*, which might have some improvement on performance.
> 
> What do you think about this?

It would be interesting if you can demonstrate some significant
performance improvements compared to the same workload with BBM.

I'm not completely convinced this would change much, given that
it is only when moving from a table to a block mapping that you
can elide BBM when the support level is 1 or 2. As far as I can
tell, this only happens in the "stop logging" case.

Is that something that happens often enough to justify the added
complexity? Having to handle TLB Conflict Abort is annoying, for
example.

> 2. Given that the issues we discussed before were found in practice
> when guest state changes from dirty logging to dirty logging canceled.
> 
> I could add a test file testing on this case to selftests/ or kvm unit
> tests/, if it's necessary.

That would be awesome, and I'd be very grateful if you did. It is the
second time we break this exact case, and having a reliable way to
verify it would definitely help.

Thanks,

         M.

Marc Zyngier Dec. 2, 2020, 12:24 p.m. UTC | #4

On Wed, 2 Dec 2020 04:10:31 +0800, Yanan Wang wrote:
> When installing a new pte entry or updating an old valid entry in stage 2
> translation, we use get_page()/put_page() to record page_count of the page-table
> pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
> which might make page-table pages unable to be freed when unmapping a range.
> 
> When dirty logging of a guest with hugepages is finished, we should merge tables
> back into a block entry if adjustment of huge mapping is found necessary.
> In addition to installing the block entry, we should not only free the non-huge
> page-table pages but also invalidate all the TLB entries of non-huge mappings for
> the block. PATCH 2/3 adds enough TLBI when merging tables into a block entry.
> 
> [...]

Applied to kvm-arm64/fixes-5.10, thanks!

[1/3] KVM: arm64: Fix memory leak on stage2 update of a valid PTE
      commit: 5c646b7e1d8bcb12317426287c516dfa4c5171c2
[2/3] KVM: arm64: Fix handling of merging tables into a block entry
      commit: 3a0b870e3448302ca2ba703bea1b79b61c3f33c6
[3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
      commit: 7d894834a305568a0168c55d4729216f5f8cb4e6

Cheers,

	M.

Yanan Wang Dec. 2, 2020, 12:50 p.m. UTC | #5

On 2020/12/2 20:23, Marc Zyngier wrote:
> Hi Yanan,
>
> [...]
>
>> BTW: there are two more things below that I want to talk about.
>>
>> 1.  Recently, I have been focusing on the ARMv8.4-TTRem feature which
>> is aimed at changing block size in stage 2 mapping.
>>
>> I have a plan to implement this feature for stage 2 translation when
>> splitting a block into tables or merging tables into a block.
>>
>> This feature supports changing block size without performing
>> *break-before-make*, which might have some improvement on performance.
>>
>> What do you think about this?
>
> It would be interesting if you can demonstrate some significant
> performance improvements compared to the same workload with BBM.
>
> I'm not completely convinced this would change much, given that
> it is only when moving from a table to a block mapping that you
> can elide BBM when the support level is 1 or 2. As far as I can
> tell, this only happens in the "stop logging" case.
>
> Is that something that happens often enough to justify the added
> complexity? Having to handle TLB Conflict Abort is annoying, for
> example.

I will take more consideration about the necessity  and maybe some tests

on the performance will be made later.


Thanks,


Yanan

>
>> 2. Given that the issues we discussed before were found in practice
>> when guest state changes from dirty logging to dirty logging canceled.
>>
>> I could add a test file testing on this case to selftests/ or kvm unit
>> tests/, if it's necessary.
>
> That would be awesome, and I'd be very grateful if you did. It is the
> second time we break this exact case, and having a reliable way to
> verify it would definitely help.
>
> Thanks,
>
>         M.

[v2,0/3] Fix several bugs in KVM stage 2 translation

Message

Comments