mbox series

[v1,00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

Message ID 20221206135930.3277585-1-ryan.roberts@arm.com (mailing list archive)
Headers show
Series KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 | expand

Message

Ryan Roberts Dec. 6, 2022, 1:59 p.m. UTC
(appologies, I'm resending this series as I managed to send the cover letter to
all but the following patches only to myself on first attempt).

This is my first upstream feature submission so please go easy ;-)

Add support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables. FEAT_LPA2 enables
52 bit PAs and VAs for 4KB and 16KB granules (note this is already supported for
64KB granules via the FEAT_LPA and FEAT_LVA extensions).

The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [1]. Although my
series does not rely on Ard's work and I'm posting the patches based on top of
v6.1-rc6, I have tested with and without Ard's changes and provide the results
below. The testing highlighted some pre-existing bugs in mainline and I have
submitted fixes for these separately at [2], [3], [4]. You can find the tree
without Ard's changes at [5] and a tree with Ard's changes at [6].

The series is broken up into 3 phases; update TLBI routines, support 52-bit
output addresses, and support 52 bit input addresses.

Update TLBI Routines: The update to the range-based TLBI instructions is
technically not needed for the KVM support because KVM only uses the
non-range-based TBLI instructions as far as I can see. But I've done both parts
because I thought it was sensible to do all the updates together - the
range-based stuff will be needed by Ard's patch I think. See commit message for
details.

Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
HW advertises support for LPA2 independently for stage 1 and stage 2, and
therefore its possible to have it for one and not the other. I've assumed that
there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
of this independence and the fact that the kvm pgtable library is used for both
stage 1 and stage 2 tables, this means the library now has to remember the
in-use format on a per-page-table basis. To do this, I had to rework some
functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
a noisy patch to add this parameter.

Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
'-1'. (Although stage 2 can use concatenated pages at the first level, and
therefore still only uses 4 levels, the kvm pgtable library deals with both
stage 1 and stage 2 tables). So there is another noisy patch to convert all
level variables to signed.

This is all tested on the FVP, using a test harness I put together, which does a
host + guest boot test for 180 configurations, built from all the (valid)
combinations of various FVP, host kernel and guest kernel parameters:

 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

I provide the results for the tree at [5] (which doesn't contain Ard's series),
where all tests pass except the 12 cases where a 4KB or 16KB host kernel is
attempting to boot in high memory - these are expected to fail due to the host
kernel not supporting LPA2. When running the tests against the tree at [6]
(which does contain Ard's series plus a few minor fixes), all tests pass.

(I'm omitting guest_page_size, guest_va and guest_pa below since it is always
64KB/52/52).

+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
| hw_pa | hw_va | kvm_mode  | host_page_size | host_va | host_pa | host_load_addr | guest_load_addr | pass  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+
|  48   |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  48   |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  48   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
|  lpa  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
|  lpa  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |    vhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |    vhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   |   nvhe    |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   |   nvhe    |      64k       |   52    |   52    |      high      |      high       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |       4k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |       low       | False |
| lpa2  |  52   | protected |      16k       |   52    |   52    |      high      |      high       | False |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   48    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   48    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      low       |      high       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |       low       | True  |
| lpa2  |  52   | protected |      64k       |   52    |   52    |      high      |      high       | True  |
+-------+-------+-----------+----------------+---------+---------+----------------+-----------------+-------+

[1] https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-ardb@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.roberts@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.roberts@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.roberts@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Plumbing to enable multiple pgtable formats
  KVM: arm64: Maintain page-table format info in struct kvm_pgtable
  KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it
  KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it
  KVM: arm64: Insert PS field at TCR_EL2 assembly time
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields
  KVM: arm64: Support upto 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems

 arch/arm64/include/asm/kvm_arm.h        |  79 +++---
 arch/arm64/include/asm/kvm_emulate.h    |  14 +-
 arch/arm64/include/asm/kvm_pgtable.h    | 131 +++++++--
 arch/arm64/include/asm/kvm_pkvm.h       |   5 +-
 arch/arm64/include/asm/pgtable-prot.h   |   6 +
 arch/arm64/include/asm/stage2_pgtable.h |  13 +-
 arch/arm64/include/asm/sysreg.h         |   5 +
 arch/arm64/include/asm/tlb.h            |  15 +-
 arch/arm64/include/asm/tlbflush.h       |  83 ++++--
 arch/arm64/kvm/arm.c                    |   5 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S      |   4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c         |  28 +-
 arch/arm64/kvm/hyp/pgtable.c            | 354 +++++++++++++++---------
 arch/arm64/kvm/mmu.c                    |  15 +-
 arch/arm64/kvm/reset.c                  |  11 +-
 16 files changed, 525 insertions(+), 264 deletions(-)

--
2.25.1

Comments

Oliver Upton Dec. 15, 2022, 12:52 a.m. UTC | #1
On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> (appologies, I'm resending this series as I managed to send the cover letter to
> all but the following patches only to myself on first attempt).
> 
> This is my first upstream feature submission so please go easy ;-)

Welcome :)

> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> therefore its possible to have it for one and not the other. I've assumed that
> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> of this independence and the fact that the kvm pgtable library is used for both
> stage 1 and stage 2 tables, this means the library now has to remember the
> in-use format on a per-page-table basis. To do this, I had to rework some
> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> a noisy patch to add this parameter.

Mismatch between the translation stages is an interesting problem...

Given that userspace is responsible for setting up the IPA space, I
can't really think of a strong use case for 52 bit IPAs with a 48 bit
VA. Sure, the VMM could construct a sparse IPA space or remap the same
HVA at multiple IPAs to artificially saturate the address space, but
neither seems terribly compelling.

Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
!LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
its guest.

Marc, is there any real reason for this or is it just a byproduct of how
LPA support was added to KVM?

> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
> '-1'. (Although stage 2 can use concatenated pages at the first level, and
> therefore still only uses 4 levels, the kvm pgtable library deals with both
> stage 1 and stage 2 tables). So there is another noisy patch to convert all
> level variables to signed.
> 
> This is all tested on the FVP, using a test harness I put together, which does a
> host + guest boot test for 180 configurations, built from all the (valid)
> combinations of various FVP, host kernel and guest kernel parameters:
> 
>  - hw_pa:		[48, lpa, lpa2]
>  - hw_va:		[48, 52]
>  - kvm_mode:		[vhe, nvhe, protected]
>  - host_page_size:	[4KB, 16KB, 64KB]
>  - host_pa:		[48, 52]
>  - host_va:		[48, 52]
>  - host_load_addr:	[low, high]
>  - guest_page_size:	[64KB]
>  - guest_pa:		[52]
>  - guest_va:		[52]
>  - guest_load_addr:	[low, high]

Wow, what a matrix!

In a later revision of this series it might be good to add support for
LPA2 guests in KVM selftests. We currently constrain the IPA size to
48bits on !64K kernels.

I'll have a deeper look at this series in the coming days.

--
Thanks,
Oliver
Ryan Roberts Dec. 15, 2022, 9:33 a.m. UTC | #2
On 15/12/2022 00:52, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>> (appologies, I'm resending this series as I managed to send the cover letter to
>> all but the following patches only to myself on first attempt).
>>
>> This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>> therefore its possible to have it for one and not the other. I've assumed that
>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>> of this independence and the fact that the kvm pgtable library is used for both
>> stage 1 and stage 2 tables, this means the library now has to remember the
>> in-use format on a per-page-table basis. To do this, I had to rework some
>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>> a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.

I guess a simpler approach would be to only use LPA2 if its supported by both
stage1 and stage2. Then the code could just use a static key in the few required
places. However, there is also a place where kvm_pgtable walks the user space s1
page table that is constructed by the kernel. For this to keep working, the
kernel would need to decide whether to use LPA2 based on the same criteria. But
it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for
your fuller review.

> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?
> 
>> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for
>> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called
>> '-1'. (Although stage 2 can use concatenated pages at the first level, and
>> therefore still only uses 4 levels, the kvm pgtable library deals with both
>> stage 1 and stage 2 tables). So there is another noisy patch to convert all
>> level variables to signed.
>>
>> This is all tested on the FVP, using a test harness I put together, which does a
>> host + guest boot test for 180 configurations, built from all the (valid)
>> combinations of various FVP, host kernel and guest kernel parameters:
>>
>>  - hw_pa:		[48, lpa, lpa2]
>>  - hw_va:		[48, 52]
>>  - kvm_mode:		[vhe, nvhe, protected]
>>  - host_page_size:	[4KB, 16KB, 64KB]
>>  - host_pa:		[48, 52]
>>  - host_va:		[48, 52]
>>  - host_load_addr:	[low, high]
>>  - guest_page_size:	[64KB]
>>  - guest_pa:		[52]
>>  - guest_va:		[52]
>>  - guest_load_addr:	[low, high]
> 
> Wow, what a matrix!
> 
> In a later revision of this series it might be good to add support for
> LPA2 guests in KVM selftests. We currently constrain the IPA size to
> 48bits on !64K kernels.

Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked
like they were hard-coded for 48-bit IPA and it looked like quite an effort to
rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed
something. I'll take another look and aim to get some tests implemented for a
future revision.
> 
> I'll have a deeper look at this series in the coming days.

Thanks!

> 
> --
> Thanks,
> Oliver
Marc Zyngier Dec. 15, 2022, 9:35 a.m. UTC | #3
On Thu, 15 Dec 2022 00:52:28 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > (appologies, I'm resending this series as I managed to send the cover letter to
> > all but the following patches only to myself on first attempt).
> > 
> > This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
> > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of
> > the PTEs. The HW advertises support for LPA2 independently for
> > stage 1 and stage 2, and therefore its possible to have it for one
> > and not the other. I've assumed that there is a valid case for
> > this if stage 1 is not supported but stage 2 is, KVM could still
> > then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > then be consumed by a 64KB page guest kernel with the help of
> > FEAT_LPA). Because of this independence and the fact that the kvm
> > pgtable library is used for both stage 1 and stage 2 tables, this
> > means the library now has to remember the in-use format on a
> > per-page-table basis. To do this, I had to rework some functions
> > to take a `struct kvm_pgtable *` parameter, and as a result, there
> > is a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.
> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?

My recollection is hazy, but LPA came first, and LVA only landed much
later (because the two features were made independent in the
architecture, something that was later abandoned for LPA2, which
implies large VAs as well).

So yes, the VMM can place memory wherever it wants in the 52bit IPA
space, even if its own VA space is limited to 48 bits. And it doesn't
have to be memory, by the way. You could place all the emulated MMIO
above the 48bit limit, for example, and that doesn't require any trick
other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly
configured.

Thanks,

	M.
Oliver Upton Dec. 15, 2022, 6:12 p.m. UTC | #4
On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> On 15/12/2022 00:52, Oliver Upton wrote:
> > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> >> (appologies, I'm resending this series as I managed to send the cover letter to
> >> all but the following patches only to myself on first attempt).
> >>
> >> This is my first upstream feature submission so please go easy ;-)
> > 
> > Welcome :)
> > 
> >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> >> therefore its possible to have it for one and not the other. I've assumed that
> >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> >> of this independence and the fact that the kvm pgtable library is used for both
> >> stage 1 and stage 2 tables, this means the library now has to remember the
> >> in-use format on a per-page-table basis. To do this, I had to rework some
> >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> >> a noisy patch to add this parameter.
> > 
> > Mismatch between the translation stages is an interesting problem...
> > 
> > Given that userspace is responsible for setting up the IPA space, I
> > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > HVA at multiple IPAs to artificially saturate the address space, but
> > neither seems terribly compelling.
> > 
> > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > its guest.
> 
> I guess a simpler approach would be to only use LPA2 if its supported by both
> stage1 and stage2. Then the code could just use a static key in the few required
> places.

Ah, you caught on quick to what I was thinking :-)

What I'm groaning about in particular is the changes to the TLB
invalidation path, as it feels like a static key is warranted there.
Nonetheless, it is all a bit of a mess depending on LPA2 support in both
the kernel and KVM.

--
Thanks,
Oliver
Oliver Upton Dec. 20, 2022, 6:28 p.m. UTC | #5
On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
> > On 15/12/2022 00:52, Oliver Upton wrote:
> > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
> > >> (appologies, I'm resending this series as I managed to send the cover letter to
> > >> all but the following patches only to myself on first attempt).
> > >>
> > >> This is my first upstream feature submission so please go easy ;-)
> > > 
> > > Welcome :)
> > > 
> > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
> > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and
> > >> therefore its possible to have it for one and not the other. I've assumed that
> > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
> > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
> > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
> > >> of this independence and the fact that the kvm pgtable library is used for both
> > >> stage 1 and stage 2 tables, this means the library now has to remember the
> > >> in-use format on a per-page-table basis. To do this, I had to rework some
> > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
> > >> a noisy patch to add this parameter.
> > > 
> > > Mismatch between the translation stages is an interesting problem...
> > > 
> > > Given that userspace is responsible for setting up the IPA space, I
> > > can't really think of a strong use case for 52 bit IPAs with a 48 bit
> > > VA. Sure, the VMM could construct a sparse IPA space or remap the same
> > > HVA at multiple IPAs to artificially saturate the address space, but
> > > neither seems terribly compelling.
> > > 
> > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> > > its guest.
> > 
> > I guess a simpler approach would be to only use LPA2 if its supported by both
> > stage1 and stage2. Then the code could just use a static key in the few required
> > places.
> 
> Ah, you caught on quick to what I was thinking :-)

Just wanted to revisit this...

Ryan, you say that it is possible for hardware to support LPA2 for a
single stage of translation. Are you basing that statement on something
in the Arm ARM or the fact that there are two different enumerations
for stage-1 and stage-2?

In my cursory search I wasn't able to find anything that would suggest
it is possible for only a single stage to implement the feature. The one
possibility I can think of is the NV case, where the L0 hypervisor for
some reason does not support LPA2 in its emulated stage-2 but still
advertises support for LPA2 at stage-1. I'd say that's quite a stupid
L0, but I should really hold my tongue until KVM actually does NV ;-)

I want to make sure there is a strong sense of what LPA2 means in terms
of the architecture to inform how we use it in KVM.

--
Thanks,
Oliver
Ryan Roberts Feb. 20, 2023, 2:17 p.m. UTC | #6
Hi Oliver,

Apologies for having gone quiet on this. I came back to this work today only to
notice that you sent the below response on the 20th Dec but it did not get
picked up by my mail client somehow (although I'm sure it was operator error). I
just spotted it on lore.kernel.org.

I'm planning to post a second version soon-ish, with all your comments
addressed. I think everything except the below is pretty clear and straight forward.


On 20/12/2022 18:28, Oliver Upton wrote:
> On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote:
>> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote:
>>> On 15/12/2022 00:52, Oliver Upton wrote:
>>>> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>>>>> (appologies, I'm resending this series as I managed to send the cover letter to
>>>>> all but the following patches only to myself on first attempt).
>>>>>
>>>>> This is my first upstream feature submission so please go easy ;-)
>>>>
>>>> Welcome :)
>>>>
>>>>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The
>>>>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>>>>> therefore its possible to have it for one and not the other. I've assumed that
>>>>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM
>>>>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could
>>>>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because
>>>>> of this independence and the fact that the kvm pgtable library is used for both
>>>>> stage 1 and stage 2 tables, this means the library now has to remember the
>>>>> in-use format on a per-page-table basis. To do this, I had to rework some
>>>>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is
>>>>> a noisy patch to add this parameter.
>>>>
>>>> Mismatch between the translation stages is an interesting problem...
>>>>
>>>> Given that userspace is responsible for setting up the IPA space, I
>>>> can't really think of a strong use case for 52 bit IPAs with a 48 bit
>>>> VA. Sure, the VMM could construct a sparse IPA space or remap the same
>>>> HVA at multiple IPAs to artificially saturate the address space, but
>>>> neither seems terribly compelling.
>>>>
>>>> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
>>>> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
>>>> its guest.
>>>
>>> I guess a simpler approach would be to only use LPA2 if its supported by both
>>> stage1 and stage2. Then the code could just use a static key in the few required
>>> places.
>>
>> Ah, you caught on quick to what I was thinking :-)
> 
> Just wanted to revisit this...
> 
> Ryan, you say that it is possible for hardware to support LPA2 for a
> single stage of translation. Are you basing that statement on something
> in the Arm ARM or the fact that there are two different enumerations
> for stage-1 and stage-2?

Its based on there being 2 separate enumerations. I've dug into this with our
architecture folks; while it is clearly possible that the HW (or L0 hyp) to
present an ID register that says one stage supports LPA2 and the other doesn't,
the real intention behind having the 2 fields separated out is for an L0 hyp to
be able to limit the stage2 granule sizes that it advertises to guest
hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
want to advertise support for LPA2 in one stage and not the other.

So on that basis, it sounds to me like we should just test for LPA2 support in
both stages and require both to be supported. That simplifies things
significantly - I can just use a static key to globally flip between pte
formats, and a bunch of the noisy refactoring disappears.

> 
> In my cursory search I wasn't able to find anything that would suggest
> it is possible for only a single stage to implement the feature. The one
> possibility I can think of is the NV case, where the L0 hypervisor for
> some reason does not support LPA2 in its emulated stage-2 but still
> advertises support for LPA2 at stage-1. I'd say that's quite a stupid
> L0, but I should really hold my tongue until KVM actually does NV ;-)
> 
> I want to make sure there is a strong sense of what LPA2 means in terms
> of the architecture to inform how we use it in KVM.
> 
> --
> Thanks,
> Oliver
> 

Thanks,
Ryan
Oliver Upton Feb. 22, 2023, 8:42 p.m. UTC | #7
Hi Ryan,

On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> Hi Oliver,
> 
> Apologies for having gone quiet on this. I came back to this work today only to
> notice that you sent the below response on the 20th Dec but it did not get
> picked up by my mail client somehow (although I'm sure it was operator error). I
> just spotted it on lore.kernel.org.

Huh, sounds like the arm mail server is not a fan of me... Alex reported
my messages arriving in spam as well. I'll let you decide what that
means about what I have to say :)

> I'm planning to post a second version soon-ish, with all your comments
> addressed. I think everything except the below is pretty clear and straight forward.

Great!

> On 20/12/2022 18:28, Oliver Upton wrote:
> > Ryan, you say that it is possible for hardware to support LPA2 for a
> > single stage of translation. Are you basing that statement on something
> > in the Arm ARM or the fact that there are two different enumerations
> > for stage-1 and stage-2?
> 
> Its based on there being 2 separate enumerations. I've dug into this with our
> architecture folks; while it is clearly possible that the HW (or L0 hyp) to
> present an ID register that says one stage supports LPA2 and the other doesn't,
> the real intention behind having the 2 fields separated out is for an L0 hyp to
> be able to limit the stage2 granule sizes that it advertises to guest
> hypervisors. There are no anticipated use cases where HW or L0 hypervisor might
> want to advertise support for LPA2 in one stage and not the other.

Yep, this is exactly what I was getting at. My impression of the stage-2
enumerations was that they solely exist for choking down the supported
granule size, I was quite surprised to see LPA2 show up in both fields
independently.

> So on that basis, it sounds to me like we should just test for LPA2 support in
> both stages and require both to be supported. That simplifies things
> significantly - I can just use a static key to globally flip between pte
> formats, and a bunch of the noisy refactoring disappears.

Whoever wants to take advantage of split support is welcome to share
their use case and upstream the patches. Otherwise, I think the simpler
approach to enlightening KVM of LPA2 reduces friction on actually
getting the initial enablement done.
Catalin Marinas Feb. 23, 2023, 9:53 a.m. UTC | #8
On Wed, Feb 22, 2023 at 08:42:37PM +0000, Oliver Upton wrote:
> On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote:
> > Apologies for having gone quiet on this. I came back to this work today only to
> > notice that you sent the below response on the 20th Dec but it did not get
> > picked up by my mail client somehow (although I'm sure it was operator error). I
> > just spotted it on lore.kernel.org.
> 
> Huh, sounds like the arm mail server is not a fan of me... Alex reported
> my messages arriving in spam as well. I'll let you decide what that
> means about what I have to say :)

Nothing personal ;), for me most linux.dev emails ended up in the
outlook spam folder. For the benefit of the Arm people on this thread,
since I added linux.dev to outlook's safe senders list, I haven't seen
any of these emails in spam (well, until IT tweaks some filters again;
in the past I was struggling to get google.com emails and the safe
senders list did not make any difference).