Message ID | 20221206135930.3277585-1-ryan.roberts@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 | expand |
On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: > (appologies, I'm resending this series as I managed to send the cover letter to > all but the following patches only to myself on first attempt). > > This is my first upstream feature submission so please go easy ;-) Welcome :) > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The > HW advertises support for LPA2 independently for stage 1 and stage 2, and > therefore its possible to have it for one and not the other. I've assumed that > there is a valid case for this if stage 1 is not supported but stage 2 is, KVM > could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could > then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because > of this independence and the fact that the kvm pgtable library is used for both > stage 1 and stage 2 tables, this means the library now has to remember the > in-use format on a per-page-table basis. To do this, I had to rework some > functions to take a `struct kvm_pgtable *` parameter, and as a result, there is > a noisy patch to add this parameter. Mismatch between the translation stages is an interesting problem... Given that userspace is responsible for setting up the IPA space, I can't really think of a strong use case for 52 bit IPAs with a 48 bit VA. Sure, the VMM could construct a sparse IPA space or remap the same HVA at multiple IPAs to artificially saturate the address space, but neither seems terribly compelling. Nonetheless, AFAICT we already allow this sort of mismatch on LPA && !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for its guest. Marc, is there any real reason for this or is it just a byproduct of how LPA support was added to KVM? > Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for > 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called > '-1'. (Although stage 2 can use concatenated pages at the first level, and > therefore still only uses 4 levels, the kvm pgtable library deals with both > stage 1 and stage 2 tables). So there is another noisy patch to convert all > level variables to signed. > > This is all tested on the FVP, using a test harness I put together, which does a > host + guest boot test for 180 configurations, built from all the (valid) > combinations of various FVP, host kernel and guest kernel parameters: > > - hw_pa: [48, lpa, lpa2] > - hw_va: [48, 52] > - kvm_mode: [vhe, nvhe, protected] > - host_page_size: [4KB, 16KB, 64KB] > - host_pa: [48, 52] > - host_va: [48, 52] > - host_load_addr: [low, high] > - guest_page_size: [64KB] > - guest_pa: [52] > - guest_va: [52] > - guest_load_addr: [low, high] Wow, what a matrix! In a later revision of this series it might be good to add support for LPA2 guests in KVM selftests. We currently constrain the IPA size to 48bits on !64K kernels. I'll have a deeper look at this series in the coming days. -- Thanks, Oliver
On 15/12/2022 00:52, Oliver Upton wrote: > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: >> (appologies, I'm resending this series as I managed to send the cover letter to >> all but the following patches only to myself on first attempt). >> >> This is my first upstream feature submission so please go easy ;-) > > Welcome :) > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The >> HW advertises support for LPA2 independently for stage 1 and stage 2, and >> therefore its possible to have it for one and not the other. I've assumed that >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because >> of this independence and the fact that the kvm pgtable library is used for both >> stage 1 and stage 2 tables, this means the library now has to remember the >> in-use format on a per-page-table basis. To do this, I had to rework some >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is >> a noisy patch to add this parameter. > > Mismatch between the translation stages is an interesting problem... > > Given that userspace is responsible for setting up the IPA space, I > can't really think of a strong use case for 52 bit IPAs with a 48 bit > VA. Sure, the VMM could construct a sparse IPA space or remap the same > HVA at multiple IPAs to artificially saturate the address space, but > neither seems terribly compelling. > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA && > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for > its guest. I guess a simpler approach would be to only use LPA2 if its supported by both stage1 and stage2. Then the code could just use a static key in the few required places. However, there is also a place where kvm_pgtable walks the user space s1 page table that is constructed by the kernel. For this to keep working, the kernel would need to decide whether to use LPA2 based on the same criteria. But it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for your fuller review. > > Marc, is there any real reason for this or is it just a byproduct of how > LPA support was added to KVM? > >> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 for >> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is called >> '-1'. (Although stage 2 can use concatenated pages at the first level, and >> therefore still only uses 4 levels, the kvm pgtable library deals with both >> stage 1 and stage 2 tables). So there is another noisy patch to convert all >> level variables to signed. >> >> This is all tested on the FVP, using a test harness I put together, which does a >> host + guest boot test for 180 configurations, built from all the (valid) >> combinations of various FVP, host kernel and guest kernel parameters: >> >> - hw_pa: [48, lpa, lpa2] >> - hw_va: [48, 52] >> - kvm_mode: [vhe, nvhe, protected] >> - host_page_size: [4KB, 16KB, 64KB] >> - host_pa: [48, 52] >> - host_va: [48, 52] >> - host_load_addr: [low, high] >> - guest_page_size: [64KB] >> - guest_pa: [52] >> - guest_va: [52] >> - guest_load_addr: [low, high] > > Wow, what a matrix! > > In a later revision of this series it might be good to add support for > LPA2 guests in KVM selftests. We currently constrain the IPA size to > 48bits on !64K kernels. Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked like they were hard-coded for 48-bit IPA and it looked like quite an effort to rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed something. I'll take another look and aim to get some tests implemented for a future revision. > > I'll have a deeper look at this series in the coming days. Thanks! > > -- > Thanks, > Oliver
On Thu, 15 Dec 2022 00:52:28 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: > > (appologies, I'm resending this series as I managed to send the cover letter to > > all but the following patches only to myself on first attempt). > > > > This is my first upstream feature submission so please go easy ;-) > > Welcome :) > > > Support 52-bit Output Addresses: FEAT_LPA2 changes the format of > > the PTEs. The HW advertises support for LPA2 independently for > > stage 1 and stage 2, and therefore its possible to have it for one > > and not the other. I've assumed that there is a valid case for > > this if stage 1 is not supported but stage 2 is, KVM could still > > then use LPA2 at stage 2 to create a 52 bit IPA space (which could > > then be consumed by a 64KB page guest kernel with the help of > > FEAT_LPA). Because of this independence and the fact that the kvm > > pgtable library is used for both stage 1 and stage 2 tables, this > > means the library now has to remember the in-use format on a > > per-page-table basis. To do this, I had to rework some functions > > to take a `struct kvm_pgtable *` parameter, and as a result, there > > is a noisy patch to add this parameter. > > Mismatch between the translation stages is an interesting problem... > > Given that userspace is responsible for setting up the IPA space, I > can't really think of a strong use case for 52 bit IPAs with a 48 bit > VA. Sure, the VMM could construct a sparse IPA space or remap the same > HVA at multiple IPAs to artificially saturate the address space, but > neither seems terribly compelling. > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA && > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for > its guest. > > Marc, is there any real reason for this or is it just a byproduct of how > LPA support was added to KVM? My recollection is hazy, but LPA came first, and LVA only landed much later (because the two features were made independent in the architecture, something that was later abandoned for LPA2, which implies large VAs as well). So yes, the VMM can place memory wherever it wants in the 52bit IPA space, even if its own VA space is limited to 48 bits. And it doesn't have to be memory, by the way. You could place all the emulated MMIO above the 48bit limit, for example, and that doesn't require any trick other than the HW supporting 52bit PAs, and VTCR_EL2 being correctly configured. Thanks, M.
On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote: > On 15/12/2022 00:52, Oliver Upton wrote: > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: > >> (appologies, I'm resending this series as I managed to send the cover letter to > >> all but the following patches only to myself on first attempt). > >> > >> This is my first upstream feature submission so please go easy ;-) > > > > Welcome :) > > > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and > >> therefore its possible to have it for one and not the other. I've assumed that > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because > >> of this independence and the fact that the kvm pgtable library is used for both > >> stage 1 and stage 2 tables, this means the library now has to remember the > >> in-use format on a per-page-table basis. To do this, I had to rework some > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is > >> a noisy patch to add this parameter. > > > > Mismatch between the translation stages is an interesting problem... > > > > Given that userspace is responsible for setting up the IPA space, I > > can't really think of a strong use case for 52 bit IPAs with a 48 bit > > VA. Sure, the VMM could construct a sparse IPA space or remap the same > > HVA at multiple IPAs to artificially saturate the address space, but > > neither seems terribly compelling. > > > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA && > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for > > its guest. > > I guess a simpler approach would be to only use LPA2 if its supported by both > stage1 and stage2. Then the code could just use a static key in the few required > places. Ah, you caught on quick to what I was thinking :-) What I'm groaning about in particular is the changes to the TLB invalidation path, as it feels like a static key is warranted there. Nonetheless, it is all a bit of a mess depending on LPA2 support in both the kernel and KVM. -- Thanks, Oliver
On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote: > On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote: > > On 15/12/2022 00:52, Oliver Upton wrote: > > > On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: > > >> (appologies, I'm resending this series as I managed to send the cover letter to > > >> all but the following patches only to myself on first attempt). > > >> > > >> This is my first upstream feature submission so please go easy ;-) > > > > > > Welcome :) > > > > > >> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The > > >> HW advertises support for LPA2 independently for stage 1 and stage 2, and > > >> therefore its possible to have it for one and not the other. I've assumed that > > >> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM > > >> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could > > >> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because > > >> of this independence and the fact that the kvm pgtable library is used for both > > >> stage 1 and stage 2 tables, this means the library now has to remember the > > >> in-use format on a per-page-table basis. To do this, I had to rework some > > >> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is > > >> a noisy patch to add this parameter. > > > > > > Mismatch between the translation stages is an interesting problem... > > > > > > Given that userspace is responsible for setting up the IPA space, I > > > can't really think of a strong use case for 52 bit IPAs with a 48 bit > > > VA. Sure, the VMM could construct a sparse IPA space or remap the same > > > HVA at multiple IPAs to artificially saturate the address space, but > > > neither seems terribly compelling. > > > > > > Nonetheless, AFAICT we already allow this sort of mismatch on LPA && > > > !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for > > > its guest. > > > > I guess a simpler approach would be to only use LPA2 if its supported by both > > stage1 and stage2. Then the code could just use a static key in the few required > > places. > > Ah, you caught on quick to what I was thinking :-) Just wanted to revisit this... Ryan, you say that it is possible for hardware to support LPA2 for a single stage of translation. Are you basing that statement on something in the Arm ARM or the fact that there are two different enumerations for stage-1 and stage-2? In my cursory search I wasn't able to find anything that would suggest it is possible for only a single stage to implement the feature. The one possibility I can think of is the NV case, where the L0 hypervisor for some reason does not support LPA2 in its emulated stage-2 but still advertises support for LPA2 at stage-1. I'd say that's quite a stupid L0, but I should really hold my tongue until KVM actually does NV ;-) I want to make sure there is a strong sense of what LPA2 means in terms of the architecture to inform how we use it in KVM. -- Thanks, Oliver
Hi Oliver, Apologies for having gone quiet on this. I came back to this work today only to notice that you sent the below response on the 20th Dec but it did not get picked up by my mail client somehow (although I'm sure it was operator error). I just spotted it on lore.kernel.org. I'm planning to post a second version soon-ish, with all your comments addressed. I think everything except the below is pretty clear and straight forward. On 20/12/2022 18:28, Oliver Upton wrote: > On Thu, Dec 15, 2022 at 06:12:14PM +0000, Oliver Upton wrote: >> On Thu, Dec 15, 2022 at 09:33:17AM +0000, Ryan Roberts wrote: >>> On 15/12/2022 00:52, Oliver Upton wrote: >>>> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote: >>>>> (appologies, I'm resending this series as I managed to send the cover letter to >>>>> all but the following patches only to myself on first attempt). >>>>> >>>>> This is my first upstream feature submission so please go easy ;-) >>>> >>>> Welcome :) >>>> >>>>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. The >>>>> HW advertises support for LPA2 independently for stage 1 and stage 2, and >>>>> therefore its possible to have it for one and not the other. I've assumed that >>>>> there is a valid case for this if stage 1 is not supported but stage 2 is, KVM >>>>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which could >>>>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). Because >>>>> of this independence and the fact that the kvm pgtable library is used for both >>>>> stage 1 and stage 2 tables, this means the library now has to remember the >>>>> in-use format on a per-page-table basis. To do this, I had to rework some >>>>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there is >>>>> a noisy patch to add this parameter. >>>> >>>> Mismatch between the translation stages is an interesting problem... >>>> >>>> Given that userspace is responsible for setting up the IPA space, I >>>> can't really think of a strong use case for 52 bit IPAs with a 48 bit >>>> VA. Sure, the VMM could construct a sparse IPA space or remap the same >>>> HVA at multiple IPAs to artificially saturate the address space, but >>>> neither seems terribly compelling. >>>> >>>> Nonetheless, AFAICT we already allow this sort of mismatch on LPA && >>>> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for >>>> its guest. >>> >>> I guess a simpler approach would be to only use LPA2 if its supported by both >>> stage1 and stage2. Then the code could just use a static key in the few required >>> places. >> >> Ah, you caught on quick to what I was thinking :-) > > Just wanted to revisit this... > > Ryan, you say that it is possible for hardware to support LPA2 for a > single stage of translation. Are you basing that statement on something > in the Arm ARM or the fact that there are two different enumerations > for stage-1 and stage-2? Its based on there being 2 separate enumerations. I've dug into this with our architecture folks; while it is clearly possible that the HW (or L0 hyp) to present an ID register that says one stage supports LPA2 and the other doesn't, the real intention behind having the 2 fields separated out is for an L0 hyp to be able to limit the stage2 granule sizes that it advertises to guest hypervisors. There are no anticipated use cases where HW or L0 hypervisor might want to advertise support for LPA2 in one stage and not the other. So on that basis, it sounds to me like we should just test for LPA2 support in both stages and require both to be supported. That simplifies things significantly - I can just use a static key to globally flip between pte formats, and a bunch of the noisy refactoring disappears. > > In my cursory search I wasn't able to find anything that would suggest > it is possible for only a single stage to implement the feature. The one > possibility I can think of is the NV case, where the L0 hypervisor for > some reason does not support LPA2 in its emulated stage-2 but still > advertises support for LPA2 at stage-1. I'd say that's quite a stupid > L0, but I should really hold my tongue until KVM actually does NV ;-) > > I want to make sure there is a strong sense of what LPA2 means in terms > of the architecture to inform how we use it in KVM. > > -- > Thanks, > Oliver > Thanks, Ryan
Hi Ryan, On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote: > Hi Oliver, > > Apologies for having gone quiet on this. I came back to this work today only to > notice that you sent the below response on the 20th Dec but it did not get > picked up by my mail client somehow (although I'm sure it was operator error). I > just spotted it on lore.kernel.org. Huh, sounds like the arm mail server is not a fan of me... Alex reported my messages arriving in spam as well. I'll let you decide what that means about what I have to say :) > I'm planning to post a second version soon-ish, with all your comments > addressed. I think everything except the below is pretty clear and straight forward. Great! > On 20/12/2022 18:28, Oliver Upton wrote: > > Ryan, you say that it is possible for hardware to support LPA2 for a > > single stage of translation. Are you basing that statement on something > > in the Arm ARM or the fact that there are two different enumerations > > for stage-1 and stage-2? > > Its based on there being 2 separate enumerations. I've dug into this with our > architecture folks; while it is clearly possible that the HW (or L0 hyp) to > present an ID register that says one stage supports LPA2 and the other doesn't, > the real intention behind having the 2 fields separated out is for an L0 hyp to > be able to limit the stage2 granule sizes that it advertises to guest > hypervisors. There are no anticipated use cases where HW or L0 hypervisor might > want to advertise support for LPA2 in one stage and not the other. Yep, this is exactly what I was getting at. My impression of the stage-2 enumerations was that they solely exist for choking down the supported granule size, I was quite surprised to see LPA2 show up in both fields independently. > So on that basis, it sounds to me like we should just test for LPA2 support in > both stages and require both to be supported. That simplifies things > significantly - I can just use a static key to globally flip between pte > formats, and a bunch of the noisy refactoring disappears. Whoever wants to take advantage of split support is welcome to share their use case and upstream the patches. Otherwise, I think the simpler approach to enlightening KVM of LPA2 reduces friction on actually getting the initial enablement done.
On Wed, Feb 22, 2023 at 08:42:37PM +0000, Oliver Upton wrote: > On Mon, Feb 20, 2023 at 02:17:30PM +0000, Ryan Roberts wrote: > > Apologies for having gone quiet on this. I came back to this work today only to > > notice that you sent the below response on the 20th Dec but it did not get > > picked up by my mail client somehow (although I'm sure it was operator error). I > > just spotted it on lore.kernel.org. > > Huh, sounds like the arm mail server is not a fan of me... Alex reported > my messages arriving in spam as well. I'll let you decide what that > means about what I have to say :) Nothing personal ;), for me most linux.dev emails ended up in the outlook spam folder. For the benefit of the Arm people on this thread, since I added linux.dev to outlook's safe senders list, I haven't seen any of these emails in spam (well, until IT tweaks some filters again; in the past I was struggling to get google.com emails and the safe senders list did not make any difference).