Message ID | 20230327082118.2177-2-nrb@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | s390x: Add support for running guests without MSO/MSL | expand |
On 3/27/23 10:21, Nico Boehr wrote: > This is to prepare for running guests without MSO/MSL, which is > currently not possible. > > We already have code in sie64a to setup a guest primary ASCE before > entering SIE, so we can in theory switch to the page tables which > translate gpa to hpa. > > But the host is running in primary space mode already, so changing the > primary ASCE before entering SIE will also affect the host's code and > data. > > To make this switch useful, the host should run in a different address > space mode. Hence, set up and change to home address space mode before > installing the guest ASCE. > > The home space ASCE is just copied over from the primary space ASCE, so > no functional change is intended, also for tests that want to use > MSO/MSL. If a test intends to use a different primary space ASCE, it can > now just set the guest.asce in the save_area. > [...] > + /* set up home address space to match primary space */ > + old_cr13 = stctg(13); > + lctlg(13, stctg(1)); > + > + /* switch to home space so guest tables can be different from host */ > + psw_mask_set_bits(PSW_MASK_HOME); > + > + /* also handle all interruptions in home space while in SIE */ > + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; > + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; > + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; We didn't enable DAT in these two cases as far as I can see so this is superfluous or we should change the mmu code. Also it's missing the svc and machine check. The whole bit manipulation thing looks a bit crude. It might make more sense to drop into real mode for a few instructions and have a dedicated storage location for an extended PSW mask and an interrupt ASCE as part of the interrupt call code instead. Opinions? > + mb(); > + > while (vm->sblk->icptcode == 0) { > sie64a(vm->sblk, &vm->save_area); > sie_handle_validity(vm); > @@ -60,6 +75,17 @@ void sie(struct vm *vm) > vm->save_area.guest.grs[14] = vm->sblk->gg14; > vm->save_area.guest.grs[15] = vm->sblk->gg15; > > + lowcore.pgm_new_psw.mask &= ~PSW_MASK_HOME; > + lowcore.ext_new_psw.mask &= ~PSW_MASK_HOME; > + lowcore.io_new_psw.mask &= ~PSW_MASK_HOME; > + mb(); > + > + psw_mask_clear_bits(PSW_MASK_HOME); > + > + /* restore the old CR 13 */ > + lctlg(13, old_cr13); > + > + > if (vm->sblk->sdf == 2) > memcpy(vm->save_area.guest.grs, vm->sblk->pv_grregs, > sizeof(vm->save_area.guest.grs)); > diff --git a/lib/s390x/sie.h b/lib/s390x/sie.h > index 147cb0f2a556..0b00fb709776 100644 > --- a/lib/s390x/sie.h > +++ b/lib/s390x/sie.h > @@ -284,5 +284,6 @@ void sie_handle_validity(struct vm *vm); > void sie_guest_sca_create(struct vm *vm); > void sie_guest_create(struct vm *vm, uint64_t guest_mem, uint64_t guest_mem_len); > void sie_guest_destroy(struct vm *vm); > +bool sie_had_pgm_int(struct vm *vm); > > #endif /* _S390X_SIE_H_ */
Quoting Janosch Frank (2023-03-28 16:13:04) > On 3/27/23 10:21, Nico Boehr wrote: > > This is to prepare for running guests without MSO/MSL, which is > > currently not possible. > > > > We already have code in sie64a to setup a guest primary ASCE before > > entering SIE, so we can in theory switch to the page tables which > > translate gpa to hpa. > > > > But the host is running in primary space mode already, so changing the > > primary ASCE before entering SIE will also affect the host's code and > > data. > > > > To make this switch useful, the host should run in a different address > > space mode. Hence, set up and change to home address space mode before > > installing the guest ASCE. > > > > The home space ASCE is just copied over from the primary space ASCE, so > > no functional change is intended, also for tests that want to use > > MSO/MSL. If a test intends to use a different primary space ASCE, it can > > now just set the guest.asce in the save_area. > > > [...] > > + /* set up home address space to match primary space */ > > + old_cr13 = stctg(13); > > + lctlg(13, stctg(1)); > > + > > + /* switch to home space so guest tables can be different from host */ > > + psw_mask_set_bits(PSW_MASK_HOME); > > + > > + /* also handle all interruptions in home space while in SIE */ > > + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; > > > + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; > > + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; > We didn't enable DAT in these two cases as far as I can see so this is > superfluous or we should change the mmu code. Also it's missing the svc > and machine check. Right. Is there a particular reason why we only run DAT on for PGM ints? > The whole bit manipulation thing looks a bit crude. It might make more > sense to drop into real mode for a few instructions and have a dedicated > storage location for an extended PSW mask and an interrupt ASCE as part > of the interrupt call code instead. > > Opinions? Maybe I don't get it, but I personally don't quite see the advantage. It seems to me this would make things much more complicated just to avoid a few simple bitops. It maybe also depends on how many new_psws we have to touch. If it's really just the PGM, the current solution seems simple enough. But if others also prefer Janosch's suggestion, I am happy to implement it.
On Wed, 29 Mar 2023 14:50:50 +0200 Nico Boehr <nrb@linux.ibm.com> wrote: > Quoting Janosch Frank (2023-03-28 16:13:04) > > On 3/27/23 10:21, Nico Boehr wrote: > > > This is to prepare for running guests without MSO/MSL, which is > > > currently not possible. > > > > > > We already have code in sie64a to setup a guest primary ASCE before > > > entering SIE, so we can in theory switch to the page tables which > > > translate gpa to hpa. > > > > > > But the host is running in primary space mode already, so changing the > > > primary ASCE before entering SIE will also affect the host's code and > > > data. > > > > > > To make this switch useful, the host should run in a different address > > > space mode. Hence, set up and change to home address space mode before > > > installing the guest ASCE. > > > > > > The home space ASCE is just copied over from the primary space ASCE, so > > > no functional change is intended, also for tests that want to use > > > MSO/MSL. If a test intends to use a different primary space ASCE, it can > > > now just set the guest.asce in the save_area. > > > > > [...] > > > + /* set up home address space to match primary space */ > > > + old_cr13 = stctg(13); > > > + lctlg(13, stctg(1)); > > > + > > > + /* switch to home space so guest tables can be different from host */ > > > + psw_mask_set_bits(PSW_MASK_HOME); > > > + > > > + /* also handle all interruptions in home space while in SIE */ > > > + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; > > > > > + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; > > > + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; > > We didn't enable DAT in these two cases as far as I can see so this is > > superfluous or we should change the mmu code. Also it's missing the svc > > and machine check. > > Right. Is there a particular reason why we only run DAT on for PGM ints? a fixup handler for PGM it might need to run with DAT on (e.g. to access data that is not identity mapped), whereas for other interrupts it's not needed (at least not yet ;) ) > > > The whole bit manipulation thing looks a bit crude. It might make more > > sense to drop into real mode for a few instructions and have a dedicated > > storage location for an extended PSW mask and an interrupt ASCE as part > > of the interrupt call code instead. > > > > Opinions? > > Maybe I don't get it, but I personally don't quite see the advantage. It seems > to me this would make things much more complicated just to avoid a few simple > bitops. > > It maybe also depends on how many new_psws we have to touch. If it's really just > the PGM, the current solution seems simple enough. > > But if others also prefer Janosch's suggestion, I am happy to implement it.
On 3/29/23 15:00, Claudio Imbrenda wrote: > On Wed, 29 Mar 2023 14:50:50 +0200 > Nico Boehr <nrb@linux.ibm.com> wrote: > >> Quoting Janosch Frank (2023-03-28 16:13:04) >>> On 3/27/23 10:21, Nico Boehr wrote: >>>> This is to prepare for running guests without MSO/MSL, which is >>>> currently not possible. >>>> >>>> We already have code in sie64a to setup a guest primary ASCE before >>>> entering SIE, so we can in theory switch to the page tables which >>>> translate gpa to hpa. >>>> >>>> But the host is running in primary space mode already, so changing the >>>> primary ASCE before entering SIE will also affect the host's code and >>>> data. >>>> >>>> To make this switch useful, the host should run in a different address >>>> space mode. Hence, set up and change to home address space mode before >>>> installing the guest ASCE. >>>> >>>> The home space ASCE is just copied over from the primary space ASCE, so >>>> no functional change is intended, also for tests that want to use >>>> MSO/MSL. If a test intends to use a different primary space ASCE, it can >>>> now just set the guest.asce in the save_area. >>>> >>> [...] >>>> + /* set up home address space to match primary space */ >>>> + old_cr13 = stctg(13); >>>> + lctlg(13, stctg(1)); >>>> + >>>> + /* switch to home space so guest tables can be different from host */ >>>> + psw_mask_set_bits(PSW_MASK_HOME); >>>> + >>>> + /* also handle all interruptions in home space while in SIE */ >>>> + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; >>> >>>> + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; >>>> + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; >>> We didn't enable DAT in these two cases as far as I can see so this is >>> superfluous or we should change the mmu code. Also it's missing the svc >>> and machine check. >> >> Right. Is there a particular reason why we only run DAT on for PGM ints? > > a fixup handler for PGM it might need to run with DAT on (e.g. to > access data that is not identity mapped), whereas for other interrupts > it's not needed (at least not yet ;) ) At the time where the mmu code was written, the other handlers were very basic or a direct abort (like the IO IRQ that was introduced by Pierre). But I'd rather have them all behave the same so they are at least consistent. If we don't introduce a location where we load the PSW from, then add a function that sets the masks for all IRQs and also convert the mmu enablement to use it. Something to the likes of: void irq_new_set_mask(bool dat, uint8_t as) { loop over psws { - Remove dat and as bits from new PSW - Or in the new dat + as bits } }
Quoting Claudio Imbrenda (2023-03-29 15:00:32) > On Wed, 29 Mar 2023 14:50:50 +0200 > Nico Boehr <nrb@linux.ibm.com> wrote: > > > Quoting Janosch Frank (2023-03-28 16:13:04) > > > On 3/27/23 10:21, Nico Boehr wrote: > > > > This is to prepare for running guests without MSO/MSL, which is > > > > currently not possible. > > > > > > > > We already have code in sie64a to setup a guest primary ASCE before > > > > entering SIE, so we can in theory switch to the page tables which > > > > translate gpa to hpa. > > > > > > > > But the host is running in primary space mode already, so changing the > > > > primary ASCE before entering SIE will also affect the host's code and > > > > data. > > > > > > > > To make this switch useful, the host should run in a different address > > > > space mode. Hence, set up and change to home address space mode before > > > > installing the guest ASCE. > > > > > > > > The home space ASCE is just copied over from the primary space ASCE, so > > > > no functional change is intended, also for tests that want to use > > > > MSO/MSL. If a test intends to use a different primary space ASCE, it can > > > > now just set the guest.asce in the save_area. > > > > > > > [...] > > > > + /* set up home address space to match primary space */ > > > > + old_cr13 = stctg(13); > > > > + lctlg(13, stctg(1)); > > > > + > > > > + /* switch to home space so guest tables can be different from host */ > > > > + psw_mask_set_bits(PSW_MASK_HOME); > > > > + > > > > + /* also handle all interruptions in home space while in SIE */ > > > > + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; > > > > > > > + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; > > > > + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; > > > We didn't enable DAT in these two cases as far as I can see so this is > > > superfluous or we should change the mmu code. Also it's missing the svc > > > and machine check. > > > > Right. Is there a particular reason why we only run DAT on for PGM ints? > > a fixup handler for PGM it might need to run with DAT on (e.g. to > access data that is not identity mapped), whereas for other interrupts > it's not needed (at least not yet ;) ) Makes sense. Since one can register a cleanup for io and ext, too, I think we need to fix the mmu init for these cases while at it. Will do that in v2.
diff --git a/lib/s390x/asm/arch_def.h b/lib/s390x/asm/arch_def.h index bb26e008cc68..dd95e27abf48 100644 --- a/lib/s390x/asm/arch_def.h +++ b/lib/s390x/asm/arch_def.h @@ -67,6 +67,8 @@ struct cpu { #define AS_HOME 3 #define PSW_MASK_DAT 0x0400000000000000UL +#define PSW_MASK_DAT_HOME 0x0400C00000000000UL +#define PSW_MASK_HOME 0x0000C00000000000UL #define PSW_MASK_IO 0x0200000000000000UL #define PSW_MASK_EXT 0x0100000000000000UL #define PSW_MASK_KEY 0x00F0000000000000UL diff --git a/lib/s390x/sie.c b/lib/s390x/sie.c index 9241b4b4a512..22141ded1a90 100644 --- a/lib/s390x/sie.c +++ b/lib/s390x/sie.c @@ -46,6 +46,8 @@ void sie_handle_validity(struct vm *vm) void sie(struct vm *vm) { + uint64_t old_cr13; + if (vm->sblk->sdf == 2) memcpy(vm->sblk->pv_grregs, vm->save_area.guest.grs, sizeof(vm->save_area.guest.grs)); @@ -53,6 +55,19 @@ void sie(struct vm *vm) /* Reset icptcode so we don't trip over it below */ vm->sblk->icptcode = 0; + /* set up home address space to match primary space */ + old_cr13 = stctg(13); + lctlg(13, stctg(1)); + + /* switch to home space so guest tables can be different from host */ + psw_mask_set_bits(PSW_MASK_HOME); + + /* also handle all interruptions in home space while in SIE */ + lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME; + lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME; + lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME; + mb(); + while (vm->sblk->icptcode == 0) { sie64a(vm->sblk, &vm->save_area); sie_handle_validity(vm); @@ -60,6 +75,17 @@ void sie(struct vm *vm) vm->save_area.guest.grs[14] = vm->sblk->gg14; vm->save_area.guest.grs[15] = vm->sblk->gg15; + lowcore.pgm_new_psw.mask &= ~PSW_MASK_HOME; + lowcore.ext_new_psw.mask &= ~PSW_MASK_HOME; + lowcore.io_new_psw.mask &= ~PSW_MASK_HOME; + mb(); + + psw_mask_clear_bits(PSW_MASK_HOME); + + /* restore the old CR 13 */ + lctlg(13, old_cr13); + + if (vm->sblk->sdf == 2) memcpy(vm->save_area.guest.grs, vm->sblk->pv_grregs, sizeof(vm->save_area.guest.grs)); diff --git a/lib/s390x/sie.h b/lib/s390x/sie.h index 147cb0f2a556..0b00fb709776 100644 --- a/lib/s390x/sie.h +++ b/lib/s390x/sie.h @@ -284,5 +284,6 @@ void sie_handle_validity(struct vm *vm); void sie_guest_sca_create(struct vm *vm); void sie_guest_create(struct vm *vm, uint64_t guest_mem, uint64_t guest_mem_len); void sie_guest_destroy(struct vm *vm); +bool sie_had_pgm_int(struct vm *vm); #endif /* _S390X_SIE_H_ */
This is to prepare for running guests without MSO/MSL, which is currently not possible. We already have code in sie64a to setup a guest primary ASCE before entering SIE, so we can in theory switch to the page tables which translate gpa to hpa. But the host is running in primary space mode already, so changing the primary ASCE before entering SIE will also affect the host's code and data. To make this switch useful, the host should run in a different address space mode. Hence, set up and change to home address space mode before installing the guest ASCE. The home space ASCE is just copied over from the primary space ASCE, so no functional change is intended, also for tests that want to use MSO/MSL. If a test intends to use a different primary space ASCE, it can now just set the guest.asce in the save_area. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> --- lib/s390x/asm/arch_def.h | 2 ++ lib/s390x/sie.c | 26 ++++++++++++++++++++++++++ lib/s390x/sie.h | 1 + 3 files changed, 29 insertions(+)