Message ID | 20220322202825.418232-1-sstabellini@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] xen/arm: set CPSR Z bit when creating aarch32 guests | expand |
On 22.03.2022 21:28, Stefano Stabellini wrote: > From: Stefano Stabellini <stefano.stabellini@xilinx.com> > > The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the > kernel, certain versions of Linux will use an UNPREDICATABLE NOP > encoding, sometimes resulting in an unbootable kernel. Whether the > resulting kernel is bootable or not depends on the processor. See commit > a92882a4d270 in the Linux kernel for all the details. Is this a problem only under Xen or also on bare hardware? In the latter case I'd be even more inclined to require this issue to be dealt with in the kernels, rather than working around it by an ABI change in Xen. > All kernel releases starting from Linux 4.9 without commit a92882a4d270 > are affected. > > Fortunately there is a simple workaround: setting the "Z" bit in CPSR > make it so those invalid NOP instructions are never executed. That is > because the instruction is conditional (not equal). So, on QEMU at > least, the instruction will end up to be ignored and not generate an > exception. Setting the "Z" bit makes those kernel versions bootable > again and it is harmless in the other cases. I'm afraid such an ABI change being harmless needs to be not just claimed, but proven. There could certainly be reasons this is safe, e.g. the same path being taken on bare hardware, and the state of the bit not being specified there. Yet even in presence of such a specification it cannot be excluded that non-standard (something XTF-like, for example) uses might have grown a dependency on the Xen ABI specification. Jan
Hi Stefano, > On 22 Mar 2022, at 21:28, Stefano Stabellini <sstabellini@kernel.org> wrote: > > From: Stefano Stabellini <stefano.stabellini@xilinx.com> > > The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the > kernel, certain versions of Linux will use an UNPREDICATABLE NOP > encoding, sometimes resulting in an unbootable kernel. Whether the > resulting kernel is bootable or not depends on the processor. See commit > a92882a4d270 in the Linux kernel for all the details. > > All kernel releases starting from Linux 4.9 without commit a92882a4d270 > are affected. Can you confirm if those kernels are also affected when started natively ? > > Fortunately there is a simple workaround: setting the "Z" bit in CPSR > make it so those invalid NOP instructions are never executed. That is > because the instruction is conditional (not equal). So, on QEMU at > least, the instruction will end up to be ignored and not generate an > exception. Setting the "Z" bit makes those kernel versions bootable > again and it is harmless in the other cases. I agree with Jan here. This will never be set or should not be expected to be set by anyone when started. It feels to me that we are introducing an ack for a temporary issue in Linux which will makes us derive from the behaviour that could be expected on native hardware. Could you give more details on how blocking this is ? Is the kernel update with the fix available on any of the affected distributions ? Depending on the answers I think we could for example have a config around this to flag it as workaround for a specific guest issue so that this is only activated when needed. Cheers Bertrand > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> > --- > Changes in v2: > - improve commit message > - add in-code comment > - move PSR_Z to the beginning > --- > xen/include/public/arch-arm.h | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h > index 94b31511dd..81cee95f14 100644 > --- a/xen/include/public/arch-arm.h > +++ b/xen/include/public/arch-arm.h > @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; > #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ > #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ > #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ > +#define PSR_Z (1<<30) /* Zero condition flag */ > > /* 32 bit modes */ > #define PSR_MODE_USR 0x10 > @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; > #define PSR_MODE_EL1t 0x04 > #define PSR_MODE_EL0t 0x00 > > -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > +/* > + * We set PSR_Z to be able to boot Linux kernel versions with an invalid > + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in > + * Linux. > + */ > +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h) > > #define SCTLR_GUEST_INIT xen_mk_ullong(0x00c50078) > -- > 2.25.1 >
On 22/03/2022 20:28, Stefano Stabellini wrote: > From: Stefano Stabellini <stefano.stabellini@xilinx.com> > > The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the > kernel, certain versions of Linux will use an UNPREDICATABLE NOP > encoding, sometimes resulting in an unbootable kernel. Whether the > resulting kernel is bootable or not depends on the processor. See commit > a92882a4d270 in the Linux kernel for all the details. > > All kernel releases starting from Linux 4.9 without commit a92882a4d270 > are affected. > > Fortunately there is a simple workaround: setting the "Z" bit in CPSR > make it so those invalid NOP instructions are never executed. That is > because the instruction is conditional (not equal). So, on QEMU at > least, the instruction will end up to be ignored and not generate an > exception. Setting the "Z" bit makes those kernel versions bootable > again and it is harmless in the other cases. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> A discussion relevant to this came up with XTF/ARM. There is not currently a written ABI for the start state of vCPUs, and there needs to be. I know x86 is in a poor shape too, but we do at least have some scraps of docs littered around and a plan to write some proper Sphinx docs. (A separate conversation was about booting from plain ELF files. Linux ARM Zimage is entirely undocumented for 32bit, and discussions with RMK suggest that we've got bugs, while 64bit has insufficient documentation to demonstrate that our logic is correct.) In particular... > --- > Changes in v2: > - improve commit message > - add in-code comment > - move PSR_Z to the beginning > --- > xen/include/public/arch-arm.h | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h > index 94b31511dd..81cee95f14 100644 > --- a/xen/include/public/arch-arm.h > +++ b/xen/include/public/arch-arm.h > @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; > #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ > #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ > #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ > +#define PSR_Z (1<<30) /* Zero condition flag */ > > /* 32 bit modes */ > #define PSR_MODE_USR 0x10 > @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; > #define PSR_MODE_EL1t 0x04 > #define PSR_MODE_EL0t 0x00 > > -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > +/* > + * We set PSR_Z to be able to boot Linux kernel versions with an invalid > + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in > + * Linux. > + */ > +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) ... this change is either breaking the ABI, or demonstrates that these values must not be in a public header file to begin with. ~Andrew
Hi Andrew, On 23/03/2022 12:36, Andrew Cooper wrote: > On 22/03/2022 20:28, Stefano Stabellini wrote: >> From: Stefano Stabellini <stefano.stabellini@xilinx.com> >> >> The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the >> kernel, certain versions of Linux will use an UNPREDICATABLE NOP >> encoding, sometimes resulting in an unbootable kernel. Whether the >> resulting kernel is bootable or not depends on the processor. See commit >> a92882a4d270 in the Linux kernel for all the details. >> >> All kernel releases starting from Linux 4.9 without commit a92882a4d270 >> are affected. >> >> Fortunately there is a simple workaround: setting the "Z" bit in CPSR >> make it so those invalid NOP instructions are never executed. That is >> because the instruction is conditional (not equal). So, on QEMU at >> least, the instruction will end up to be ignored and not generate an >> exception. Setting the "Z" bit makes those kernel versions bootable >> again and it is harmless in the other cases. >> >> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> > > A discussion relevant to this came up with XTF/ARM. > > There is not currently a written ABI for the start state of vCPUs, and > there needs to be. I know x86 is in a poor shape too, but we do at > least have some scraps of docs littered around and a plan to write some > proper Sphinx docs. > > (A separate conversation was about booting from plain ELF files. Linux > ARM Zimage is entirely undocumented for 32bit, and discussions with RMK > suggest that we've got bugs Do you mind providing more details on what would be the bugs here? > , while 64bit has insufficient documentation > to demonstrate that our logic is correct.) Did you actually read https://github.com/torvalds/linux/blob/master/Documentation/arm64/booting.rst? > > In particular... > >> --- >> Changes in v2: >> - improve commit message >> - add in-code comment >> - move PSR_Z to the beginning >> --- >> xen/include/public/arch-arm.h | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h >> index 94b31511dd..81cee95f14 100644 >> --- a/xen/include/public/arch-arm.h >> +++ b/xen/include/public/arch-arm.h >> @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; >> #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ >> #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ >> #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ >> +#define PSR_Z (1<<30) /* Zero condition flag */ >> >> /* 32 bit modes */ >> #define PSR_MODE_USR 0x10 >> @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; >> #define PSR_MODE_EL1t 0x04 >> #define PSR_MODE_EL0t 0x00 >> >> -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) >> +/* >> + * We set PSR_Z to be able to boot Linux kernel versions with an invalid >> + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in >> + * Linux. >> + */ >> +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > > ... this change is either breaking the ABI, or demonstrates that these > values must not be in a public header file to begin with. PSR_GUEST32_INIT is only exposed to the toolstack (see the ifdef above). It is defined in the arch-arm.h because it makes easier to keep the value in sync. This is not part of the ABI and therefore we are free to change the value in any way we want. Setting Z is a convenient way to handle the Linux issue without making it too invasive. Cheers,
On Wed, 23 Mar 2022, Bertrand Marquis wrote: > > On 22 Mar 2022, at 21:28, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > From: Stefano Stabellini <stefano.stabellini@xilinx.com> > > > > The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the > > kernel, certain versions of Linux will use an UNPREDICATABLE NOP > > encoding, sometimes resulting in an unbootable kernel. Whether the > > resulting kernel is bootable or not depends on the processor. See commit > > a92882a4d270 in the Linux kernel for all the details. > > > > All kernel releases starting from Linux 4.9 without commit a92882a4d270 > > are affected. > > Can you confirm if those kernels are also affected when started natively ? Theoretically yes, but in practice only booting on Xen is affected because: - the issue cannot happen when booting from u-boot because u-boot sets the "Z" bit - the issue cannot happen when booting with QEMU -kernel because it also sets "Z" - older bootloaders on native skip the first 32 bytes of the start address, which also masks this problem Thus, in practice, I have no idea how one could reproduce the problem on native. This info is in the commit message a92882a4d270 on Linux and in-code comments in the kernel. > > Fortunately there is a simple workaround: setting the "Z" bit in CPSR > > make it so those invalid NOP instructions are never executed. That is > > because the instruction is conditional (not equal). So, on QEMU at > > least, the instruction will end up to be ignored and not generate an > > exception. Setting the "Z" bit makes those kernel versions bootable > > again and it is harmless in the other cases. > > I agree with Jan here. This will never be set or should not be expected > to be set by anyone when started. > It feels to me that we are introducing an ack for a temporary issue in > Linux which will makes us derive from the behaviour that could be > expected on native hardware. > > Could you give more details on how blocking this is ? Without this change, none of the Debian arm32 kernels boot on Xen after Jessie (on QEMU). > Is the kernel update with the fix available on any of the affected distributions ? None that I could find. I tried Debian Buster, Debian Bullseye, Debian testing and the latest Alpine Linux. Happy to try more if you give me a download link or two. > Depending on the answers I think we could for example have a config around > this to flag it as workaround for a specific guest issue so that this is only > activated when needed. Also note that this alternative workaround also solves the problem, however it has other drawbacks as Julien described: [1] https://marc.info/?l=xen-devel&m=164774063802402 My take on this is the following. PSR_GUEST32_INIT is not part of the ABI so this cannot be considered an ABI change. But in any case, given that without this change (or another change [1]) most of the kernels out there don't work, is there a point in discussing ABI breakages? Basically nothing works right now :-D I think it makes sense to think whether this change could cause a kernel that used to boot, not to boot anymore. However, I don't think is possible because: - we only support zImage on arm32 and "Z" works well with it - both u-boot and qemu -kernel set "Z" so we would already now if something broke > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> > > --- > > Changes in v2: > > - improve commit message > > - add in-code comment > > - move PSR_Z to the beginning > > --- > > xen/include/public/arch-arm.h | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h > > index 94b31511dd..81cee95f14 100644 > > --- a/xen/include/public/arch-arm.h > > +++ b/xen/include/public/arch-arm.h > > @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; > > #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ > > #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ > > #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ > > +#define PSR_Z (1<<30) /* Zero condition flag */ > > > > /* 32 bit modes */ > > #define PSR_MODE_USR 0x10 > > @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; > > #define PSR_MODE_EL1t 0x04 > > #define PSR_MODE_EL0t 0x00 > > > > -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > > +/* > > + * We set PSR_Z to be able to boot Linux kernel versions with an invalid > > + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in > > + * Linux. > > + */ > > +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) > > #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h) > > > > #define SCTLR_GUEST_INIT xen_mk_ullong(0x00c50078) > > -- > > 2.25.1 > > >
Hi Stefano, Thanks a lot for the detailed answers. > On 24 Mar 2022, at 03:05, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Wed, 23 Mar 2022, Bertrand Marquis wrote: >>> On 22 Mar 2022, at 21:28, Stefano Stabellini <sstabellini@kernel.org> wrote: >>> >>> From: Stefano Stabellini <stefano.stabellini@xilinx.com> >>> >>> The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the >>> kernel, certain versions of Linux will use an UNPREDICATABLE NOP >>> encoding, sometimes resulting in an unbootable kernel. Whether the >>> resulting kernel is bootable or not depends on the processor. See commit >>> a92882a4d270 in the Linux kernel for all the details. >>> >>> All kernel releases starting from Linux 4.9 without commit a92882a4d270 >>> are affected. >> >> Can you confirm if those kernels are also affected when started natively ? > > Theoretically yes, but in practice only booting on Xen is affected > because: > > - the issue cannot happen when booting from u-boot because u-boot sets > the "Z" bit > - the issue cannot happen when booting with QEMU -kernel because it also > sets "Z" > - older bootloaders on native skip the first 32 bytes of the start > address, which also masks this problem > > Thus, in practice, I have no idea how one could reproduce the problem on > native. > > This info is in the commit message a92882a4d270 on Linux and in-code > comments in the kernel. If uboot is setting we can consider that we have a behaviour equivalent to a native boot. Could you add something in the comment in your patch to state that the Z flag is also set by uboot ? This will help in the future to remember why it is ok to have that as the current comment could make one think that this is something only done by Xen. > > >>> Fortunately there is a simple workaround: setting the "Z" bit in CPSR >>> make it so those invalid NOP instructions are never executed. That is >>> because the instruction is conditional (not equal). So, on QEMU at >>> least, the instruction will end up to be ignored and not generate an >>> exception. Setting the "Z" bit makes those kernel versions bootable >>> again and it is harmless in the other cases. >> >> I agree with Jan here. This will never be set or should not be expected >> to be set by anyone when started. >> It feels to me that we are introducing an ack for a temporary issue in >> Linux which will makes us derive from the behaviour that could be >> expected on native hardware. >> >> Could you give more details on how blocking this is ? > > Without this change, none of the Debian arm32 kernels boot on Xen after > Jessie (on QEMU). Ok > > >> Is the kernel update with the fix available on any of the affected distributions ? > > None that I could find. I tried Debian Buster, Debian Bullseye, Debian > testing and the latest Alpine Linux. Happy to try more if you give me a > download link or two. I think the list is long enough to justify the change. > > >> Depending on the answers I think we could for example have a config around >> this to flag it as workaround for a specific guest issue so that this is only >> activated when needed. > > Also note that this alternative workaround also solves the problem, > however it has other drawbacks as Julien described: > [1] https://marc.info/?l=xen-devel&m=164774063802402 Definitely setting the Z bit is better I think. > > > My take on this is the following. PSR_GUEST32_INIT is not part of the > ABI so this cannot be considered an ABI change. > > But in any case, given that without this change (or another change [1]) > most of the kernels out there don't work, is there a point in discussing > ABI breakages? Basically nothing works right now :-D > > I think it makes sense to think whether this change could cause a kernel > that used to boot, not to boot anymore. However, I don't think is > possible because: > > - we only support zImage on arm32 and "Z" works well with it > - both u-boot and qemu -kernel set "Z" so we would already now if > something broke > Agree so please add that both in the comment and in the commit message. Cheers Bertrand > > >>> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> >>> --- >>> Changes in v2: >>> - improve commit message >>> - add in-code comment >>> - move PSR_Z to the beginning >>> --- >>> xen/include/public/arch-arm.h | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h >>> index 94b31511dd..81cee95f14 100644 >>> --- a/xen/include/public/arch-arm.h >>> +++ b/xen/include/public/arch-arm.h >>> @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; >>> #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ >>> #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ >>> #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ >>> +#define PSR_Z (1<<30) /* Zero condition flag */ >>> >>> /* 32 bit modes */ >>> #define PSR_MODE_USR 0x10 >>> @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; >>> #define PSR_MODE_EL1t 0x04 >>> #define PSR_MODE_EL0t 0x00 >>> >>> -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) >>> +/* >>> + * We set PSR_Z to be able to boot Linux kernel versions with an invalid >>> + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in >>> + * Linux. >>> + */ >>> +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) >>> #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h) >>> >>> #define SCTLR_GUEST_INIT xen_mk_ullong(0x00c50078) >>> -- >>> 2.25.1 >>> >>
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index 94b31511dd..81cee95f14 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t; #define PSR_DBG_MASK (1<<9) /* arm64: Debug Exception mask */ #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ +#define PSR_Z (1<<30) /* Zero condition flag */ /* 32 bit modes */ #define PSR_MODE_USR 0x10 @@ -383,7 +384,12 @@ typedef uint64_t xen_callback_t; #define PSR_MODE_EL1t 0x04 #define PSR_MODE_EL0t 0x00 -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) +/* + * We set PSR_Z to be able to boot Linux kernel versions with an invalid + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in + * Linux. + */ +#define PSR_GUEST32_INIT (PSR_Z|PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h) #define SCTLR_GUEST_INIT xen_mk_ullong(0x00c50078)