Message ID | 20230109095948.2471205-1-ardb@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | efi: tpm: Avoid READ_ONCE() for accessing the event log | expand |
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > Nathan reports that recent kernels built with LTO will crash when doing > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > misaligned load from the TPM event log, which is annotated with > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > which does not tolerate misaligned accesses. Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned address is pretty sketchy, but if this ends up tripping lots of folks up then I suppose we could use a plain load and a DMB LD as an alternative. It's likely to be more expensive in the LDAPR case, though. > Interestingly, this does not happen when booting the same kernel > straight from the UEFI shell, and so the fact that the event log may > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > However, using READ_ONCE() to access firmware tables is slightly unusual > in any case, and here, we only need to ensure that 'event' is not > dereferenced again after it gets unmapped, so a compiler barrier should > be sufficient, and works around the reported issue. > > Cc: <stable@vger.kernel.org> > Cc: Peter Jones <pjones@redhat.com> > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > Reported-by: Nathan Chancellor <nathan@kernel.org> > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > include/linux/tpm_eventlog.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > --- a/include/linux/tpm_eventlog.h > +++ b/include/linux/tpm_eventlog.h > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > * The loop below will unmap these fields if the log is larger than > * one page, so save them here for reference: > */ > - count = READ_ONCE(event->count); > - event_type = READ_ONCE(event->event_type); > + count = event->count; > + event_type = event->event_type; > + > + barrier(); It would be handy to have a comment here, but when I started thinking about what that would say, it occurred to me that the unmap operation should already have a barrier inside it due to the TLB invalidation, so I'm not sure why this is needed at all. Will
On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote: > > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > > Nathan reports that recent kernels built with LTO will crash when doing > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > > misaligned load from the TPM event log, which is annotated with > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > > which does not tolerate misaligned accesses. > > Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned > address is pretty sketchy, but if this ends up tripping lots of folks up > then I suppose we could use a plain load and a DMB LD as an alternative. > It's likely to be more expensive in the LDAPR case, though. > Yeah, I am not suggesting that we change READ_ONCE(), but this case was definitely not taken into account at the time. > > Interestingly, this does not happen when booting the same kernel > > straight from the UEFI shell, and so the fact that the event log may > > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > > > However, using READ_ONCE() to access firmware tables is slightly unusual > > in any case, and here, we only need to ensure that 'event' is not > > dereferenced again after it gets unmapped, so a compiler barrier should > > be sufficient, and works around the reported issue. > > > > Cc: <stable@vger.kernel.org> > > Cc: Peter Jones <pjones@redhat.com> > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > > Reported-by: Nathan Chancellor <nathan@kernel.org> > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > include/linux/tpm_eventlog.h | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > > --- a/include/linux/tpm_eventlog.h > > +++ b/include/linux/tpm_eventlog.h > > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > > * The loop below will unmap these fields if the log is larger than > > * one page, so save them here for reference: > > */ > > - count = READ_ONCE(event->count); > > - event_type = READ_ONCE(event->event_type); > > + count = event->count; > > + event_type = event->event_type; > > + > > + barrier(); > > It would be handy to have a comment here, but when I started thinking about > what that would say, it occurred to me that the unmap operation should > already have a barrier inside it due to the TLB invalidation, so I'm not > sure why this is needed at all. > This is purely to prevent the compiler from accessing count or event_type by reloading it from the event pointer, in case it runs out of registers. Perhaps this is unlikely to occur, given that the kernel uses -fno-strict-aliasing, and so any store occurring after this READ_ONCE() could potentially affect the result of accessing event->count or event->event_type.
On Mon, Jan 09, 2023 at 04:20:34PM +0100, Ard Biesheuvel wrote: > On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote: > > > > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > > > Nathan reports that recent kernels built with LTO will crash when doing > > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > > > misaligned load from the TPM event log, which is annotated with > > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > > > which does not tolerate misaligned accesses. > > > > Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned > > address is pretty sketchy, but if this ends up tripping lots of folks up > > then I suppose we could use a plain load and a DMB LD as an alternative. > > It's likely to be more expensive in the LDAPR case, though. > > > > Yeah, I am not suggesting that we change READ_ONCE(), but this case > was definitely not taken into account at the time. Indeed, and it looks like the architecture added SCTLR_ELx.nAA to toggle this behaviour, although it was only added in 8.4 with FEAT_LSE2. > > > Interestingly, this does not happen when booting the same kernel > > > straight from the UEFI shell, and so the fact that the event log may > > > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > > > > > However, using READ_ONCE() to access firmware tables is slightly unusual > > > in any case, and here, we only need to ensure that 'event' is not > > > dereferenced again after it gets unmapped, so a compiler barrier should > > > be sufficient, and works around the reported issue. > > > > > > Cc: <stable@vger.kernel.org> > > > Cc: Peter Jones <pjones@redhat.com> > > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > > > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > > > Reported-by: Nathan Chancellor <nathan@kernel.org> > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > --- > > > include/linux/tpm_eventlog.h | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > > > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > > > --- a/include/linux/tpm_eventlog.h > > > +++ b/include/linux/tpm_eventlog.h > > > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > > > * The loop below will unmap these fields if the log is larger than > > > * one page, so save them here for reference: > > > */ > > > - count = READ_ONCE(event->count); > > > - event_type = READ_ONCE(event->event_type); > > > + count = event->count; > > > + event_type = event->event_type; > > > + > > > + barrier(); > > > > It would be handy to have a comment here, but when I started thinking about > > what that would say, it occurred to me that the unmap operation should > > already have a barrier inside it due to the TLB invalidation, so I'm not > > sure why this is needed at all. > > > > This is purely to prevent the compiler from accessing count or > event_type by reloading it from the event pointer, in case it runs out > of registers. But that reload would only be a problem if the event has been unmapped, no? Given that the unmapping code has a barrier() and the unmapped page is not explicitly referenced, then I don't see the issue. Will
On Mon, 9 Jan 2023 at 16:34, Will Deacon <will@kernel.org> wrote: > > On Mon, Jan 09, 2023 at 04:20:34PM +0100, Ard Biesheuvel wrote: > > On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote: > > > > > > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > > > > Nathan reports that recent kernels built with LTO will crash when doing > > > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > > > > misaligned load from the TPM event log, which is annotated with > > > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > > > > which does not tolerate misaligned accesses. > > > > > > Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned > > > address is pretty sketchy, but if this ends up tripping lots of folks up > > > then I suppose we could use a plain load and a DMB LD as an alternative. > > > It's likely to be more expensive in the LDAPR case, though. > > > > > > > Yeah, I am not suggesting that we change READ_ONCE(), but this case > > was definitely not taken into account at the time. > > Indeed, and it looks like the architecture added SCTLR_ELx.nAA to toggle > this behaviour, although it was only added in 8.4 with FEAT_LSE2. > > > > > Interestingly, this does not happen when booting the same kernel > > > > straight from the UEFI shell, and so the fact that the event log may > > > > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > > > > > > > However, using READ_ONCE() to access firmware tables is slightly unusual > > > > in any case, and here, we only need to ensure that 'event' is not > > > > dereferenced again after it gets unmapped, so a compiler barrier should > > > > be sufficient, and works around the reported issue. > > > > > > > > Cc: <stable@vger.kernel.org> > > > > Cc: Peter Jones <pjones@redhat.com> > > > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > > > > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > > > > Reported-by: Nathan Chancellor <nathan@kernel.org> > > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > --- > > > > include/linux/tpm_eventlog.h | 6 ++++-- > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > > > > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > > > > --- a/include/linux/tpm_eventlog.h > > > > +++ b/include/linux/tpm_eventlog.h > > > > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > > > > * The loop below will unmap these fields if the log is larger than > > > > * one page, so save them here for reference: > > > > */ > > > > - count = READ_ONCE(event->count); > > > > - event_type = READ_ONCE(event->event_type); > > > > + count = event->count; > > > > + event_type = event->event_type; > > > > + > > > > + barrier(); > > > > > > It would be handy to have a comment here, but when I started thinking about > > > what that would say, it occurred to me that the unmap operation should > > > already have a barrier inside it due to the TLB invalidation, so I'm not > > > sure why this is needed at all. > > > > > > > This is purely to prevent the compiler from accessing count or > > event_type by reloading it from the event pointer, in case it runs out > > of registers. > > But that reload would only be a problem if the event has been unmapped, no? > Given that the unmapping code has a barrier() and the unmapped page is not > explicitly referenced, then I don't see the issue. > Fair point. Looking at the history, it was I who suggested the READ_ONCE() here in addition to the changes to lift the explicit event dereferences out of the loop, but it indeed seems unlikely that there is any way the compiler could decide that it can dereference event again to grab these quantities. https://lore.kernel.org/all/20190826153028.32639-1-pjones@redhat.com/T/#u So i'll drop the barrier() from this patch. Thanks, Ard.
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > Nathan reports that recent kernels built with LTO will crash when doing > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > misaligned load from the TPM event log, which is annotated with > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > which does not tolerate misaligned accesses. > > Interestingly, this does not happen when booting the same kernel > straight from the UEFI shell, and so the fact that the event log may > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > However, using READ_ONCE() to access firmware tables is slightly unusual > in any case, and here, we only need to ensure that 'event' is not > dereferenced again after it gets unmapped, so a compiler barrier should > be sufficient, and works around the reported issue. > > Cc: <stable@vger.kernel.org> > Cc: Peter Jones <pjones@redhat.com> > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > Reported-by: Nathan Chancellor <nathan@kernel.org> > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Based on the thread, I tested this patch without barrier() and my machine boots up just fine now with an LTO kernel. Thanks a lot for the analysis and fix! Tested-by: Nathan Chancellor <nathan@kernel.org> > --- > include/linux/tpm_eventlog.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > --- a/include/linux/tpm_eventlog.h > +++ b/include/linux/tpm_eventlog.h > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > * The loop below will unmap these fields if the log is larger than > * one page, so save them here for reference: > */ > - count = READ_ONCE(event->count); > - event_type = READ_ONCE(event->event_type); > + count = event->count; > + event_type = event->event_type; > + > + barrier(); > > /* Verify that it's the log header */ > if (event_header->pcr_idx != 0 || > -- > 2.39.0 >
On Mon, 9 Jan 2023 at 18:48, Nathan Chancellor <nathan@kernel.org> wrote: > > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > > Nathan reports that recent kernels built with LTO will crash when doing > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > > misaligned load from the TPM event log, which is annotated with > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > > which does not tolerate misaligned accesses. > > > > Interestingly, this does not happen when booting the same kernel > > straight from the UEFI shell, and so the fact that the event log may > > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > > > However, using READ_ONCE() to access firmware tables is slightly unusual > > in any case, and here, we only need to ensure that 'event' is not > > dereferenced again after it gets unmapped, so a compiler barrier should > > be sufficient, and works around the reported issue. > > > > Cc: <stable@vger.kernel.org> > > Cc: Peter Jones <pjones@redhat.com> > > Cc: Jarkko Sakkinen <jarkko@kernel.org> > > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > > Reported-by: Nathan Chancellor <nathan@kernel.org> > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > Based on the thread, I tested this patch without barrier() and my > machine boots up just fine now with an LTO kernel. Thanks a lot for the > analysis and fix! > > Tested-by: Nathan Chancellor <nathan@kernel.org> > Thanks. I've queued this up as a EFI fix.
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote: > Nathan reports that recent kernels built with LTO will crash when doing > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a > misaligned load from the TPM event log, which is annotated with > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction > which does not tolerate misaligned accesses. > > Interestingly, this does not happen when booting the same kernel > straight from the UEFI shell, and so the fact that the event log may > appear misaligned in memory may be caused by a bug in GRUB or SHIM. > > However, using READ_ONCE() to access firmware tables is slightly unusual > in any case, and here, we only need to ensure that 'event' is not > dereferenced again after it gets unmapped, so a compiler barrier should > be sufficient, and works around the reported issue. > > Cc: <stable@vger.kernel.org> > Cc: Peter Jones <pjones@redhat.com> > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Cc: Matthew Garrett <mjg59@srcf.ucam.org> > Reported-by: Nathan Chancellor <nathan@kernel.org> > Link: https://github.com/ClangBuiltLinux/linux/issues/1782 > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > include/linux/tpm_eventlog.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h > index 20c0ff54b7a0d313..0abcc85904cba874 100644 > --- a/include/linux/tpm_eventlog.h > +++ b/include/linux/tpm_eventlog.h > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev > * The loop below will unmap these fields if the log is larger than > * one page, so save them here for reference: > */ > - count = READ_ONCE(event->count); > - event_type = READ_ONCE(event->event_type); > + count = event->count; > + event_type = event->event_type; > + > + barrier(); > > /* Verify that it's the log header */ > if (event_header->pcr_idx != 0 || > -- > 2.39.0 > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> BR, Jarkko
diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h index 20c0ff54b7a0d313..0abcc85904cba874 100644 --- a/include/linux/tpm_eventlog.h +++ b/include/linux/tpm_eventlog.h @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev * The loop below will unmap these fields if the log is larger than * one page, so save them here for reference: */ - count = READ_ONCE(event->count); - event_type = READ_ONCE(event->event_type); + count = event->count; + event_type = event->event_type; + + barrier(); /* Verify that it's the log header */ if (event_header->pcr_idx != 0 ||
Nathan reports that recent kernels built with LTO will crash when doing EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a misaligned load from the TPM event log, which is annotated with READ_ONCE(), and under LTO, this gets translated into a LDAR instruction which does not tolerate misaligned accesses. Interestingly, this does not happen when booting the same kernel straight from the UEFI shell, and so the fact that the event log may appear misaligned in memory may be caused by a bug in GRUB or SHIM. However, using READ_ONCE() to access firmware tables is slightly unusual in any case, and here, we only need to ensure that 'event' is not dereferenced again after it gets unmapped, so a compiler barrier should be sufficient, and works around the reported issue. Cc: <stable@vger.kernel.org> Cc: Peter Jones <pjones@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1782 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- include/linux/tpm_eventlog.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)