Message ID | 20200630081921.13443-1-ardb@kernel.org (mailing list archive) |
---|---|
State | Mainlined |
Commit | f7b93d42945cc71e1346dd5ae07c59061d56745e |
Headers | show |
Series | arm64/alternatives: use subsections for replacement sequences | expand |
On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > When building very large kernels, the logic that emits replacement > sequences for alternatives fails when relative branches are present > in the code that is emitted into the .altinstr_replacement section > and patched in at the original site and fixed up. The reason is that > the linker will insert veneers if relative branches go out of range, > and due to the relative distance of the .altinstr_replacement from > the .text section where its branch targets usually live, veneers > may be emitted at the end of the .altinstr_replacement section, with > the relative branches in the sequence pointed at the veneers instead > of the actual target. > > The alternatives patching logic will attempt to fix up the branch to > point to its original target, which will be the veneer in this case, > but given that the patch site is likely to be far away as well, it > will be out of range and so patching will fail. There are other cases > where these veneers are problematic, e.g., when the target of the > branch is in .text while the patch site is in .init.text, in which > case putting the replacement sequence inside .text may not help either. > > So let's use subsections to emit the replacement code as closely as > possible to the patch site, to ensure that veneers are only likely to > be emitted if they are required at the patch site as well, in which > case they will be in range for the replacement sequence both before > and after it is transported to the patch site. > > This will prevent alternative sequences in non-init code from being > released from memory after boot, but this is tolerable given that the > entire section is only 512 KB on an allyesconfig build (which weighs in > at 500+ MB for the entire Image). Also, note that modules today carry > the replacement sequences in non-init sections as well, and any of > those that target init code will be emitted into init sections after > this change. > > This fixes an early crash when booting an allyesconfig kernel on a > system where any of the alternatives sequences containing relative > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Andre Przywara <andre.przywara@arm.com> > Cc: Dave P Martin <dave.martin@arm.com> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > arch/arm64/kernel/vmlinux.lds.S | 3 --- > 2 files changed, 8 insertions(+), 11 deletions(-) > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > index 5e5dc05d63a0..12f0eb56a1cc 100644 > --- a/arch/arm64/include/asm/alternative.h > +++ b/arch/arm64/include/asm/alternative.h > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > ".pushsection .altinstructions,\"a\"\n" \ > ALTINSTR_ENTRY(feature) \ > ".popsection\n" \ > - ".pushsection .altinstr_replacement, \"a\"\n" \ > + ".subsection 1\n" \ This uses subsections in existing sections. Could that interfere with existing (or future) uses of subsections? (I've not checked whether there actually are such uses. I'm also assuming that clobbering the invoker's idea of what section is .previous doesn't matter.) Another wrinkle: the replacement code now becomes executable, whereas I think it was previously in rodata. I'm not sure how much this matters, but it might be a source of gadgets. A different option would be to add an explicitly veneered branch macro for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd need a bti j or equivalent at the destination, which might or might not be easy to achieve -- mind you, I think we theoretically need that anyway for veneers to work properly in all cases. Because we would define the exact instruction sequence, the alternatives code could probably replace it with a direct branch if the actual destination is close enough. The downside is that it wouldn't be a single instruction any more, and there would be some overhead for conditional branches if we replace the unneeded insns with NOPs. [...] Cheers ---Dave
On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin@arm.com> wrote: > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > When building very large kernels, the logic that emits replacement > > sequences for alternatives fails when relative branches are present > > in the code that is emitted into the .altinstr_replacement section > > and patched in at the original site and fixed up. The reason is that > > the linker will insert veneers if relative branches go out of range, > > and due to the relative distance of the .altinstr_replacement from > > the .text section where its branch targets usually live, veneers > > may be emitted at the end of the .altinstr_replacement section, with > > the relative branches in the sequence pointed at the veneers instead > > of the actual target. > > > > The alternatives patching logic will attempt to fix up the branch to > > point to its original target, which will be the veneer in this case, > > but given that the patch site is likely to be far away as well, it > > will be out of range and so patching will fail. There are other cases > > where these veneers are problematic, e.g., when the target of the > > branch is in .text while the patch site is in .init.text, in which > > case putting the replacement sequence inside .text may not help either. > > > > So let's use subsections to emit the replacement code as closely as > > possible to the patch site, to ensure that veneers are only likely to > > be emitted if they are required at the patch site as well, in which > > case they will be in range for the replacement sequence both before > > and after it is transported to the patch site. > > > > This will prevent alternative sequences in non-init code from being > > released from memory after boot, but this is tolerable given that the > > entire section is only 512 KB on an allyesconfig build (which weighs in > > at 500+ MB for the entire Image). Also, note that modules today carry > > the replacement sequences in non-init sections as well, and any of > > those that target init code will be emitted into init sections after > > this change. > > > > This fixes an early crash when booting an allyesconfig kernel on a > > system where any of the alternatives sequences containing relative > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Cc: Andre Przywara <andre.przywara@arm.com> > > Cc: Dave P Martin <dave.martin@arm.com> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > --- a/arch/arm64/include/asm/alternative.h > > +++ b/arch/arm64/include/asm/alternative.h > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > ".pushsection .altinstructions,\"a\"\n" \ > > ALTINSTR_ENTRY(feature) \ > > ".popsection\n" \ > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > + ".subsection 1\n" \ > > This uses subsections in existing sections. Could that interfere with > existing (or future) uses of subsections? (I've not checked whether > there actually are such uses. I'm also assuming that clobbering the > invoker's idea of what section is .previous doesn't matter.) > It shouldn't matter, really. You can use different indexes for the subsection, but since the execution never flows from one subsection into the next, all that matters is that they are 'somewhere else' As for the use of .previous - the idea is that this does not affect the contents of the section stack, which I think makes sense. We could use '.pushsection .text, 1' as well to enter another subsection in .text, but that means we keep the .text vs .init.text issue that this patch solves. > Another wrinkle: the replacement code now becomes executable, whereas > I think it was previously in rodata. I'm not sure how much this > matters, but it might be a source of gadgets. > True. Perhaps we need to get rid of relative branches in alternative sequences entirely - see below. > > A different option would be to add an explicitly veneered branch macro > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > need a bti j or equivalent at the destination, which might or might not > be easy to achieve -- mind you, I think we theoretically need that > anyway for veneers to work properly in all cases. > > Because we would define the exact instruction sequence, the > alternatives code could probably replace it with a direct branch if the > actual destination is close enough. The downside is that it wouldn't > be a single instruction any more, and there would be some overhead for > conditional branches if we replace the unneeded insns with NOPs. > Yeah, this becomes quite hairy very quickly, especially because now you need to allocate a spare register each time you do this. One other option is to simply disallow branches in the alternative sequences: I spotted three occurrences [0] that are quite easily fixed, by inverting the condition so that the relative branch is emitted in place, and the alternative sequence is just NOPs.
On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin@arm.com> wrote: > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > > When building very large kernels, the logic that emits replacement > > > sequences for alternatives fails when relative branches are present > > > in the code that is emitted into the .altinstr_replacement section > > > and patched in at the original site and fixed up. The reason is that > > > the linker will insert veneers if relative branches go out of range, > > > and due to the relative distance of the .altinstr_replacement from > > > the .text section where its branch targets usually live, veneers > > > may be emitted at the end of the .altinstr_replacement section, with > > > the relative branches in the sequence pointed at the veneers instead > > > of the actual target. > > > > > > The alternatives patching logic will attempt to fix up the branch to > > > point to its original target, which will be the veneer in this case, > > > but given that the patch site is likely to be far away as well, it > > > will be out of range and so patching will fail. There are other cases > > > where these veneers are problematic, e.g., when the target of the > > > branch is in .text while the patch site is in .init.text, in which > > > case putting the replacement sequence inside .text may not help either. > > > > > > So let's use subsections to emit the replacement code as closely as > > > possible to the patch site, to ensure that veneers are only likely to > > > be emitted if they are required at the patch site as well, in which > > > case they will be in range for the replacement sequence both before > > > and after it is transported to the patch site. > > > > > > This will prevent alternative sequences in non-init code from being > > > released from memory after boot, but this is tolerable given that the > > > entire section is only 512 KB on an allyesconfig build (which weighs in > > > at 500+ MB for the entire Image). Also, note that modules today carry > > > the replacement sequences in non-init sections as well, and any of > > > those that target init code will be emitted into init sections after > > > this change. > > > > > > This fixes an early crash when booting an allyesconfig kernel on a > > > system where any of the alternatives sequences containing relative > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > Cc: James Morse <james.morse@arm.com> > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > Cc: Dave P Martin <dave.martin@arm.com> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > --- > > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > > --- a/arch/arm64/include/asm/alternative.h > > > +++ b/arch/arm64/include/asm/alternative.h > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > > ".pushsection .altinstructions,\"a\"\n" \ > > > ALTINSTR_ENTRY(feature) \ > > > ".popsection\n" \ > > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > > + ".subsection 1\n" \ > > > > This uses subsections in existing sections. Could that interfere with > > existing (or future) uses of subsections? (I've not checked whether > > there actually are such uses. I'm also assuming that clobbering the > > invoker's idea of what section is .previous doesn't matter.) > > > > It shouldn't matter, really. You can use different indexes for the > subsection, but since the execution never flows from one subsection > into the next, all that matters is that they are 'somewhere else' > > As for the use of .previous - the idea is that this does not affect > the contents of the section stack, which I think makes sense. We could > use '.pushsection .text, 1' as well to enter another subsection in > .text, but that means we keep the .text vs .init.text issue that this > patch solves. > > > Another wrinkle: the replacement code now becomes executable, whereas > > I think it was previously in rodata. I'm not sure how much this > > matters, but it might be a source of gadgets. > > > > True. Perhaps we need to get rid of relative branches in alternative > sequences entirely - see below. > > > > > A different option would be to add an explicitly veneered branch macro > > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > > need a bti j or equivalent at the destination, which might or might not > > be easy to achieve -- mind you, I think we theoretically need that > > anyway for veneers to work properly in all cases. > > > > Because we would define the exact instruction sequence, the > > alternatives code could probably replace it with a direct branch if the > > actual destination is close enough. The downside is that it wouldn't > > be a single instruction any more, and there would be some overhead for > > conditional branches if we replace the unneeded insns with NOPs. > > > > Yeah, this becomes quite hairy very quickly, especially because now > you need to allocate a spare register each time you do this. > > One other option is to simply disallow branches in the alternative > sequences: I spotted three occurrences [0] that are quite easily > fixed, by inverting the condition so that the relative branch is > emitted in place, and the alternative sequence is just NOPs. [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches
On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > When building very large kernels, the logic that emits replacement > sequences for alternatives fails when relative branches are present > in the code that is emitted into the .altinstr_replacement section > and patched in at the original site and fixed up. The reason is that > the linker will insert veneers if relative branches go out of range, > and due to the relative distance of the .altinstr_replacement from > the .text section where its branch targets usually live, veneers > may be emitted at the end of the .altinstr_replacement section, with > the relative branches in the sequence pointed at the veneers instead > of the actual target. > > The alternatives patching logic will attempt to fix up the branch to > point to its original target, which will be the veneer in this case, > but given that the patch site is likely to be far away as well, it > will be out of range and so patching will fail. There are other cases > where these veneers are problematic, e.g., when the target of the > branch is in .text while the patch site is in .init.text, in which > case putting the replacement sequence inside .text may not help either. > > So let's use subsections to emit the replacement code as closely as > possible to the patch site, to ensure that veneers are only likely to > be emitted if they are required at the patch site as well, in which > case they will be in range for the replacement sequence both before > and after it is transported to the patch site. > > This will prevent alternative sequences in non-init code from being > released from memory after boot, but this is tolerable given that the > entire section is only 512 KB on an allyesconfig build (which weighs in > at 500+ MB for the entire Image). Also, note that modules today carry > the replacement sequences in non-init sections as well, and any of > those that target init code will be emitted into init sections after > this change. > > This fixes an early crash when booting an allyesconfig kernel on a > system where any of the alternatives sequences containing relative > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) Given that this fixes a runtime failure with large kernel images (and potentially modules), I'll take this as a fix for 5.8 and we can rework it later on depending on how the discussion here winds up. Cheers. Will
On Tue, 30 Jun 2020 10:19:21 +0200, Ard Biesheuvel wrote: > When building very large kernels, the logic that emits replacement > sequences for alternatives fails when relative branches are present > in the code that is emitted into the .altinstr_replacement section > and patched in at the original site and fixed up. The reason is that > the linker will insert veneers if relative branches go out of range, > and due to the relative distance of the .altinstr_replacement from > the .text section where its branch targets usually live, veneers > may be emitted at the end of the .altinstr_replacement section, with > the relative branches in the sequence pointed at the veneers instead > of the actual target. > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64/alternatives: use subsections for replacement sequences https://git.kernel.org/arm64/c/f7b93d42945c Cheers,
On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote: > On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin@arm.com> wrote: > > > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > > > When building very large kernels, the logic that emits replacement > > > > sequences for alternatives fails when relative branches are present > > > > in the code that is emitted into the .altinstr_replacement section > > > > and patched in at the original site and fixed up. The reason is that > > > > the linker will insert veneers if relative branches go out of range, > > > > and due to the relative distance of the .altinstr_replacement from > > > > the .text section where its branch targets usually live, veneers > > > > may be emitted at the end of the .altinstr_replacement section, with > > > > the relative branches in the sequence pointed at the veneers instead > > > > of the actual target. > > > > > > > > The alternatives patching logic will attempt to fix up the branch to > > > > point to its original target, which will be the veneer in this case, > > > > but given that the patch site is likely to be far away as well, it > > > > will be out of range and so patching will fail. There are other cases > > > > where these veneers are problematic, e.g., when the target of the > > > > branch is in .text while the patch site is in .init.text, in which > > > > case putting the replacement sequence inside .text may not help either. > > > > > > > > So let's use subsections to emit the replacement code as closely as > > > > possible to the patch site, to ensure that veneers are only likely to > > > > be emitted if they are required at the patch site as well, in which > > > > case they will be in range for the replacement sequence both before > > > > and after it is transported to the patch site. > > > > > > > > This will prevent alternative sequences in non-init code from being > > > > released from memory after boot, but this is tolerable given that the > > > > entire section is only 512 KB on an allyesconfig build (which weighs in > > > > at 500+ MB for the entire Image). Also, note that modules today carry > > > > the replacement sequences in non-init sections as well, and any of > > > > those that target init code will be emitted into init sections after > > > > this change. > > > > > > > > This fixes an early crash when booting an allyesconfig kernel on a > > > > system where any of the alternatives sequences containing relative > > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > > Cc: James Morse <james.morse@arm.com> > > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > > Cc: Dave P Martin <dave.martin@arm.com> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > --- > > > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > > > --- a/arch/arm64/include/asm/alternative.h > > > > +++ b/arch/arm64/include/asm/alternative.h > > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > > > ".pushsection .altinstructions,\"a\"\n" \ > > > > ALTINSTR_ENTRY(feature) \ > > > > ".popsection\n" \ > > > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > > > + ".subsection 1\n" \ > > > > > > This uses subsections in existing sections. Could that interfere with > > > existing (or future) uses of subsections? (I've not checked whether > > > there actually are such uses. I'm also assuming that clobbering the > > > invoker's idea of what section is .previous doesn't matter.) > > > > > > > It shouldn't matter, really. You can use different indexes for the > > subsection, but since the execution never flows from one subsection > > into the next, all that matters is that they are 'somewhere else' > > > > As for the use of .previous - the idea is that this does not affect > > the contents of the section stack, which I think makes sense. We could > > use '.pushsection .text, 1' as well to enter another subsection in > > .text, but that means we keep the .text vs .init.text issue that this > > patch solves. The following works: .pushsection junk .previous .subsection foo // ... .popsection though the gas documentation is not very clear about the relationship between .previous and the section stack directives. In fact, each stack slot has its own notion of .previous. If this trick is too uncertain, we can probably do without it though. Relying on .previous after invoking a macro is ill-advised anyway, and I haven't seen this issue come up in practice. Since foo is just a number, maybe we could just pick a randomish value, similarly to the way we "manage" asm local labels generated by macros, leaving small-integer values reserved for top-level code? This might help prevent surprises later on. In general it's reasonable to use subsections to consolidate things that should be kept close but otherwise contiguous in the output object, such as collecting together entries for a jump table. If a .S file and macros it calls are both using subsection 1, the macros might spit out garbage into the middle of the table the .S file is trying to build for example. However, I don't see any obvious evidence of that kind of thing in arch/arm64, and nothing in core code (no surprise there, this is asm). > > > > > Another wrinkle: the replacement code now becomes executable, whereas > > > I think it was previously in rodata. I'm not sure how much this > > > matters, but it might be a source of gadgets. > > > > > > > True. Perhaps we need to get rid of relative branches in alternative > > sequences entirely - see below. > > > > > > > > A different option would be to add an explicitly veneered branch macro > > > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > > > need a bti j or equivalent at the destination, which might or might not > > > be easy to achieve -- mind you, I think we theoretically need that > > > anyway for veneers to work properly in all cases. > > > > > > Because we would define the exact instruction sequence, the > > > alternatives code could probably replace it with a direct branch if the > > > actual destination is close enough. The downside is that it wouldn't > > > be a single instruction any more, and there would be some overhead for > > > conditional branches if we replace the unneeded insns with NOPs. > > > > > > > Yeah, this becomes quite hairy very quickly, especially because now > > you need to allocate a spare register each time you do this. which is not ideal, I agree. Do we anticipate any truly out-of-range branches, or are we assuming the kernel text + modules area is always compact enough that direct branching always works? > > > > One other option is to simply disallow branches in the alternative > > sequences: I spotted three occurrences [0] that are quite easily > > fixed, by inverting the condition so that the relative branch is > > emitted in place, and the alternative sequence is just NOPs. > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches That might be a nice simplification if there's no significant performance impact. Cheers ---Dave
On Mon, 6 Jul 2020 at 18:50, Dave Martin <Dave.Martin@arm.com> wrote: > > On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote: > > On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin@arm.com> wrote: > > > > > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > > > > When building very large kernels, the logic that emits replacement > > > > > sequences for alternatives fails when relative branches are present > > > > > in the code that is emitted into the .altinstr_replacement section > > > > > and patched in at the original site and fixed up. The reason is that > > > > > the linker will insert veneers if relative branches go out of range, > > > > > and due to the relative distance of the .altinstr_replacement from > > > > > the .text section where its branch targets usually live, veneers > > > > > may be emitted at the end of the .altinstr_replacement section, with > > > > > the relative branches in the sequence pointed at the veneers instead > > > > > of the actual target. > > > > > > > > > > The alternatives patching logic will attempt to fix up the branch to > > > > > point to its original target, which will be the veneer in this case, > > > > > but given that the patch site is likely to be far away as well, it > > > > > will be out of range and so patching will fail. There are other cases > > > > > where these veneers are problematic, e.g., when the target of the > > > > > branch is in .text while the patch site is in .init.text, in which > > > > > case putting the replacement sequence inside .text may not help either. > > > > > > > > > > So let's use subsections to emit the replacement code as closely as > > > > > possible to the patch site, to ensure that veneers are only likely to > > > > > be emitted if they are required at the patch site as well, in which > > > > > case they will be in range for the replacement sequence both before > > > > > and after it is transported to the patch site. > > > > > > > > > > This will prevent alternative sequences in non-init code from being > > > > > released from memory after boot, but this is tolerable given that the > > > > > entire section is only 512 KB on an allyesconfig build (which weighs in > > > > > at 500+ MB for the entire Image). Also, note that modules today carry > > > > > the replacement sequences in non-init sections as well, and any of > > > > > those that target init code will be emitted into init sections after > > > > > this change. > > > > > > > > > > This fixes an early crash when booting an allyesconfig kernel on a > > > > > system where any of the alternatives sequences containing relative > > > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > > > > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > > > Cc: James Morse <james.morse@arm.com> > > > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > > > Cc: Dave P Martin <dave.martin@arm.com> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > --- > > > > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > > > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > > > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > > > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > > > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > > > > --- a/arch/arm64/include/asm/alternative.h > > > > > +++ b/arch/arm64/include/asm/alternative.h > > > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > > > > ".pushsection .altinstructions,\"a\"\n" \ > > > > > ALTINSTR_ENTRY(feature) \ > > > > > ".popsection\n" \ > > > > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > > > > + ".subsection 1\n" \ > > > > > > > > This uses subsections in existing sections. Could that interfere with > > > > existing (or future) uses of subsections? (I've not checked whether > > > > there actually are such uses. I'm also assuming that clobbering the > > > > invoker's idea of what section is .previous doesn't matter.) > > > > > > > > > > It shouldn't matter, really. You can use different indexes for the > > > subsection, but since the execution never flows from one subsection > > > into the next, all that matters is that they are 'somewhere else' > > > > > > As for the use of .previous - the idea is that this does not affect > > > the contents of the section stack, which I think makes sense. We could > > > use '.pushsection .text, 1' as well to enter another subsection in > > > .text, but that means we keep the .text vs .init.text issue that this > > > patch solves. > > The following works: > > .pushsection junk > .previous > .subsection foo > // ... > > .popsection > > though the gas documentation is not very clear about the relationship > between .previous and the section stack directives. In fact, each stack > slot has its own notion of .previous. If this trick is too uncertain, > we can probably do without it though. Relying on .previous after > invoking a macro is ill-advised anyway, and I haven't seen this issue > come up in practice. > There is some mention about .previous only affecting the slot at the top of the stack, but it is rather vague. > Since foo is just a number, maybe we could just pick a randomish value, > similarly to the way we "manage" asm local labels generated by macros, > leaving small-integer values reserved for top-level code? This might > help prevent surprises later on. > > In general it's reasonable to use subsections to consolidate things > that should be kept close but otherwise contiguous in the output > object, such as collecting together entries for a jump table. If a .S > file and macros it calls are both using subsection 1, the macros might > spit out garbage into the middle of the table the .S file is trying to > build for example. However, I don't see any obvious evidence of that > kind of thing in arch/arm64, and nothing in core code (no surprise > there, this is asm). > All uses of subsections I am aware just use it to emit code out of line, with a label at the start and a branch at the end, and the only reason is to remove it from the hot path. For that kind of use, the only requirement for the subsection index is that it is != 0. If you are doing smart things with subsections and expect some consistency in how they are organized at the end of the object file, you might care about the subsection index, but I don't see a reason to do so here. > > > > > > > Another wrinkle: the replacement code now becomes executable, whereas > > > > I think it was previously in rodata. I'm not sure how much this > > > > matters, but it might be a source of gadgets. > > > > > > > > > > True. Perhaps we need to get rid of relative branches in alternative > > > sequences entirely - see below. > > > > > > > > > > > A different option would be to add an explicitly veneered branch macro > > > > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > > > > need a bti j or equivalent at the destination, which might or might not > > > > be easy to achieve -- mind you, I think we theoretically need that > > > > anyway for veneers to work properly in all cases. > > > > > > > > Because we would define the exact instruction sequence, the > > > > alternatives code could probably replace it with a direct branch if the > > > > actual destination is close enough. The downside is that it wouldn't > > > > be a single instruction any more, and there would be some overhead for > > > > conditional branches if we replace the unneeded insns with NOPs. > > > > > > > > > > Yeah, this becomes quite hairy very quickly, especially because now > > > you need to allocate a spare register each time you do this. > > which is not ideal, I agree. > > Do we anticipate any truly out-of-range branches, or are we assuming the > kernel text + modules area is always compact enough that direct > branching always works? > There are two realistic failure modes, afaict: - *Really* large kernels, such as allyesconfig, which is kind of artificial but still relevant for validation purposes - Occurrences where the module region is almost exhausted, to the point where the next module's non-init segment no longer fits, but the init section does. In this case, any alternative sequences with branches that need to be patched into the init text section may be out of range for their targets (which could be actual functions or PLTs that were emitted into the non-init section) > > > > > > One other option is to simply disallow branches in the alternative > > > sequences: I spotted three occurrences [0] that are quite easily > > > fixed, by inverting the condition so that the relative branch is > > > emitted in place, and the alternative sequence is just NOPs. > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches > > That might be a nice simplification if there's no significant > performance impact. > Actually, the PAN code could be improved a bit more - I just sent a patch - but in general, disallowing such branches would be a nice simplification but I am not sure we can easily enforce it at build time.
On Mon, Jul 06, 2020 at 07:04:44PM +0300, Ard Biesheuvel wrote: > On Mon, 6 Jul 2020 at 18:50, Dave Martin <Dave.Martin@arm.com> wrote: > > > > On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote: > > > On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin@arm.com> wrote: > > > > > > > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > > > > > When building very large kernels, the logic that emits replacement > > > > > > sequences for alternatives fails when relative branches are present > > > > > > in the code that is emitted into the .altinstr_replacement section > > > > > > and patched in at the original site and fixed up. The reason is that > > > > > > the linker will insert veneers if relative branches go out of range, > > > > > > and due to the relative distance of the .altinstr_replacement from > > > > > > the .text section where its branch targets usually live, veneers > > > > > > may be emitted at the end of the .altinstr_replacement section, with > > > > > > the relative branches in the sequence pointed at the veneers instead > > > > > > of the actual target. > > > > > > > > > > > > The alternatives patching logic will attempt to fix up the branch to > > > > > > point to its original target, which will be the veneer in this case, > > > > > > but given that the patch site is likely to be far away as well, it > > > > > > will be out of range and so patching will fail. There are other cases > > > > > > where these veneers are problematic, e.g., when the target of the > > > > > > branch is in .text while the patch site is in .init.text, in which > > > > > > case putting the replacement sequence inside .text may not help either. > > > > > > > > > > > > So let's use subsections to emit the replacement code as closely as > > > > > > possible to the patch site, to ensure that veneers are only likely to > > > > > > be emitted if they are required at the patch site as well, in which > > > > > > case they will be in range for the replacement sequence both before > > > > > > and after it is transported to the patch site. > > > > > > > > > > > > This will prevent alternative sequences in non-init code from being > > > > > > released from memory after boot, but this is tolerable given that the > > > > > > entire section is only 512 KB on an allyesconfig build (which weighs in > > > > > > at 500+ MB for the entire Image). Also, note that modules today carry > > > > > > the replacement sequences in non-init sections as well, and any of > > > > > > those that target init code will be emitted into init sections after > > > > > > this change. > > > > > > > > > > > > This fixes an early crash when booting an allyesconfig kernel on a > > > > > > system where any of the alternatives sequences containing relative > > > > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > > > > > > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > > > > Cc: James Morse <james.morse@arm.com> > > > > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > > > > Cc: Dave P Martin <dave.martin@arm.com> > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > > --- > > > > > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > > > > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > > > > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > > > > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > > > > > --- a/arch/arm64/include/asm/alternative.h > > > > > > +++ b/arch/arm64/include/asm/alternative.h > > > > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > > > > > ".pushsection .altinstructions,\"a\"\n" \ > > > > > > ALTINSTR_ENTRY(feature) \ > > > > > > ".popsection\n" \ > > > > > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > > > > > + ".subsection 1\n" \ > > > > > > > > > > This uses subsections in existing sections. Could that interfere with > > > > > existing (or future) uses of subsections? (I've not checked whether > > > > > there actually are such uses. I'm also assuming that clobbering the > > > > > invoker's idea of what section is .previous doesn't matter.) > > > > > > > > > > > > > It shouldn't matter, really. You can use different indexes for the > > > > subsection, but since the execution never flows from one subsection > > > > into the next, all that matters is that they are 'somewhere else' > > > > > > > > As for the use of .previous - the idea is that this does not affect > > > > the contents of the section stack, which I think makes sense. We could > > > > use '.pushsection .text, 1' as well to enter another subsection in > > > > .text, but that means we keep the .text vs .init.text issue that this > > > > patch solves. > > > > The following works: > > > > .pushsection junk > > .previous > > .subsection foo > > // ... > > > > .popsection > > > > though the gas documentation is not very clear about the relationship > > between .previous and the section stack directives. In fact, each stack > > slot has its own notion of .previous. If this trick is too uncertain, > > we can probably do without it though. Relying on .previous after > > invoking a macro is ill-advised anyway, and I haven't seen this issue > > come up in practice. > > > > There is some mention about .previous only affecting the slot at the > top of the stack, but it is rather vague. > > > Since foo is just a number, maybe we could just pick a randomish value, > > similarly to the way we "manage" asm local labels generated by macros, > > leaving small-integer values reserved for top-level code? This might > > help prevent surprises later on. > > > > In general it's reasonable to use subsections to consolidate things > > that should be kept close but otherwise contiguous in the output > > object, such as collecting together entries for a jump table. If a .S > > file and macros it calls are both using subsection 1, the macros might > > spit out garbage into the middle of the table the .S file is trying to > > build for example. However, I don't see any obvious evidence of that > > kind of thing in arch/arm64, and nothing in core code (no surprise > > there, this is asm). > > > > All uses of subsections I am aware just use it to emit code out of > line, with a label at the start and a branch at the end, and the only > reason is to remove it from the hot path. For that kind of use, the > only requirement for the subsection index is that it is != 0. > > If you are doing smart things with subsections and expect some > consistency in how they are organized at the end of the object file, > you might care about the subsection index, but I don't see a reason to > do so here. If we used .subsection 77, say, then a conflict would be highly unlikely. But I'd agree that this might just cause more confusion in the long run (if it ever matters). > > > > > > > > > Another wrinkle: the replacement code now becomes executable, whereas > > > > > I think it was previously in rodata. I'm not sure how much this > > > > > matters, but it might be a source of gadgets. > > > > > > > > > > > > > True. Perhaps we need to get rid of relative branches in alternative > > > > sequences entirely - see below. > > > > > > > > > > > > > > A different option would be to add an explicitly veneered branch macro > > > > > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > > > > > need a bti j or equivalent at the destination, which might or might not > > > > > be easy to achieve -- mind you, I think we theoretically need that > > > > > anyway for veneers to work properly in all cases. > > > > > > > > > > Because we would define the exact instruction sequence, the > > > > > alternatives code could probably replace it with a direct branch if the > > > > > actual destination is close enough. The downside is that it wouldn't > > > > > be a single instruction any more, and there would be some overhead for > > > > > conditional branches if we replace the unneeded insns with NOPs. > > > > > > > > > > > > > Yeah, this becomes quite hairy very quickly, especially because now > > > > you need to allocate a spare register each time you do this. > > > > which is not ideal, I agree. > > > > Do we anticipate any truly out-of-range branches, or are we assuming the > > kernel text + modules area is always compact enough that direct > > branching always works? > > > > There are two realistic failure modes, afaict: > - *Really* large kernels, such as allyesconfig, which is kind of > artificial but still relevant for validation purposes > - Occurrences where the module region is almost exhausted, to the > point where the next module's non-init segment no longer fits, but the > init section does. In this case, any alternative sequences with > branches that need to be patched into the init text section may be out > of range for their targets (which could be actual functions or PLTs > that were emitted into the non-init section) > > > > > > > > > > One other option is to simply disallow branches in the alternative > > > > sequences: I spotted three occurrences [0] that are quite easily > > > > fixed, by inverting the condition so that the relative branch is > > > > emitted in place, and the alternative sequence is just NOPs. > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches > > > > That might be a nice simplification if there's no significant > > performance impact. > > > > Actually, the PAN code could be improved a bit more - I just sent a > patch - but in general, disallowing such branches would be a nice > simplification but I am not sure we can easily enforce it at build > time. We could scan for problem relocs when linking, but that would introduce an extra build step. So I'd guess that wouldn't be popular just for this. Cheers ---Dave
Hi, I think this patch breaks pseudo-NMIs. I bisected the following splat to this commit: [ 0.002103] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000130 [ 0.002124] Mem abort info: [ 0.002124] ESR = 0x96000004 [ 0.002135] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.002135] SET = 0, FnV = 0 [ 0.002146] EA = 0, S1PTW = 0 [ 0.002146] Data abort info: [ 0.002154] ISV = 0, ISS = 0x00000004 [ 0.002156] CM = 0, WnR = 0 [ 0.002164] [0000000000000130] user address but active_mm is swapper [ 0.002167] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.002174] Modules linked in: [ 0.002184] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0-rc1-00024-gf7b93d42945c-dirty #135 [ 0.002188] Hardware name: FVP Base (DT) [ 0.002194] pstate: 000003c9 (nzcv DAIF -PAN -UAO BTYPE=--) [ 0.002204] pc : work_pending+0x32c/0x348 [ 0.002210] lr : el1_irq+0xcc/0x180 [ 0.002214] sp : ffffd7c5dced3d90 [ 0.002220] pmr_save: 000000f0 [ 0.002224] x29: ffffd7c5dced3ec0 x28: ffffd7c5dcee2e00 [ 0.002231] x27: ffffd7c5dcee2e00 x26: ffff2842a2ce9000 [ 0.002242] x25: ffff800010000000 x24: 0000000000000001 [ 0.002244] x23: 3c2dd225ee77b65f x22: 2576d4c31e698712 [ 0.002254] x21: 0000000000000402 x20: 00000000000000f0 [ 0.002264] x19: ffffd7c5dced3d90 x18: 0000000000000020 [ 0.002274] x17: 0000000066f2d860 x16: 0000000000000001 [ 0.002274] x15: 0000000000000000 x14: 0000000000000000 [ 0.002284] x13: 00000000fa83b2da x12: 0000000000000000 [ 0.002295] x11: 0000000000000000 x10: 3c2dd225ee77b65f [ 0.002306] x9 : 2576d4c31e698712 x8 : ffffd7c5dcee3bd8 [ 0.002314] x7 : 000000000000ba5e x6 : 00000000fffedb08 [ 0.002316] x5 : 00000000ffffffff x4 : ffff2842a2ce9000 [ 0.002327] x3 : 00000000000000f0 x2 : 00000000000000f0 [ 0.002338] x1 : ffff2842a2ce9000 x0 : 00000000000000f0 [ 0.002344] Call trace: [ 0.002348] work_pending+0x32c/0x348 [ 0.002354] do_idle+0x230/0x30c [ 0.002364] cpu_startup_entry+0x24/0x80 [ 0.002370] rest_init+0xd8/0xe8 [ 0.002380] arch_call_rest_init+0x10/0x1c [ 0.002384] start_kernel+0x4b8/0x4f0 [ 0.002394] Code: d5033f9f d518d03f d503201f d503201f (a9440801) [ 0.002404] ---[ end trace 08b34e49c88e0252 ]--- [ 0.002413] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.002414] SMP: stopping secondary CPUs [ 0.002434] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- The config I used is defconfig + pseudo-NMIs enabled (CONFIG_ARM64_PSEUDO_NMI=y or, in menuconfig, Kernel features -> Support for NMI-like interrupts). The compiler that I used: $ aarch64-linux-gnu-gcc --version aarch64-linux-gnu-gcc (GCC) 10.1.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The kernel was run on the model. Command line to launch the model: $MODEL --application cluster0.*=$PAYLOAD --config-file $CONFIG \ -C bp.pl011_uart0.out_file=$LOGFILE Config file: bp.dram_size=8 bp.pl011_uart0.unbuffered_output=1 bp.pl011_uart0.untimed_fifos=true bp.refcounter.non_arch_start_at_default=1 bp.secure_memory=0 bp.smsc_91c111.enabled=0 bp.ve_sysregs.mmbSiteDefault=0 cache_state_modelled=false cluster0.NUM_CORES=4 cluster0.amu_has_external_interface=1 cluster0.cpu0.CONFIG64=1 cluster0.cpu0.crypto_sha3=1 cluster0.cpu0.crypto_sha512=1 cluster0.cpu0.crypto_sm3=1 cluster0.cpu0.crypto_sm4=1 cluster0.cpu0.enable_crc32=1 cluster0.has_16k_granule=1 cluster0.has_4k_granule=1 cluster0.has_64k_granule=1 cluster0.has_amu=1 cluster0.has_arm_v8-1=1 cluster0.has_arm_v8-2=1 cluster0.has_arm_v8-3=1 cluster0.has_arm_v8-4=1 cluster0.has_arm_v8-5=1 cluster0.has_branch_target_exception=1 cluster0.has_ccidx=1 cluster0.has_data_alignment_flag=1 cluster0.has_fp16=1 cluster0.has_large_system_ext=1 cluster0.has_large_va=1 cluster0.has_ldm_stm_ordering_control=1 cluster0.has_lrcpc=1 cluster0.has_mpam=1 cluster0.has_ras=1 cluster0.has_ras_double_fault=1 cluster0.has_rndr=1 cluster0.has_el3=1 # (bool , init-time) default = '1' : Implements EL3 gic_distributor.GITS_BASER0-type=1 gic_distributor.ITS-count=1 gic_distributor.ITS-hardware-collection-count=1 gic_distributor.direct-lpi-support=1 pctl.startup=0.*.*.* I wrapped the kernel with bootwrapper, with this patch on top to clear SCR_EL3.FIQ and make pseudo-NMIs work: diff --git a/arch/aarch64/boot.S b/arch/aarch64/boot.S index 74705cded338..21e72f336057 100644 --- a/arch/aarch64/boot.S +++ b/arch/aarch64/boot.S @@ -51,6 +51,7 @@ _start: #ifndef KERNEL_32 orr x0, x0, #(1 << 10) // 64-bit EL2 #endif + orr x0, x0, #(1 << 2) // Route FIQs to EL3 msr scr_el3, x0 msr cptr_el3, xzr // Disable copro. traps to EL3 diff --git a/arch/aarch64/include/asm/gic-v3.h b/arch/aarch64/include/asm/gic-v3.h index e743c02ffe5e..ea7862745872 100644 --- a/arch/aarch64/include/asm/gic-v3.h +++ b/arch/aarch64/include/asm/gic-v3.h @@ -32,4 +32,9 @@ static inline void gic_write_icc_ctlr(uint32_t val) asm volatile ("msr " ICC_CTLR_EL3 ", %0" : : "r" (val)); } +static inline void gic_write_icc_pmr(uint64_t val) +{ + asm volatile ("msr " ICC_PMR_EL1 ", %0" : : "r" (val)); +} + #endif diff --git a/gic-v3.c b/gic-v3.c index c2f66bae4c35..5aeb51a1a7be 100644 --- a/gic-v3.c +++ b/gic-v3.c @@ -122,6 +122,9 @@ void gic_secure_init(void) gic_write_icc_sre(sre); isb(); + gic_write_icc_pmr(0xff); + isb(); + gic_write_icc_ctlr(0); isb(); } Bootwrapper command line: ./configure \ --host=aarch64-linux-gnu \ --with-kernel-dir=${HOST_KERNEL_DIR} \ --with-dtb=${HOST_KERNEL_DIR}/arch/arm64/boot/dts/arm/base_gicv3.dtb \ --with-cmdline="earlycon=pl011,0x1c090000 console=ttyAMA0 irqchip.gicv3_pseudo_nmi=1" \ --enable-gicv3 \ --enable-psci \ --with-initrd=${HOST_BUILDROOT_DIR}/output/images/rootfs.cpio.gz I was also able to reproduce it on my rockpro64, but I was using the pseudo-NMI series that allows them to be used on the board [1] and a modified dtb to remove the big cores. I can post the log + whatever else is need to reproduce it, if anyone thinks that would help. [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2020-June/580143.html Thanks, Alex On 6/30/20 9:19 AM, Ard Biesheuvel wrote: > When building very large kernels, the logic that emits replacement > sequences for alternatives fails when relative branches are present > in the code that is emitted into the .altinstr_replacement section > and patched in at the original site and fixed up. The reason is that > the linker will insert veneers if relative branches go out of range, > and due to the relative distance of the .altinstr_replacement from > the .text section where its branch targets usually live, veneers > may be emitted at the end of the .altinstr_replacement section, with > the relative branches in the sequence pointed at the veneers instead > of the actual target. > > The alternatives patching logic will attempt to fix up the branch to > point to its original target, which will be the veneer in this case, > but given that the patch site is likely to be far away as well, it > will be out of range and so patching will fail. There are other cases > where these veneers are problematic, e.g., when the target of the > branch is in .text while the patch site is in .init.text, in which > case putting the replacement sequence inside .text may not help either. > > So let's use subsections to emit the replacement code as closely as > possible to the patch site, to ensure that veneers are only likely to > be emitted if they are required at the patch site as well, in which > case they will be in range for the replacement sequence both before > and after it is transported to the patch site. > > This will prevent alternative sequences in non-init code from being > released from memory after boot, but this is tolerable given that the > entire section is only 512 KB on an allyesconfig build (which weighs in > at 500+ MB for the entire Image). Also, note that modules today carry > the replacement sequences in non-init sections as well, and any of > those that target init code will be emitted into init sections after > this change. > > This fixes an early crash when booting an allyesconfig kernel on a > system where any of the alternatives sequences containing relative > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Andre Przywara <andre.przywara@arm.com> > Cc: Dave P Martin <dave.martin@arm.com> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > arch/arm64/kernel/vmlinux.lds.S | 3 --- > 2 files changed, 8 insertions(+), 11 deletions(-) > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > index 5e5dc05d63a0..12f0eb56a1cc 100644 > --- a/arch/arm64/include/asm/alternative.h > +++ b/arch/arm64/include/asm/alternative.h > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > ".pushsection .altinstructions,\"a\"\n" \ > ALTINSTR_ENTRY(feature) \ > ".popsection\n" \ > - ".pushsection .altinstr_replacement, \"a\"\n" \ > + ".subsection 1\n" \ > "663:\n\t" \ > newinstr "\n" \ > "664:\n\t" \ > - ".popsection\n\t" \ > + ".previous\n\t" \ > ".org . - (664b-663b) + (662b-661b)\n\t" \ > ".org . - (662b-661b) + (664b-663b)\n" \ > ".endif\n" > @@ -117,9 +117,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > 662: .pushsection .altinstructions, "a" > altinstruction_entry 661b, 663f, \cap, 662b-661b, 664f-663f > .popsection > - .pushsection .altinstr_replacement, "ax" > + .subsection 1 > 663: \insn2 > -664: .popsection > +664: .previous > .org . - (664b-663b) + (662b-661b) > .org . - (662b-661b) + (664b-663b) > .endif > @@ -160,7 +160,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > .pushsection .altinstructions, "a" > altinstruction_entry 663f, 661f, \cap, 664f-663f, 662f-661f > .popsection > - .pushsection .altinstr_replacement, "ax" > + .subsection 1 > .align 2 /* So GAS knows label 661 is suitably aligned */ > 661: > .endm > @@ -179,9 +179,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > .macro alternative_else > 662: > .if .Lasm_alt_mode==0 > - .pushsection .altinstr_replacement, "ax" > + .subsection 1 > .else > - .popsection > + .previous > .endif > 663: > .endm > @@ -192,7 +192,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > .macro alternative_endif > 664: > .if .Lasm_alt_mode==0 > - .popsection > + .previous > .endif > .org . - (664b-663b) + (662b-661b) > .org . - (662b-661b) + (664b-663b) > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > index 6827da7f3aa5..5423ffe0a987 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -165,9 +165,6 @@ SECTIONS > *(.altinstructions) > __alt_instructions_end = .; > } > - .altinstr_replacement : { > - *(.altinstr_replacement) > - } > > . = ALIGN(SEGMENT_ALIGN); > __inittext_end = .;
Hi, I should post the entire boot log: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd0f0] [ 0.000000] Linux version 5.8.0-rc1-00024-gf7b93d42945c-dirty (alex@monolith) (aarch64-linux-gnu-gcc (GCC) 10.1.0, GNU ld (GNU Binutils) 2.34) #135 SMP PREEMPT Thu Jul 9 11:28:55 BST 2020 [ 0.000000] Machine model: FVP Base [ 0.000000] earlycon: pl11 at MMIO 0x000000001c090000 (options '') [ 0.000000] printk: bootconsole [pl11] enabled [ 0.000000] efi: UEFI not found. [ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader! [ 0.000000] cma: Reserved 32 MiB at 0x00000000fe000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x00000008ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x8ff7f1100-0x8ff7f2fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff] [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000008ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000008ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000008ffffffff] [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: Using PSCI v0.1 Function IDs from DT [ 0.000000] percpu: Embedded 23 pages/cpu s53912 r8192 d32104 u94208 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: GIC system register CPU interface [ 0.000000] CPU features: detected: Virtualization Host Extensions [ 0.000000] CPU features: detected: Hardware dirty bit management [ 0.000000] CPU features: detected: Speculative Store Bypassing Safe (SSBS) [ 0.000000] CPU features: detected: Address authentication (IMP DEF algorithm) [ 0.000000] CPU features: detected: IRQ priority masking [ 0.000000] CPU features: detected: Branch Target Identification [ 0.000000] alternatives: patching kernel code [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1032192 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: earlycon=pl011,0x1c090000 console=ttyAMA0 irqchip.gicv3_pseudo_nmi=1 [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0xbbfff000-0xbffff000] (64MB) [ 0.000000] Memory: 3896200K/4194304K available (13884K kernel code, 2170K rwdata, 7440K rodata, 5568K init, 481K bss, 265336K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GICv3: 224 SPIs implemented [ 0.000000] GICv3: 0 Extended SPIs implemented [ 0.000000] GICv3: Distributor has no Range Selector support [ 0.000000] GICv3: 16 PPIs implemented [ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000002f100000 [ 0.000000] ITS [mem 0x2f020000-0x2f03ffff] [ 0.000000] ITS@0x000000002f020000: allocated 8192 Devices @8fac40000 (indirect, esz 8, psz 64K, shr 1) [ 0.000000] ITS@0x000000002f020000: allocated 8192 Virtual CPUs @8fac50000 (indirect, esz 8, psz 64K, shr 1) [ 0.000000] ITS@0x000000002f020000: allocated 8192 Interrupt Collections @8fac60000 (flat, esz 8, psz 64K, shr 1) [ 0.000000] GICv3: using LPI property table @0x00000008fac70000 [ 0.000000] GICv3: CPU0: using allocated LPI pending table @0x00000008fac80000 [ 0.000000] GICv3: Relaxing ICC_PMR_EL1 synchronisation [ 0.000000] random: get_random_bytes called from start_kernel+0x314/0x4f0 with crng_init=0 [ 0.000000] arch_timer: cp15 timer(s) running at 1000.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000004] sched_clock: 56 bits at 1000MHz, resolution 1ns, wraps every 4398046511103ns [ 0.000184] sp804: timer clock not found: -517 [ 0.000206] sp804: arm,sp804 clock not found: -2 [ 0.000214] Failed to initialize '/smb@8000000/motherboard/iofpga@300000000/timer@110000': -2 [ 0.000245] sp804: timer clock not found: -517 [ 0.000266] sp804: arm,sp804 clock not found: -2 [ 0.000277] Failed to initialize '/smb@8000000/motherboard/iofpga@300000000/timer@120000': -2 [ 0.000374] Console: colour dummy device 80x25 [ 0.000416] Calibrating delay loop (skipped), value calculated using timer frequency.. 2000.00 BogoMIPS (lpj=4000000) [ 0.000427] pid_max: default: 32768 minimum: 301 [ 0.000484] LSM: Security Framework initializing [ 0.000514] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.000524] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.001124] rcu: Hierarchical SRCU implementation. [ 0.001206] Platform MSI: its@2f020000 domain created [ 0.001244] PCI/MSI: /interrupt-controller@2f000000/its@2f020000 domain created [ 0.001284] fsl-mc MSI: /interrupt-controller@2f000000/its@2f020000 domain created [ 0.001815] EFI services will not be available. [ 0.001910] smp: Bringing up secondary CPUs ... [ 0.002064] Detected PIPT I-cache on CPU1 [ 0.002082] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000 [ 0.002084] GICv3: CPU1: using allocated LPI pending table @0x00000008fac90000 [ 0.002094] CPU1: Booted secondary processor 0x0000000001 [0x410fd0f0] [ 0.002103] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000130 [ 0.002124] Mem abort info: [ 0.002124] ESR = 0x96000004 [ 0.002135] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.002135] SET = 0, FnV = 0 [ 0.002146] EA = 0, S1PTW = 0 [ 0.002146] Data abort info: [ 0.002154] ISV = 0, ISS = 0x00000004 [ 0.002156] CM = 0, WnR = 0 [ 0.002164] [0000000000000130] user address but active_mm is swapper [ 0.002167] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.002174] Modules linked in: [ 0.002184] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0-rc1-00024-gf7b93d42945c-dirty #135 [ 0.002188] Hardware name: FVP Base (DT) [ 0.002194] pstate: 000003c9 (nzcv DAIF -PAN -UAO BTYPE=--) [ 0.002204] pc : work_pending+0x32c/0x348 [ 0.002210] lr : el1_irq+0xcc/0x180 [ 0.002214] sp : ffffd7c5dced3d90 [ 0.002220] pmr_save: 000000f0 [ 0.002224] x29: ffffd7c5dced3ec0 x28: ffffd7c5dcee2e00 [ 0.002231] x27: ffffd7c5dcee2e00 x26: ffff2842a2ce9000 [ 0.002242] x25: ffff800010000000 x24: 0000000000000001 [ 0.002244] x23: 3c2dd225ee77b65f x22: 2576d4c31e698712 [ 0.002254] x21: 0000000000000402 x20: 00000000000000f0 [ 0.002264] x19: ffffd7c5dced3d90 x18: 0000000000000020 [ 0.002274] x17: 0000000066f2d860 x16: 0000000000000001 [ 0.002274] x15: 0000000000000000 x14: 0000000000000000 [ 0.002284] x13: 00000000fa83b2da x12: 0000000000000000 [ 0.002295] x11: 0000000000000000 x10: 3c2dd225ee77b65f [ 0.002306] x9 : 2576d4c31e698712 x8 : ffffd7c5dcee3bd8 [ 0.002314] x7 : 000000000000ba5e x6 : 00000000fffedb08 [ 0.002316] x5 : 00000000ffffffff x4 : ffff2842a2ce9000 [ 0.002327] x3 : 00000000000000f0 x2 : 00000000000000f0 [ 0.002338] x1 : ffff2842a2ce9000 x0 : 00000000000000f0 [ 0.002344] Call trace: [ 0.002348] work_pending+0x32c/0x348 [ 0.002354] do_idle+0x230/0x30c [ 0.002364] cpu_startup_entry+0x24/0x80 [ 0.002370] rest_init+0xd8/0xe8 [ 0.002380] arch_call_rest_init+0x10/0x1c [ 0.002384] start_kernel+0x4b8/0x4f0 [ 0.002394] Code: d5033f9f d518d03f d503201f d503201f (a9440801) [ 0.002404] ---[ end trace 08b34e49c88e0252 ]--- [ 0.002413] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.002414] SMP: stopping secondary CPUs [ 0.002434] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- If it helps, I tried it with caches enabled and the logs were identical (checked with diff). Thanks, Alex On 7/9/20 11:57 AM, Alexandru Elisei wrote: > Hi, > > I think this patch breaks pseudo-NMIs. I bisected the following splat to this commit: > > [ 0.002103] Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000130 > [ 0.002124] Mem abort info: > [ 0.002124] ESR = 0x96000004 > [ 0.002135] EC = 0x25: DABT (current EL), IL = 32 bits > [ 0.002135] SET = 0, FnV = 0 > [ 0.002146] EA = 0, S1PTW = 0 > [ 0.002146] Data abort info: > [ 0.002154] ISV = 0, ISS = 0x00000004 > [ 0.002156] CM = 0, WnR = 0 > [ 0.002164] [0000000000000130] user address but active_mm is swapper > [ 0.002167] Internal error: Oops: 96000004 [#1] PREEMPT SMP > [ 0.002174] Modules linked in: > [ 0.002184] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 5.8.0-rc1-00024-gf7b93d42945c-dirty #135 > [ 0.002188] Hardware name: FVP Base (DT) > [ 0.002194] pstate: 000003c9 (nzcv DAIF -PAN -UAO BTYPE=--) > [ 0.002204] pc : work_pending+0x32c/0x348 > [ 0.002210] lr : el1_irq+0xcc/0x180 > [ 0.002214] sp : ffffd7c5dced3d90 > [ 0.002220] pmr_save: 000000f0 > [ 0.002224] x29: ffffd7c5dced3ec0 x28: ffffd7c5dcee2e00 > [ 0.002231] x27: ffffd7c5dcee2e00 x26: ffff2842a2ce9000 > [ 0.002242] x25: ffff800010000000 x24: 0000000000000001 > [ 0.002244] x23: 3c2dd225ee77b65f x22: 2576d4c31e698712 > [ 0.002254] x21: 0000000000000402 x20: 00000000000000f0 > [ 0.002264] x19: ffffd7c5dced3d90 x18: 0000000000000020 > [ 0.002274] x17: 0000000066f2d860 x16: 0000000000000001 > [ 0.002274] x15: 0000000000000000 x14: 0000000000000000 > [ 0.002284] x13: 00000000fa83b2da x12: 0000000000000000 > [ 0.002295] x11: 0000000000000000 x10: 3c2dd225ee77b65f > [ 0.002306] x9 : 2576d4c31e698712 x8 : ffffd7c5dcee3bd8 > [ 0.002314] x7 : 000000000000ba5e x6 : 00000000fffedb08 > [ 0.002316] x5 : 00000000ffffffff x4 : ffff2842a2ce9000 > [ 0.002327] x3 : 00000000000000f0 x2 : 00000000000000f0 > [ 0.002338] x1 : ffff2842a2ce9000 x0 : 00000000000000f0 > [ 0.002344] Call trace: > [ 0.002348] work_pending+0x32c/0x348 > [ 0.002354] do_idle+0x230/0x30c > [ 0.002364] cpu_startup_entry+0x24/0x80 > [ 0.002370] rest_init+0xd8/0xe8 > [ 0.002380] arch_call_rest_init+0x10/0x1c > [ 0.002384] start_kernel+0x4b8/0x4f0 > [ 0.002394] Code: d5033f9f d518d03f d503201f d503201f (a9440801) > [ 0.002404] ---[ end trace 08b34e49c88e0252 ]--- > [ 0.002413] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.002414] SMP: stopping secondary CPUs > [ 0.002434] ---[ end Kernel panic - not syncing: Attempted to kill the idle > task! ]--- > > The config I used is defconfig + pseudo-NMIs enabled (CONFIG_ARM64_PSEUDO_NMI=y > or, in menuconfig, Kernel features -> Support for NMI-like interrupts). The > compiler that I used: > > $ aarch64-linux-gnu-gcc --version > aarch64-linux-gnu-gcc (GCC) 10.1.0 > Copyright (C) 2020 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > The kernel was run on the model. Command line to launch the model: > > $MODEL --application cluster0.*=$PAYLOAD --config-file $CONFIG \ > -C bp.pl011_uart0.out_file=$LOGFILE > > Config file: > > bp.dram_size=8 > bp.pl011_uart0.unbuffered_output=1 > bp.pl011_uart0.untimed_fifos=true > bp.refcounter.non_arch_start_at_default=1 > bp.secure_memory=0 > bp.smsc_91c111.enabled=0 > bp.ve_sysregs.mmbSiteDefault=0 > cache_state_modelled=false > cluster0.NUM_CORES=4 > cluster0.amu_has_external_interface=1 > cluster0.cpu0.CONFIG64=1 > cluster0.cpu0.crypto_sha3=1 > cluster0.cpu0.crypto_sha512=1 > cluster0.cpu0.crypto_sm3=1 > cluster0.cpu0.crypto_sm4=1 > cluster0.cpu0.enable_crc32=1 > cluster0.has_16k_granule=1 > cluster0.has_4k_granule=1 > cluster0.has_64k_granule=1 > cluster0.has_amu=1 > cluster0.has_arm_v8-1=1 > cluster0.has_arm_v8-2=1 > cluster0.has_arm_v8-3=1 > cluster0.has_arm_v8-4=1 > cluster0.has_arm_v8-5=1 > cluster0.has_branch_target_exception=1 > cluster0.has_ccidx=1 > cluster0.has_data_alignment_flag=1 > cluster0.has_fp16=1 > cluster0.has_large_system_ext=1 > cluster0.has_large_va=1 > cluster0.has_ldm_stm_ordering_control=1 > cluster0.has_lrcpc=1 > cluster0.has_mpam=1 > cluster0.has_ras=1 > cluster0.has_ras_double_fault=1 > cluster0.has_rndr=1 > cluster0.has_el3=1 # (bool , init-time) > default = '1' : Implements EL3 > gic_distributor.GITS_BASER0-type=1 > gic_distributor.ITS-count=1 > gic_distributor.ITS-hardware-collection-count=1 > gic_distributor.direct-lpi-support=1 > pctl.startup=0.*.*.* > > I wrapped the kernel with bootwrapper, with this patch on top to clear SCR_EL3.FIQ > and make pseudo-NMIs work: > > diff --git a/arch/aarch64/boot.S b/arch/aarch64/boot.S > index 74705cded338..21e72f336057 100644 > --- a/arch/aarch64/boot.S > +++ b/arch/aarch64/boot.S > @@ -51,6 +51,7 @@ _start: > #ifndef KERNEL_32 > orr x0, x0, #(1 << 10) // 64-bit EL2 > #endif > + orr x0, x0, #(1 << 2) // Route FIQs to EL3 > msr scr_el3, x0 > > msr cptr_el3, xzr // Disable copro. traps to EL3 > diff --git a/arch/aarch64/include/asm/gic-v3.h b/arch/aarch64/include/asm/gic-v3.h > index e743c02ffe5e..ea7862745872 100644 > --- a/arch/aarch64/include/asm/gic-v3.h > +++ b/arch/aarch64/include/asm/gic-v3.h > @@ -32,4 +32,9 @@ static inline void gic_write_icc_ctlr(uint32_t val) > asm volatile ("msr " ICC_CTLR_EL3 ", %0" : : "r" (val)); > } > > +static inline void gic_write_icc_pmr(uint64_t val) > +{ > + asm volatile ("msr " ICC_PMR_EL1 ", %0" : : "r" (val)); > +} > + > #endif > diff --git a/gic-v3.c b/gic-v3.c > index c2f66bae4c35..5aeb51a1a7be 100644 > --- a/gic-v3.c > +++ b/gic-v3.c > @@ -122,6 +122,9 @@ void gic_secure_init(void) > gic_write_icc_sre(sre); > isb(); > > + gic_write_icc_pmr(0xff); > + isb(); > + > gic_write_icc_ctlr(0); > isb(); > } > > Bootwrapper command line: > > ./configure \ > --host=aarch64-linux-gnu \ > --with-kernel-dir=${HOST_KERNEL_DIR} \ > --with-dtb=${HOST_KERNEL_DIR}/arch/arm64/boot/dts/arm/base_gicv3.dtb \ > --with-cmdline="earlycon=pl011,0x1c090000 console=ttyAMA0 > irqchip.gicv3_pseudo_nmi=1" \ > --enable-gicv3 \ > --enable-psci \ > --with-initrd=${HOST_BUILDROOT_DIR}/output/images/rootfs.cpio.gz > > I was also able to reproduce it on my rockpro64, but I was using the pseudo-NMI > series that allows them to be used on the board [1] and a modified dtb to remove > the big cores. I can post the log + whatever else is need to reproduce it, if > anyone thinks that would help. > > [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2020-June/580143.html > > Thanks, > > Alex > > On 6/30/20 9:19 AM, Ard Biesheuvel wrote: >> When building very large kernels, the logic that emits replacement >> sequences for alternatives fails when relative branches are present >> in the code that is emitted into the .altinstr_replacement section >> and patched in at the original site and fixed up. The reason is that >> the linker will insert veneers if relative branches go out of range, >> and due to the relative distance of the .altinstr_replacement from >> the .text section where its branch targets usually live, veneers >> may be emitted at the end of the .altinstr_replacement section, with >> the relative branches in the sequence pointed at the veneers instead >> of the actual target. >> >> The alternatives patching logic will attempt to fix up the branch to >> point to its original target, which will be the veneer in this case, >> but given that the patch site is likely to be far away as well, it >> will be out of range and so patching will fail. There are other cases >> where these veneers are problematic, e.g., when the target of the >> branch is in .text while the patch site is in .init.text, in which >> case putting the replacement sequence inside .text may not help either. >> >> So let's use subsections to emit the replacement code as closely as >> possible to the patch site, to ensure that veneers are only likely to >> be emitted if they are required at the patch site as well, in which >> case they will be in range for the replacement sequence both before >> and after it is transported to the patch site. >> >> This will prevent alternative sequences in non-init code from being >> released from memory after boot, but this is tolerable given that the >> entire section is only 512 KB on an allyesconfig build (which weighs in >> at 500+ MB for the entire Image). Also, note that modules today carry >> the replacement sequences in non-init sections as well, and any of >> those that target init code will be emitted into init sections after >> this change. >> >> This fixes an early crash when booting an allyesconfig kernel on a >> system where any of the alternatives sequences containing relative >> branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) >> >> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> >> Cc: James Morse <james.morse@arm.com> >> Cc: Andre Przywara <andre.przywara@arm.com> >> Cc: Dave P Martin <dave.martin@arm.com> >> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> >> --- >> arch/arm64/include/asm/alternative.h | 16 ++++++++-------- >> arch/arm64/kernel/vmlinux.lds.S | 3 --- >> 2 files changed, 8 insertions(+), 11 deletions(-) >> >> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h >> index 5e5dc05d63a0..12f0eb56a1cc 100644 >> --- a/arch/arm64/include/asm/alternative.h >> +++ b/arch/arm64/include/asm/alternative.h >> @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } >> ".pushsection .altinstructions,\"a\"\n" \ >> ALTINSTR_ENTRY(feature) \ >> ".popsection\n" \ >> - ".pushsection .altinstr_replacement, \"a\"\n" \ >> + ".subsection 1\n" \ >> "663:\n\t" \ >> newinstr "\n" \ >> "664:\n\t" \ >> - ".popsection\n\t" \ >> + ".previous\n\t" \ >> ".org . - (664b-663b) + (662b-661b)\n\t" \ >> ".org . - (662b-661b) + (664b-663b)\n" \ >> ".endif\n" >> @@ -117,9 +117,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } >> 662: .pushsection .altinstructions, "a" >> altinstruction_entry 661b, 663f, \cap, 662b-661b, 664f-663f >> .popsection >> - .pushsection .altinstr_replacement, "ax" >> + .subsection 1 >> 663: \insn2 >> -664: .popsection >> +664: .previous >> .org . - (664b-663b) + (662b-661b) >> .org . - (662b-661b) + (664b-663b) >> .endif >> @@ -160,7 +160,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } >> .pushsection .altinstructions, "a" >> altinstruction_entry 663f, 661f, \cap, 664f-663f, 662f-661f >> .popsection >> - .pushsection .altinstr_replacement, "ax" >> + .subsection 1 >> .align 2 /* So GAS knows label 661 is suitably aligned */ >> 661: >> .endm >> @@ -179,9 +179,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } >> .macro alternative_else >> 662: >> .if .Lasm_alt_mode==0 >> - .pushsection .altinstr_replacement, "ax" >> + .subsection 1 >> .else >> - .popsection >> + .previous >> .endif >> 663: >> .endm >> @@ -192,7 +192,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } >> .macro alternative_endif >> 664: >> .if .Lasm_alt_mode==0 >> - .popsection >> + .previous >> .endif >> .org . - (664b-663b) + (662b-661b) >> .org . - (662b-661b) + (664b-663b) >> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S >> index 6827da7f3aa5..5423ffe0a987 100644 >> --- a/arch/arm64/kernel/vmlinux.lds.S >> +++ b/arch/arm64/kernel/vmlinux.lds.S >> @@ -165,9 +165,6 @@ SECTIONS >> *(.altinstructions) >> __alt_instructions_end = .; >> } >> - .altinstr_replacement : { >> - *(.altinstr_replacement) >> - } >> >> . = ALIGN(SEGMENT_ALIGN); >> __inittext_end = .; > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, 9 Jul 2020 at 14:11, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > Hi, > Hi Alex, Apologies for the breakage. > I should post the entire boot log: ... > [ 0.002204] pc : work_pending+0x32c/0x348 This suggests that you ended executing directly from the alternatives replacement section that gets appended to the end of work_pending (and therefore gets mistaken for being part of it) It appears that the following code in alternatives.c static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) { unsigned long replptr; if (kernel_text_address(pc)) return true; returns true inadvertently for the branch in this piece of code in entry.S alternative_if ARM64_HAS_IRQ_PRIO_MASKING ldr x20, [sp, #S_PMR_SAVE] msr_s SYS_ICC_PMR_EL1, x20 mrs_s x21, SYS_ICC_CTLR_EL1 tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE dsb sy // Ensure priority change is seen by redistributor .L__skip_pmr_sync\@: due to the fact that kernel_text_address() has no way of distinguishing branches inside the subsection from branches that require updating. So the alternatives patching code dutifully updates the tbz opcode and points it to its original target in the subsection. This is going to be rather tricky to fix, unless we special case tbz/cbz branches and other branches with limited range that would never have worked before anyway. For now, better to just revert it and revisit it later. > [ 0.002210] lr : el1_irq+0xcc/0x180 > [ 0.002214] sp : ffffd7c5dced3d90 > [ 0.002220] pmr_save: 000000f0 > [ 0.002224] x29: ffffd7c5dced3ec0 x28: ffffd7c5dcee2e00 > [ 0.002231] x27: ffffd7c5dcee2e00 x26: ffff2842a2ce9000 > [ 0.002242] x25: ffff800010000000 x24: 0000000000000001 > [ 0.002244] x23: 3c2dd225ee77b65f x22: 2576d4c31e698712 > [ 0.002254] x21: 0000000000000402 x20: 00000000000000f0 > [ 0.002264] x19: ffffd7c5dced3d90 x18: 0000000000000020 > [ 0.002274] x17: 0000000066f2d860 x16: 0000000000000001 > [ 0.002274] x15: 0000000000000000 x14: 0000000000000000 > [ 0.002284] x13: 00000000fa83b2da x12: 0000000000000000 > [ 0.002295] x11: 0000000000000000 x10: 3c2dd225ee77b65f > [ 0.002306] x9 : 2576d4c31e698712 x8 : ffffd7c5dcee3bd8 > [ 0.002314] x7 : 000000000000ba5e x6 : 00000000fffedb08 > [ 0.002316] x5 : 00000000ffffffff x4 : ffff2842a2ce9000 > [ 0.002327] x3 : 00000000000000f0 x2 : 00000000000000f0 > [ 0.002338] x1 : ffff2842a2ce9000 x0 : 00000000000000f0 > [ 0.002344] Call trace: > [ 0.002348] work_pending+0x32c/0x348 > [ 0.002354] do_idle+0x230/0x30c > [ 0.002364] cpu_startup_entry+0x24/0x80 > [ 0.002370] rest_init+0xd8/0xe8 > [ 0.002380] arch_call_rest_init+0x10/0x1c > [ 0.002384] start_kernel+0x4b8/0x4f0 > [ 0.002394] Code: d5033f9f d518d03f d503201f d503201f (a9440801) > [ 0.002404] ---[ end trace 08b34e49c88e0252 ]--- > [ 0.002413] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.002414] SMP: stopping secondary CPUs > [ 0.002434] ---[ end Kernel panic - not syncing: Attempted to kill the idle > task! ]--- > > If it helps, I tried it with caches enabled and the logs were identical (checked > with diff). > > Thanks, > > Alex > > On 7/9/20 11:57 AM, Alexandru Elisei wrote: > > Hi, > > > > I think this patch breaks pseudo-NMIs. I bisected the following splat to this commit: > > > > [ 0.002103] Unable to handle kernel NULL pointer dereference at virtual address > > 0000000000000130 > > [ 0.002124] Mem abort info: > > [ 0.002124] ESR = 0x96000004 > > [ 0.002135] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 0.002135] SET = 0, FnV = 0 > > [ 0.002146] EA = 0, S1PTW = 0 > > [ 0.002146] Data abort info: > > [ 0.002154] ISV = 0, ISS = 0x00000004 > > [ 0.002156] CM = 0, WnR = 0 > > [ 0.002164] [0000000000000130] user address but active_mm is swapper > > [ 0.002167] Internal error: Oops: 96000004 [#1] PREEMPT SMP > > [ 0.002174] Modules linked in: > > [ 0.002184] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > > 5.8.0-rc1-00024-gf7b93d42945c-dirty #135 > > [ 0.002188] Hardware name: FVP Base (DT) > > [ 0.002194] pstate: 000003c9 (nzcv DAIF -PAN -UAO BTYPE=--) > > [ 0.002204] pc : work_pending+0x32c/0x348 > > [ 0.002210] lr : el1_irq+0xcc/0x180 > > [ 0.002214] sp : ffffd7c5dced3d90 > > [ 0.002220] pmr_save: 000000f0 > > [ 0.002224] x29: ffffd7c5dced3ec0 x28: ffffd7c5dcee2e00 > > [ 0.002231] x27: ffffd7c5dcee2e00 x26: ffff2842a2ce9000 > > [ 0.002242] x25: ffff800010000000 x24: 0000000000000001 > > [ 0.002244] x23: 3c2dd225ee77b65f x22: 2576d4c31e698712 > > [ 0.002254] x21: 0000000000000402 x20: 00000000000000f0 > > [ 0.002264] x19: ffffd7c5dced3d90 x18: 0000000000000020 > > [ 0.002274] x17: 0000000066f2d860 x16: 0000000000000001 > > [ 0.002274] x15: 0000000000000000 x14: 0000000000000000 > > [ 0.002284] x13: 00000000fa83b2da x12: 0000000000000000 > > [ 0.002295] x11: 0000000000000000 x10: 3c2dd225ee77b65f > > [ 0.002306] x9 : 2576d4c31e698712 x8 : ffffd7c5dcee3bd8 > > [ 0.002314] x7 : 000000000000ba5e x6 : 00000000fffedb08 > > [ 0.002316] x5 : 00000000ffffffff x4 : ffff2842a2ce9000 > > [ 0.002327] x3 : 00000000000000f0 x2 : 00000000000000f0 > > [ 0.002338] x1 : ffff2842a2ce9000 x0 : 00000000000000f0 > > [ 0.002344] Call trace: > > [ 0.002348] work_pending+0x32c/0x348 > > [ 0.002354] do_idle+0x230/0x30c > > [ 0.002364] cpu_startup_entry+0x24/0x80 > > [ 0.002370] rest_init+0xd8/0xe8 > > [ 0.002380] arch_call_rest_init+0x10/0x1c > > [ 0.002384] start_kernel+0x4b8/0x4f0 > > [ 0.002394] Code: d5033f9f d518d03f d503201f d503201f (a9440801) > > [ 0.002404] ---[ end trace 08b34e49c88e0252 ]--- > > [ 0.002413] Kernel panic - not syncing: Attempted to kill the idle task! > > [ 0.002414] SMP: stopping secondary CPUs > > [ 0.002434] ---[ end Kernel panic - not syncing: Attempted to kill the idle > > task! ]--- > > > > The config I used is defconfig + pseudo-NMIs enabled (CONFIG_ARM64_PSEUDO_NMI=y > > or, in menuconfig, Kernel features -> Support for NMI-like interrupts). The > > compiler that I used: > > > > $ aarch64-linux-gnu-gcc --version > > aarch64-linux-gnu-gcc (GCC) 10.1.0 > > Copyright (C) 2020 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There is NO > > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > > > The kernel was run on the model. Command line to launch the model: > > > > $MODEL --application cluster0.*=$PAYLOAD --config-file $CONFIG \ > > -C bp.pl011_uart0.out_file=$LOGFILE > > > > Config file: > > > > bp.dram_size=8 > > bp.pl011_uart0.unbuffered_output=1 > > bp.pl011_uart0.untimed_fifos=true > > bp.refcounter.non_arch_start_at_default=1 > > bp.secure_memory=0 > > bp.smsc_91c111.enabled=0 > > bp.ve_sysregs.mmbSiteDefault=0 > > cache_state_modelled=false > > cluster0.NUM_CORES=4 > > cluster0.amu_has_external_interface=1 > > cluster0.cpu0.CONFIG64=1 > > cluster0.cpu0.crypto_sha3=1 > > cluster0.cpu0.crypto_sha512=1 > > cluster0.cpu0.crypto_sm3=1 > > cluster0.cpu0.crypto_sm4=1 > > cluster0.cpu0.enable_crc32=1 > > cluster0.has_16k_granule=1 > > cluster0.has_4k_granule=1 > > cluster0.has_64k_granule=1 > > cluster0.has_amu=1 > > cluster0.has_arm_v8-1=1 > > cluster0.has_arm_v8-2=1 > > cluster0.has_arm_v8-3=1 > > cluster0.has_arm_v8-4=1 > > cluster0.has_arm_v8-5=1 > > cluster0.has_branch_target_exception=1 > > cluster0.has_ccidx=1 > > cluster0.has_data_alignment_flag=1 > > cluster0.has_fp16=1 > > cluster0.has_large_system_ext=1 > > cluster0.has_large_va=1 > > cluster0.has_ldm_stm_ordering_control=1 > > cluster0.has_lrcpc=1 > > cluster0.has_mpam=1 > > cluster0.has_ras=1 > > cluster0.has_ras_double_fault=1 > > cluster0.has_rndr=1 > > cluster0.has_el3=1 # (bool , init-time) > > default = '1' : Implements EL3 > > gic_distributor.GITS_BASER0-type=1 > > gic_distributor.ITS-count=1 > > gic_distributor.ITS-hardware-collection-count=1 > > gic_distributor.direct-lpi-support=1 > > pctl.startup=0.*.*.* > > > > I wrapped the kernel with bootwrapper, with this patch on top to clear SCR_EL3.FIQ > > and make pseudo-NMIs work: > > > > diff --git a/arch/aarch64/boot.S b/arch/aarch64/boot.S > > index 74705cded338..21e72f336057 100644 > > --- a/arch/aarch64/boot.S > > +++ b/arch/aarch64/boot.S > > @@ -51,6 +51,7 @@ _start: > > #ifndef KERNEL_32 > > orr x0, x0, #(1 << 10) // 64-bit EL2 > > #endif > > + orr x0, x0, #(1 << 2) // Route FIQs to EL3 > > msr scr_el3, x0 > > > > msr cptr_el3, xzr // Disable copro. traps to EL3 > > diff --git a/arch/aarch64/include/asm/gic-v3.h b/arch/aarch64/include/asm/gic-v3.h > > index e743c02ffe5e..ea7862745872 100644 > > --- a/arch/aarch64/include/asm/gic-v3.h > > +++ b/arch/aarch64/include/asm/gic-v3.h > > @@ -32,4 +32,9 @@ static inline void gic_write_icc_ctlr(uint32_t val) > > asm volatile ("msr " ICC_CTLR_EL3 ", %0" : : "r" (val)); > > } > > > > +static inline void gic_write_icc_pmr(uint64_t val) > > +{ > > + asm volatile ("msr " ICC_PMR_EL1 ", %0" : : "r" (val)); > > +} > > + > > #endif > > diff --git a/gic-v3.c b/gic-v3.c > > index c2f66bae4c35..5aeb51a1a7be 100644 > > --- a/gic-v3.c > > +++ b/gic-v3.c > > @@ -122,6 +122,9 @@ void gic_secure_init(void) > > gic_write_icc_sre(sre); > > isb(); > > > > + gic_write_icc_pmr(0xff); > > + isb(); > > + > > gic_write_icc_ctlr(0); > > isb(); > > } > > > > Bootwrapper command line: > > > > ./configure \ > > --host=aarch64-linux-gnu \ > > --with-kernel-dir=${HOST_KERNEL_DIR} \ > > --with-dtb=${HOST_KERNEL_DIR}/arch/arm64/boot/dts/arm/base_gicv3.dtb \ > > --with-cmdline="earlycon=pl011,0x1c090000 console=ttyAMA0 > > irqchip.gicv3_pseudo_nmi=1" \ > > --enable-gicv3 \ > > --enable-psci \ > > --with-initrd=${HOST_BUILDROOT_DIR}/output/images/rootfs.cpio.gz > > > > I was also able to reproduce it on my rockpro64, but I was using the pseudo-NMI > > series that allows them to be used on the board [1] and a modified dtb to remove > > the big cores. I can post the log + whatever else is need to reproduce it, if > > anyone thinks that would help. > > > > [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2020-June/580143.html > > > > Thanks, > > > > Alex > > > > On 6/30/20 9:19 AM, Ard Biesheuvel wrote: > >> When building very large kernels, the logic that emits replacement > >> sequences for alternatives fails when relative branches are present > >> in the code that is emitted into the .altinstr_replacement section > >> and patched in at the original site and fixed up. The reason is that > >> the linker will insert veneers if relative branches go out of range, > >> and due to the relative distance of the .altinstr_replacement from > >> the .text section where its branch targets usually live, veneers > >> may be emitted at the end of the .altinstr_replacement section, with > >> the relative branches in the sequence pointed at the veneers instead > >> of the actual target. > >> > >> The alternatives patching logic will attempt to fix up the branch to > >> point to its original target, which will be the veneer in this case, > >> but given that the patch site is likely to be far away as well, it > >> will be out of range and so patching will fail. There are other cases > >> where these veneers are problematic, e.g., when the target of the > >> branch is in .text while the patch site is in .init.text, in which > >> case putting the replacement sequence inside .text may not help either. > >> > >> So let's use subsections to emit the replacement code as closely as > >> possible to the patch site, to ensure that veneers are only likely to > >> be emitted if they are required at the patch site as well, in which > >> case they will be in range for the replacement sequence both before > >> and after it is transported to the patch site. > >> > >> This will prevent alternative sequences in non-init code from being > >> released from memory after boot, but this is tolerable given that the > >> entire section is only 512 KB on an allyesconfig build (which weighs in > >> at 500+ MB for the entire Image). Also, note that modules today carry > >> the replacement sequences in non-init sections as well, and any of > >> those that target init code will be emitted into init sections after > >> this change. > >> > >> This fixes an early crash when booting an allyesconfig kernel on a > >> system where any of the alternatives sequences containing relative > >> branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > >> > >> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > >> Cc: James Morse <james.morse@arm.com> > >> Cc: Andre Przywara <andre.przywara@arm.com> > >> Cc: Dave P Martin <dave.martin@arm.com> > >> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > >> --- > >> arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > >> arch/arm64/kernel/vmlinux.lds.S | 3 --- > >> 2 files changed, 8 insertions(+), 11 deletions(-) > >> > >> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > >> index 5e5dc05d63a0..12f0eb56a1cc 100644 > >> --- a/arch/arm64/include/asm/alternative.h > >> +++ b/arch/arm64/include/asm/alternative.h > >> @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > >> ".pushsection .altinstructions,\"a\"\n" \ > >> ALTINSTR_ENTRY(feature) \ > >> ".popsection\n" \ > >> - ".pushsection .altinstr_replacement, \"a\"\n" \ > >> + ".subsection 1\n" \ > >> "663:\n\t" \ > >> newinstr "\n" \ > >> "664:\n\t" \ > >> - ".popsection\n\t" \ > >> + ".previous\n\t" \ > >> ".org . - (664b-663b) + (662b-661b)\n\t" \ > >> ".org . - (662b-661b) + (664b-663b)\n" \ > >> ".endif\n" > >> @@ -117,9 +117,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > >> 662: .pushsection .altinstructions, "a" > >> altinstruction_entry 661b, 663f, \cap, 662b-661b, 664f-663f > >> .popsection > >> - .pushsection .altinstr_replacement, "ax" > >> + .subsection 1 > >> 663: \insn2 > >> -664: .popsection > >> +664: .previous > >> .org . - (664b-663b) + (662b-661b) > >> .org . - (662b-661b) + (664b-663b) > >> .endif > >> @@ -160,7 +160,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > >> .pushsection .altinstructions, "a" > >> altinstruction_entry 663f, 661f, \cap, 664f-663f, 662f-661f > >> .popsection > >> - .pushsection .altinstr_replacement, "ax" > >> + .subsection 1 > >> .align 2 /* So GAS knows label 661 is suitably aligned */ > >> 661: > >> .endm > >> @@ -179,9 +179,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > >> .macro alternative_else > >> 662: > >> .if .Lasm_alt_mode==0 > >> - .pushsection .altinstr_replacement, "ax" > >> + .subsection 1 > >> .else > >> - .popsection > >> + .previous > >> .endif > >> 663: > >> .endm > >> @@ -192,7 +192,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > >> .macro alternative_endif > >> 664: > >> .if .Lasm_alt_mode==0 > >> - .popsection > >> + .previous > >> .endif > >> .org . - (664b-663b) + (662b-661b) > >> .org . - (662b-661b) + (664b-663b) > >> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > >> index 6827da7f3aa5..5423ffe0a987 100644 > >> --- a/arch/arm64/kernel/vmlinux.lds.S > >> +++ b/arch/arm64/kernel/vmlinux.lds.S > >> @@ -165,9 +165,6 @@ SECTIONS > >> *(.altinstructions) > >> __alt_instructions_end = .; > >> } > >> - .altinstr_replacement : { > >> - *(.altinstr_replacement) > >> - } > >> > >> . = ALIGN(SEGMENT_ALIGN); > >> __inittext_end = .; > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Thu, 9 Jul 2020 at 15:31, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 9 Jul 2020 at 14:11, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > > > Hi, > > > > Hi Alex, > > Apologies for the breakage. > > > > I should post the entire boot log: > ... > > [ 0.002204] pc : work_pending+0x32c/0x348 > > This suggests that you ended executing directly from the alternatives > replacement section that gets appended to the end of work_pending (and > therefore gets mistaken for being part of it) > > It appears that the following code in alternatives.c > > static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) > { > unsigned long replptr; > > if (kernel_text_address(pc)) > return true; > > returns true inadvertently for the branch in this piece of code in entry.S > > alternative_if ARM64_HAS_IRQ_PRIO_MASKING > ldr x20, [sp, #S_PMR_SAVE] > msr_s SYS_ICC_PMR_EL1, x20 > mrs_s x21, SYS_ICC_CTLR_EL1 > tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE > dsb sy // Ensure priority change is seen by redistributor > .L__skip_pmr_sync\@: > > > due to the fact that kernel_text_address() has no way of > distinguishing branches inside the subsection from branches that > require updating. So the alternatives patching code dutifully updates > the tbz opcode and points it to its original target in the subsection. > > This is going to be rather tricky to fix, unless we special case > tbz/cbz branches and other branches with limited range that would > never have worked before anyway. > > For now, better to just revert it and revisit it later. > ... unless we decide to fix up all branches pointing outside the replacement sequence, which is not an entirely unreasonable thing to do: diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index d1757ef1b1e7..7c205f9202a3 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -45,18 +45,11 @@ { unsigned long replptr; - if (kernel_text_address(pc)) - return true; - replptr = (unsigned long)ALT_REPL_PTR(alt); if (pc >= replptr && pc <= (replptr + alt->alt_len)) return false; - /* - * Branching into *another* alternate sequence is doomed, and - * we're not even trying to fix it up. - */ - BUG(); + return true; }
On Thu, Jul 09, 2020 at 03:39:53PM +0300, Ard Biesheuvel wrote: > On Thu, 9 Jul 2020 at 15:31, Ard Biesheuvel <ardb@kernel.org> wrote: > > It appears that the following code in alternatives.c > > > > static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) > > { > > unsigned long replptr; > > > > if (kernel_text_address(pc)) > > return true; > > > > returns true inadvertently for the branch in this piece of code in entry.S > > > > alternative_if ARM64_HAS_IRQ_PRIO_MASKING > > ldr x20, [sp, #S_PMR_SAVE] > > msr_s SYS_ICC_PMR_EL1, x20 > > mrs_s x21, SYS_ICC_CTLR_EL1 > > tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE > > dsb sy // Ensure priority change is seen by redistributor > > .L__skip_pmr_sync\@: > > > > > > due to the fact that kernel_text_address() has no way of > > distinguishing branches inside the subsection from branches that > > require updating. So the alternatives patching code dutifully updates > > the tbz opcode and points it to its original target in the subsection. > > > > This is going to be rather tricky to fix, unless we special case > > tbz/cbz branches and other branches with limited range that would > > never have worked before anyway. > > > > For now, better to just revert it and revisit it later. > > > > ... unless we decide to fix up all branches pointing outside the > replacement sequence, which is not an entirely unreasonable thing to > do: > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > index d1757ef1b1e7..7c205f9202a3 100644 > --- a/arch/arm64/kernel/alternative.c > +++ b/arch/arm64/kernel/alternative.c > @@ -45,18 +45,11 @@ > { > unsigned long replptr; > > - if (kernel_text_address(pc)) > - return true; > - > replptr = (unsigned long)ALT_REPL_PTR(alt); > if (pc >= replptr && pc <= (replptr + alt->alt_len)) > return false; > > - /* > - * Branching into *another* alternate sequence is doomed, and > - * we're not even trying to fix it up. > - */ > - BUG(); > + return true; That looks better than the revert to me. Alex -- can you give it a spin with your setup, please? Will
Hi Ard, Thank you so much for your quick reply! On 7/9/20 1:39 PM, Ard Biesheuvel wrote: > On Thu, 9 Jul 2020 at 15:31, Ard Biesheuvel <ardb@kernel.org> wrote: >> On Thu, 9 Jul 2020 at 14:11, Alexandru Elisei <alexandru.elisei@arm.com> wrote: >>> Hi, >>> >> Hi Alex, >> >> Apologies for the breakage. >> >> >>> I should post the entire boot log: >> ... >>> [ 0.002204] pc : work_pending+0x32c/0x348 >> This suggests that you ended executing directly from the alternatives >> replacement section that gets appended to the end of work_pending (and >> therefore gets mistaken for being part of it) >> >> It appears that the following code in alternatives.c >> >> static bool branch_insn_requires_update(struct alt_instr *alt, unsigned long pc) >> { >> unsigned long replptr; >> >> if (kernel_text_address(pc)) >> return true; >> >> returns true inadvertently for the branch in this piece of code in entry.S >> >> alternative_if ARM64_HAS_IRQ_PRIO_MASKING >> ldr x20, [sp, #S_PMR_SAVE] >> msr_s SYS_ICC_PMR_EL1, x20 >> mrs_s x21, SYS_ICC_CTLR_EL1 >> tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE >> dsb sy // Ensure priority change is seen by redistributor >> .L__skip_pmr_sync\@: >> >> >> due to the fact that kernel_text_address() has no way of >> distinguishing branches inside the subsection from branches that >> require updating. So the alternatives patching code dutifully updates >> the tbz opcode and points it to its original target in the subsection. >> >> This is going to be rather tricky to fix, unless we special case >> tbz/cbz branches and other branches with limited range that would >> never have worked before anyway. >> >> For now, better to just revert it and revisit it later. >> > ... unless we decide to fix up all branches pointing outside the > replacement sequence, which is not an entirely unreasonable thing to > do: > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > index d1757ef1b1e7..7c205f9202a3 100644 > --- a/arch/arm64/kernel/alternative.c > +++ b/arch/arm64/kernel/alternative.c > @@ -45,18 +45,11 @@ > { > unsigned long replptr; > > - if (kernel_text_address(pc)) > - return true; > - > replptr = (unsigned long)ALT_REPL_PTR(alt); > if (pc >= replptr && pc <= (replptr + alt->alt_len)) > return false; > > - /* > - * Branching into *another* alternate sequence is doomed, and > - * we're not even trying to fix it up. > - */ > - BUG(); > + return true; > } Both fixes work for me. I've been running some tests on my rockpro64 with your patch reverted, so that definitely fixes the issue. With the above diff applied, I was able to boot and run some PMU tests using NMIs. Thanks, Alex
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index 5e5dc05d63a0..12f0eb56a1cc 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } ".pushsection .altinstructions,\"a\"\n" \ ALTINSTR_ENTRY(feature) \ ".popsection\n" \ - ".pushsection .altinstr_replacement, \"a\"\n" \ + ".subsection 1\n" \ "663:\n\t" \ newinstr "\n" \ "664:\n\t" \ - ".popsection\n\t" \ + ".previous\n\t" \ ".org . - (664b-663b) + (662b-661b)\n\t" \ ".org . - (662b-661b) + (664b-663b)\n" \ ".endif\n" @@ -117,9 +117,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } 662: .pushsection .altinstructions, "a" altinstruction_entry 661b, 663f, \cap, 662b-661b, 664f-663f .popsection - .pushsection .altinstr_replacement, "ax" + .subsection 1 663: \insn2 -664: .popsection +664: .previous .org . - (664b-663b) + (662b-661b) .org . - (662b-661b) + (664b-663b) .endif @@ -160,7 +160,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } .pushsection .altinstructions, "a" altinstruction_entry 663f, 661f, \cap, 664f-663f, 662f-661f .popsection - .pushsection .altinstr_replacement, "ax" + .subsection 1 .align 2 /* So GAS knows label 661 is suitably aligned */ 661: .endm @@ -179,9 +179,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { } .macro alternative_else 662: .if .Lasm_alt_mode==0 - .pushsection .altinstr_replacement, "ax" + .subsection 1 .else - .popsection + .previous .endif 663: .endm @@ -192,7 +192,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } .macro alternative_endif 664: .if .Lasm_alt_mode==0 - .popsection + .previous .endif .org . - (664b-663b) + (662b-661b) .org . - (662b-661b) + (664b-663b) diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 6827da7f3aa5..5423ffe0a987 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -165,9 +165,6 @@ SECTIONS *(.altinstructions) __alt_instructions_end = .; } - .altinstr_replacement : { - *(.altinstr_replacement) - } . = ALIGN(SEGMENT_ALIGN); __inittext_end = .;
When building very large kernels, the logic that emits replacement sequences for alternatives fails when relative branches are present in the code that is emitted into the .altinstr_replacement section and patched in at the original site and fixed up. The reason is that the linker will insert veneers if relative branches go out of range, and due to the relative distance of the .altinstr_replacement from the .text section where its branch targets usually live, veneers may be emitted at the end of the .altinstr_replacement section, with the relative branches in the sequence pointed at the veneers instead of the actual target. The alternatives patching logic will attempt to fix up the branch to point to its original target, which will be the veneer in this case, but given that the patch site is likely to be far away as well, it will be out of range and so patching will fail. There are other cases where these veneers are problematic, e.g., when the target of the branch is in .text while the patch site is in .init.text, in which case putting the replacement sequence inside .text may not help either. So let's use subsections to emit the replacement code as closely as possible to the patch site, to ensure that veneers are only likely to be emitted if they are required at the patch site as well, in which case they will be in range for the replacement sequence both before and after it is transported to the patch site. This will prevent alternative sequences in non-init code from being released from memory after boot, but this is tolerable given that the entire section is only 512 KB on an allyesconfig build (which weighs in at 500+ MB for the entire Image). Also, note that modules today carry the replacement sequences in non-init sections as well, and any of those that target init code will be emitted into init sections after this change. This fixes an early crash when booting an allyesconfig kernel on a system where any of the alternatives sequences containing relative branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm64/include/asm/alternative.h | 16 ++++++++-------- arch/arm64/kernel/vmlinux.lds.S | 3 --- 2 files changed, 8 insertions(+), 11 deletions(-)