Message ID | 20200519121818.14511-4-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64 sigreturn unwinding fixes | expand |
On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote: > Daniel reports that the .cfi_startproc is misplaced for the sigreturn > trampoline, which causes LLVM's unwinder to misbehave: > > | I run into this with LLVM’s unwinder. > | This combination was always broken. > > This prompted Dave to realise that our CFI directives are contradictory, > as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter > unconditionally identifying the interrupted context as opposed to the > values in the sigcontext. > > Rework the CFI directives so that we only use .cfi_signal_frame, and > include the "mysterious NOP" as part of the .cfi_{start,end}proc block. > > Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> > Reported-by: Dave Martin <dave.martin@arm.com> > Reported-by: Daniel Kiss <daniel.kiss@arm.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S > index 7853fa9692f6..28b33f7d0604 100644 > --- a/arch/arm64/kernel/vdso/sigreturn.S > +++ b/arch/arm64/kernel/vdso/sigreturn.S > @@ -14,6 +14,9 @@ > > .text > > +/* Ensure that the mysterious NOP can be associated with a function. */ > + .cfi_startproc > + .cfi_signal_frame > /* > * This mysterious NOP is required for some unwinders that subtract one from > * the return address in order to identify the calling function. > @@ -28,11 +31,6 @@ > * is perfectly fine. > */ > SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) > - .cfi_startproc > - .cfi_signal_frame > - .cfi_def_cfa x29, 0 > - .cfi_offset x29, 0 * 8 > - .cfi_offset x30, 1 * 8 Having thought about this again, I think it might be better to stick to the original version. If the signal handler is halfway through mungeing the sigcontext then backtracing using sigcontext won't be reliable. In any case, if something in the interrupted code caused the signal, the backtrace of the old stack is likely to me more useful, and that's what x29 will give us. If there's no old stack because we blew it away, that's too bad. Plus, in the absence of any spec that says exactly what .cfi_signal_frame means*, we probably don't want to rock the boat. Cheers ---Dave [*] assumption, but the arch ABI and Dwarf specs are unlikely to cover this, and Linux doesn't go in for specs.
On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote: > On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote: > > Daniel reports that the .cfi_startproc is misplaced for the sigreturn > > trampoline, which causes LLVM's unwinder to misbehave: > > > > | I run into this with LLVM’s unwinder. > > | This combination was always broken. > > > > This prompted Dave to realise that our CFI directives are contradictory, > > as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter > > unconditionally identifying the interrupted context as opposed to the > > values in the sigcontext. > > > > Rework the CFI directives so that we only use .cfi_signal_frame, and > > include the "mysterious NOP" as part of the .cfi_{start,end}proc block. > > > > Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> > > Reported-by: Dave Martin <dave.martin@arm.com> > > Reported-by: Daniel Kiss <daniel.kiss@arm.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S > > index 7853fa9692f6..28b33f7d0604 100644 > > --- a/arch/arm64/kernel/vdso/sigreturn.S > > +++ b/arch/arm64/kernel/vdso/sigreturn.S > > @@ -14,6 +14,9 @@ > > > > .text > > > > +/* Ensure that the mysterious NOP can be associated with a function. */ > > + .cfi_startproc > > + .cfi_signal_frame > > /* > > * This mysterious NOP is required for some unwinders that subtract one from > > * the return address in order to identify the calling function. > > @@ -28,11 +31,6 @@ > > * is perfectly fine. > > */ > > SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) > > - .cfi_startproc > > - .cfi_signal_frame > > - .cfi_def_cfa x29, 0 > > - .cfi_offset x29, 0 * 8 > > - .cfi_offset x30, 1 * 8 > > Having thought about this again, I think it might be better to stick to > the original version. > > If the signal handler is halfway through mungeing the sigcontext then > backtracing using sigcontext won't be reliable. I suppose, but then what does .cfi_signal_frame do? I'll see if I can find something that uses it. The frame record is still sitting on the stack, so it does feel redundant to say both '.cfi_signal_frame' and '.cfi_def_cfa' (and other architectures, e.g. riscv don't do this). But I'm also happy to play it safe if I can stick a comment in here saying what it does. > Plus, in the absence of any spec that says exactly what > .cfi_signal_frame means*, we probably don't want to rock the boat. The gas docs say: "Mark current function as signal trampoline." which is really informative. Will
On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote: > On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote: > > On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote: > > > Daniel reports that the .cfi_startproc is misplaced for the sigreturn > > > trampoline, which causes LLVM's unwinder to misbehave: > > > > > > | I run into this with LLVM’s unwinder. > > > | This combination was always broken. > > > > > > This prompted Dave to realise that our CFI directives are contradictory, > > > as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter > > > unconditionally identifying the interrupted context as opposed to the > > > values in the sigcontext. > > > > > > Rework the CFI directives so that we only use .cfi_signal_frame, and > > > include the "mysterious NOP" as part of the .cfi_{start,end}proc block. > > > > > > Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> > > > Reported-by: Dave Martin <dave.martin@arm.com> > > > Reported-by: Daniel Kiss <daniel.kiss@arm.com> > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- > > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S > > > index 7853fa9692f6..28b33f7d0604 100644 > > > --- a/arch/arm64/kernel/vdso/sigreturn.S > > > +++ b/arch/arm64/kernel/vdso/sigreturn.S > > > @@ -14,6 +14,9 @@ > > > > > > .text > > > > > > +/* Ensure that the mysterious NOP can be associated with a function. */ > > > + .cfi_startproc > > > + .cfi_signal_frame > > > /* > > > * This mysterious NOP is required for some unwinders that subtract one from > > > * the return address in order to identify the calling function. > > > @@ -28,11 +31,6 @@ > > > * is perfectly fine. > > > */ > > > SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) > > > - .cfi_startproc > > > - .cfi_signal_frame > > > - .cfi_def_cfa x29, 0 > > > - .cfi_offset x29, 0 * 8 > > > - .cfi_offset x30, 1 * 8 > > > > Having thought about this again, I think it might be better to stick to > > the original version. > > > > If the signal handler is halfway through mungeing the sigcontext then > > backtracing using sigcontext won't be reliable. > > I suppose, but then what does .cfi_signal_frame do? I'll see if I can > find something that uses it. The frame record is still sitting on the > stack, so it does feel redundant to say both '.cfi_signal_frame' and > '.cfi_def_cfa' (and other architectures, e.g. riscv don't do this). > > But I'm also happy to play it safe if I can stick a comment in here > saying what it does. > > > Plus, in the absence of any spec that says exactly what > > .cfi_signal_frame means*, we probably don't want to rock the boat. > > The gas docs say: > > "Mark current function as signal trampoline." > > which is really informative. Well, we've demonstrated that identifying the signal frame is a gross bodge. The cfi annotation should provide a reliable way to identify the signal frame, but I guess it was too poorly specified or came too late to prevent the bodges from spreading. Since this seems to be a nonstandard invention, I wouldn't hold out much hope of finding a usable spec. Of course, something might be using it now, so I guess we have to leave it. ---Dave
On Tue, May 19, 2020 at 02:55:37PM +0100, Dave Martin wrote: > On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote: > > The gas docs say: > > > > "Mark current function as signal trampoline." > > > > which is really informative. > > Well, we've demonstrated that identifying the signal frame is a gross > bodge. The cfi annotation should provide a reliable way to identify the > signal frame, but I guess it was too poorly specified or came too late > to prevent the bodges from spreading. > > Since this seems to be a nonstandard invention, I wouldn't hold out > much hope of finding a usable spec. > > Of course, something might be using it now, so I guess we have to leave > it. I had a quick look at libstdc++ (the horror!) and it looks like the DWARF backend in there /does/ make use of this information as part of the _Unwind_GetIPInfo() function: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib--unwind-getipinfo.html *ip_before_insn is set to 1 or 0 depending on whether or not the PC corresponds to a function annotated with .cfi_signal_frame. So I think the code in libstdc++-v3/libsupc++/eh_personality.cc doesn't need the mysterious NOP at all! Unfortunately, it looks like the LLVM libc++ doesn't use this, and instead calls _Unwind_GetIP(): https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/baselib--unwind-getip.html and unconditionally subtracts 1 in libcxxabi/src/cxa_personality.cpp, meaning that the NOP is necessary. So, after giving myself a splitting headache, it looks like: 1. We need the mysterious NOP for LLVM 2. We could drop .cfi_signal_frame but it's harmless to keep it 3. We need the .cfi_def_cfa directive to locate the frame record, as .cfi_signal_frame doesn't do very much at all. Make sense? If so, I'll spin a v2 of the patches along with a comment trying to summarise some of this. Cheers, Will
> On 19 May 2020, at 15:55, Dave Martin <Dave.Martin@arm.com> wrote: > > On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote: >> On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote: >>> On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote: >>>> Daniel reports that the .cfi_startproc is misplaced for the sigreturn >>>> trampoline, which causes LLVM's unwinder to misbehave: >>>> >>>> | I run into this with LLVM’s unwinder. >>>> | This combination was always broken. >>>> >>>> This prompted Dave to realise that our CFI directives are contradictory, >>>> as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter >>>> unconditionally identifying the interrupted context as opposed to the >>>> values in the sigcontext. >>>> >>>> Rework the CFI directives so that we only use .cfi_signal_frame, and >>>> include the "mysterious NOP" as part of the .cfi_{start,end}proc block. >>>> >>>> Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> >>>> Reported-by: Dave Martin <dave.martin@arm.com> >>>> Reported-by: Daniel Kiss <daniel.kiss@arm.com> >>>> Signed-off-by: Will Deacon <will@kernel.org> >>>> --- >>>> arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- >>>> 1 file changed, 3 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S >>>> index 7853fa9692f6..28b33f7d0604 100644 >>>> --- a/arch/arm64/kernel/vdso/sigreturn.S >>>> +++ b/arch/arm64/kernel/vdso/sigreturn.S >>>> @@ -14,6 +14,9 @@ >>>> >>>> .text >>>> >>>> +/* Ensure that the mysterious NOP can be associated with a function. */ >>>> + .cfi_startproc >>>> + .cfi_signal_frame >>>> /* >>>> * This mysterious NOP is required for some unwinders that subtract one from >>>> * the return address in order to identify the calling function. >>>> @@ -28,11 +31,6 @@ >>>> * is perfectly fine. >>>> */ >>>> SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) >>>> - .cfi_startproc >>>> - .cfi_signal_frame >>>> - .cfi_def_cfa x29, 0 >>>> - .cfi_offset x29, 0 * 8 >>>> - .cfi_offset x30, 1 * 8 LLVM’s unwinder does not like this version of the CFI. It needs a bit more information, the cfi_signal_frame is not used for finding the frame. >>> >>> Having thought about this again, I think it might be better to stick to >>> the original version. >>> >>> If the signal handler is halfway through mungeing the sigcontext then >>> backtracing using sigcontext won't be reliable. >> >> I suppose, but then what does .cfi_signal_frame do? I'll see if I can >> find something that uses it. The frame record is still sitting on the >> stack, so it does feel redundant to say both '.cfi_signal_frame' and >> '.cfi_def_cfa' (and other architectures, e.g. riscv don't do this). >> >> But I'm also happy to play it safe if I can stick a comment in here >> saying what it does. Sounds good to me. >> >>> Plus, in the absence of any spec that says exactly what >>> .cfi_signal_frame means*, we probably don't want to rock the boat. >> >> The gas docs say: >> >> "Mark current function as signal trampoline." >> >> which is really informative. > > Well, we've demonstrated that identifying the signal frame is a gross > bodge. The cfi annotation should provide a reliable way to identify the > signal frame, but I guess it was too poorly specified or came too late > to prevent the bodges from spreading. > > Since this seems to be a nonstandard invention, I wouldn't hold out > much hope of finding a usable spec. > > Of course, something might be using it now, so I guess we have to leave > it. > > ---Dave
On Tue, May 19, 2020 at 03:30:57PM +0000, Daniel Kiss wrote: > > On 19 May 2020, at 15:55, Dave Martin <Dave.Martin@arm.com> wrote: > > On Tue, May 19, 2020 at 02:39:41PM +0100, Will Deacon wrote: > >> On Tue, May 19, 2020 at 02:09:31PM +0100, Dave P Martin wrote: > >>> On Tue, May 19, 2020 at 01:18:18PM +0100, Will Deacon wrote: > >>>> Daniel reports that the .cfi_startproc is misplaced for the sigreturn > >>>> trampoline, which causes LLVM's unwinder to misbehave: > >>>> > >>>> | I run into this with LLVM’s unwinder. > >>>> | This combination was always broken. > >>>> > >>>> This prompted Dave to realise that our CFI directives are contradictory, > >>>> as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter > >>>> unconditionally identifying the interrupted context as opposed to the > >>>> values in the sigcontext. > >>>> > >>>> Rework the CFI directives so that we only use .cfi_signal_frame, and > >>>> include the "mysterious NOP" as part of the .cfi_{start,end}proc block. > >>>> > >>>> Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> > >>>> Reported-by: Dave Martin <dave.martin@arm.com> > >>>> Reported-by: Daniel Kiss <daniel.kiss@arm.com> > >>>> Signed-off-by: Will Deacon <will@kernel.org> > >>>> --- > >>>> arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- > >>>> 1 file changed, 3 insertions(+), 5 deletions(-) > >>>> > >>>> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S > >>>> index 7853fa9692f6..28b33f7d0604 100644 > >>>> --- a/arch/arm64/kernel/vdso/sigreturn.S > >>>> +++ b/arch/arm64/kernel/vdso/sigreturn.S > >>>> @@ -14,6 +14,9 @@ > >>>> > >>>> .text > >>>> > >>>> +/* Ensure that the mysterious NOP can be associated with a function. */ > >>>> + .cfi_startproc > >>>> + .cfi_signal_frame > >>>> /* > >>>> * This mysterious NOP is required for some unwinders that subtract one from > >>>> * the return address in order to identify the calling function. > >>>> @@ -28,11 +31,6 @@ > >>>> * is perfectly fine. > >>>> */ > >>>> SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) > >>>> - .cfi_startproc > >>>> - .cfi_signal_frame > >>>> - .cfi_def_cfa x29, 0 > >>>> - .cfi_offset x29, 0 * 8 > >>>> - .cfi_offset x30, 1 * 8 > LLVM’s unwinder does not like this version of the CFI. It needs a bit more information, > the cfi_signal_frame is not used for finding the frame. Thanks, Daniel. That is, at least, aligned with my current understanding of how this is supposed to work. I'll send out a v2 in a bit. Will
diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S index 7853fa9692f6..28b33f7d0604 100644 --- a/arch/arm64/kernel/vdso/sigreturn.S +++ b/arch/arm64/kernel/vdso/sigreturn.S @@ -14,6 +14,9 @@ .text +/* Ensure that the mysterious NOP can be associated with a function. */ + .cfi_startproc + .cfi_signal_frame /* * This mysterious NOP is required for some unwinders that subtract one from * the return address in order to identify the calling function. @@ -28,11 +31,6 @@ * is perfectly fine. */ SYM_START(__kernel_rt_sigreturn, SYM_L_GLOBAL, SYM_A_ALIGN) - .cfi_startproc - .cfi_signal_frame - .cfi_def_cfa x29, 0 - .cfi_offset x29, 0 * 8 - .cfi_offset x30, 1 * 8 mov x8, #__NR_rt_sigreturn svc #0 .cfi_endproc
Daniel reports that the .cfi_startproc is misplaced for the sigreturn trampoline, which causes LLVM's unwinder to misbehave: | I run into this with LLVM’s unwinder. | This combination was always broken. This prompted Dave to realise that our CFI directives are contradictory, as we specify both .cfi_signal_frame *and* .cfi_def_cfa, with the latter unconditionally identifying the interrupted context as opposed to the values in the sigcontext. Rework the CFI directives so that we only use .cfi_signal_frame, and include the "mysterious NOP" as part of the .cfi_{start,end}proc block. Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> Reported-by: Dave Martin <dave.martin@arm.com> Reported-by: Daniel Kiss <daniel.kiss@arm.com> Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kernel/vdso/sigreturn.S | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)