Message ID | 20220407202518.19780-1-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: livepatch: Use DWARF Call Frame Information for frame pointer validation | expand |
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > The solution > ============ > > The goal here is to use the absolute minimum CFI needed to compute the FP at > every instruction address. The unwinder can compute the FP in each frame, > compare the actual FP with the computed one and validate the actual FP. > > Objtool is enhanced to parse the CFI, extract just the rules required, > encode them in compact data structures and create special sections for > the rules. The unwinder uses the special sections to find the rules for > a given instruction address and compute the FP. > > Objtool can be invoked as follows: > > objtool dwarf generate <object-file> Hi Madhaven, This is quite interesting. And it's exactly the kind of crazy idea I can appreciate ;-) Some initial thoughts: 1) I have some concerns about DWARF's reliability, especially considering a) inline asm, b) regular asm, and c) the kernel's tendency to push compilers to their limits. BUT, supplementing the frame pointer unwinding with DWARF, rather than just relying on DWARF alone, does help a LOT. I guess the hope is that cross-checking two "mostly reliable" things against each other (frame pointers and DWARF) will give a reliable result ;-) In a general sense, I've never looked at DWARF's reliability, even for just normal C code. It would be good to have some way of knowing that DWARF looks mostly sane for both GCC and Clang. For example, maybe somehow cross-checking it with objtool's knowledge. And then of course we'd have to hope that it stays bug-free in future compilers. I'd also be somewhat concerned about assembly. Since there's nothing ensuring the unwind hints are valid, and will stay valid over time, I wonder how likely it would be for that to break, and what the implications would be. Most likely I guess it would break silently, but then get caught by the frame pointer cross-checking. So a broken hint might not get noticed for a long time, but at least it (hopefully) wouldn't break reliable unwinding. Also, inline asm can sometimes do stack hacks like "push;do_something;pop" which isn't visible to the toolchain. But again, hopefully the frame pointer checking would fail and mark it unreliable. So I do have some worries about DWARF, but the fact that it's getting "fact checked" by frame pointers might be sufficient. 2) If I understand correctly, objtool is converting parts of DWARF to a new format which can then be read by the kernel. In that case, please don't call it DWARF as that will cause a lot of confusion. There are actually several similarities between your new format and ORC, which is also an objtool-created DWARF alternative. It would be interesting to see if they could be combined somehow. 3) Objtool has become an integral part of x86-64, due to security and performance features and toolchain workarounds. Not *all* of its features require the full "branch validation" which follows all code paths -- and was the hardest part to get right -- but several features *do* need that: stack validation, ORC, uaccess validation, noinstr validation. Objtool has been picking up a lot of steam (and features) lately, with more features currently in active development. And lately there have been renewed patches for porting it to powerpc and arm64 (and rumors of s390). If arm64 ever wants one of those features -- particularly a "branch validation" based feature -- I think it would make more sense to just do the stack validation in objtool, rather than the DWARF supplementation approach. Just to give an idea of what objtool already supports and how useful it has become for x86, here's an excerpt from some documentation I've been working on, since I'm in the middle of rewriting the interface to make it more modular. This is a list of all its current features: Features -------- Objtool has the following features: - Stack unwinding metadata validation -- useful for helping to ensure stack traces are reliable for live patching - ORC unwinder metadata generation -- a faster and more precise alternative to frame pointer based unwinding - Retpoline validation -- ensures that all indirect calls go through retpoline thunks, for Spectre v2 mitigations - Retpoline call site annotation -- annotates all retpoline thunk call sites, enabling the kernel to patch them inline, to prevent "thunk funneling" for both security and performance reasons - Non-instrumentation validation -- validates non-instrumentable ("noinstr") code rules, preventing unexpected instrumentation in low-level C entry code - Static call annotation -- annotates static call sites, enabling the kernel to implement inline static calls, a faster alternative to some indirect branches - Uaccess validation -- validates uaccess rules for a proper safe implementation of Supervisor Mode Access Protection (SMAP) - Straight Line Speculation validation -- validates certain SLS mitigations - Indirect Branch Tracking validation -- validates Intel CET IBT rules to ensure that all functions referenced by function pointers have corresponding ENDBR instructions - Indirect Branch Tracking annotation -- annotates unused ENDBR instruction sites, enabling the kernel to "seal" them (replace them with NOPs) to further harden IBT - Function entry annotation -- annotates function entries, enabling kernel function tracing - Other toolchain hacks which will go unmentioned at this time...
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote:
> [-- application/octet-stream is unsupported (use 'v' to view this part) --]
Your emails are unreadable :-(
Right; so not having seen the patches due to Madhaven's email being broken, I can perhaps less appreciated the crazy involved. On Thu, Apr 07, 2022 at 05:21:51PM -0700, Josh Poimboeuf wrote: > 2) > > If I understand correctly, objtool is converting parts of DWARF to a new > format which can then be read by the kernel. In that case, please don't > call it DWARF as that will cause a lot of confusion. > > There are actually several similarities between your new format and ORC, > which is also an objtool-created DWARF alternative. It would be > interesting to see if they could be combined somehow. What Josh said; please use/extend ORC. I really don't understand where all this crazy is coming from; why does objtool need to do something radically weird for ARM64? There are existing ARM64 patches for objtool; in fact they have recently been re-posted: https://lkml.kernel.org/r/20220407120141.43801-1-chenzhongjin@huawei.com The only tricky bit seems to be the whole jump-table issue. Using DWARF as input to deal with jump-tables should be possible -- exceedingly overkill, but possible I suppose. Mandating DWARF sucks though, compile times are so much worse with DWARVES on :/ Once objtool can properly follow/validate ARM64 code, it should be fairly straight forward to have it generate ORC data just like it does on x86_64.
On Fri, Apr 08, 2022 at 12:55:11PM +0200, Peter Zijlstra wrote: > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > > > [-- application/octet-stream is unsupported (use 'v' to view this part) --] > > Your emails are unreadable :-( List copy is OK, so perhaps it's due to how Josh bounced them..
On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > The solution > ============ > > The goal here is to use the absolute minimum CFI needed to compute the FP at > every instruction address. The unwinder can compute the FP in each frame, > compare the actual FP with the computed one and validate the actual FP. > > Objtool is enhanced to parse the CFI, extract just the rules required, > encode them in compact data structures and create special sections for > the rules. The unwinder uses the special sections to find the rules for > a given instruction address and compute the FP. > > Objtool can be invoked as follows: > > objtool dwarf generate <object-file> > > The version of the DWARF standard supported in this work is version 4. The > official documentation for this version is here: > > https://dwarfstd.org/doc/DWARF4.pdf > > Section 6.4 contains the description of the CFI. The problem is of course that DWARF is only available for compiler generated code and doesn't cover assembly code, of which is there is always lots. I suppose you can go add DWARF annotations to all the assembly, but IIRC those are pretty terrible. We were *REALLY* happy to delete all that nasty from the x86 code. On top of that, AFAIK compilers don't generally consider DWARF generation to be a correctness issue. For them it's debug info and having it be correct is nice but not required. So using it as input for something that's required to be correct, seems unfortunate.
On Fri, Apr 08, 2022 at 01:54:28PM +0200, Peter Zijlstra wrote: > On Fri, Apr 08, 2022 at 12:55:11PM +0200, Peter Zijlstra wrote: > > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > > > > > [-- application/octet-stream is unsupported (use 'v' to view this part) --] > > > > Your emails are unreadable :-( > > List copy is OK, so perhaps it's due to how Josh bounced them.. Corporate email Mimecast fail when I bounced them, sorry :-/
On 4/8/22 05:55, Peter Zijlstra wrote: > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > >> [-- application/octet-stream is unsupported (use 'v' to view this part) --] > > Your emails are unreadable :-( I am not sure why the emails are unreadable. Any suggestions? Should I resend? Please let me know. Sorry about this. Madhavan
On Sun, Apr 10, 2022 at 12:47:46PM -0500, Madhavan T. Venkataraman wrote: > > > On 4/8/22 05:55, Peter Zijlstra wrote: > > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: > > > >> [-- application/octet-stream is unsupported (use 'v' to view this part) --] > > > > Your emails are unreadable :-( > > I am not sure why the emails are unreadable. Any suggestions? Should I resend? Please let me know. > Sorry about this. That was actually my (company's) fault when I bounced the patches to Peter.
Hi Josh, On 4/7/22 19:21, Josh Poimboeuf wrote: > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: >> The solution >> ============ >> >> The goal here is to use the absolute minimum CFI needed to compute the FP at >> every instruction address. The unwinder can compute the FP in each frame, >> compare the actual FP with the computed one and validate the actual FP. >> >> Objtool is enhanced to parse the CFI, extract just the rules required, >> encode them in compact data structures and create special sections for >> the rules. The unwinder uses the special sections to find the rules for >> a given instruction address and compute the FP. >> >> Objtool can be invoked as follows: >> >> objtool dwarf generate <object-file> > > Hi Madhaven, > > This is quite interesting. And it's exactly the kind of crazy idea I > can appreciate ;-) > A little crazy is a good thing sometimes. > Some initial thoughts: > > > 1) > > I have some concerns about DWARF's reliability, especially considering > a) inline asm, b) regular asm, and c) the kernel's tendency to push > compilers to their limits. > I am thinking of implementing a DWARF verifier to make sure that the DWARF information in .debug_frame is correct. I am still in the process of designing this. I will keep you posted on that. This should address (a) and (c). As for (b), the compiler does not generate any DWARF rules for ASM code. DWARF annotations are a PITA to maintain. So, in my current design regular ASM functions are considered unreliable from an unwind perspective except the places that have unwind hints. Unwind hints are only needed for places that tend to occur frequently in stack traces. So, it would be just a handful of unwind hints which can be maintained. As you know, ASM functions come in two flavors - SYM_CODE() functions and SYM_FUNC() functions. SYM_CODE functions are, by definition, unreliable from an unwind perspective because they don't follow ABI rules and they don't set up any frame pointer. In my reliable stack trace patch series, I have a patch based on the opinion of the reviewers to mark these functions so the unwinder can recognize them and declare the stack trace unreliable. So, the only ASM functions that matter in (b) are the SYM_FUNC functions. For now, I have considered them to be unreliable. But I will analyze those functions to see if any of them can occur frequently in stack traces. If some of these functions can occur frequently in stack traces, I need to address them. I will see if unwind hints are a good fit. I will get back to you on this. > BUT, supplementing the frame pointer unwinding with DWARF, rather than > just relying on DWARF alone, does help a LOT. > Yes. > I guess the hope is that cross-checking two "mostly reliable" things > against each other (frame pointers and DWARF) will give a reliable > result ;-) > Yes! > In a general sense, I've never looked at DWARF's reliability, even for > just normal C code. It would be good to have some way of knowing that > DWARF looks mostly sane for both GCC and Clang. For example, maybe > somehow cross-checking it with objtool's knowledge. And then of course > we'd have to hope that it stays bug-free in future compilers. > This is a valid point. So far, I find that gcc generates reliable DWARF information. But there are two bugs in what Clang generates. I have added workarounds in my parser to compensate. So, I think a DWARF verifier is an option that architectures can use. At this point, I don't want to mandate a verifier on every architecture. But that is a discussion that we can have once I have a verifier ready. > I'd also be somewhat concerned about assembly. Since there's nothing > ensuring the unwind hints are valid, and will stay valid over time, I > wonder how likely it would be for that to break, and what the > implications would be. Most likely I guess it would break silently, but > then get caught by the frame pointer cross-checking. So a broken hint > might not get noticed for a long time, but at least it (hopefully) > wouldn't break reliable unwinding. > Yes. That is my thinking as well. When the unwinder checks the actual FP with the computed FP, any mismatch will be treated as unreliable code for unwind. So, apart from some retries during the livepatch process, this is most probably not a problem. Now, I set a flag for an unwind hint so that the unwinder knows that it is processing an unwind hint. I could generate a warning if an unwind hint does not result in a reliable unwind of the frame. This would bring the broken hint to people's attention. > Also, inline asm can sometimes do stack hacks like > "push;do_something;pop" which isn't visible to the toolchain. But > again, hopefully the frame pointer checking would fail and mark it > unreliable. > > So I do have some worries about DWARF, but the fact that it's getting > "fact checked" by frame pointers might be sufficient. > Exactly. > > 2) > > If I understand correctly, objtool is converting parts of DWARF to a new > format which can then be read by the kernel. In that case, please don't > call it DWARF as that will cause a lot of confusion. > OK. I will rename it. > There are actually several similarities between your new format and ORC, > which is also an objtool-created DWARF alternative. It would be > interesting to see if they could be combined somehow. > I will certainly look into it. So, if I decide to merge the two, I might want to make a minor change to the ORC structure. Would that be OK with you? > > 3) > > Objtool has become an integral part of x86-64, due to security and > performance features and toolchain workarounds. > > Not *all* of its features require the full "branch validation" which > follows all code paths -- and was the hardest part to get right -- but > several features *do* need that: stack validation, ORC, uaccess > validation, noinstr validation. > > Objtool has been picking up a lot of steam (and features) lately, with > more features currently in active development. And lately there have > been renewed patches for porting it to powerpc and arm64 (and rumors of > s390). > > If arm64 ever wants one of those features -- particularly a "branch > validation" based feature -- I think it would make more sense to just do > the stack validation in objtool, rather than the DWARF supplementation > approach. > First off, I think that objtool does a great job for X64. I only want to implement frame pointer validation in a different way. All the other features of objtool (listed below) are great. I have admired the amount of work you guys have put into the X64 part. These are the reasons why I tried the DWARF based method: - My implementation is largely architecture independent. There are a couple of minor pieces that are architecture-specific, but they are minor in nature. So, if an architecture wanted to support the livepatch feature but did not want to do a heavy weight objtool implementation, then it has an option. There has been some debate about whether static analysis should be mandated for livepatch. My patch series is an attempt to provide an option. - To get an objtool static analysis implementation working for an architecture as reliably as X64 and getting it reviewed and upstreamed can take years. It took years for X64, am I right? I mean, it has been quite a while since the original patch series for arm64 was posted. There have been only one or two minor comments so far. I am sure arm64 linux users would very much want to have livepatch available ASAP to be able to install security fixes without downtime. This is an immediate need. - No software is bug free. So, even if static analysis is implemented for an architecture, it would be good to have another method of verifying the unwind rules generated from the static analysis. DWARF can provide that additional verification. > Just to give an idea of what objtool already supports and how useful it > has become for x86, here's an excerpt from some documentation I've been > working on, since I'm in the middle of rewriting the interface to make > it more modular. This is a list of all its current features: > > > Features > -------- > > Objtool has the following features: > > > - Stack unwinding metadata validation -- useful for helping to ensure > stack traces are reliable for live patching > > - ORC unwinder metadata generation -- a faster and more precise > alternative to frame pointer based unwinding > > - Retpoline validation -- ensures that all indirect calls go through > retpoline thunks, for Spectre v2 mitigations > > - Retpoline call site annotation -- annotates all retpoline thunk call > sites, enabling the kernel to patch them inline, to prevent "thunk > funneling" for both security and performance reasons > > - Non-instrumentation validation -- validates non-instrumentable > ("noinstr") code rules, preventing unexpected instrumentation in > low-level C entry code > > - Static call annotation -- annotates static call sites, enabling the > kernel to implement inline static calls, a faster alternative to some > indirect branches > > - Uaccess validation -- validates uaccess rules for a proper safe > implementation of Supervisor Mode Access Protection (SMAP) > > - Straight Line Speculation validation -- validates certain SLS > mitigations > > - Indirect Branch Tracking validation -- validates Intel CET IBT rules > to ensure that all functions referenced by function pointers have > corresponding ENDBR instructions > > - Indirect Branch Tracking annotation -- annotates unused ENDBR > instruction sites, enabling the kernel to "seal" them (replace them > with NOPs) to further harden IBT > > - Function entry annotation -- annotates function entries, enabling > kernel function tracing > > - Other toolchain hacks which will go unmentioned at this time... > I completely agree. So, it is just frame pointer validation for livepatch I am trying to look at. Thanks! Madhavan
On 4/8/22 06:41, Peter Zijlstra wrote: > > Right; so not having seen the patches due to Madhaven's email being > broken, I can perhaps less appreciated the crazy involved. > Crazy like a fox. > On Thu, Apr 07, 2022 at 05:21:51PM -0700, Josh Poimboeuf wrote: >> 2) >> >> If I understand correctly, objtool is converting parts of DWARF to a new >> format which can then be read by the kernel. In that case, please don't >> call it DWARF as that will cause a lot of confusion. >> >> There are actually several similarities between your new format and ORC, >> which is also an objtool-created DWARF alternative. It would be >> interesting to see if they could be combined somehow. > > What Josh said; please use/extend ORC. > Yes. I am looking into it. > I really don't understand where all this crazy is coming from; why does > objtool need to do something radically weird for ARM64? > > There are existing ARM64 patches for objtool; in fact they have recently > been re-posted: > > https://lkml.kernel.org/r/20220407120141.43801-1-chenzhongjin@huawei.com > > The only tricky bit seems to be the whole jump-table issue. Using DWARF > as input to deal with jump-tables should be possible -- exceedingly > overkill, but possible I suppose. Mandating DWARF sucks though, compile > times are so much worse with DWARVES on :/ > > Once objtool can properly follow/validate ARM64 code, it should be > fairly straight forward to have it generate ORC data just like it does > on x86_64. > My reasons for attempting the DWARF based implementation: - My implementation is largely architecture independent. There are a couple of minor pieces that are architecture-specific, but they are minor in nature. So, if an architecture wanted to support the livepatch feature but did not want to do a heavy weight objtool implementation, then it has an option. There has been some debate about whether static analysis should be mandated for livepatch. My patch series is an attempt to provide an option. - To get an objtool static analysis implementation working for an architecture as reliably as X64 and getting it reviewed and upstreamed can take years. It took years for X64, am I right? I mean, it has been quite a while since the original patch series for arm64 was posted. There have been only one or two minor comments so far. I am sure arm64 linux users would very much want to have livepatch available ASAP to be able to install security fixes without downtime. This is an immediate need. - No software is bug free. So, even if static analysis is implemented for an architecture, it would be good to have another method of verifying the unwind rules generated from the static analysis. DWARF can provide that additional verification. Madhavan
On 4/8/22 07:06, Peter Zijlstra wrote: > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: >> The solution >> ============ >> >> The goal here is to use the absolute minimum CFI needed to compute the FP at >> every instruction address. The unwinder can compute the FP in each frame, >> compare the actual FP with the computed one and validate the actual FP. >> >> Objtool is enhanced to parse the CFI, extract just the rules required, >> encode them in compact data structures and create special sections for >> the rules. The unwinder uses the special sections to find the rules for >> a given instruction address and compute the FP. >> >> Objtool can be invoked as follows: >> >> objtool dwarf generate <object-file> >> >> The version of the DWARF standard supported in this work is version 4. The >> official documentation for this version is here: >> >> https://dwarfstd.org/doc/DWARF4.pdf >> >> Section 6.4 contains the description of the CFI. > > The problem is of course that DWARF is only available for compiler > generated code and doesn't cover assembly code, of which is there is > always lots. > Yes. But assembly functions are of two types: SYM_CODE_*() functions SYM_FUNC_*() functions SYM_CODE functions are, by definition, special functions that don't follow any ABI rules. They don't set up a frame. Based on the opinion of ARM64 experts, these need to be recognized by the unwinder and, if they are present in a stack trace, the stack trace must be considered unreliable. I have, in fact, submitted a patch to implement that. So, only SYM_FUNC*() functions are relevant for this part. I will look into these for arm64 and check if any of them can occur frequently in stack traces. If any of them is likely to occur frequently in stack traces, I must address them. If there are only a few such functions, unwind hints may be sufficient. I will get back to you on this. > I suppose you can go add DWARF annotations to all the assembly, but IIRC > those are pretty terrible. We were *REALLY* happy to delete all that > nasty from the x86 code. > DWARF annotations are a PITA to maintain. I will never recommend that! > On top of that, AFAIK compilers don't generally consider DWARF > generation to be a correctness issue. For them it's debug info and > having it be correct is nice but not required. So using it as input for > something that's required to be correct, seems unfortunate. It is only debug info. But if that info can be verified, then it is usable for livepatch purposes. I am thinking of implementing a verifier since DWARF reliability is a valid concern. I will keep you posted. Thanks! Madhavan
Hi Madhaven, Sorry I sent the last email as HTML. This is a plain text resend. On 2022/4/12 1:18, Madhavan T. Venkataraman wrote: >> In a general sense, I've never looked at DWARF's reliability, even for >> just normal C code. It would be good to have some way of knowing that >> DWARF looks mostly sane for both GCC and Clang. For example, maybe >> somehow cross-checking it with objtool's knowledge. And then of course >> we'd have to hope that it stays bug-free in future compilers. >> > > This is a valid point. So far, I find that gcc generates reliable DWARF information. > But there are two bugs in what Clang generates. I have added workarounds in my > parser to compensate. > > So, I think a DWARF verifier is an option that architectures can use. At this point, > I don't want to mandate a verifier on every architecture. But that is a discussion > that we can have once I have a verifier ready. > I'm concerning that depending on compilers to generate correct information can become a trouble because we linux kernel side can rarely fix what compilers make. That's also why the gcc plugin idea was objected in the objtool migration. If your parser can solve this it sounds more doable. >> I'd also be somewhat concerned about assembly. Since there's nothing >> ensuring the unwind hints are valid, and will stay valid over time, I >> wonder how likely it would be for that to break, and what the >> implications would be. Most likely I guess it would break silently, but >> then get caught by the frame pointer cross-checking. So a broken hint >> might not get noticed for a long time, but at least it (hopefully) >> wouldn't break reliable unwinding. >> > > Yes. That is my thinking as well. When the unwinder checks the actual FP with the > computed FP, any mismatch will be treated as unreliable code for unwind. So, > apart from some retries during the livepatch process, this is most probably not > a problem. > > Now, I set a flag for an unwind hint so that the unwinder knows that it is > processing an unwind hint. I could generate a warning if an unwind hint does not > result in a reliable unwind of the frame. This would bring the broken hint > to people's attention. > > >> Also, inline asm can sometimes do stack hacks like >> "push;do_something;pop" which isn't visible to the toolchain. But >> again, hopefully the frame pointer checking would fail and mark it >> unreliable. >> >> So I do have some worries about DWARF, but the fact that it's getting >> "fact checked" by frame pointers might be sufficient. >> > > Exactly. > I'm wondering how much functions will give a unreliable result because any unreliable function shows in stack trace will cause livepatch fail/retry. IIUC all unmarked assembly functions will considered unreliable and cause problem. It can be a burden to mark all of them. > - No software is bug free. So, even if static analysis is implemented for an architecture, > it would be good to have another method of verifying the unwind rules generated from > the static analysis. DWARF can provide that additional verification. > I'm wondering how much functions will give a unreliable result because any unreliable function shows in stack trace will cause livepatch fail/retry. IIUC all unmarked assembly functions will considered unreliable and cause problem. It can be a burden to mark all of them. > > So, it is just frame pointer validation for livepatch I am trying to look at. > My support reason for FP with validation is that it provides a guarantee for FP unwinder. FP and ORC use absolute and relative for stack unwind to unwind stack respectively, however FP has been considered unreliable. Is there any feature depends on FP? If so it can be more persuasive. Also this patch is much more completed than migration for objtool. It would be nice if this could be put into use quickly. The objtool-arm64 is less than half done, but I'm going to relies as much as possible on current objtool components, so no more feasibility validation is required. By the way, I was thinking about a corner case, because arm64 CALL instruction won't push LR onto stack atomically as x86. Before push LR, FP to save frame there still can be some instructions such as bti, paciasp. If an irq happens here, the stack frame is not constructed so the FP unwinder will omit this function and provides a wrong stack trace to livepatch. It's just a guess and I have not built the test case. But I think it's a defect on arm64 that FP unwinder can't work properly on prologue and epilogue. Do you have any idea about this? Thanks for your time, Chen
On 4/12/22 02:30, Chen Zhongjin wrote: > Hi Madhaven, > > On 2022/4/12 1:18, Madhavan T. Venkataraman wrote: >> >> This is a valid point. So far, I find that gcc generates reliable DWARF information. >> But there are two bugs in what Clang generates. I have added workarounds in my >> parser to compensate. >> >> So, I think a DWARF verifier is an option that architectures can use. At this point, >> I don't want to mandate a verifier on every architecture. But that is a discussion >> that we can have once I have a verifier ready. > I'm concerning that depending on compilers to generate correct information can become > a trouble because we linux kernel side can rarely fix what compilers make. That's also why > the gcc plugin idea was objected in the objtool migration. > > If your parser can solve this it sounds more doable. > So far, I find that gcc generates reliable DWARF information. Clang did have two bugs for which the parser compensates. So, what is needed is a DWARF verifier which can find buggy DWARF information. I am working on it. Having said that, I think the DWARF information for the stack and frame pointer is a *lot* simpler than the other debug stuff. So, there may be a couple of existing bugs that need to be discovered and fixed. But I think the likelihood of bugs appearing in this area in the future is low but it can happen. I agree that there needs to be a way to discover and flag the bugs when they do appear. But compiler bugs can also affect objtool and cause problems in it. If I am not mistaken, they ran into this multiple times on the X86 side and had to get fixes done. Josh knows better. Compiler bugs and optimizations have always been problematic from this perspective and they will potentially continue to be. >> >> Yes. That is my thinking as well. When the unwinder checks the actual FP with the >> computed FP, any mismatch will be treated as unreliable code for unwind. So, >> apart from some retries during the livepatch process, this is most probably not >> a problem. >> >> Now, I set a flag for an unwind hint so that the unwinder knows that it is >> processing an unwind hint. I could generate a warning if an unwind hint does not >> result in a reliable unwind of the frame. This would bring the broken hint >> to people's attention. >> >> >> Also, inline asm can sometimes do stack hacks like >> "push;do_something;pop" which isn't visible to the toolchain. But >> again, hopefully the frame pointer checking would fail and mark it >> unreliable. >> > I'm wondering how much functions will give a unreliable result because any unreliable > function shows in stack trace will cause livepatch fail/retry. IIUC all unmarked assembly > functions will considered unreliable and cause problem. It can be a burden to mark all > of them. It is not a burden to mark all of them. For instance, I have submitted a patch where I mark all the SYM_CODE*() functions by overriding the SYM_CODE_START()/END() macros in arm64. So, the changes are very small and self-contained. A good part of the assembly functions are defined with SYM_CODE_*(). These, by definition, are low-level functions that do not follow ABI rules. IIRC, Objtool does not perform static analysis on these today. These need to be recognized by the unwinder in the kernel and handled. Josh, please correct me if I am wrong. So, this is a problem even if we had static analysis in Objtool for arm64. As for functions defined with SYM_FUNC_*(), they are supposed to have proper frame setup and teardown. But most of them do not appear to have a proper frame pointer prolog and epilog today in arm64. Some of these will probably never have an FP prolog or epilog because they are high performance functions or specialized functions and the extra overhead may be unacceptable. Some of the SYM_FUNC*() functions are leaf functions that don't need an FP prolog or epilog. With static analysis, Objtool will flag all such functions. Either a proper FP prolog and epilog have to be introduced in the code or the functions need to be flagged as unreliable from an unwind perspective. If any of these functions occurs frequently in stack traces, then, either a proper FP prolog and epilog have to be introduced in the function code. Or, unwind hints have to be placed at strategic points. In either case, there is a maintenance burden although developers may prefer one over the other on a case-by-case basis. The DWARF situation is the same. For the frequently occurring assembly functions, unwind hints need to be defined. Currently, I have undertaken to study the SYM_FUNC*() functions in arm64 to see if I can determine which ones belong to this category. Also, I am going to be doing targeted livepatch testing to see if any of these functions will cause many retries during the livepatch process. If they don't, then this is not a problem. >> - No software is bug free. So, even if static analysis is implemented for an architecture, >> it would be good to have another method of verifying the unwind rules generated from >> the static analysis. DWARF can provide that additional verification. > To me verifying ORC with DWARF a little odd, cuz they are running with different unwind > mechanism. For normal scenario which calling convention is obeyed, ORC can give a > promised reliable stack trace, while when it easily involve bug in assembly codes, > DWARF also can't work. > I am not sure I follow. With both DWARF and ORC, stack pointer and frame pointer offsets are recorded for every instruction address. These offsets have to be the same regardless of which one you use. The only difference is that for an assembly function that has a proper FP prolog and epilog, Objtool static analysis is able to generate those offsets. With DWARF, the compiler does not generate any offsets for assembly functions. I have to rely on unwind hints. BTW, I only need to do this for functions that occur frequently in stack traces. > My support for FP with validation is that it provides a guarantee for FP unwinder. FP and ORC > use absolute and relative for stack unwind to unwind stack respectively, however FP has been > considered unreliable. Is there any feature depends on FP? If so it can be more persuasive. > Yes. Static analysis makes sure that functions are following ABI rules. So, it provides a static guarantee. And that is great. However, with DWARF, even if some functions don't follow ABI rules, a reliable unwinder can still be provided as long as the DWARF information generated by the compiler is correct. For instance, let us say that the compiler generates code for a function with a call to another function before the FP has been setup properly. If the DWARF information is correctly generated, the unwinder can see that a stack trace involving the called function is unreliable. Also, hypothetically, if a buggy kernel function corrupts the frame pointer or the stack pointer, dynamic validation can catch it. > > Also this patch is much more completed than migration for objtool. It would be > nice if this could be put into use quickly. The objtool-arm64 is less than half done, but I'm going > to relies as much as possible on current objtool components, so no more feasibility validation > is required. > The approach in your patch series is certainly feasible. I don't deny that at all. And, believe me, I would like the community to take interest in it and review it. If I get a chance, I will also participate in that review. As I mentioned in a previous email, my attempt is to come up with a largely architecture independent solution to the FP validation problem with a quicker time to market. That is all. > By the way, I was thinking about a corner case, because arm64 CALL instruction won't push LR > onto stack atomically as x86. Before push LR, FP to save frame there still can be some instructions > such as bti, paciasp. If an irq happens here, the stack frame is not constructed > so the FP unwinder will omit this function and provides a wrong stack trace to livepatch. > > With DWARF, the unwinder will see that there are no DWARF rules associated with those PCs that occur before the FP is completely setup. It will mark the stack trace as unreliable. So, these cases are already handled as I have explained in my cover letter. > It's just a guess and I have not built the test case. But I think it's a defect on arm64 that FP > unwinder can't work properly on prologue and epilogue. Do you have any idea about this? > There is no defect. The frame pointer prolog can have multiple instructions before the frame is set up. Any interrupt or exception happening on those instructions will have an unreliable stack trace by definition. A reliable unwinder must be able to recognize that case and mark the stack trace as unreliable. That is all. Thanks for your comments. Madhavan
Hi Josh, Peter, I have decided to accept your comment that using the compiler generated DWARF info is probably not reliable. I can write a DWARF verifier. But I decided that if I can write a DWARF verifier, I can use that same code to generate the same information as DWARF CFI. So, I am going to implement this in the traditional way as you wanted. Fortunately, I only need to decode a small subset of the instruction formats (just the ones that affect the SP and FP) and branch and return instructions to be able to compute the SP and FP offsets at every instruction address. I only need a small part of the static analysis code. In other words, I am removing the crazy from my patch series. I am still retaining the idea of dynamic FP validation rather than static validation. IMO, that will be able to handle even non-ABI compliant functions and FP corruptions from buggy kernel functions. Huawei has resubmitted the objtool patch set. When the full blown static analysis is implemented by them and accepted by the community, my changes can be easily merged with theirs. So, my solution is not a competing solution. In fact, it will provide an additional level of robustness. I will make these changes, test and send out version 2. Thanks for your comments. Madhavan On 4/7/22 19:21, Josh Poimboeuf wrote: > On Thu, Apr 07, 2022 at 03:25:09PM -0500, madvenka@linux.microsoft.com wrote: >> The solution >> ============ >> >> The goal here is to use the absolute minimum CFI needed to compute the FP at >> every instruction address. The unwinder can compute the FP in each frame, >> compare the actual FP with the computed one and validate the actual FP. >> >> Objtool is enhanced to parse the CFI, extract just the rules required, >> encode them in compact data structures and create special sections for >> the rules. The unwinder uses the special sections to find the rules for >> a given instruction address and compute the FP. >> >> Objtool can be invoked as follows: >> >> objtool dwarf generate <object-file> > > Hi Madhaven, > > This is quite interesting. And it's exactly the kind of crazy idea I > can appreciate ;-) > > Some initial thoughts: > > > 1) > > I have some concerns about DWARF's reliability, especially considering > a) inline asm, b) regular asm, and c) the kernel's tendency to push > compilers to their limits. > > BUT, supplementing the frame pointer unwinding with DWARF, rather than > just relying on DWARF alone, does help a LOT. > > I guess the hope is that cross-checking two "mostly reliable" things > against each other (frame pointers and DWARF) will give a reliable > result ;-) > > In a general sense, I've never looked at DWARF's reliability, even for > just normal C code. It would be good to have some way of knowing that > DWARF looks mostly sane for both GCC and Clang. For example, maybe > somehow cross-checking it with objtool's knowledge. And then of course > we'd have to hope that it stays bug-free in future compilers. > > I'd also be somewhat concerned about assembly. Since there's nothing > ensuring the unwind hints are valid, and will stay valid over time, I > wonder how likely it would be for that to break, and what the > implications would be. Most likely I guess it would break silently, but > then get caught by the frame pointer cross-checking. So a broken hint > might not get noticed for a long time, but at least it (hopefully) > wouldn't break reliable unwinding. > > Also, inline asm can sometimes do stack hacks like > "push;do_something;pop" which isn't visible to the toolchain. But > again, hopefully the frame pointer checking would fail and mark it > unreliable. > > So I do have some worries about DWARF, but the fact that it's getting > "fact checked" by frame pointers might be sufficient. > > > 2) > > If I understand correctly, objtool is converting parts of DWARF to a new > format which can then be read by the kernel. In that case, please don't > call it DWARF as that will cause a lot of confusion. > > There are actually several similarities between your new format and ORC, > which is also an objtool-created DWARF alternative. It would be > interesting to see if they could be combined somehow. > > > 3) > > Objtool has become an integral part of x86-64, due to security and > performance features and toolchain workarounds. > > Not *all* of its features require the full "branch validation" which > follows all code paths -- and was the hardest part to get right -- but > several features *do* need that: stack validation, ORC, uaccess > validation, noinstr validation. > > Objtool has been picking up a lot of steam (and features) lately, with > more features currently in active development. And lately there have > been renewed patches for porting it to powerpc and arm64 (and rumors of > s390). > > If arm64 ever wants one of those features -- particularly a "branch > validation" based feature -- I think it would make more sense to just do > the stack validation in objtool, rather than the DWARF supplementation > approach. > > Just to give an idea of what objtool already supports and how useful it > has become for x86, here's an excerpt from some documentation I've been > working on, since I'm in the middle of rewriting the interface to make > it more modular. This is a list of all its current features: > > > Features > -------- > > Objtool has the following features: > > > - Stack unwinding metadata validation -- useful for helping to ensure > stack traces are reliable for live patching > > - ORC unwinder metadata generation -- a faster and more precise > alternative to frame pointer based unwinding > > - Retpoline validation -- ensures that all indirect calls go through > retpoline thunks, for Spectre v2 mitigations > > - Retpoline call site annotation -- annotates all retpoline thunk call > sites, enabling the kernel to patch them inline, to prevent "thunk > funneling" for both security and performance reasons > > - Non-instrumentation validation -- validates non-instrumentable > ("noinstr") code rules, preventing unexpected instrumentation in > low-level C entry code > > - Static call annotation -- annotates static call sites, enabling the > kernel to implement inline static calls, a faster alternative to some > indirect branches > > - Uaccess validation -- validates uaccess rules for a proper safe > implementation of Supervisor Mode Access Protection (SMAP) > > - Straight Line Speculation validation -- validates certain SLS > mitigations > > - Indirect Branch Tracking validation -- validates Intel CET IBT rules > to ensure that all functions referenced by function pointers have > corresponding ENDBR instructions > > - Indirect Branch Tracking annotation -- annotates unused ENDBR > instruction sites, enabling the kernel to "seal" them (replace them > with NOPs) to further harden IBT > > - Function entry annotation -- annotates function entries, enabling > kernel function tracing > > - Other toolchain hacks which will go unmentioned at this time... >
On Tue, Apr 12, 2022 at 04:32:22PM +0800, Chen Zhongjin wrote: > By the way, I was thinking about a corner case, because arm64 CALL > instruction won't push LR onto stack atomically as x86. Before push LR, FP > to save frame there still can be some instructions such as bti, paciasp. If > an irq happens here, the stack frame is not constructed so the FP unwinder > will omit this function and provides a wrong stack trace to livepatch. > > It's just a guess and I have not built the test case. But I think it's a > defect on arm64 that FP unwinder can't work properly on prologue and > epilogue. Do you have any idea about this? x86 has similar issues with frame pointers, if for example preemption or page fault exception occurs in a leaf function, or in a function prologue or epilogue, before or after the frame pointer setup. This issue is solved by the "reliable" unwinder which detects irqs/exceptions on the stack and reports the stack as unreliable.
On Mon, Apr 11, 2022 at 12:18:13PM -0500, Madhavan T. Venkataraman wrote: > > There are actually several similarities between your new format and ORC, > > which is also an objtool-created DWARF alternative. It would be > > interesting to see if they could be combined somehow. > > > > I will certainly look into it. So, if I decide to merge the two, I might want > to make a minor change to the ORC structure. Would that be OK with you? Yes, in fact I would expect it, since ORC is quite x86-specific at the moment. So it would need some abstractions to make it more multi-arch friendly.
Hi Josh, IIUC, ORC on x86 can make reliable stack unwind for this scenario because objtool validates BP state. I'm thinking that on arm64 there's no guarantee that LR will be pushed onto stack. When we meet similar scenario on arm64, we should recover (LR, FP) on pt_regs and continue to unwind the stack. And this is reliable only after we validate (LR, FP). So should we track LR on arm64 additionally as track BP on x86? Or can we just treat (LR, FP) as a pair? because as I know they are always set up together. On 2022/4/16 8:56, Josh Poimboeuf wrote: > On Tue, Apr 12, 2022 at 04:32:22PM +0800, Chen Zhongjin wrote: >> By the way, I was thinking about a corner case, because arm64 CALL >> instruction won't push LR onto stack atomically as x86. Before push LR, FP >> to save frame there still can be some instructions such as bti, paciasp. If >> an irq happens here, the stack frame is not constructed so the FP unwinder >> will omit this function and provides a wrong stack trace to livepatch. >> >> It's just a guess and I have not built the test case. But I think it's a >> defect on arm64 that FP unwinder can't work properly on prologue and >> epilogue. Do you have any idea about this? > > x86 has similar issues with frame pointers, if for example preemption or > page fault exception occurs in a leaf function, or in a function > prologue or epilogue, before or after the frame pointer setup. > > This issue is solved by the "reliable" unwinder which detects > irqs/exceptions on the stack and reports the stack as unreliable. >
On Mon, Apr 18, 2022 at 08:28:33PM +0800, Chen Zhongjin wrote: > Hi Josh, > > IIUC, ORC on x86 can make reliable stack unwind for this scenario > because objtool validates BP state. > > I'm thinking that on arm64 there's no guarantee that LR will be pushed > onto stack. When we meet similar scenario on arm64, we should recover > (LR, FP) on pt_regs and continue to unwind the stack. And this is > reliable only after we validate (LR, FP). > > So should we track LR on arm64 additionally as track BP on x86? Or can > we just treat (LR, FP) as a pair? because as I know they are always set > up together. Does the arm64 unwinder have a way to detect kernel pt_regs on the stack? If so, the simplest solution is to mark all stacks with kernel regs as unreliable. That's what the x86 FP unwinder does.
On 4/18/22 11:11, Josh Poimboeuf wrote: > On Mon, Apr 18, 2022 at 08:28:33PM +0800, Chen Zhongjin wrote: >> Hi Josh, >> >> IIUC, ORC on x86 can make reliable stack unwind for this scenario >> because objtool validates BP state. >> >> I'm thinking that on arm64 there's no guarantee that LR will be pushed >> onto stack. When we meet similar scenario on arm64, we should recover >> (LR, FP) on pt_regs and continue to unwind the stack. And this is >> reliable only after we validate (LR, FP). >> >> So should we track LR on arm64 additionally as track BP on x86? Or can >> we just treat (LR, FP) as a pair? because as I know they are always set >> up together. > > Does the arm64 unwinder have a way to detect kernel pt_regs on the > stack? If so, the simplest solution is to mark all stacks with kernel > regs as unreliable. That's what the x86 FP unwinder does. > AFAICT, only the task pt_regs can be detected. For detecting the other pt_regs, we would have to set a bit in the FP. IIRC, I had a proposal where I set the LSB in the FP stored on the stack. The arm64 folks did not like that approach as it would be indistinguishable from a corrupted FP, however unlikely the corruption may be. Unwind hints can be used for these cases to unwind reliably through them. That is probably the current thinking. Mark Rutland can confirm. Madhavan