Message ID | 20210507205513.640780-1-dianders@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | arm64: perf: Make compat tracing better | expand |
Catalin / Will, On Fri, May 7, 2021 at 1:55 PM Douglas Anderson <dianders@chromium.org> wrote: > > The goal for this series is to improve "perf" behavior when 32-bit > userspace code is involved. This turns out to be fairly important for > Chrome OS which still runs 32-bit userspace for the time being (long > story there). > > I won't repeat everything said in the individual patches since since > they are wordy enough as it is. > > Please enjoy and I hope this isn't too ugly/hacky for inclusion in > mainline. > > Thanks to Nick Desaulniers for his early review of these patches and > to Ricky for the super early prototype that some of this is based on. > > > Douglas Anderson (3): > arm64: perf: perf_callchain_user() compat support for > clang/non-APCS-gcc-arm > arm64: perf: Improve compat perf_callchain_user() for clang leaf > functions > arm64: perf: Add a config option saying 32-bit thumb code uses R11 for > FP > > arch/arm64/Kconfig | 12 ++ > arch/arm64/kernel/perf_callchain.c | 329 +++++++++++++++++++++++++---- > 2 files changed, 305 insertions(+), 36 deletions(-) While this isn't massively urgent, I'd like to confirm that this is on your plate to eventually review and/or land. If it's not, do you have any suggestions for how I should proceed? Thanks! -Doug
Hi Doug, Thanks for posting this, and sorry for the delay in getting to it. On Fri, May 07, 2021 at 01:55:10PM -0700, Douglas Anderson wrote: > The goal for this series is to improve "perf" behavior when 32-bit > userspace code is involved. This turns out to be fairly important for > Chrome OS which still runs 32-bit userspace for the time being (long > story there). Watch out, your days are numbered! See [1]. > I won't repeat everything said in the individual patches since since > they are wordy enough as it is. > > Please enjoy and I hope this isn't too ugly/hacky for inclusion in > mainline. > > Thanks to Nick Desaulniers for his early review of these patches and > to Ricky for the super early prototype that some of this is based on. I can see that you've put a lot of effort into this, but I'm not thrilled with the prospect of maintaining these heuristics in the kernel. The callchain behaviour is directly visible to userspace, and all we'll be able to do is throw more heuristics at it if faced with any regression reports. Every assumption made about userspace behaviour results in diminishing returns where some set of programs no longer fall into the "supported" bucket and, on balance, I don't think the trade-off is worth it. If we were to do this in the kernel, then I'd like to see a spec for how frame-pointer based unwinding should work for Thumb and have it agreed upon and implemented by both GCC and LLVM. That way, we can implement the unwinder according to that spec and file bug reports against the compiler if it goes wrong. In lieu of that, I think we must defer to userspace to unwind using DWARF. Perf supports this via PERF_SAMPLE_STACK_USER and PERF_SAMPLE_REGS_USER, which allows libunwind to be used to create the callchain. You haven't mentioned that here, so I'd be interested to know why not. Finally, you've probably noticed that our unwinding code for compat tasks is basically identical to the code in arch/arm/. If the functionality is going to be extended, it should be done there first and then we will follow to be compatible. Cheers, Will [1] https://lore.kernel.org/lkml/20210602164719.31777-20-will@kernel.org/T/#u
Hi, On Wed, Jun 2, 2021 at 10:56 AM Will Deacon <will@kernel.org> wrote: > > Hi Doug, > > Thanks for posting this, and sorry for the delay in getting to it. > > On Fri, May 07, 2021 at 01:55:10PM -0700, Douglas Anderson wrote: > > The goal for this series is to improve "perf" behavior when 32-bit > > userspace code is involved. This turns out to be fairly important for > > Chrome OS which still runs 32-bit userspace for the time being (long > > story there). > > Watch out, your days are numbered! See [1]. Yeah, folks on the Chrome OS team are aware and we're trying our darndest to move away. It's been an unfortunate set of circumstances that has kept us on 32-bit this long. :( BTW: I like your suggestion of "retirement" as a solution to dealing with this problem, but I'm not quite ready to retire yet. > > I won't repeat everything said in the individual patches since since > > they are wordy enough as it is. > > > > Please enjoy and I hope this isn't too ugly/hacky for inclusion in > > mainline. > > > > Thanks to Nick Desaulniers for his early review of these patches and > > to Ricky for the super early prototype that some of this is based on. > > I can see that you've put a lot of effort into this, but I'm not thrilled > with the prospect of maintaining these heuristics in the kernel. The > callchain behaviour is directly visible to userspace, and all we'll be able > to do is throw more heuristics at it if faced with any regression reports. > Every assumption made about userspace behaviour results in diminishing > returns where some set of programs no longer fall into the "supported" > bucket and, on balance, I don't think the trade-off is worth it. > > If we were to do this in the kernel, then I'd like to see a spec for how > frame-pointer based unwinding should work for Thumb and have it agreed > upon and implemented by both GCC and LLVM. That way, we can implement > the unwinder according to that spec and file bug reports against the > compiler if it goes wrong. Given how long this has been going on, I'd somewhat guess that getting this implemented in GCC and LLVM is 1+ year out. Presumably Chrome OS will be transitioned off 32-bit ARM by then. > In lieu of that, I think we must defer to userspace to unwind using DWARF. > Perf supports this via PERF_SAMPLE_STACK_USER and PERF_SAMPLE_REGS_USER, > which allows libunwind to be used to create the callchain. You haven't > mentioned that here, so I'd be interested to know why not. Good point. So I guess I didn't mention it because: a) I really know very little about perf. I got roped in this because I understand stack unwinding, not because I know how to use perf well. :-P So I personally have no idea how to set this up. b) In the little bit of reading I did about this, people seemed to say that using libunwind for perf sampling was just too slow / too much overhead. > Finally, you've probably noticed that our unwinding code for compat tasks > is basically identical to the code in arch/arm/. If the functionality is > going to be extended, it should be done there first and then we will follow > to be compatible. That's fair. I doubt that submitting patches to this area of code for arm32 would be enjoyable, so I'll pass if it's all the same. Given your feedback, I think it's fair to consider ${SUBJECT} patch abandoned then. I'll see if people want to land it as a private patch in the Chrome OS tree for the time being until we can more fully abandon arm32 support or until the ARM teams working on gcc and clang come up with a standard that we can support more properly. In the meantime, if anyone cares to pick this patch up and move forward, feel free to do so with my blessing. -Doug