Message ID | 20221106100316.2803176-1-chenguokai17@mails.ucas.ac.cn (mailing list archive) |
---|---|
Headers | show |
Series | Add OPTPROBES feature on RISCV | expand |
Chen Guokai <chenguokai17@mails.ucas.ac.cn> writes: > From: Liao Chang <liaoclark@163.com> > > From: Liao Chang <liaochang1@huawei.com> > > Add jump optimization support for RISC-V. Thanks for working on this! I have some comments on the series, but I'll do that on a per-patch basis. Have you run the series on real hardware, or just qemu? Cheers, Björn
Hi Björn, Thanks for your great review! Some explanations below. > 2022年11月8日 00:54,Björn Töpel <bjorn@kernel.org> 写道: > > Have you run the series on real hardware, or just qemu? Currently only qemu tests are made, I will try to test it on a FPGA real hardware soon. > AFAIU, the algorithm only tracks registers that are *in use*. You are > already scanning the whole function (next patch). What about the caller > saved registers that are *not* used by the function in the probe range? > Can those, potentially unused, regs be used? Great missing part! I have made a static analyzation right upon receiving this mail. The result shows that this newly purposed idea reaches about the same success rate on my test set (rv64 defconf with RVI only) while when combined, they can reach a higher success rate, 1/3 above their baseline. A patch that includes this strategy will be sent soon. > >> +static void arch_find_register(unsigned long start, unsigned long end, > > Nit; When I see "arch_" I think it's functionality that can be > overridden per-arch. This is not the case, but just a helper for RV. It can be explained from two aspects. First, it can be extended to most RISC archs, which can be extracted into the common flow of Kprobe. Second, it is indeed a internal helper for now, so I will correct the name in the next version. >> static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op, >> - int *rd1, int *rd2) >> + int *rd, int *ra) > > Nit; Please get rid of this code churn, just name the parameters > correctly on introduction in the previous patch. Will be fixed. >> + *rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL); >> + *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL); > > Hmm, __builtin_ctzl is undefined for 0, right? Can that be triggered > here? Will be fixed. Regards, Guokai Chen
在 2022/11/8 19:04, Xim 写道: > Hi Björn, > > Thanks for your great review! Some explanations below. > >> 2022年11月8日 00:54,Björn Töpel <bjorn@kernel.org> 写道: >> >> Have you run the series on real hardware, or just qemu? > > Currently only qemu tests are made, I will try to test it on a FPGA real hardware soon. > >> AFAIU, the algorithm only tracks registers that are *in use*. You are >> already scanning the whole function (next patch). What about the caller >> saved registers that are *not* used by the function in the probe range? >> Can those, potentially unused, regs be used? > > Great missing part! I have made a static analyzation right upon receiving this mail. > The result shows that this newly purposed idea reaches about the same > success rate on my test set (rv64 defconf with RVI only) while when combined, > they can reach a higher success rate, 1/3 above their baseline. A patch that > includes this strategy will be sent soon. >> >>> +static void arch_find_register(unsigned long start, unsigned long end, >> >> Nit; When I see "arch_" I think it's functionality that can be >> overridden per-arch. This is not the case, but just a helper for RV. > > It can be explained from two aspects. First, it can be extended to most RISC > archs, which can be extracted into the common flow of Kprobe. Second, it is indeed > a internal helper for now, so I will correct the name in the next version. > >>> static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op, >>> - int *rd1, int *rd2) >>> + int *rd, int *ra) >> >> Nit; Please get rid of this code churn, just name the parameters >> correctly on introduction in the previous patch. > > Will be fixed. > >>> + *rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL); >>> + *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL); >> >> Hmm, __builtin_ctzl is undefined for 0, right? Can that be triggered >> here? This corner case has been taken into account, look these condition parts, if kw == 1UL this expression will return 0 directly, no chance to invoke __builtin_ctzl. Thanks. > > Will be fixed. > > Regards, > Guokai Chen >
"liaochang (A)" <liaochang1@huawei.com> writes: >>>> + *rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL); >>>> + *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL); >>> >>> Hmm, __builtin_ctzl is undefined for 0, right? Can that be triggered >>> here? > > This corner case has been taken into account, look these condition parts, > if kw == 1UL this expression will return 0 directly, no chance to invoke __builtin_ctzl. Indeed! Thanks for making that clear! Looking forward to the next revision! Björn