Message ID | 20211025200852.3002369-7-kaleshsingh@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tracing: Extend histogram triggers expression parsing | expand |
On Mon, 25 Oct 2021 13:08:38 -0700 Kalesh Singh <kaleshsingh@google.com> wrote: > == Results == > > Divisor is a power of 2 (divisor == 32): > > test_hist_field_div_not_optimized | 8,717,091 cpu-cycles > test_hist_field_div_optimized | 1,643,137 cpu-cycles > > If the divisor is a power of 2, the optimized version is ~5.3x faster. > > Divisor is not a power of 2 (divisor == 33): > > test_hist_field_div_not_optimized | 4,444,324 cpu-cycles > test_hist_field_div_optimized | 5,497,958 cpu-cycles To optimize this even more, if the divisor is constant, we could make a separate function to not do the branch, and just shift or divide. And even if it is not a power of 2, for constants, we could implement a multiplication and shift, and guarantee an accuracy up to a defined max. If div is a constant, then we can calculate the mult and shift, and max dividend. Let's use 20 for shift. // This works best for small divisors if (div > max_div) { // only do a real division return; } shift = 20; mult = ((1 << shift) + div - 1) / div; delta = mult * div - (1 << shift); if (!delta) { /* div is a power of 2 */ max = -1; return; } max = (1 << shift) / delta; We would of course need to use 64 bit operations (maybe only do this for 64 bit machines). And perhaps even use bigger shift values to get a bigger max. Then we could do: if (val1 < max) return (val1 * mult) >> shift; -- Steve
On Tue, Oct 26, 2021 at 12:14 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 25 Oct 2021 13:08:38 -0700 > Kalesh Singh <kaleshsingh@google.com> wrote: > > > == Results == > > > > Divisor is a power of 2 (divisor == 32): > > > > test_hist_field_div_not_optimized | 8,717,091 cpu-cycles > > test_hist_field_div_optimized | 1,643,137 cpu-cycles > > > > If the divisor is a power of 2, the optimized version is ~5.3x faster. > > > > Divisor is not a power of 2 (divisor == 33): > > > > test_hist_field_div_not_optimized | 4,444,324 cpu-cycles > > test_hist_field_div_optimized | 5,497,958 cpu-cycles > > To optimize this even more, if the divisor is constant, we could make a > separate function to not do the branch, and just shift or divide. Ack. I can update to use separate functions for the constant divisors. > > And even if it is not a power of 2, for constants, we could implement a > multiplication and shift, and guarantee an accuracy up to a defined max. > > > If div is a constant, then we can calculate the mult and shift, and max > dividend. Let's use 20 for shift. > > // This works best for small divisors > if (div > max_div) { > // only do a real division > return; > } > shift = 20; > mult = ((1 << shift) + div - 1) / div; > delta = mult * div - (1 << shift); > if (!delta) { > /* div is a power of 2 */ > max = -1; > return; > } > max = (1 << shift) / delta; I'm still trying to digest the above algorithm. But doesn't this add 2 extra divisions? What am I missing here? Thanks, Kalesh > > We would of course need to use 64 bit operations (maybe only do this for 64 > bit machines). And perhaps even use bigger shift values to get a bigger max. > > Then we could do: > > if (val1 < max) > return (val1 * mult) >> shift; > > -- Steve
On Tue, 26 Oct 2021 16:39:13 -0700 Kalesh Singh <kaleshsingh@google.com> wrote: > > // This works best for small divisors > > if (div > max_div) { > > // only do a real division > > return; > > } > > shift = 20; > > mult = ((1 << shift) + div - 1) / div; > > delta = mult * div - (1 << shift); > > if (!delta) { > > /* div is a power of 2 */ > > max = -1; > > return; > > } > > max = (1 << shift) / delta; > > I'm still trying to digest the above algorithm. mult = (2^20 + div - 1) / div; The "div - 1" is to round up. Basically, it's doing: X / div = X * (2^20 / div) / 2^20 If div is constant, the 2^20 / div is constant, and the "2^20" is the same as a shift. So multiplier is 2^20 / div, and the shift is 20. But because there's rounding errors it is only accurate up to the difference of: delta = mult * div / 2^20 That is if mult is a power of two, then there would be no rounding errors, and the delta is zero, making the max infinite: max = 2^20 / delta as delta goes to zero. > But doesn't this add 2 extra divisions? What am I missing here? The above is only done at parsing not during the trace, where we care about. > > > > > > We would of course need to use 64 bit operations (maybe only do this for 64 > > bit machines). And perhaps even use bigger shift values to get a bigger max. > > > > Then we could do: > > > > if (val1 < max) > > return (val1 * mult) >> shift; This is done at the time of recording. Actually, it would be: if (val1 < max) return (val1 * mult) >> shift; else return val1 / div; -- Steve
On Tue, Oct 26, 2021 at 5:18 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 26 Oct 2021 16:39:13 -0700 > Kalesh Singh <kaleshsingh@google.com> wrote: > > > > // This works best for small divisors > > > if (div > max_div) { > > > // only do a real division > > > return; > > > } > > > shift = 20; > > > mult = ((1 << shift) + div - 1) / div; > > > delta = mult * div - (1 << shift); > > > if (!delta) { > > > /* div is a power of 2 */ > > > max = -1; > > > return; > > > } > > > max = (1 << shift) / delta; > > > > I'm still trying to digest the above algorithm. > > mult = (2^20 + div - 1) / div; > > The "div - 1" is to round up. > > Basically, it's doing: X / div = X * (2^20 / div) / 2^20 > > If div is constant, the 2^20 / div is constant, and the "2^20" is the > same as a shift. > > So multiplier is 2^20 / div, and the shift is 20. > > But because there's rounding errors it is only accurate up to the > difference of: > > delta = mult * div / 2^20 > > That is if mult is a power of two, then there would be no rounding > errors, and the delta is zero, making the max infinite: > > max = 2^20 / delta as delta goes to zero. > > > But doesn't this add 2 extra divisions? What am I missing here? > > The above is only done at parsing not during the trace, where we care > about. Hi Steve, Thanks for the explanation, this cleared it up for me. - Kalesh > > > > > > > > > > We would of course need to use 64 bit operations (maybe only do this for 64 > > > bit machines). And perhaps even use bigger shift values to get a bigger max. > > > > > > Then we could do: > > > > > > if (val1 < max) > > > return (val1 * mult) >> shift; > > This is done at the time of recording. > > Actually, it would be: > > if (val1 < max) > return (val1 * mult) >> shift; > else > return val1 / div; > > -- Steve
On Tue, 26 Oct 2021 18:09:22 -0700 Kalesh Singh <kaleshsingh@google.com> wrote: > > delta = mult * div / 2^20 > > > > That is if mult is a power of two, then there would be no rounding > > errors, and the delta is zero, making the max infinite: That should have been (as shown in the algorithm) delta = mult * div - 2 ^ 20 As mult is 2^20 / div; and the above should end up zero if there's no rounding issues, as it would be: delta = (2^20 / div) * div - 2^20 -- Steve
On Tue, Oct 26, 2021 at 6:15 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 26 Oct 2021 18:09:22 -0700 > Kalesh Singh <kaleshsingh@google.com> wrote: > > > > delta = mult * div / 2^20 > > > > > > That is if mult is a power of two, then there would be no rounding > > > errors, and the delta is zero, making the max infinite: > > That should have been (as shown in the algorithm) > > delta = mult * div - 2 ^ 20 > > As mult is 2^20 / div; and the above should end up zero if there's no > rounding issues, as it would be: > > delta = (2^20 / div) * div - 2^20 Good catch. We're checking if we get back the exact value. And IIUC max_div is an arbitrary value we decide on that's <= 2^shift? Is there a rule of thumb for choosing this? Thanks, Kalesh > > -- Steve
On Tue, 26 Oct 2021 18:31:21 -0700 Kalesh Singh <kaleshsingh@google.com> wrote: > And IIUC max_div is an arbitrary value we decide on that's <= 2^shift? > Is there a rule of thumb for choosing this? The way I came up with the max was to figure out at what point is it no longer guaranteed to be accurate. That is, what number can make the mult/shift no longer match the division. If we have some number div that is not a power of two. At some point: (X * mult) >> shift != X / div Now I simply picked max = 1 << shift / (mult * div - (1 << shift)) Because that will always be within the precision of the actual number. But I believe we can make max bigger, but because that deals with truncation, it's not simple math. That is, the above X / div is truncated and not the real number. I'm sure there's an algorithm somewhere that can give as the real max. -- Steve
On Tue, 26 Oct 2021 22:21:23 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> I'm sure there's an algorithm somewhere that can give as the real max.
You got me playing with this more ;-)
OK, I added the rounding in the wrong place. I found that we can make
the max_div to be the same as the shift! The bigger the shift, the
bigger the max!
mult = (1 << shift) / div;
max_div = (1 << shift)
But the rounding needs to be with the mult / shift:
return (val * mult + ((1 << shift) - 1)) >> shift;
When val goes pass 1 << shift, then the error will be off by more than
one.
-- Steve
On Tue, Oct 26, 2021 at 8:16 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 26 Oct 2021 22:21:23 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > I'm sure there's an algorithm somewhere that can give as the real max. > > You got me playing with this more ;-) > > OK, I added the rounding in the wrong place. I found that we can make > the max_div to be the same as the shift! The bigger the shift, the > bigger the max! Nice! :) > > mult = (1 << shift) / div; > max_div = (1 << shift) > > But the rounding needs to be with the mult / shift: > > return (val * mult + ((1 << shift) - 1)) >> shift; > > > When val goes pass 1 << shift, then the error will be off by more than > one. Did you mean, val should be such that when we do the (val * mult) we only get rounding errors less than (1 << shift)? I think we also need to flip the delta now since we round down initially: delta = (1 << shift) - (mult * div) Thanks, Kalesh > > -- Steve
On Tue, 26 Oct 2021 21:04:29 -0700 Kalesh Singh <kaleshsingh@google.com> wrote: > On Tue, Oct 26, 2021 at 8:16 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Tue, 26 Oct 2021 22:21:23 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > I'm sure there's an algorithm somewhere that can give as the real max. > > > > You got me playing with this more ;-) > > > > OK, I added the rounding in the wrong place. I found that we can make > > the max_div to be the same as the shift! The bigger the shift, the > > bigger the max! > > Nice! :) > > > > mult = (1 << shift) / div; > > max_div = (1 << shift) > > > > But the rounding needs to be with the mult / shift: > > > > return (val * mult + ((1 << shift) - 1)) >> shift; > > > > > > When val goes pass 1 << shift, then the error will be off by more than > > one. > Did you mean, val should be such that when we do the (val * mult) we > only get rounding errors less than (1 << shift)? We get rounding errors when val is greater than (1 << shift) because then it exposes the bits that are not shifted out. > > I think we also need to flip the delta now since we round down initially: > > delta = (1 << shift) - (mult * div) > Actually, we don't need the delta at all. Just what I showed above. Pick some arbitrary shift (let's say 20 as that seems to be commonly used, and works for 32 bit as well) and then we figure out the multiplier. mult = (1 << shift) / div; No delta needed. Our max is going to be 1 << shift, and then all we need is: if (val < (1 << shift)) return (val * mult + ((1 << shift) - 1)) >> shift; else return val / div; All we need to save to do the operation is the shift, the constant div and the calculated constant mult. -- Steve
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index db28bcf976f4..364cb3091789 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -304,6 +304,10 @@ static u64 hist_field_div(struct hist_field *hist_field, if (!val2) return -1; + /* Use shift if the divisor is a power of 2 */ + if (!(val2 & (val2 - 1))) + return val1 >> __ffs64(val2); + return div64_u64(val1, val2); }