Message ID | 20250217170934.457266-1-colin.i.king@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [next] mm/mincore: improve performance by adding an unlikely hint | expand |
On Mon, Feb 17, 2025 at 05:09:34PM +0000, Colin Ian King wrote: > Adding an unlikely() hint on the masked start comparison error > return path improves run-time performance of the mincore system call. > > Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls > on a 256KB mmap'd region where 50% of the pages we resident. > > Results based on running 20 tests with turbo disabled (to reduce > clock freq turbo changes), with 10 second run per test and comparing > the number of mincores calls per second. The % standard deviation of > the 20 tests was ~0.10%, so results are reliable. I think you've elided _just_ enough information here that nobody can judge whether your stats skills are any good ;-) You've told us 7ns (per call, presumably) and you've told us 0.10% standard deviation, but you haven't told us how long the syscall takes, so nobody can tell whether 7ns is within 0.10% or not ;-) > Signed-off-by: Colin Ian King <colin.i.king@gmail.com> > --- > mm/mincore.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/mincore.c b/mm/mincore.c > index d6bd19e520fc..832f29f46767 100644 > --- a/mm/mincore.c > +++ b/mm/mincore.c > @@ -239,7 +239,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, > start = untagged_addr(start); > > /* Check the start address: needs to be page-aligned.. */ > - if (start & ~PAGE_MASK) > + if (unlikely(start & ~PAGE_MASK)) > return -EINVAL; We might get even more advantage by moving the EINVAL test before untagged_addr() since we know that the tags are all in the high bits and we don't need to have the test be dependent on the previous arithmetic.
fOn 17/02/2025 17:58, Matthew Wilcox wrote: > On Mon, Feb 17, 2025 at 05:09:34PM +0000, Colin Ian King wrote: >> Adding an unlikely() hint on the masked start comparison error >> return path improves run-time performance of the mincore system call. >> >> Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls >> on a 256KB mmap'd region where 50% of the pages we resident. >> >> Results based on running 20 tests with turbo disabled (to reduce >> clock freq turbo changes), with 10 second run per test and comparing >> the number of mincores calls per second. The % standard deviation of >> the 20 tests was ~0.10%, so results are reliable. > > I think you've elided _just_ enough information here that nobody can > judge whether your stats skills are any good ;-) You've told us 7ns > (per call, presumably) and you've told us 0.10% standard deviation, > but you haven't told us how long the syscall takes, so nobody can tell > whether 7ns is within 0.10% or not ;-) Ugh, my bad. Improvement was from ~970 down to 963 ns, so small ~0.7% improvement. Colin > >> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> >> --- >> mm/mincore.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/mincore.c b/mm/mincore.c >> index d6bd19e520fc..832f29f46767 100644 >> --- a/mm/mincore.c >> +++ b/mm/mincore.c >> @@ -239,7 +239,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, >> start = untagged_addr(start); >> >> /* Check the start address: needs to be page-aligned.. */ >> - if (start & ~PAGE_MASK) >> + if (unlikely(start & ~PAGE_MASK)) >> return -EINVAL; > > We might get even more advantage by moving the EINVAL test before > untagged_addr() since we know that the tags are all in the high bits and > we don't need to have the test be dependent on the previous arithmetic.
On Mon, 17 Feb 2025 18:00:22 +0000 "Colin King (gmail)" <colin.i.king@gmail.com> wrote: > fOn 17/02/2025 17:58, Matthew Wilcox wrote: > > On Mon, Feb 17, 2025 at 05:09:34PM +0000, Colin Ian King wrote: > >> Adding an unlikely() hint on the masked start comparison error > >> return path improves run-time performance of the mincore system call. > >> > >> Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls > >> on a 256KB mmap'd region where 50% of the pages we resident. > >> > >> Results based on running 20 tests with turbo disabled (to reduce > >> clock freq turbo changes), with 10 second run per test and comparing > >> the number of mincores calls per second. The % standard deviation of > >> the 20 tests was ~0.10%, so results are reliable. > > > > I think you've elided _just_ enough information here that nobody can > > judge whether your stats skills are any good ;-) You've told us 7ns > > (per call, presumably) and you've told us 0.10% standard deviation, > > but you haven't told us how long the syscall takes, so nobody can tell > > whether 7ns is within 0.10% or not ;-) > > Ugh, my bad. > > Improvement was from ~970 down to 963 ns, so small ~0.7% improvement. > It actually doesn't change the generated code: hp2:/usr/src/25> diff -u mm/mincore.lst.old mm/mincore.lst --- mm/mincore.lst.old 2025-02-17 19:11:34.093727411 -0800 +++ mm/mincore.lst 2025-02-17 19:12:59.797009056 -0800 @@ -1563,7 +1563,7 @@ start = untagged_addr(start); /* Check the start address: needs to be page-aligned.. */ - if (start & ~PAGE_MASK) + if (unlikely(start & ~PAGE_MASK)) b27: 31 ff xor %edi,%edi asm (ALTERNATIVE("", b29: 90 nop
On 18/02/2025 03:13, Andrew Morton wrote: > On Mon, 17 Feb 2025 18:00:22 +0000 "Colin King (gmail)" <colin.i.king@gmail.com> wrote: > >> fOn 17/02/2025 17:58, Matthew Wilcox wrote: >>> On Mon, Feb 17, 2025 at 05:09:34PM +0000, Colin Ian King wrote: >>>> Adding an unlikely() hint on the masked start comparison error >>>> return path improves run-time performance of the mincore system call. >>>> >>>> Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls >>>> on a 256KB mmap'd region where 50% of the pages we resident. >>>> >>>> Results based on running 20 tests with turbo disabled (to reduce >>>> clock freq turbo changes), with 10 second run per test and comparing >>>> the number of mincores calls per second. The % standard deviation of >>>> the 20 tests was ~0.10%, so results are reliable. >>> >>> I think you've elided _just_ enough information here that nobody can >>> judge whether your stats skills are any good ;-) You've told us 7ns >>> (per call, presumably) and you've told us 0.10% standard deviation, >>> but you haven't told us how long the syscall takes, so nobody can tell >>> whether 7ns is within 0.10% or not ;-) >> >> Ugh, my bad. >> >> Improvement was from ~970 down to 963 ns, so small ~0.7% improvement. >> > > It actually doesn't change the generated code: I've compare the generated x86 object code using gcc 14.2.1 20240912 (Fedora 41) and 14.2.0 (Debian 14.2.0-17), 14.2.1 20250211 (Clear Linux) and I get differences in the generated object code comparing old and new, and the improvement on ClearLinux is more significant too because it uses -O3. So I'm confident the change is generating improved object code. > > hp2:/usr/src/25> diff -u mm/mincore.lst.old mm/mincore.lst > --- mm/mincore.lst.old 2025-02-17 19:11:34.093727411 -0800 > +++ mm/mincore.lst 2025-02-17 19:12:59.797009056 -0800 > @@ -1563,7 +1563,7 @@ > start = untagged_addr(start); > > /* Check the start address: needs to be page-aligned.. */ > - if (start & ~PAGE_MASK) > + if (unlikely(start & ~PAGE_MASK)) > b27: 31 ff xor %edi,%edi > asm (ALTERNATIVE("", > b29: 90 nop
On Tue, 18 Feb 2025 14:16:20 +0000 "Colin King (gmail)" <colin.i.king@gmail.com> wrote: > >> Improvement was from ~970 down to 963 ns, so small ~0.7% improvement. > >> > > > > It actually doesn't change the generated code: > > I've compare the generated x86 object code using gcc 14.2.1 20240912 > (Fedora 41) and 14.2.0 (Debian 14.2.0-17), 14.2.1 20250211 (Clear Linux) > and I get differences in the generated object code comparing old and > new, and the improvement on ClearLinux is more significant too because > it uses -O3. So I'm confident the change is generating improved object code. I was using gcc-13.2.0. Please resend, with a Matthew-friendly changelog?
diff --git a/mm/mincore.c b/mm/mincore.c index d6bd19e520fc..832f29f46767 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -239,7 +239,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, start = untagged_addr(start); /* Check the start address: needs to be page-aligned.. */ - if (start & ~PAGE_MASK) + if (unlikely(start & ~PAGE_MASK)) return -EINVAL; /* ..and we need to be passed a valid user-space range */
Adding an unlikely() hint on the masked start comparison error return path improves run-time performance of the mincore system call. Benchmarking on an i9-12900 shows an improvement of 7ns on mincore calls on a 256KB mmap'd region where 50% of the pages we resident. Results based on running 20 tests with turbo disabled (to reduce clock freq turbo changes), with 10 second run per test and comparing the number of mincores calls per second. The % standard deviation of the 20 tests was ~0.10%, so results are reliable. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> --- mm/mincore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)