Message ID | 20220708003053.158480-1-namit@vmware.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/mm/tlb: ignore f->new_tlb_gen when zero | expand |
On 08.07.22 02:30, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. > > In addition, add the missing unlikely() and jump to get tracing right. > > Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > Reported-by: Hugh Dickins <hughd@google.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Nadav Amit <namit@vmware.com> > --- > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")?
On 7/7/22 17:30, Nadav Amit wrote: You might want to fix the clock on the system from which you sent this. I was really scratching my head trying to figure out how you got this patch out before Hugh's bug report. > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker for f->new_tlb_gen being invalid. Being consistent seems like a good idea on this stuff. > In addition, add the missing unlikely() and jump to get tracing right. There are currently five routes out of flush_tlb_func(): * Three early returns * One 'goto done' * One implicit return The tracing code doesn't get run for any of the early returns, but that's intentional because they don't *actually* flush the TLB. We don't want to record that flush_tlb_func() flushed the TLB when it didn't. There's another tracepoint on the TLB_REMOTE_SEND_IPI side to tell where the flushes were requested. That said, I think the if (unlikely(local_tlb_gen == mm_tlb_gen)) goto done; is arguably buggy, as is the 'goto done' in this hunk: > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; > } > > /* We might want to (eventually) think about doing something like the attached patch to make the skipped flushes explicit in the tracing and make the return paths out of this function more consistent.
On 7/8/22 04:40, David Hildenbrand wrote: >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index d9314cc8b81f..d81b4084bb8a 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) >> return; >> } >> >> - if (f->new_tlb_gen <= local_tlb_gen) { >> + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { >> /* >> * The TLB is already up to date in respect to f->new_tlb_gen. >> * While the core might be still behind mm_tlb_gen, checking >> * mm_tlb_gen unnecessarily would have negative caching effects >> * so avoid it. >> */ >> - return; >> + goto done; > Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: > Avoid reading mm_tlb_gen when possible")? It depends on how many batched flushes that workload had. From the looks of it, they're all one page: madvise(addr + i, pgsize, MADV_DONTNEED); so there shouldn't be *much* batching in play. But, it wouldn't hurt to re-run them in either case.
On Jul 8, 2022, at 8:13 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 04:40, David Hildenbrand wrote: >>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >>> index d9314cc8b81f..d81b4084bb8a 100644 >>> --- a/arch/x86/mm/tlb.c >>> +++ b/arch/x86/mm/tlb.c >>> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) >>> return; >>> } >>> >>> - if (f->new_tlb_gen <= local_tlb_gen) { >>> + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { >>> /* >>> * The TLB is already up to date in respect to f->new_tlb_gen. >>> * While the core might be still behind mm_tlb_gen, checking >>> * mm_tlb_gen unnecessarily would have negative caching effects >>> * so avoid it. >>> */ >>> - return; >>> + goto done; >> Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: >> Avoid reading mm_tlb_gen when possible")? > > It depends on how many batched flushes that workload had. From the > looks of it, they're all one page: > > madvise(addr + i, pgsize, MADV_DONTNEED); > > so there shouldn't be *much* batching in play. But, it wouldn't hurt to > re-run them in either case. Just to clarify, since these things are confusing. There are two batching mechanisms. The common one is mmu_gather, which MADV_DONTNEED uses. This one is *not* the one that caused the breakage. The second one is the “unmap_batch”, which was only used by x86 until now. (I just saw patches for ARM, but I think they just exploit the interface in a way). The “unmap_batch” is used when you swap out. This was broken. Since the bug was not during MADV_DONTNEED there is no reason for the results to be any different. Famous last words?
On 7/8/22 09:54, Nadav Amit wrote: > Since the bug was not during MADV_DONTNEED there is no reason for the > results to be any different. > > Famous last words? Considering that your patch broke the kernel a way that surprised us all, I think caution is warranted. Re-running a microbenchmark that takes five minutes and stresses things a bit is the least you can do, I think.
On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/7/22 17:30, Nadav Amit wrote: > > You might want to fix the clock on the system from which you sent this. > I was really scratching my head trying to figure out how you got this > patch out before Hugh's bug report. > >> From: Nadav Amit <namit@vmware.com> >> >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >> possible") introduced an optimization of skipping the flush if the TLB >> generation that is flushed (as provided in flush_tlb_info) was already >> flushed. >> >> However, arch_tlbbatch_flush() does not provide any generation in >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >> TLB flushes. >> >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >> anyhow is an invalid generation value. > > It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker > for f->new_tlb_gen being invalid. Being consistent seems like a good > idea on this stuff. If we get a request to do a flush, regardless whether full or partial, that logically we have already done, there is not reason to do it. I therefore do not see a reason to look on f->end. I think that looking at the generation is very intuitive. If you want, I can add a constant such as TLB_GENERATION_INVALID. > >> In addition, add the missing unlikely() and jump to get tracing right. > > There are currently five routes out of flush_tlb_func(): > * Three early returns > * One 'goto done' > * One implicit return > > The tracing code doesn't get run for any of the early returns, but > that's intentional because they don't *actually* flush the TLB. We > don't want to record that flush_tlb_func() flushed the TLB when it > didn't. There's another tracepoint on the TLB_REMOTE_SEND_IPI side to > tell where the flushes were requested. > > That said, I think the > > if (unlikely(local_tlb_gen == mm_tlb_gen)) > goto done; > > is arguably buggy, as is the 'goto done' in this hunk: I was just trying to follow it for consistency. Will remove. > > We might want to (eventually) think about doing something like the > attached patch to make the skipped flushes explicit in the tracing and > make the return paths out of this function more consistent. That’s fine with me. I just recommend that you have a single tracing call in the function, since having too many ruins the generated code.
On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 09:54, Nadav Amit wrote: >> Since the bug was not during MADV_DONTNEED there is no reason for the >> results to be any different. >> >> Famous last words? > > Considering that your patch broke the kernel a way that surprised us > all, I think caution is warranted. Re-running a microbenchmark that > takes five minutes and stresses things a bit is the least you can do, I > think. I will send it later today. I was just pointing that the failing code-path is different than the one I measured.
On Jul 8, 2022, at 10:09 AM, Nadav Amit <namit@vmware.com> wrote: > On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote: > >> ⚠ External Email >> >> On 7/8/22 09:54, Nadav Amit wrote: >>> Since the bug was not during MADV_DONTNEED there is no reason for the >>> results to be any different. >>> >>> Famous last words? >> >> Considering that your patch broke the kernel a way that surprised us >> all, I think caution is warranted. Re-running a microbenchmark that >> takes five minutes and stresses things a bit is the least you can do, I >> think. > > I will send it later today. I was just pointing that the failing code-path > is different than the one I measured. It will take some more time, since 5.19 does not want to boot on my machine, and results from VMs are meaningless for this patch. I would look into this unrelated failure, unless you want results from 5.18. [ 6.303945] ------------[ cut here ]------------ [ 6.309209] kernel BUG at arch/x86/kernel/apic/apic.c:1598! [ 6.315537] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 6.321275] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4TLB+ #5 [ 6.328760] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018 [ 6.337236] RIP: 0010:setup_local_APIC+0x31e/0x330 [ 6.342686] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0 [ 6.363818] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246 [ 6.369752] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 6.377820] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020 [ 6.385888] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8 [ 6.393956] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031 [ 6.402024] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000 [ 6.410091] FS: 0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000 [ 6.419250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.425765] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0 [ 6.433826] Call Trace: [ 6.436646] <TASK> [ 6.439077] ? _printk+0x53/0x6a [ 6.442777] apic_intr_mode_init+0xd2/0xf1 [ 6.447448] x86_late_time_init+0x1b/0x2b [ 6.452019] start_kernel+0x5d8/0x694 [ 6.456194] secondary_startup_64_no_verify+0xce/0xdb [ 6.461933] </TASK> [ 6.464463] Modules linked in: [ 6.467979] ---[ end trace 0000000000000000 ]--- [ 6.473243] RIP: 0010:setup_local_APIC+0x31e/0x330 [ 6.478704] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0 [ 6.499865] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246 [ 6.505803] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 6.513887] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020 [ 6.521969] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8 [ 6.530053] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031 [ 6.538136] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000 [ 6.546218] FS: 0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000 [ 6.555391] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.561919] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0 [ 6.570003] Kernel panic - not syncing: Attempted to kill the idle task! [ 6.577591] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
On 7/8/22 10:04, Nadav Amit wrote: > On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: >> On 7/7/22 17:30, Nadav Amit wrote: >> You might want to fix the clock on the system from which you sent this. >> I was really scratching my head trying to figure out how you got this >> patch out before Hugh's bug report. >> >>> From: Nadav Amit <namit@vmware.com> >>> >>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >>> possible") introduced an optimization of skipping the flush if the TLB >>> generation that is flushed (as provided in flush_tlb_info) was already >>> flushed. >>> >>> However, arch_tlbbatch_flush() does not provide any generation in >>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >>> TLB flushes. >>> >>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >>> anyhow is an invalid generation value. >> >> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker >> for f->new_tlb_gen being invalid. Being consistent seems like a good >> idea on this stuff. > > If we get a request to do a flush, regardless whether full or partial, > that logically we have already done, there is not reason to do it. > > I therefore do not see a reason to look on f->end. I think that looking > at the generation is very intuitive. If you want, I can add a constant > such as TLB_GENERATION_INVALID. That's a good point. But, _my_ point was that there was only really one read site of f->new_tlb_gen in flush_tlb_func(). That site is guarded by the "f->end != TLB_FLUSH_ALL" check which prevented it from making the same error that your patch did. Whatever we do, it would be nice to have a *single* way to check for "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?" Using something like TLB_GENERATION_INVALID seems reasonable to me.
On Thu, 7 Jul 2022, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. > > In addition, add the missing unlikely() and jump to get tracing right. > > Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > Reported-by: Hugh Dickins <hughd@google.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Nadav Amit <namit@vmware.com> Thanks a lot for your rapid response and thinking it through (before I got around to any "nopcid" or "nopti" experiments). I've been testing this one for a few hours now, and no problems seen. I expect you'll be sending another version, maybe next week, meeting Dave's concerns; but wanted to reassure that you have correctly identified the issue and fixed it with this - thanks. Hugh > --- > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; > } > > /* > -- > 2.25.1
On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote: > ⚠ External Email > > On Thu, 7 Jul 2022, Nadav Amit wrote: > >> From: Nadav Amit <namit@vmware.com> >> >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >> possible") introduced an optimization of skipping the flush if the TLB >> generation that is flushed (as provided in flush_tlb_info) was already >> flushed. >> >> However, arch_tlbbatch_flush() does not provide any generation in >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >> TLB flushes. >> >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >> anyhow is an invalid generation value. >> >> In addition, add the missing unlikely() and jump to get tracing right. >> >> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") >> Reported-by: Hugh Dickins <hughd@google.com> >> Cc: Dave Hansen <dave.hansen@linux.intel.com> >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >> Cc: Andy Lutomirski <luto@kernel.org> >> Signed-off-by: Nadav Amit <namit@vmware.com> > > Thanks a lot for your rapid response and thinking it through > (before I got around to any "nopcid" or "nopti" experiments). > > I've been testing this one for a few hours now, and no problems seen. > I expect you'll be sending another version, maybe next week, meeting > Dave's concerns; but wanted to reassure that you have correctly > identified the issue and fixed it with this - thanks. Thanks, Hugh. Sorry again for my mistake. Can I please have your “Tested-by”?
On Fri, 8 Jul 2022, Nadav Amit wrote: > On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote: > > > ⚠ External Email > > > > On Thu, 7 Jul 2022, Nadav Amit wrote: > > > >> From: Nadav Amit <namit@vmware.com> > >> > >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > >> possible") introduced an optimization of skipping the flush if the TLB > >> generation that is flushed (as provided in flush_tlb_info) was already > >> flushed. > >> > >> However, arch_tlbbatch_flush() does not provide any generation in > >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any > >> TLB flushes. > >> > >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > >> anyhow is an invalid generation value. > >> > >> In addition, add the missing unlikely() and jump to get tracing right. > >> > >> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > >> Reported-by: Hugh Dickins <hughd@google.com> > >> Cc: Dave Hansen <dave.hansen@linux.intel.com> > >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > >> Cc: Andy Lutomirski <luto@kernel.org> > >> Signed-off-by: Nadav Amit <namit@vmware.com> > > > > Thanks a lot for your rapid response and thinking it through > > (before I got around to any "nopcid" or "nopti" experiments). > > > > I've been testing this one for a few hours now, and no problems seen. > > I expect you'll be sending another version, maybe next week, meeting > > Dave's concerns; but wanted to reassure that you have correctly > > identified the issue and fixed it with this - thanks. > > Thanks, Hugh. Sorry again for my mistake. > > Can I please have your “Tested-by”? Sure, let me scrabble around in my box of tags, yes, here's one: Tested-by: Hugh Dickins <hughd@google.com>
On Jul 8, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 10:04, Nadav Amit wrote: >> On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: >>> On 7/7/22 17:30, Nadav Amit wrote: >>> You might want to fix the clock on the system from which you sent this. >>> I was really scratching my head trying to figure out how you got this >>> patch out before Hugh's bug report. >>> >>>> From: Nadav Amit <namit@vmware.com> >>>> >>>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >>>> possible") introduced an optimization of skipping the flush if the TLB >>>> generation that is flushed (as provided in flush_tlb_info) was already >>>> flushed. >>>> >>>> However, arch_tlbbatch_flush() does not provide any generation in >>>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >>>> TLB flushes. >>>> >>>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >>>> anyhow is an invalid generation value. >>> >>> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker >>> for f->new_tlb_gen being invalid. Being consistent seems like a good >>> idea on this stuff. >> >> If we get a request to do a flush, regardless whether full or partial, >> that logically we have already done, there is not reason to do it. >> >> I therefore do not see a reason to look on f->end. I think that looking >> at the generation is very intuitive. If you want, I can add a constant >> such as TLB_GENERATION_INVALID. > > That's a good point. > > But, _my_ point was that there was only really one read site of > f->new_tlb_gen in flush_tlb_func(). That site is guarded by the "f->end > != TLB_FLUSH_ALL" check which prevented it from making the same error > that your patch did. > > Whatever we do, it would be nice to have a *single* way to check for > "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?" > > Using something like TLB_GENERATION_INVALID seems reasonable to me. I am not sure that I fully understood what you meant. I understand you do want TLB_GENERATION_INVALID, and I think you ask for some assertions to regard to the expected relationship between TLB_GENERATION_INVALID and TLB_FLUSH_ALL. I think that in order to shorten the discussion, I’ll send v2 (very soon) and you will comment on the patch itself. I did run the tests again to measure performance as you asked for, and the results are similar (will-it-scale tlb_flush1/threads): 11% speedup with 45 cores. Without aa44284960d5: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,156782,97.89,157024,97.92,157024 5,707879,89.60,322015,89.69,785120 10,1311968,79.21,490881,79.68,1570240 15,1498762,68.82,553958,69.71,2355360 20,1483057,58.45,598939,60.00,3140480 25,1428105,48.07,626179,50.46,3925600 30,1648992,37.77,643954,41.36,4710720 35,1861301,27.50,671570,32.63,5495840 40,2038278,17.17,669470,24.03,6280960 45,2140412,6.71,673633,15.27,7066080 With aa44284960d5 + pending patch: 0,0,100,0,100,0 1,157935,97.93,155351,97.93,157935 5,710450,89.60,324981,89.70,789675 10,1291935,79.21,498496,79.57,1579350 15,1515323,68.81,575821,69.68,2369025 20,1545172,58.46,625521,60.05,3158700 25,1501015,48.09,675856,50.62,3948375 30,1682308,37.80,705242,41.55,4738050 35,1850464,27.52,717724,32.33,5527725 40,2030726,17.17,734610,23.99,6317400 45,2136401,6.71,747257,16.07,7107075
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index d9314cc8b81f..d81b4084bb8a 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) return; } - if (f->new_tlb_gen <= local_tlb_gen) { + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { /* * The TLB is already up to date in respect to f->new_tlb_gen. * While the core might be still behind mm_tlb_gen, checking * mm_tlb_gen unnecessarily would have negative caching effects * so avoid it. */ - return; + goto done; } /*