Message ID | 54b0b07a-c178-9ffe-b5af-088f3c21696c@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: move FOLL_PIN debug accounting under CONFIG_DEBUG_VM | expand |
On 31.01.23 16:36, Jens Axboe wrote: > Using FOLL_PIN for mapping user pages caused a performance regression of > about 2.7%. Looking at profiles, we see: > > +2.71% [kernel.vmlinux] [k] mod_node_page_state > > which wasn't there before. The node page state counters are percpu, but > with a very low threshold. On my setup, every 108th update ends up > needing to punt to two atomic_lond_add()'s, which is causing this above > regression. > > As these counters are purely for debug purposes, move them under > CONFIG_DEBUG_VM rather than do them unconditionally. > > Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages") > Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") > Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/ > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > I added fixes tags, even though it's not a strict fix for this commits. > But it does fix a performance regression introduced by those commits. > It's a useful hint for backporting. I'd just mention them in the commit log instead, but I don't particularly care here as long as the commit ID's are stable. If still possible, I'd include this as a preparational change for these commits instead. Anyhow Acked-by: David Hildenbrand <david@redhat.com>
On 1/31/23 8:48?AM, David Hildenbrand wrote: > On 31.01.23 16:36, Jens Axboe wrote: >> Using FOLL_PIN for mapping user pages caused a performance regression of >> about 2.7%. Looking at profiles, we see: >> >> +2.71% [kernel.vmlinux] [k] mod_node_page_state >> >> which wasn't there before. The node page state counters are percpu, but >> with a very low threshold. On my setup, every 108th update ends up >> needing to punt to two atomic_lond_add()'s, which is causing this above >> regression. >> >> As these counters are purely for debug purposes, move them under >> CONFIG_DEBUG_VM rather than do them unconditionally. >> >> Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages") >> Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") >> Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/ >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> I added fixes tags, even though it's not a strict fix for this commits. >> But it does fix a performance regression introduced by those commits. >> It's a useful hint for backporting. > > I'd just mention them in the commit log instead, but I don't > particularly care here as long as the commit ID's are stable. Sure, I can move that bit into the commit message. > If still possible, I'd include this as a preparational change for > these commits instead. Anyhow That would be preferable in general, but I don't think it's worth rebasing the series just for that. > Acked-by: David Hildenbrand <david@redhat.com> Thanks! Will add in conjunction with updating the commit message.
On 1/31/23 07:36, Jens Axboe wrote: > Using FOLL_PIN for mapping user pages caused a performance regression of > about 2.7%. Looking at profiles, we see: > > +2.71% [kernel.vmlinux] [k] mod_node_page_state > > which wasn't there before. The node page state counters are percpu, but > with a very low threshold. On my setup, every 108th update ends up > needing to punt to two atomic_lond_add()'s, which is causing this above > regression. > > As these counters are purely for debug purposes, move them under > CONFIG_DEBUG_VM rather than do them unconditionally. > > Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages") > Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") > Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/ > Signed-off-by: Jens Axboe <axboe@kernel.dk> > Yes, that's the exact right fix (in the absence of some magical high-volume, high performance per-cpu counters anyway). In fact, originally these counter operations were behind CONFIG_DEBUG_VM in an early patchset, but the FOLL_PIN feature was new and potentially flaky, and also not yet in the Direct IO fast path. So during code review we decided to enable the debugging information unconditionally. But now we can no longer afford the debug counters in release builds--but we also no longer need them, because the FOLL_PIN feature has settled in and had enough soak time to be more confident about it. Over time, I'd kind of started assuming that these counters were necessary for release builds, and it required someone else to wake me up and point out the obvious, so thanks! :) Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cd28a100d9e4..0153ec8a54ae 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -195,8 +195,10 @@ enum node_stat_item { NR_WRITTEN, /* page writings since bootup */ NR_THROTTLED_WRITTEN, /* NR_WRITTEN while reclaim throttled */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ +#ifdef CONFIG_DEBUG_VM NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ +#endif NR_KERNEL_STACK_KB, /* measured in KiB */ #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) NR_KERNEL_SCS_KB, /* measured in KiB */ diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..41abb16286ec 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -168,7 +168,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) */ smp_mb__after_atomic(); +#ifdef CONFIG_DEBUG_VM node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); +#endif return folio; } @@ -180,7 +182,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) { if (flags & FOLL_PIN) { +#ifdef CONFIG_DEBUG_VM node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); +#endif if (folio_test_large(folio)) atomic_sub(refs, folio_pincount_ptr(folio)); else @@ -236,8 +240,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) } else { folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); } - +#ifdef CONFIG_DEBUG_VM node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); +#endif } return 0; diff --git a/mm/vmstat.c b/mm/vmstat.c index 1ea6a5ce1c41..5cbd9a1924bf 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1227,8 +1227,10 @@ const char * const vmstat_text[] = { "nr_written", "nr_throttled_written", "nr_kernel_misc_reclaimable", +#ifdef CONFIG_DEBUG_VM "nr_foll_pin_acquired", "nr_foll_pin_released", +#endif "nr_kernel_stack", #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) "nr_shadow_call_stack",
Using FOLL_PIN for mapping user pages caused a performance regression of about 2.7%. Looking at profiles, we see: +2.71% [kernel.vmlinux] [k] mod_node_page_state which wasn't there before. The node page state counters are percpu, but with a very low threshold. On my setup, every 108th update ends up needing to punt to two atomic_lond_add()'s, which is causing this above regression. As these counters are purely for debug purposes, move them under CONFIG_DEBUG_VM rather than do them unconditionally. Fixes: fd20d0c1852e ("block: convert bio_map_user_iov to use iov_iter_extract_pages") Fixes: 920756a3306a ("block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages") Link: https://lore.kernel.org/linux-block/f57ee72f-38e9-6afa-182f-2794638eadcb@kernel.dk/ Signed-off-by: Jens Axboe <axboe@kernel.dk> --- I added fixes tags, even though it's not a strict fix for this commits. But it does fix a performance regression introduced by those commits. It's a useful hint for backporting. I'd prefer sticking this at the end of the iov-extract series that is already pulled in, so it can go with those patches.