Message ID | 20220705123532.283-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] mm: fix missing wake-up event for FSDAX pages | expand |
On Tue, 5 Jul 2022 20:35:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > 1, then the page is freed. The FSDAX pages can be pinned through GUP, > then they will be unpinned via unpin_user_page() using a folio variant > to put the page, however, folio variants did not consider this special > case, the result will be to miss a wakeup event (like the user of > __fuse_dax_break_layouts()). Since FSDAX pages are only possible get > by GUP users, so fix GUP instead of folio_put() to lower overhead. > What are the user visible runtime effects of this bug?
On Tue, Jul 05, 2022 at 02:18:19PM -0700, Andrew Morton wrote: > On Tue, 5 Jul 2022 20:35:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > > 1, then the page is freed. The FSDAX pages can be pinned through GUP, > > then they will be unpinned via unpin_user_page() using a folio variant > > to put the page, however, folio variants did not consider this special > > case, the result will be to miss a wakeup event (like the user of > > __fuse_dax_break_layouts()). Since FSDAX pages are only possible get > > by GUP users, so fix GUP instead of folio_put() to lower overhead. > > > > What are the user visible runtime effects of this bug? "missing wake up event" seems pretty obvious to me? Something goes to sleep waiting for a page to become unused, and is never woken.
On Wed, 6 Jul 2022 00:38:41 +0100 Matthew Wilcox <willy@infradead.org> wrote: > On Tue, Jul 05, 2022 at 02:18:19PM -0700, Andrew Morton wrote: > > On Tue, 5 Jul 2022 20:35:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > > > 1, then the page is freed. The FSDAX pages can be pinned through GUP, > > > then they will be unpinned via unpin_user_page() using a folio variant > > > to put the page, however, folio variants did not consider this special > > > case, the result will be to miss a wakeup event (like the user of > > > __fuse_dax_break_layouts()). Since FSDAX pages are only possible get > > > by GUP users, so fix GUP instead of folio_put() to lower overhead. > > > > > > > What are the user visible runtime effects of this bug? > > "missing wake up event" seems pretty obvious to me? Something goes to > sleep waiting for a page to become unused, and is never woken. No, missed wakeups are often obscured by another wakeup coming in shortly afterwards. If this wakeup is not one of these, then are there reports from the softlockup detector? Do we have reports of processes permanently stuck in D state?
On Tue, Jul 05, 2022 at 04:47:10PM -0700, Andrew Morton wrote: > On Wed, 6 Jul 2022 00:38:41 +0100 Matthew Wilcox <willy@infradead.org> wrote: > > > On Tue, Jul 05, 2022 at 02:18:19PM -0700, Andrew Morton wrote: > > > On Tue, 5 Jul 2022 20:35:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > > > > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > > > > 1, then the page is freed. The FSDAX pages can be pinned through GUP, > > > > then they will be unpinned via unpin_user_page() using a folio variant > > > > to put the page, however, folio variants did not consider this special > > > > case, the result will be to miss a wakeup event (like the user of > > > > __fuse_dax_break_layouts()). Since FSDAX pages are only possible get > > > > by GUP users, so fix GUP instead of folio_put() to lower overhead. > > > > > > > > > > What are the user visible runtime effects of this bug? > > > > "missing wake up event" seems pretty obvious to me? Something goes to > > sleep waiting for a page to become unused, and is never woken. > > No, missed wakeups are often obscured by another wakeup coming in > shortly afterwards. > I need to clarify the task will never be woken up. > If this wakeup is not one of these, then are there reports from the > softlockup detector? > > Do we have reports of processes permanently stuck in D state? > No. The task is in an TASK_INTERRUPTIBLE state (see __fuse_dax_break_layouts). The hung task reporter only reports D task (TASK_UNINTERRUPTIBLE). Thanks. >
On Wed, 6 Jul 2022 10:47:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > If this wakeup is not one of these, then are there reports from the > > softlockup detector? > > > > Do we have reports of processes permanently stuck in D state? > > > > No. The task is in an TASK_INTERRUPTIBLE state (see __fuse_dax_break_layouts). > The hung task reporter only reports D task (TASK_UNINTERRUPTIBLE). Thanks, I updated the changelog a bit. : FSDAX page refcounts are 1-based, rather than 0-based: if refcount is : 1, then the page is freed. The FSDAX pages can be pinned through GUP, : then they will be unpinned via unpin_user_page() using a folio variant : to put the page, however, folio variants did not consider this special : case, the result will be to miss a wakeup event (like the user of : __fuse_dax_break_layouts()). This results in a task being permanently : stuck in TASK_INTERRUPTIBLE state. : : Since FSDAX pages are only possibly obtained by GUP users, so fix GUP : instead of folio_put() to lower overhead. I believe these details are helpful for -stable maintainers who are wondering why they were sent stuff. Also for maintainers of downstreeam older kernels who are scratching heads over some user bug report, trying to find a patch which might fix it - for this they want to see a description of the user-visible effects, for matching with that bug report.
On Tue, Jul 05, 2022 at 08:00:42PM -0700, Andrew Morton wrote: > On Wed, 6 Jul 2022 10:47:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > > > If this wakeup is not one of these, then are there reports from the > > > softlockup detector? > > > > > > Do we have reports of processes permanently stuck in D state? > > > > > > > No. The task is in an TASK_INTERRUPTIBLE state (see __fuse_dax_break_layouts). > > The hung task reporter only reports D task (TASK_UNINTERRUPTIBLE). > > Thanks, I updated the changelog a bit. > > : FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > : 1, then the page is freed. The FSDAX pages can be pinned through GUP, > : then they will be unpinned via unpin_user_page() using a folio variant > : to put the page, however, folio variants did not consider this special > : case, the result will be to miss a wakeup event (like the user of > : __fuse_dax_break_layouts()). This results in a task being permanently > : stuck in TASK_INTERRUPTIBLE state. > : > : Since FSDAX pages are only possibly obtained by GUP users, so fix GUP > : instead of folio_put() to lower overhead. > > I believe these details are helpful for -stable maintainers who are > wondering why they were sent stuff. Also for maintainers of > downstreeam older kernels who are scratching heads over some user bug > report, trying to find a patch which might fix it - for this they want > to see a description of the user-visible effects, for matching with > that bug report. > Thanks Andrew, it's really helpful.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 517f9deba56f..b324c9fa2940 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1157,23 +1157,27 @@ static inline bool is_zone_movable_page(const struct page *page) #if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) DECLARE_STATIC_KEY_FALSE(devmap_managed_key); -bool __put_devmap_managed_page(struct page *page); -static inline bool put_devmap_managed_page(struct page *page) +bool __put_devmap_managed_page_refs(struct page *page, int refs); +static inline bool put_devmap_managed_page_refs(struct page *page, int refs) { if (!static_branch_unlikely(&devmap_managed_key)) return false; if (!is_zone_device_page(page)) return false; - return __put_devmap_managed_page(page); + return __put_devmap_managed_page_refs(page, refs); } - #else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ -static inline bool put_devmap_managed_page(struct page *page) +static inline bool put_devmap_managed_page_refs(struct page *page, int refs) { return false; } #endif /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ +static inline bool put_devmap_managed_page(struct page *page) +{ + return put_devmap_managed_page_refs(page, 1); +} + /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ ((unsigned int) folio_ref_count(folio) + 127u <= 127u) diff --git a/mm/gup.c b/mm/gup.c index 4e1999402e71..965ba755023f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -87,7 +87,8 @@ static inline struct folio *try_get_folio(struct page *page, int refs) * belongs to this folio. */ if (unlikely(page_folio(page) != folio)) { - folio_put_refs(folio, refs); + if (!put_devmap_managed_page_refs(&folio->page, refs)) + folio_put_refs(folio, refs); goto retry; } @@ -176,7 +177,8 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) refs *= GUP_PIN_COUNTING_BIAS; } - folio_put_refs(folio, refs); + if (!put_devmap_managed_page_refs(&folio->page, refs)) + folio_put_refs(folio, refs); } /** diff --git a/mm/memremap.c b/mm/memremap.c index f0955785150f..58b20c3c300b 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -509,7 +509,7 @@ void free_zone_device_page(struct page *page) } #ifdef CONFIG_FS_DAX -bool __put_devmap_managed_page(struct page *page) +bool __put_devmap_managed_page_refs(struct page *page, int refs) { if (page->pgmap->type != MEMORY_DEVICE_FS_DAX) return false; @@ -519,9 +519,9 @@ bool __put_devmap_managed_page(struct page *page) * refcount is 1, then the page is free and the refcount is * stable because nobody holds a reference on the page. */ - if (page_ref_dec_return(page) == 1) + if (page_ref_sub_return(page, refs) == 1) wake_up_var(&page->_refcount); return true; } -EXPORT_SYMBOL(__put_devmap_managed_page); +EXPORT_SYMBOL(__put_devmap_managed_page_refs); #endif /* CONFIG_FS_DAX */
FSDAX page refcounts are 1-based, rather than 0-based: if refcount is 1, then the page is freed. The FSDAX pages can be pinned through GUP, then they will be unpinned via unpin_user_page() using a folio variant to put the page, however, folio variants did not consider this special case, the result will be to miss a wakeup event (like the user of __fuse_dax_break_layouts()). Since FSDAX pages are only possible get by GUP users, so fix GUP instead of folio_put() to lower overhead. Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()") Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> --- v2: - Fix GUP instead of folio_put() suggested by Matthew. include/linux/mm.h | 14 +++++++++----- mm/gup.c | 6 ++++-- mm/memremap.c | 6 +++--- 3 files changed, 16 insertions(+), 10 deletions(-)