Message ID | 20211105204225.iIh99P9cn%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/262] scripts/spelling.txt: add more spellings to spelling.txt | expand |
On Fri, Nov 05, 2021 at 01:42:25PM -0700, Andrew Morton wrote: > --- a/mm/filemap.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested > +++ a/mm/filemap.c > @@ -1612,6 +1612,7 @@ void end_page_writeback(struct page *pag > > smp_mb__after_atomic(); > wake_up_page(page, PG_writeback); > + acct_reclaim_writeback(page); > put_page(page); > } > EXPORT_SYMBOL(end_page_writeback); hmm? I think you based on some older version of Linus' tree that didn't have folios. This fixup patch was against an older fixup patch that you did, but maybe it's enough for Linus to apply ... diff --git a/mm/filemap.c b/mm/filemap.c index 6844c9816a86..daa0e23a6ee6 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1607,7 +1607,7 @@ void folio_end_writeback(struct folio *folio) smp_mb__after_atomic(); folio_wake(folio, PG_writeback); - acct_reclaim_writeback(folio_page(folio, 0)); + acct_reclaim_writeback(folio); folio_put(folio); } EXPORT_SYMBOL(folio_end_writeback); diff --git a/mm/internal.h b/mm/internal.h index 632c55c5a075..3b79a5c9427a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -41,15 +41,15 @@ static inline void *folio_raw_mapping(struct folio *folio) return (void *)(mapping & ~PAGE_MAPPING_FLAGS); } -void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, +void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio, int nr_throttled); -static inline void acct_reclaim_writeback(struct page *page) +static inline void acct_reclaim_writeback(struct folio *folio) { - pg_data_t *pgdat = page_pgdat(page); + pg_data_t *pgdat = folio_pgdat(folio); int nr_throttled = atomic_read(&pgdat->nr_writeback_throttled); if (nr_throttled) - __acct_reclaim_writeback(pgdat, page, nr_throttled); + __acct_reclaim_writeback(pgdat, folio, nr_throttled); } static inline void wake_throttle_isolated(pg_data_t *pgdat) diff --git a/mm/vmscan.c b/mm/vmscan.c index 59c07ee4220d..fb9584641ac7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1085,12 +1085,12 @@ void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason) * pages to clean. If enough pages have been cleaned since throttling * started then wakeup the throttled tasks. */ -void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, +void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio, int nr_throttled) { unsigned long nr_written; - inc_node_page_state(page, NR_THROTTLED_WRITTEN); + node_stat_add_folio(folio, NR_THROTTLED_WRITTEN); /* * This is an inaccurate read as the per-cpu deltas may not
On Fri, Nov 5, 2021 at 2:05 PM Matthew Wilcox <willy@infradead.org> wrote: > > hmm? I think you based on some older version of Linus' tree that didn't > have folios. Andrew these days actually maintains a base commit model exactly so that he doesn't end up rebasing during development. So the whole series is based on plain 5.15, and I'll take care of the conflict resolution. This workflow can result in more conflicts for me than what Andrew used to do ("send against current linus tip"), but it means that when conflicts happen, they get all the merge resolution help that git gives you, and hopefully what gets tested (over the months that it can be in -mm) is closer to what gets sent to me. Linus
On Sat, Nov 6, 2021 at 1:49 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This workflow can result in more conflicts for me than what Andrew > used to do ("send against current linus tip"), but it means that when > conflicts happen, they get all the merge resolution help that git > gives you, and hopefully what gets tested (over the months that it can > be in -mm) is closer to what gets sent to me. .. and resolving the conflicts (none of which looked bad), I think that part of the resolution ends up doing very similar things to your fixup patch. So it looks all good. Famous last words. Linus
On 11/6/21 22:12, Linus Torvalds wrote: > On Sat, Nov 6, 2021 at 1:49 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> This workflow can result in more conflicts for me than what Andrew >> used to do ("send against current linus tip"), but it means that when >> conflicts happen, they get all the merge resolution help that git >> gives you, and hopefully what gets tested (over the months that it can >> be in -mm) is closer to what gets sent to me. > > .. and resolving the conflicts (none of which looked bad), I think > that part of the resolution ends up doing very similar things to your > fixup patch. If this needed resolution, didn't the resolution exist in -next already? > So it looks all good. > > Famous last words. > > Linus >
On Sat, 6 Nov 2021 22:13:34 +0100 Vlastimil Babka <vbabka@suse.cz> wrote: > On 11/6/21 22:12, Linus Torvalds wrote: > > On Sat, Nov 6, 2021 at 1:49 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> > >> This workflow can result in more conflicts for me than what Andrew > >> used to do ("send against current linus tip"), but it means that when > >> conflicts happen, they get all the merge resolution help that git > >> gives you, and hopefully what gets tested (over the months that it can > >> be in -mm) is closer to what gets sent to me. > > > > .. and resolving the conflicts (none of which looked bad), I think > > that part of the resolution ends up doing very similar things to your > > fixup patch. > > If this needed resolution, didn't the resolution exist in -next already? Yes, but I had it queued after linux-next.patch so it got lost in the unholy mess that linux-next becomes during the merge window. I'm still figuring this out. In retrospect I should have moved this patch "mm/vmscan: throttle reclaim until some writeback completes if congested" to the post-linux-next section weeks ago, then waited for the prerequisites to be merged into mainline. That way the unaltered, tested patch would have smoothly slotted in late in the merge window.
On Sat, Nov 6, 2021 at 2:13 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > If this needed resolution, didn't the resolution exist in -next already? Oh, I'm sure it was there in -next. But I just always do my own merge resolution anyway because I want to see what's going on. I don't look at other peoples resolutions, and I much prefer to actually look at the history itself in order to actually understand what the history and cause for the conflicts is (and what the proper resolution was). Of course, in many cases it's so trivial that there's not a lot to "understand", and most merge conflicts by far are not the kind that need a lot of thought. But just to clarify: I do actually like seeing people send their resolutions to me (possibly as an addendum to the pull request email, or possibly as a separate "resolved" branch). I don't use those to guide my resolution, but if there is any subtle issues at all, I will then compare the end results to verify that they agreed. Often any differences tend to be just whitespace or similar, but it can be interesting to see when there are meaningful semantic differences. Linus
On Sat, Nov 06, 2021 at 02:12:02PM -0700, Linus Torvalds wrote: > On Sat, Nov 6, 2021 at 1:49 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > This workflow can result in more conflicts for me than what Andrew > > used to do ("send against current linus tip"), but it means that when > > conflicts happen, they get all the merge resolution help that git > > gives you, and hopefully what gets tested (over the months that it can > > be in -mm) is closer to what gets sent to me. > > .. and resolving the conflicts (none of which looked bad), I think > that part of the resolution ends up doing very similar things to your > fixup patch. Reviewed what you did in the merge commit, looks good to me. And I've learned I need to run git log --cc instead of -p in order to see all changes to a file.
On Sat, Nov 6, 2021 at 3:46 PM Matthew Wilcox <willy@infradead.org> wrote: > > Reviewed what you did in the merge commit, looks good to me. And I've > learned I need to run git log --cc instead of -p in order to see all > changes to a file. Heh. If this is your first time using "--cc" (although it's the default for "git show", so you may have used it without being aware of it), it's very useful and powerful, but it's worth keeping in mind that it's also a lot more limited than the merge-time "git diff" output. At merge time, git has computed the shared state parenthood, and "git diff" knows about not only the current state, but also the state of both parents and the base state of the file (in a three-way merge kind of sense, although with recursive merges the "base state" may be much more complex than just a shared parent state). But "git log --cc" (and related "show commit" kind of things, like "git show" and friends) only sees the final result and the parent information. The full common parent and base state isn't there after-the-fact. That means that "git log --cc" doesn't have quite as much information to go by, and the "--cc" output can sometimes be a bit misleading. In particular, if there was a conflict, and the resolution ended up basically being "take one side where the conflict was", then "git log --cc" will not show the conflict resolution as a conflict at all - it will just think "ok, development was done on that branch, the other side was irrelevant". So "--cc" is very useful, and often shows that interesting sub-part of the merge where there were conflicts. But it's definitely somewhat limited, and can end up looking like there was no conflict at all even when there was something. Linus
--- a/include/linux/backing-dev.h~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/include/linux/backing-dev.h @@ -154,7 +154,6 @@ static inline int wb_congested(struct bd } long congestion_wait(int sync, long timeout); -long wait_iff_congested(int sync, long timeout); static inline bool mapping_can_writeback(struct address_space *mapping) { --- a/include/linux/mmzone.h~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/include/linux/mmzone.h @@ -199,6 +199,7 @@ enum node_stat_item { NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ NR_WRITTEN, /* page writings since bootup */ + NR_THROTTLED_WRITTEN, /* NR_WRITTEN while reclaim throttled */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ @@ -272,6 +273,11 @@ enum lru_list { NR_LRU_LISTS }; +enum vmscan_throttle_state { + VMSCAN_THROTTLE_WRITEBACK, + NR_VMSCAN_THROTTLE, +}; + #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++) #define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++) @@ -841,6 +847,13 @@ typedef struct pglist_data { int node_id; wait_queue_head_t kswapd_wait; wait_queue_head_t pfmemalloc_wait; + + /* workqueues for throttling reclaim for different reasons. */ + wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE]; + + atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */ + unsigned long nr_reclaim_start; /* nr pages written while throttled + * when throttling started. */ struct task_struct *kswapd; /* Protected by mem_hotplug_begin/end() */ int kswapd_order; --- a/include/trace/events/vmscan.h~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/include/trace/events/vmscan.h @@ -27,6 +27,14 @@ {RECLAIM_WB_ASYNC, "RECLAIM_WB_ASYNC"} \ ) : "RECLAIM_WB_NONE" +#define _VMSCAN_THROTTLE_WRITEBACK (1 << VMSCAN_THROTTLE_WRITEBACK) + +#define show_throttle_flags(flags) \ + (flags) ? __print_flags(flags, "|", \ + {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"} \ + ) : "VMSCAN_THROTTLE_NONE" + + #define trace_reclaim_flags(file) ( \ (file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \ (RECLAIM_WB_ASYNC) \ @@ -454,6 +462,32 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_en TP_ARGS(nr_reclaimed) ); +TRACE_EVENT(mm_vmscan_throttled, + + TP_PROTO(int nid, int usec_timeout, int usec_delayed, int reason), + + TP_ARGS(nid, usec_timeout, usec_delayed, reason), + + TP_STRUCT__entry( + __field(int, nid) + __field(int, usec_timeout) + __field(int, usec_delayed) + __field(int, reason) + ), + + TP_fast_assign( + __entry->nid = nid; + __entry->usec_timeout = usec_timeout; + __entry->usec_delayed = usec_delayed; + __entry->reason = 1U << reason; + ), + + TP_printk("nid=%d usec_timeout=%d usect_delayed=%d reason=%s", + __entry->nid, + __entry->usec_timeout, + __entry->usec_delayed, + show_throttle_flags(__entry->reason)) +); #endif /* _TRACE_VMSCAN_H */ /* This part must be outside protection */ --- a/include/trace/events/writeback.h~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/include/trace/events/writeback.h @@ -763,13 +763,6 @@ DEFINE_EVENT(writeback_congest_waited_te TP_ARGS(usec_timeout, usec_delayed) ); -DEFINE_EVENT(writeback_congest_waited_template, writeback_wait_iff_congested, - - TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed), - - TP_ARGS(usec_timeout, usec_delayed) -); - DECLARE_EVENT_CLASS(writeback_single_inode_template, TP_PROTO(struct inode *inode, --- a/mm/backing-dev.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/backing-dev.c @@ -1038,51 +1038,3 @@ long congestion_wait(int sync, long time return ret; } EXPORT_SYMBOL(congestion_wait); - -/** - * wait_iff_congested - Conditionally wait for a backing_dev to become uncongested or a pgdat to complete writes - * @sync: SYNC or ASYNC IO - * @timeout: timeout in jiffies - * - * In the event of a congested backing_dev (any backing_dev) this waits - * for up to @timeout jiffies for either a BDI to exit congestion of the - * given @sync queue or a write to complete. - * - * The return value is 0 if the sleep is for the full timeout. Otherwise, - * it is the number of jiffies that were still remaining when the function - * returned. return_value == timeout implies the function did not sleep. - */ -long wait_iff_congested(int sync, long timeout) -{ - long ret; - unsigned long start = jiffies; - DEFINE_WAIT(wait); - wait_queue_head_t *wqh = &congestion_wqh[sync]; - - /* - * If there is no congestion, yield if necessary instead - * of sleeping on the congestion queue - */ - if (atomic_read(&nr_wb_congested[sync]) == 0) { - cond_resched(); - - /* In case we scheduled, work out time remaining */ - ret = timeout - (jiffies - start); - if (ret < 0) - ret = 0; - - goto out; - } - - /* Sleep until uncongested or a write happens */ - prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); - ret = io_schedule_timeout(timeout); - finish_wait(wqh, &wait); - -out: - trace_writeback_wait_iff_congested(jiffies_to_usecs(timeout), - jiffies_to_usecs(jiffies - start)); - - return ret; -} -EXPORT_SYMBOL(wait_iff_congested); --- a/mm/filemap.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/filemap.c @@ -1612,6 +1612,7 @@ void end_page_writeback(struct page *pag smp_mb__after_atomic(); wake_up_page(page, PG_writeback); + acct_reclaim_writeback(page); put_page(page); } EXPORT_SYMBOL(end_page_writeback); --- a/mm/internal.h~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/internal.h @@ -34,6 +34,17 @@ void page_writeback_init(void); +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, + int nr_throttled); +static inline void acct_reclaim_writeback(struct page *page) +{ + pg_data_t *pgdat = page_pgdat(page); + int nr_throttled = atomic_read(&pgdat->nr_writeback_throttled); + + if (nr_throttled) + __acct_reclaim_writeback(pgdat, page, nr_throttled); +} + vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, --- a/mm/page_alloc.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/page_alloc.c @@ -7408,6 +7408,8 @@ static void pgdat_init_kcompactd(struct static void __meminit pgdat_init_internals(struct pglist_data *pgdat) { + int i; + pgdat_resize_init(pgdat); pgdat_init_split_queue(pgdat); @@ -7416,6 +7418,9 @@ static void __meminit pgdat_init_interna init_waitqueue_head(&pgdat->kswapd_wait); init_waitqueue_head(&pgdat->pfmemalloc_wait); + for (i = 0; i < NR_VMSCAN_THROTTLE; i++) + init_waitqueue_head(&pgdat->reclaim_wait[i]); + pgdat_page_ext_init(pgdat); lruvec_init(&pgdat->__lruvec); } --- a/mm/vmscan.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/vmscan.c @@ -1006,6 +1006,64 @@ static void handle_write_error(struct ad unlock_page(page); } +static void +reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, + long timeout) +{ + wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason]; + long ret; + DEFINE_WAIT(wait); + + /* + * Do not throttle IO workers, kthreads other than kswapd or + * workqueues. They may be required for reclaim to make + * forward progress (e.g. journalling workqueues or kthreads). + */ + if (!current_is_kswapd() && + current->flags & (PF_IO_WORKER|PF_KTHREAD)) + return; + + if (atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) { + WRITE_ONCE(pgdat->nr_reclaim_start, + node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + } + + prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); + ret = schedule_timeout(timeout); + finish_wait(wqh, &wait); + atomic_dec(&pgdat->nr_writeback_throttled); + + trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout), + jiffies_to_usecs(timeout - ret), + reason); +} + +/* + * Account for pages written if tasks are throttled waiting on dirty + * pages to clean. If enough pages have been cleaned since throttling + * started then wakeup the throttled tasks. + */ +void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, + int nr_throttled) +{ + unsigned long nr_written; + + inc_node_page_state(page, NR_THROTTLED_WRITTEN); + + /* + * This is an inaccurate read as the per-cpu deltas may not + * be synchronised. However, given that the system is + * writeback throttled, it is not worth taking the penalty + * of getting an accurate count. At worst, the throttle + * timeout guarantees forward progress. + */ + nr_written = node_page_state(pgdat, NR_THROTTLED_WRITTEN) - + READ_ONCE(pgdat->nr_reclaim_start); + + if (nr_written > SWAP_CLUSTER_MAX * nr_throttled) + wake_up(&pgdat->reclaim_wait[VMSCAN_THROTTLE_WRITEBACK]); +} + /* possible outcome of pageout() */ typedef enum { /* failed to write page out, page is locked */ @@ -1411,9 +1469,8 @@ retry: /* * The number of dirty pages determines if a node is marked - * reclaim_congested which affects wait_iff_congested. kswapd - * will stall and start writing pages if the tail of the LRU - * is all dirty unqueued pages. + * reclaim_congested. kswapd will stall and start writing + * pages if the tail of the LRU is all dirty unqueued pages. */ page_check_dirty_writeback(page, &dirty, &writeback); if (dirty || writeback) @@ -3179,19 +3236,19 @@ again: * If kswapd scans pages marked for immediate * reclaim and under writeback (nr_immediate), it * implies that pages are cycling through the LRU - * faster than they are written so also forcibly stall. + * faster than they are written so forcibly stall + * until some pages complete writeback. */ if (sc->nr.immediate) - congestion_wait(BLK_RW_ASYNC, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); } /* - * Tag a node/memcg as congested if all the dirty pages - * scanned were backed by a congested BDI and - * wait_iff_congested will stall. + * Tag a node/memcg as congested if all the dirty pages were marked + * for writeback and immediate reclaim (counted in nr.congested). * * Legacy memcg will stall in page writeback so avoid forcibly - * stalling in wait_iff_congested(). + * stalling in reclaim_throttle(). */ if ((current_is_kswapd() || (cgroup_reclaim(sc) && writeback_throttling_sane(sc))) && @@ -3199,15 +3256,15 @@ again: set_bit(LRUVEC_CONGESTED, &target_lruvec->flags); /* - * Stall direct reclaim for IO completions if underlying BDIs - * and node is congested. Allow kswapd to continue until it + * Stall direct reclaim for IO completions if the lruvec is + * node is congested. Allow kswapd to continue until it * starts encountering unqueued dirty pages or cycling through * the LRU too quickly. */ if (!current_is_kswapd() && current_may_throttle() && !sc->hibernation_mode && test_bit(LRUVEC_CONGESTED, &target_lruvec->flags)) - wait_iff_congested(BLK_RW_ASYNC, HZ/10); + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK, HZ/10); if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc)) @@ -4285,6 +4342,7 @@ static int kswapd(void *p) WRITE_ONCE(pgdat->kswapd_order, 0); WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); + atomic_set(&pgdat->nr_writeback_throttled, 0); for ( ; ; ) { bool ret; --- a/mm/vmstat.c~mm-vmscan-throttle-reclaim-until-some-writeback-completes-if-congested +++ a/mm/vmstat.c @@ -1225,6 +1225,7 @@ const char * const vmstat_text[] = { "nr_vmscan_immediate_reclaim", "nr_dirtied", "nr_written", + "nr_throttled_written", "nr_kernel_misc_reclaimable", "nr_foll_pin_acquired", "nr_foll_pin_released",