Message ID | 20200519201912.1564477-5-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Tue, 19 May 2020 22:19:08 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > From: Ingo Molnar <mingo@kernel.org> > > The various struct pagevec per CPU variables are protected by disabling > either preemption or interrupts across the critical sections. Inside > these sections spinlocks have to be acquired. > > These spinlocks are regular spinlock_t types which are converted to > "sleeping" spinlocks on PREEMPT_RT enabled kernels. Obviously sleeping > locks cannot be acquired in preemption or interrupt disabled sections. > > local locks provide a trivial way to substitute preempt and interrupt > disable instances. On a non PREEMPT_RT enabled kernel local_lock() maps > to preempt_disable() and local_lock_irq() to local_irq_disable(). > > Add swapvec_lock to protect the per-CPU lru_add_pvec and > lru_lazyfree_pvecs variables and rotate_lock to protect the per-CPU > lru_rotate_pvecs variable > > Change the relevant call sites to acquire these locks instead of using > preempt_disable() / get_cpu() / get_cpu_var() and local_irq_disable() / > local_irq_save(). > > There is neither a functional change nor a change in the generated > binary code for non PREEMPT_RT enabled non-debug kernels. > > When lockdep is enabled local locks have lockdep maps embedded. These > allow lockdep to validate the protections, i.e. inappropriate usage of a > preemption only protected sections would result in a lockdep warning > while the same problem would not be noticed with a plain > preempt_disable() based protection. > > local locks also improve readability as they provide a named scope for > the protections while preempt/interrupt disable are opaque scopeless. > > Finally local locks allow PREEMPT_RT to substitute them with real > locking primitives to ensure the correctness of operation in a fully > preemptible kernel. > No functional change. > > ... > > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -12,6 +12,7 @@ > #include <linux/fs.h> > #include <linux/atomic.h> > #include <linux/page-flags.h> > +#include <linux/locallock.h> Could we please make these local_lock.h and local_lock_internal.h? Making the filenames different from everything else is just irritating! > + local_lock(swapvec_lock); It's quite peculiar that these operations appear to be pass-by-value. All other locking operations are pass-by-reference - spin_lock(&lock), not spin_lock(lock). This is what the eye expects to see and it's simply more logical - calling code shouldn't have to "know" that the locking operations are implemented as cpp macros. And we'd be in a mess if someone tried to convert these to real C functions. Which prompts the question: why were all these operations implemented in the processor anyway? afaict they could have been written in C.
On Tue, May 19, 2020 at 04:58:37PM -0700, Andrew Morton wrote: > On Tue, 19 May 2020 22:19:08 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > + local_lock(swapvec_lock); > > It's quite peculiar that these operations appear to be pass-by-value. > All other locking operations are pass-by-reference - spin_lock(&lock), > not spin_lock(lock). This is what the eye expects to see and it's > simply more logical - calling code shouldn't have to "know" that the > locking operations are implemented as cpp macros. And we'd be in a > mess if someone tried to convert these to real C functions. The funny thing is that the documentation gets this right: +The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few +implications. For example, on a non-PREEMPT_RT kernel the following code +sequence works as expected:: + + local_lock_irq(&local_lock); + raw_spin_lock(&lock); but apparently the implementation changed without the documentation matching.
On Tue, May 19, 2020 at 10:19:08PM +0200, Sebastian Andrzej Siewior wrote: > diff --git a/mm/swap.c b/mm/swap.c > index bf9a79fed62d7..03c97d15fcd69 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -44,8 +44,14 @@ > /* How many pages do we try to swap or page in/out together? */ > int page_cluster; > > -static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); > + > +/* Protecting lru_rotate_pvecs */ > +static DEFINE_LOCAL_LOCK(rotate_lock); > static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); > + > +/* Protecting the following struct pagevec */ > +DEFINE_LOCAL_LOCK(swapvec_lock); > +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); > static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); > static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); > static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); So personally I'd prefer to have this look like: struct lru_vecs { struct local_lock lock; struct pagevec add; struct pagevec rotate; struct pagevec deact_file; struct pagevec deact; struct pagevec lazyfree; #ifdef CONFIG_SMP struct pagevec active; #endif }; DEFINE_PER_CPU(struct lru_pvec, lru_pvec); or something, but I realize that is a lot of churn (although highly automated), so I'll leave that to the mm folks.
diff --git a/include/linux/swap.h b/include/linux/swap.h index e1bbf7a16b276..540b52c71bc95 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -12,6 +12,7 @@ #include <linux/fs.h> #include <linux/atomic.h> #include <linux/page-flags.h> +#include <linux/locallock.h> #include <asm/page.h> struct notifier_block; @@ -328,6 +329,7 @@ extern unsigned long nr_free_pagecache_pages(void); /* linux/mm/swap.c */ +DECLARE_LOCAL_LOCK(swapvec_lock); extern void lru_cache_add(struct page *); extern void lru_cache_add_anon(struct page *page); extern void lru_cache_add_file(struct page *page); diff --git a/mm/compaction.c b/mm/compaction.c index 46f0fcc93081e..77972c8d4dead 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2243,15 +2243,14 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) * would succeed. */ if (cc->order > 0 && last_migrated_pfn) { - int cpu; unsigned long current_block_start = block_start_pfn(cc->migrate_pfn, cc->order); if (last_migrated_pfn < current_block_start) { - cpu = get_cpu(); - lru_add_drain_cpu(cpu); + local_lock(swapvec_lock); + lru_add_drain_cpu(smp_processor_id()); drain_local_pages(cc->zone); - put_cpu(); + local_unlock(swapvec_lock); /* No more flushing until we migrate again */ last_migrated_pfn = 0; } diff --git a/mm/swap.c b/mm/swap.c index bf9a79fed62d7..03c97d15fcd69 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -44,8 +44,14 @@ /* How many pages do we try to swap or page in/out together? */ int page_cluster; -static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); + +/* Protecting lru_rotate_pvecs */ +static DEFINE_LOCAL_LOCK(rotate_lock); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); + +/* Protecting the following struct pagevec */ +DEFINE_LOCAL_LOCK(swapvec_lock); +static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); @@ -254,11 +260,11 @@ void rotate_reclaimable_page(struct page *page) unsigned long flags; get_page(page); - local_irq_save(flags); + local_lock_irqsave(rotate_lock, flags); pvec = this_cpu_ptr(&lru_rotate_pvecs); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_move_tail(pvec); - local_irq_restore(flags); + local_unlock_irqrestore(rotate_lock, flags); } } @@ -308,12 +314,14 @@ void activate_page(struct page *page) { page = compound_head(page); if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(activate_page_pvecs); + struct pagevec *pvec; + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&activate_page_pvecs); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, __activate_page, NULL); - put_cpu_var(activate_page_pvecs); + local_unlock(swapvec_lock); } } @@ -335,9 +343,12 @@ void activate_page(struct page *page) static void __lru_cache_activate_page(struct page *page) { - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); + struct pagevec *pvec; int i; + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&lru_add_pvec); + /* * Search backwards on the optimistic assumption that the page being * activated has just been added to this pagevec. Note that only @@ -357,7 +368,7 @@ static void __lru_cache_activate_page(struct page *page) } } - put_cpu_var(lru_add_pvec); + local_unlock(swapvec_lock); } /* @@ -404,12 +415,14 @@ EXPORT_SYMBOL(mark_page_accessed); static void __lru_cache_add(struct page *page) { - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); + struct pagevec *pvec; + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&lru_add_pvec); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) __pagevec_lru_add(pvec); - put_cpu_var(lru_add_pvec); + local_unlock(swapvec_lock); } /** @@ -603,9 +616,9 @@ void lru_add_drain_cpu(int cpu) unsigned long flags; /* No harm done if a racing interrupt already did this */ - local_irq_save(flags); + local_lock_irqsave(rotate_lock, flags); pagevec_move_tail(pvec); - local_irq_restore(flags); + local_unlock_irqrestore(rotate_lock, flags); } pvec = &per_cpu(lru_deactivate_file_pvecs, cpu); @@ -641,11 +654,14 @@ void deactivate_file_page(struct page *page) return; if (likely(get_page_unless_zero(page))) { - struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs); + struct pagevec *pvec; + + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&lru_deactivate_file_pvecs); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); - put_cpu_var(lru_deactivate_file_pvecs); + local_unlock(swapvec_lock); } } @@ -660,12 +676,14 @@ void deactivate_file_page(struct page *page) void deactivate_page(struct page *page) { if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + struct pagevec *pvec; + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&lru_deactivate_pvecs); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); - put_cpu_var(lru_deactivate_pvecs); + local_unlock(swapvec_lock); } } @@ -680,19 +698,22 @@ void mark_page_lazyfree(struct page *page) { if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs); + struct pagevec *pvec; + local_lock(swapvec_lock); + pvec = this_cpu_ptr(&lru_lazyfree_pvecs); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); - put_cpu_var(lru_lazyfree_pvecs); + local_unlock(swapvec_lock); } } void lru_add_drain(void) { - lru_add_drain_cpu(get_cpu()); - put_cpu(); + local_lock(swapvec_lock); + lru_add_drain_cpu(smp_processor_id()); + local_unlock(swapvec_lock); } #ifdef CONFIG_SMP