From patchwork Mon Feb 18 12:38:56 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Srivatsa S. Bhat" X-Patchwork-Id: 2157721 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id E0082DF25A for ; Mon, 18 Feb 2013 12:45:15 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1U7Q3I-000394-T7; Mon, 18 Feb 2013 12:42:37 +0000 Received: from e28smtp01.in.ibm.com ([122.248.162.1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1U7Q1p-0002Mq-SS for linux-arm-kernel@lists.infradead.org; Mon, 18 Feb 2013 12:41:13 +0000 Received: from /spool/local by e28smtp01.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 18 Feb 2013 18:08:17 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp01.in.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 18 Feb 2013 18:08:16 +0530 Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 26C5F125804F for ; Mon, 18 Feb 2013 18:11:44 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r1ICesID21037104 for ; Mon, 18 Feb 2013 18:10:55 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r1ICerZm030799 for ; Mon, 18 Feb 2013 23:40:56 +1100 Received: from srivatsabhat.in.ibm.com (srivatsabhat.in.ibm.com [9.124.35.204] (may be forged)) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r1ICeqpv030754; Mon, 18 Feb 2013 23:40:52 +1100 From: "Srivatsa S. Bhat" Subject: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks To: tglx@linutronix.de, peterz@infradead.org, tj@kernel.org, oleg@redhat.com, paulmck@linux.vnet.ibm.com, rusty@rustcorp.com.au, mingo@kernel.org, akpm@linux-foundation.org, namhyung@kernel.org Date: Mon, 18 Feb 2013 18:08:56 +0530 Message-ID: <20130218123856.26245.46705.stgit@srivatsabhat.in.ibm.com> In-Reply-To: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com> References: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com> User-Agent: StGIT/0.14.3 MIME-Version: 1.0 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13021812-4790-0000-0000-000006F5F09A X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130218_074106_810731_0F84034B X-CRM114-Status: GOOD ( 21.11 ) X-Spam-Score: -1.9 (-) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [122.248.162.1 listed in list.dnswl.org] 3.0 KHOP_BIG_TO_CC Sent to 10+ recipients instaed of Bcc or a list -0.7 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: linux-arch@vger.kernel.org, linux@arm.linux.org.uk, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, fweisbec@gmail.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rostedt@goodmis.org, xiaoguangrong@linux.vnet.ibm.com, rjw@sisk.pl, sbw@mit.edu, wangyun@linux.vnet.ibm.com, srivatsa.bhat@linux.vnet.ibm.com, netdev@vger.kernel.org, vincent.guittot@linaro.org, walken@google.com, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Using global rwlocks as the backend for per-CPU rwlocks helps us avoid many lock-ordering related problems (unlike per-cpu locks). However, global rwlocks lead to unnecessary cache-line bouncing even when there are no writers present, which can slow down the system needlessly. Per-cpu counters can help solve the cache-line bouncing problem. So we actually use the best of both: per-cpu counters (no-waiting) at the reader side in the fast-path, and global rwlocks in the slowpath. [ Fastpath = no writer is active; Slowpath = a writer is active ] IOW, the readers just increment/decrement their per-cpu refcounts (disabling interrupts during the updates, if necessary) when no writer is active. When a writer becomes active, he signals all readers to switch to global rwlocks for the duration of his activity. The readers switch over when it is safe for them (ie., when they are about to start a fresh, non-nested read-side critical section) and start using (holding) the global rwlock for read in their subsequent critical sections. The writer waits for every existing reader to switch, and then acquires the global rwlock for write and enters his critical section. Later, the writer signals all readers that he is done, and that they can go back to using their per-cpu refcounts again. Note that the lock-safety (despite the per-cpu scheme) comes from the fact that the readers can *choose* _when_ to switch to rwlocks upon the writer's signal. And the readers don't wait on anybody based on the per-cpu counters. The only true synchronization that involves waiting at the reader-side in this scheme, is the one arising from the global rwlock, which is safe from circular locking dependency issues. Reader-writer locks and per-cpu counters are recursive, so they can be used in a nested fashion in the reader-path, which makes per-CPU rwlocks also recursive. Also, this design of switching the synchronization scheme ensures that you can safely nest and use these locks in a very flexible manner. I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful suggestions and ideas, which inspired and influenced many of the decisions in this as well as previous designs. Thanks a lot Michael and Xiao! Cc: David Howells Signed-off-by: Srivatsa S. Bhat --- lib/percpu-rwlock.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 137 insertions(+), 2 deletions(-) diff --git a/lib/percpu-rwlock.c b/lib/percpu-rwlock.c index f938096..edefdea 100644 --- a/lib/percpu-rwlock.c +++ b/lib/percpu-rwlock.c @@ -27,6 +27,24 @@ #include #include +#include + + +#define reader_yet_to_switch(pcpu_rwlock, cpu) \ + (ACCESS_ONCE(per_cpu_ptr((pcpu_rwlock)->rw_state, cpu)->reader_refcnt)) + +#define reader_percpu_nesting_depth(pcpu_rwlock) \ + (__this_cpu_read((pcpu_rwlock)->rw_state->reader_refcnt)) + +#define reader_uses_percpu_refcnt(pcpu_rwlock) \ + reader_percpu_nesting_depth(pcpu_rwlock) + +#define reader_nested_percpu(pcpu_rwlock) \ + (reader_percpu_nesting_depth(pcpu_rwlock) > 1) + +#define writer_active(pcpu_rwlock) \ + (__this_cpu_read((pcpu_rwlock)->rw_state->writer_signal)) + int __percpu_init_rwlock(struct percpu_rwlock *pcpu_rwlock, const char *name, struct lock_class_key *rwlock_key) @@ -55,21 +73,138 @@ void percpu_free_rwlock(struct percpu_rwlock *pcpu_rwlock) void percpu_read_lock(struct percpu_rwlock *pcpu_rwlock) { - read_lock(&pcpu_rwlock->global_rwlock); + preempt_disable(); + + /* + * Let the writer know that a reader is active, even before we choose + * our reader-side synchronization scheme. + */ + this_cpu_inc(pcpu_rwlock->rw_state->reader_refcnt); + + /* + * If we are already using per-cpu refcounts, it is not safe to switch + * the synchronization scheme. So continue using the refcounts. + */ + if (reader_nested_percpu(pcpu_rwlock)) + return; + + /* + * The write to 'reader_refcnt' must be visible before we read + * 'writer_signal'. + */ + smp_mb(); + + if (likely(!writer_active(pcpu_rwlock))) { + goto out; + } else { + /* Writer is active, so switch to global rwlock. */ + read_lock(&pcpu_rwlock->global_rwlock); + + /* + * We might have raced with a writer going inactive before we + * took the read-lock. So re-evaluate whether we still need to + * hold the rwlock or if we can switch back to per-cpu + * refcounts. (This also helps avoid heterogeneous nesting of + * readers). + */ + if (writer_active(pcpu_rwlock)) { + /* + * The above writer_active() check can get reordered + * with this_cpu_dec() below, but this is OK, because + * holding the rwlock is conservative. + */ + this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt); + } else { + read_unlock(&pcpu_rwlock->global_rwlock); + } + } + +out: + /* Prevent reordering of any subsequent reads/writes */ + smp_mb(); } void percpu_read_unlock(struct percpu_rwlock *pcpu_rwlock) { - read_unlock(&pcpu_rwlock->global_rwlock); + /* + * We never allow heterogeneous nesting of readers. So it is trivial + * to find out the kind of reader we are, and undo the operation + * done by our corresponding percpu_read_lock(). + */ + + /* Try to fast-path: a nested percpu reader is the simplest case */ + if (reader_nested_percpu(pcpu_rwlock)) { + this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt); + preempt_enable(); + return; + } + + /* + * Now we are left with only 2 options: a non-nested percpu reader, + * or a reader holding rwlock + */ + if (reader_uses_percpu_refcnt(pcpu_rwlock)) { + /* + * Complete the critical section before decrementing the + * refcnt. We can optimize this away if we are a nested + * reader (the case above). + */ + smp_mb(); + this_cpu_dec(pcpu_rwlock->rw_state->reader_refcnt); + } else { + read_unlock(&pcpu_rwlock->global_rwlock); + } + + preempt_enable(); } void percpu_write_lock(struct percpu_rwlock *pcpu_rwlock) { + unsigned int cpu; + + /* + * Tell all readers that a writer is becoming active, so that they + * start switching over to the global rwlock. + */ + for_each_possible_cpu(cpu) + per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = true; + + smp_mb(); + + /* + * Wait for every reader to see the writer's signal and switch from + * percpu refcounts to global rwlock. + * + * If a reader is still using percpu refcounts, wait for him to switch. + * Else, we can safely go ahead, because either the reader has already + * switched over, or the next reader that comes along on that CPU will + * notice the writer's signal and will switch over to the rwlock. + */ + + for_each_possible_cpu(cpu) { + while (reader_yet_to_switch(pcpu_rwlock, cpu)) + cpu_relax(); + } + + smp_mb(); /* Complete the wait-for-readers, before taking the lock */ write_lock(&pcpu_rwlock->global_rwlock); } void percpu_write_unlock(struct percpu_rwlock *pcpu_rwlock) { + unsigned int cpu; + + /* Complete the critical section before clearing ->writer_signal */ + smp_mb(); + + /* + * Inform all readers that we are done, so that they can switch back + * to their per-cpu refcounts. (We don't need to wait for them to + * see it). + */ + for_each_possible_cpu(cpu) + per_cpu_ptr(pcpu_rwlock->rw_state, cpu)->writer_signal = false; + write_unlock(&pcpu_rwlock->global_rwlock); }