From patchwork Fri Nov 24 13:26:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13467668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72F09C624B4 for ; Fri, 24 Nov 2023 13:27:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08D528D0083; Fri, 24 Nov 2023 08:27:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 015C88D006E; Fri, 24 Nov 2023 08:27:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD2078D0083; Fri, 24 Nov 2023 08:27:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C87BC8D006E for ; Fri, 24 Nov 2023 08:27:36 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A9EC9A062A for ; Fri, 24 Nov 2023 13:27:36 +0000 (UTC) X-FDA: 81492924912.21.CB2EEA5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 0660D4001E for ; Fri, 24 Nov 2023 13:27:34 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evv8hcdE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700832455; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IWxh+srF4eII3yzN2Ch68mHrKm0yO3+Xlp6snuZfYi4=; b=c2sf20GvgrQE5L2MpuGE6sW3MHTiZ20v5NmKmI10SXeLvWcGHkW2Cm7ObY92UevNNAXy+Y 1hE0l438RgRPaIjxnFbi4d7NI3G9q2HQBtHe+p/HEa6QMd5BzHa5cq+AXbWfX1ghHQmTkN tfNpuFrFBzcBLs9bnCmjU26RvAzQOpU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=evv8hcdE; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700832455; a=rsa-sha256; cv=none; b=1mZZDHQDOQBQXnJCRqIEe2eojat2peZQGclFM2LMZp704VBsw/lU90/HxceNBGOLvye5S9 bSj4JiOZGlfGTTXxE0cmorJKnyy8+z2xFegdNXPRschlLFTNpBGip4UxnBzeeyWrcsWzef RbGr6Ih/Cd2ndDZLqMcgrSb10Tzm5Wg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700832454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IWxh+srF4eII3yzN2Ch68mHrKm0yO3+Xlp6snuZfYi4=; b=evv8hcdEYQIPUHMbWsl2JwW7m3xwWCTjaiLvCF8VyeLh7IMXw/waukZSIJioNWnHcA7s2d Rryh+p1yySbYSr2ZuSCKvWa3DD6BzLfOSQEpfV3e62v5gdtKHTqis64kr5tbe8eaqEpO3a Yvwacs+sdWlfWA+G5B8sFDBDRhHiz/c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-344-yC66DjjfM-2TnstsUc80dA-1; Fri, 24 Nov 2023 08:27:30 -0500 X-MC-Unique: yC66DjjfM-2TnstsUc80dA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5C903185A780; Fri, 24 Nov 2023 13:27:29 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 150282166B2A; Fri, 24 Nov 2023 13:27:25 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Linus Torvalds , Ryan Roberts , Matthew Wilcox , Hugh Dickins , Yin Fengwei , Yang Shi , Ying Huang , Zi Yan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , "Paul E. McKenney" Subject: [PATCH WIP v1 16/20] atomic_seqcount: support a single exclusive writer in the absence of other writers Date: Fri, 24 Nov 2023 14:26:21 +0100 Message-ID: <20231124132626.235350-17-david@redhat.com> In-Reply-To: <20231124132626.235350-1-david@redhat.com> References: <20231124132626.235350-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 0660D4001E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: dsdqrby4hz4pzzyre6cyurcqx6azcrgy X-HE-Tag: 1700832454-705984 X-HE-Meta: U2FsdGVkX1/H7o15MOZminpO40WzIjdSWiHoP1ymK+ytDGpx5IU50F16raaRbEVvWuQhLGGq/Erh5cL3Jw0uFCHDCDx4yHO5b4yZv3nY4vNl+IYVvxSw4PFpdbU3fDs3OjFVOyK/NGZxyvzQbg7O6TLNhuKt4VcOvMEurVEQbrYwvC0PuEXLgw0bERKvYX9w5SBOq4TuGB9tUnhv1bWttLXx/nl0Lgv88LS1hfE/oouLofIzNSL5jXYsLSCwTUZSjW4FzwUi1PUOeVXRHwOb2sG0E1TsFIiGWp/AGGqVoRGkVRB1Zv5Qx4/5t8GgToI0RbpmyIb7GmW6fibyATVaz9xCaXvQmEC+IPwZIN/jqkAJ8WyvFoj5GnMdrAWifUyhkrM1OM5pS54NFG9a84TOnQgQsQt+QibpW1nUGOfjGF3oQ04cxww1Xz/dwz/rpwi3trkngZD4DK8NGS4CNeZiBlerP/HfPFIo9ipV5ReSbi0SYTe7WI2wu73cglF55v5VOjiVgrOvRLLIQkEO7LEaNrM/JsjF8YDIevvinnReCRBqslJUAPEZ22tGRnXGEW7hD8uwUjrGkulO0wep94+44ZzOjuvpH9AdAj0TNLOdunCzme1nDjIatyxSB/92vAqpCacpXkjcWGXxQDpR57Aql10FUJbua8WsY4Enc8Afin8PMm0IMervCLIJw/OawzKiNlmHAN9K76QU1yzA6EcZr5RO9GYI9HT82rsxJx83aZ3YKSfQymaEvRuoe5vIxMGF/aYGTUcSiNLav8YkBlpT89HxdQDGORaybkHTAjFBzd/5vusCdNT1mLkkZ2PMlKH1Vh+5+liI07d+k8qc85qyfWen4GAasPkwptTj0c6PHtPY3AFnEw4gBDXW0hbR980Az/+mMx3wL0vc5R4m81CBR8RCF8eFslSewjbqW5RyY9wiXhwoyIkuafheSO+AdtjlLHz9ChoM0qSOZseKIAX TEggiX7z 5zNSPmG1T3WjHNgcqPED1YoyHdWFbpeqCdez2mNkCdjgpUzJSv6ae4k/QcCXHBRfYXkTEq5KFrHqlFyIaLPk54kSE9dBgo/EWPlwTmSc3u+FjwrhHjBmnWrKfeYoqsAitSTtoxu+a5etHip9rbfyCJFEbhkIYqNDi0VO956t+Tc0yjvT/wgEvJfSWMSF/t8VLTbKn2n8LePc8bx8+VSyVAtCa77QgyNBRaJR+/1Zmj3TJgJRHoJqGja03JHsWa7jlBJzS1po6hp4negmx/1OElsUvMLTa/wbdOVrV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The current atomic seqcount requires that all writers must use atomic RMW operations in the critical section, which can result in quite some overhead on some platforms. In the common case, there is only a single writer, and ideally we'd be able to not use atomic RMW operations in that case, to reduce the overall number of atomic RMW operations on the fast path. So let's add support for a single exclusive writer. If there are no other writers, a writer can become the single exclusive writer by using an atomic cmpxchg on the atomic seqcount. However, if there is any concurrent writer (shared or exclusive), the writers become shared and only have to wait for a single exclusive writer to finish. So shared writers might be delayed a bit by the single exclusive writer, but they don't starve as they are guaranteed to make progress after the exclusive writer finished (that ideally runs faster than any shared writer due to no atomic RMW operations in the critical section). The exclusive path now effectively acts as a lock: if the trylock fails, we fallback to the shared path. We need acquire-release semantics that are implied by the full memory barriers that we are enforcing. Instead of the atomic_long_add_return(), we could keep using an atomic_long_add() + atomic_long_read(). But I suspect that doesn't really matter. If it ever matters, if will be easy to optimize. Signed-off-by: David Hildenbrand --- include/linux/atomic_seqcount.h | 101 ++++++++++++++++++++++++++------ include/linux/rmap.h | 5 +- 2 files changed, 85 insertions(+), 21 deletions(-) diff --git a/include/linux/atomic_seqcount.h b/include/linux/atomic_seqcount.h index 109447b663a1..00286a9da221 100644 --- a/include/linux/atomic_seqcount.h +++ b/include/linux/atomic_seqcount.h @@ -8,8 +8,11 @@ /* * raw_atomic_seqcount_t -- a reader-writer consistency mechanism with - * lockless readers (read-only retry loops), and lockless writers. - * The writers must use atomic RMW operations in the critical section. + * lockless readers (read-only retry loops), and (almost) lockless writers. + * Shared writers must use atomic RMW operations in the critical section, + * a single exclusive writer can avoid atomic RMW operations in the critical + * section. Shared writers will always have to wait for at most one exclusive + * writer to finish in order to make progress. * * This locking mechanism is applicable when all individual operations * performed by writers can be expressed using atomic RMW operations @@ -38,9 +41,10 @@ typedef struct raw_atomic_seqcount { /* 65536 CPUs */ #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x0000000000008000ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x000000000000fffful -#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000000fffful +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x0000000000010000ul +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000000001fffful /* We have 48bit for the actual sequence. */ -#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000010000ul +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x0000000000020000ul #else /* CONFIG_64BIT */ @@ -48,9 +52,10 @@ typedef struct raw_atomic_seqcount { /* 64 CPUs */ #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX 0x00000040ul #define ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK 0x0000007ful -#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x0000007ful -/* We have 25bit for the actual sequence. */ -#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000080ul +#define ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER 0x00000080ul +#define ATOMIC_SEQCOUNT_WRITERS_MASK 0x000000fful +/* We have 24bit for the actual sequence. */ +#define ATOMIC_SEQCOUNT_SEQUENCE_STEP 0x00000100ul #endif /* CONFIG_64BIT */ @@ -126,44 +131,102 @@ static inline bool raw_read_atomic_seqcount_retry(raw_atomic_seqcount_t *s, /** * raw_write_seqcount_begin() - start a raw_seqcount_t write critical section * @s: Pointer to the raw_atomic_seqcount_t + * @try_exclusive: Whether to try becoming the exclusive writer. * * raw_write_seqcount_begin() opens the write critical section of the * given raw_seqcount_t. This function must not be used in interrupt context. + * + * Return: "true" when we are the exclusive writer and can avoid atomic RMW + * operations in the critical section. Otherwise, we are a shared + * writer and have to use atomic RMW operations in the critical + * section. Will always return "false" if @try_exclusive is not "true". */ -static inline void raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s) +static inline bool raw_write_atomic_seqcount_begin(raw_atomic_seqcount_t *s, + bool try_exclusive) { + unsigned long seqcount, seqcount_new; + BUILD_BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT)); #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT DEBUG_LOCKS_WARN_ON(in_interrupt()); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ preempt_disable(); - atomic_long_add(ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); - /* Store the sequence before any store in the critical section. */ - smp_mb__after_atomic(); + + /* If requested, can we just become the exclusive writer? */ + if (!try_exclusive) + goto shared; + + seqcount = atomic_long_read(&s->sequence); + if (unlikely(seqcount & ATOMIC_SEQCOUNT_WRITERS_MASK)) + goto shared; + + seqcount_new = seqcount | ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; + /* + * Store the sequence before any store in the critical section. Further, + * this implies an acquire so loads within the critical section are + * not reordered to be outside the critical section. + */ + if (atomic_long_try_cmpxchg(&s->sequence, &seqcount, seqcount_new)) + return true; +shared: + /* + * Indicate that there is a shared writer, and spin until the exclusive + * writer is done. This avoids writer starvation, because we'll always + * have to wait for at most one writer. + * + * We spin with preemption disabled to not reschedule to a reader that + * cannot make any progress either way. + * + * Store the sequence before any store in the critical section. + */ + seqcount = atomic_long_add_return(ATOMIC_SEQCOUNT_SHARED_WRITER, + &s->sequence); #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT - DEBUG_LOCKS_WARN_ON((atomic_long_read(&s->sequence) & - ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > + DEBUG_LOCKS_WARN_ON((seqcount & ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK) > ATOMIC_SEQCOUNT_SHARED_WRITERS_MAX); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + if (likely(!(seqcount & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER))) + return false; + + while (atomic_long_read(&s->sequence) & ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER) + cpu_relax(); + return false; } /** * raw_write_seqcount_end() - end a raw_seqcount_t write critical section * @s: Pointer to the raw_atomic_seqcount_t + * @exclusive: Return value of raw_write_atomic_seqcount_begin(). * * raw_write_seqcount_end() closes the write critical section of the * given raw_seqcount_t. */ -static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s) +static inline void raw_write_atomic_seqcount_end(raw_atomic_seqcount_t *s, + bool exclusive) { + unsigned long val = ATOMIC_SEQCOUNT_SEQUENCE_STEP; + + if (likely(exclusive)) { +#ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT + DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER)); +#endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ + val -= ATOMIC_SEQCOUNT_EXCLUSIVE_WRITER; + } else { #ifdef CONFIG_DEBUG_ATOMIC_SEQCOUNT - DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & - ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); + DEBUG_LOCKS_WARN_ON(!(atomic_long_read(&s->sequence) & + ATOMIC_SEQCOUNT_SHARED_WRITERS_MASK)); #endif /* CONFIG_DEBUG_ATOMIC_SEQCOUNT */ - /* Store the sequence after any store in the critical section. */ + val -= ATOMIC_SEQCOUNT_SHARED_WRITER; + } + /* + * Store the sequence after any store in the critical section. For + * the exclusive path, this further implies a release, so loads + * within the critical section are not reordered to be outside the + * cricial section. + */ smp_mb__before_atomic(); - atomic_long_add(ATOMIC_SEQCOUNT_SEQUENCE_STEP - - ATOMIC_SEQCOUNT_SHARED_WRITER, &s->sequence); + atomic_long_add(val, &s->sequence); preempt_enable(); } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 76e6fb1dad5c..0758dddc5528 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -295,12 +295,13 @@ static inline void __folio_write_large_rmap_begin(struct folio *folio) { VM_WARN_ON_FOLIO(!folio_test_large_rmappable(folio), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); - raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount); + raw_write_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount, + false); } static inline void __folio_write_large_rmap_end(struct folio *folio) { - raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount); + raw_write_atomic_seqcount_end(&folio->_rmap_atomic_seqcount, false); } void __folio_set_large_rmap_val(struct folio *folio, int count,