From patchwork Sat Jan 8 16:43:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12707532 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D7DC433F5 for ; Sat, 8 Jan 2022 16:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A2416B0071; Sat, 8 Jan 2022 11:44:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC35F6B007B; Sat, 8 Jan 2022 11:44:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C649D6B0073; Sat, 8 Jan 2022 11:44:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id 876446B0073 for ; Sat, 8 Jan 2022 11:44:21 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 354F0181B048C for ; Sat, 8 Jan 2022 16:44:21 +0000 (UTC) X-FDA: 79007692722.25.1EDDE4E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf22.hostedemail.com (Postfix) with ESMTP id B8D5FC0016 for ; Sat, 8 Jan 2022 16:44:20 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A0100B80B3E; Sat, 8 Jan 2022 16:44:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E7CDC36AF2; Sat, 8 Jan 2022 16:44:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641660258; bh=UNKWT50c2f68zkUzaFiQUCdKH55gB0N7A0rbX5LXlDg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZczwHgJCsbxYQ+miDqJBqZB+PJZL285KFnyINpOxf6YtIR3q7IX8VHVvHlx4SJmza C2Q5btFR7YvhIRGjiHskhEF48wywxtqHFz0bs4PHvhD8cq2ECMxLPnqL0ityfYR6DL R3LzaGkC36Ezj/uRUM8j4zKQEDxSZlQA6315KPBQ8ndjiBO/M/DSR5yQaAh5hEHOWS CgKrhoM/iC3dtTzRrsZp6iMc4CT+q4je+3HY4bZZKOE3JsWcz6ObJ/NwT789Fn5bO3 rKeDuJLuVQtkLsg50jMuXUqIEG0pWzTsl5i8ZTUDG3SwKJKP3OisChDiA2bDWcRhkn BukE1CH6jJjVg== From: Andy Lutomirski To: Andrew Morton , Linux-MM Cc: Nicholas Piggin , Anton Blanchard , Benjamin Herrenschmidt , Paul Mackerras , Randy Dunlap , linux-arch , x86@kernel.org, Rik van Riel , Dave Hansen , Peter Zijlstra , Nadav Amit , Mathieu Desnoyers , Andy Lutomirski Subject: [PATCH 01/23] membarrier: Document why membarrier() works Date: Sat, 8 Jan 2022 08:43:46 -0800 Message-Id: X-Mailer: git-send-email 2.33.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: B8D5FC0016 X-Stat-Signature: 3a69naw9bonp99k38cbefxgf73jnncek Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ZczwHgJC; spf=pass (imf22.hostedemail.com: domain of luto@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspamd-Server: rspam10 X-HE-Tag: 1641660260-663569 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We had a nice comment at the top of membarrier.c explaining why membarrier worked in a handful of scenarios, but that consisted more of a list of things not to forget than an actual description of the algorithm and why it should be expected to work. Add a comment explaining my understanding of the algorithm. This exposes a couple of implementation issues that I will hopefully fix up in subsequent patches. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski Reviewed-by: Mathieu Desnoyers --- kernel/sched/membarrier.c | 60 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 58 insertions(+), 2 deletions(-) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index b5add64d9698..30e964b9689d 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -7,8 +7,64 @@ #include "sched.h" /* - * For documentation purposes, here are some membarrier ordering - * scenarios to keep in mind: + * The basic principle behind the regular memory barrier mode of + * membarrier() is as follows. membarrier() is called in one thread. Tt + * iterates over all CPUs, and, for each CPU, it either sends an IPI to + * that CPU or it does not. If it sends an IPI, then we have the + * following sequence of events: + * + * 1. membarrier() does smp_mb(). + * 2. membarrier() does a store (the IPI request payload) that is observed by + * the target CPU. + * 3. The target CPU does smp_mb(). + * 4. The target CPU does a store (the completion indication) that is observed + * by membarrier()'s wait-for-IPIs-to-finish request. + * 5. membarrier() does smp_mb(). + * + * So all pre-membarrier() local accesses are visible after the IPI on the + * target CPU and all pre-IPI remote accesses are visible after + * membarrier(). IOW membarrier() has synchronized both ways with the target + * CPU. + * + * (This has the caveat that membarrier() does not interrupt the CPU that it's + * running on at the time it sends the IPIs. However, if that is the CPU on + * which membarrier() starts and/or finishes, membarrier() does smp_mb() and, + * if not, then the scheduler's migration of membarrier() is a full barrier.) + * + * membarrier() skips sending an IPI only if membarrier() sees + * cpu_rq(cpu)->curr->mm != target mm. The sequence of events is: + * + * membarrier() | target CPU + * --------------------------------------------------------------------- + * | 1. smp_mb() + * | 2. set rq->curr->mm = other_mm + * | (by writing to ->curr or to ->mm) + * 3. smp_mb() | + * 4. read rq->curr->mm == other_mm | + * 5. smp_mb() | + * | 6. rq->curr->mm = target_mm + * | (by writing to ->curr or to ->mm) + * | 7. smp_mb() + * | + * + * All memory accesses on the target CPU prior to scheduling are visible + * to membarrier()'s caller after membarrier() returns due to steps 1, 2, 4 + * and 5. + * + * All memory accesses by membarrier()'s caller prior to membarrier() are + * visible to the target CPU after scheduling due to steps 3, 4, 6, and 7. + * + * Note that, tasks can change their ->mm, e.g. via kthread_use_mm(). So + * tasks that switch their ->mm must follow the same rules as the scheduler + * changing rq->curr, and the membarrier() code needs to do both dereferences + * carefully. + * + * GLOBAL_EXPEDITED support works the same way except that all references + * to rq->curr->mm are replaced with references to rq->membarrier_state. + * + * + * Specific examples of how this produces the documented properties of + * membarrier(): * * A) Userspace thread execution after IPI vs membarrier's memory * barrier before sending the IPI