From patchwork Mon Dec 23 02:55:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13918426 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5B0BE77188 for ; Mon, 23 Dec 2024 03:06:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B3AD6B0088; Sun, 22 Dec 2024 22:06:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 364746B008A; Sun, 22 Dec 2024 22:06:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22BA76B0093; Sun, 22 Dec 2024 22:06:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 035646B0088 for ; Sun, 22 Dec 2024 22:06:33 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B785DC16B5 for ; Mon, 23 Dec 2024 03:06:33 +0000 (UTC) X-FDA: 82924733640.09.2331593 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf19.hostedemail.com (Postfix) with ESMTP id 35A131A0003 for ; Mon, 23 Dec 2024 03:05:53 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734923167; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=19DDCi8dxqUWgo/+86PoVklV8ZDIdwpi/jhq8lWONng=; b=oNoJdzVJ2kIbRhgcGFnj9CE3r9nHzssmGdAO/7j4c8mQkn1daMCUgbulWW+8bcZIaycr4H uOAtja/HGG/uHe63hyCvpg/JRvAibQOSQOPSBvxIhRjfD3L7OKwb+8y9YgzAMGaSNDDqUc hDIwawQE1VRES7s0e96fFFSuugwikBg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734923167; a=rsa-sha256; cv=none; b=Y4R4Ee9VecfDY3JUDpuZm02jqpgTiLPowAGJNNsv65VzPHR6csA+wNbZJNz0D4Rzi6gheW ZASYQzQam+Gazs6ZbbjFOby0pRcDm33CVyKWC3qQss6gPw7sonEd9J4I7wEPyaBuYrs5Hi ajGAQOaCDm7v4Eu1PlH+FQH25a0KrFc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tPYdo-000000001Ih-1YYe; Sun, 22 Dec 2024 21:57:56 -0500 From: Rik van Riel To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, akpm@linux-foundation.org, linux-mm@kvack.org Subject: [RFC PATCH v2 00/11] AMD broadcast TLB invalidation Date: Sun, 22 Dec 2024 21:55:06 -0500 Message-ID: <20241223025751.3268975-1-riel@surriel.com> X-Mailer: git-send-email 2.47.1 MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 35A131A0003 X-Stat-Signature: 3j8p5rxo71akb1ryc1qu6kdnbz19ckk8 X-Rspam-User: X-HE-Tag: 1734923153-194043 X-HE-Meta: U2FsdGVkX19DMbsMQpphBcEkLoqIywn9b1y0akMeXn2LqaHbOGFeyumgDl69mUTDYdwTKNR7pp+eSJ75KIKEAge2k+znIWGKJgxZWKEhfiryW32Caxcj1c/Ao5JyqAchD3t0+gPH2838eVAyAHiTtQLMWtnPDb04RkjapKEY5PnVC0moYK0znnnCBCZH0rObgSbnBJf//DPty/VEUOD1MPRuH83417d11C4rrr0wtj2FXE9YSGhxE3nKnjJ9x2YpBDmRnq7QpvGWcbhgGuZyUFtKabgvYT+FKLsYu4HWs/3PsTeCVgTyfC9E3sEP10qjGkxUdzkJLb8M/+0lTTrTlMyYFz2ShgWlP1zeB6OgIZJ10nA7kywE5ERKtLD5hgrEnKdcb05wEJn6kPBN3Gyv30HezNDZ9JV/wDcT4XV9sdjlcU1hzirSTSjUACJHwViZ6qQal+pHiWgRx+Nlo1v4qXZt9mnYiayWocNIlQLeL6TNKBQgwNOOsDAANgV2EVkM5mAhZNgITXc3cnCXN2XDVGs+fE6/wxGsFpSwjORt40OHL53uBs4AZX7t7OXz66mSYA+Pb0HW9Om2t+Z+r66vUgRtP46slBxJatl7LRNhRCZJW86UOpxdi7E5KcmaksC6jHalIPG9KefzQE2+6OvWg7rdr7kNk7o2xLnyyMBlkwAIq2jlE5VVS7plJak9NxmmOQ4sysIJwXqtuECc9ohEtKOK8wZdxuuoVj6mFyTjPEZGPpsYUUWMVrkgS3tbpeWZE7l9mdgKznmZyCtqyfSG3y+agvjkMNgP0PcakpBTicwfDXicS8llL3KOGKtTlBCiYjbastc/RwUnzpIqgxeVhaYyhg3WFRzugScwoD/T3D0nqkny++D5duaPB/WrxIIqRvbVpxzl7BKShyns9owrSlNzoZHraCkRdEK6Z3XNKOsAyiRX3lYoniEugqzqT8ge5CUtWtzncceSZ4FgGrd u8MXcTJA PguTN/ExjuhddCYRoACkkTDgglLl1+m7UaiDPsyBxoo+rrj+QduewgkPl9sF+7QkllJ7HTOWmjTn+BZdbRvNdeKe9uQWro0Qwhng4jM/NPy9hzjM9n5touHYexvIGqa07jKVpiTMV5+iCtSaArFsILvar8kz2taC5J2QDd6OiP/Yj/kAnykkd9OivObTwLKDWU1O1Tv8RtnWjZtHnL3W3URjE54Iirz2DizFD68ouRHzOX6opd856z4xXUdbk4VnYdhZfoHWf27daotbXoqx5JOjy/Ms5dGpoIBM1bjdDRDsPVy/Elp6AZtSPRBcRadjNtS97 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add support for broadcast TLB invalidation using AMD's INVLPGB instruction. This allows the kernel to invalidate TLB entries on remote CPUs without needing to send IPIs, without having to wait for remote CPUs to handle those interrupts, and with less interruption to what was running on those CPUs. Because x86 PCID space is limited, and there are some very large systems out there, broadcast TLB invalidation is only used for processes that are active on 3 or more CPUs, with the threshold being gradually increased the more the PCID space gets exhausted. Combined with the removal of unnecessary lru_add_drain calls (see https://lkml.org/lkml/2024/12/19/1388) this results in a nice performance boost for the will-it-scale tlb_flush2_threads test on an AMD Milan system with 36 cores: - vanilla kernel: 527k loops/second - lru_add_drain removal: 731k loops/second - only INVLPGB: 527k loops/second - lru_add_drain + INVLPGB: 1157k loops/second Profiling with only the INVLPGB changes showed while TLB invalidation went down from 40% of the total CPU time to only around 4% of CPU time, the contention simply moved to the LRU lock. Fixing both at the same time about doubles the number of iterations per second from this case. v2: - Apply suggestions by Peter and Borislav (thank you!) - Fix bug in arch_tlbbatch_flush, where we need to do both the TLBSYNC, and flush the CPUs that are in the cpumask. - Some updates to comments and changelogs based on questions.