From patchwork Tue Dec 10 16:40:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uladzislau Rezki (Sony)" X-Patchwork-Id: 13901747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DBF1C3DA4A for ; Tue, 10 Dec 2024 16:40:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCB5B6B0259; Tue, 10 Dec 2024 11:40:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C2D806B025C; Tue, 10 Dec 2024 11:40:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A828D6B0259; Tue, 10 Dec 2024 11:40:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 829D76B0259 for ; Tue, 10 Dec 2024 11:40:42 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3B6491A0C09 for ; Tue, 10 Dec 2024 16:40:42 +0000 (UTC) X-FDA: 82879612074.08.7128C87 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf08.hostedemail.com (Postfix) with ESMTP id 9AFC9160004 for ; Tue, 10 Dec 2024 16:40:25 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BbdxNRyy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733848830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UdPE5CaBHMhT6cavsDrFmlg9kwPywztBv9Mfz0gYFxI=; b=GpyOL3yVBmWJyjS56dScpctB9yxTdTwb8lmrEaM7qv8NiniotZPHQ/vMVJVfpLPBop5w65 Q7tGtVr5BYW8hVlYDP37VKO35hu1RVKnMLwLaWZFtZXpcHwbgVP3o+54JBHwnCiHmV0+q5 3s97fO7yD5OqKPzf10Au15x9QHdnwsk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733848830; a=rsa-sha256; cv=none; b=vorIgMRumpcRqsx8xs43fcrBIh5AGNVoIonZUWx8JFm9QbYfqHANQnACcBnwazKIsHqpSY EwReMqHUysQ82FqkC9MWdhy5PZ6wUuCsmkik0KoB7n6gFOEFFvjqn5vvdTC5UcV9xK+TON A9bstEP/iDCFMnK97c9Z9o47q6aG5CU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BbdxNRyy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=urezki@gmail.com Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-540218726d5so1912266e87.2 for ; Tue, 10 Dec 2024 08:40:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848839; x=1734453639; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UdPE5CaBHMhT6cavsDrFmlg9kwPywztBv9Mfz0gYFxI=; b=BbdxNRyy1FsWnSFSMmGkq00vcI5pXoXh22p9VbNzqxonjw2mHwoJnyV85rqV4Xi7/b /r/aHwuUpq+voQYHf82I1bZJ2v2UkcTOr1GzVzb6Ow9puFYFkKw/NMQYgfzWTFns+bgo kqbSqhnXr/W9Rb14Eu5H1rhkiiNycqdQhzBHKjBQoacdoLoWjuWu1ulCf31/BH/bJ8Jd erwExfSG32zibJERHP6fQno2rJvZrYG1w6+Z5XzhnGF41dyjavMv9XfJdsO9R1ZVFRq9 DqaNtd/iF0JbxUMM+/j89Ao8JpDGsd2LW0ravs+hPottA/farx0+2lKuVaymiF/aW0Ib pSXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848839; x=1734453639; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UdPE5CaBHMhT6cavsDrFmlg9kwPywztBv9Mfz0gYFxI=; b=VF7aJdv6dvVcDRtJH2pBO0ZAArhgrFOO6moPEZwOQ1FjowziqTH7HMXwq8zrP03MZq KlL/nfjH7To07DppC8WlaNf73kVQygd51QWelNSGFvDO8pmSjNyM92n6qf9fVdRajyxI uvGuAZCzdnTLwlmw3/mln5IW+cjkldnOT/dsV379KpnXj2VfePuDVXeAMmXsgd5mgzIx hxhPcojvee+LLz6kZJabKmHWEGJuRtTnwBTVQ/7jZ1U4Ap6zvIwLOVNUncgcArCVwCRY +ON16CUaFAPeZS/kxPRr2okagaOglqqMbXN0iB14ldAGJ3IZxtLbsfgNRvMXrmFAlH3j q1Fw== X-Gm-Message-State: AOJu0Yx0popIjJuW2Gj9D27TtRtFgVSfpm0TIFz+D3mwvs/brp5wNfmL TId9CFizSn1JpRWT3kOqgPbuAWXKEpVULYh2u4DCOG1Nrqma16MscOEm+g== X-Gm-Gg: ASbGncvCeCVYBe8XWRnIyfBwmT6Zf6Sk7OMeNFxwgQAD4cMNjoJONJujYJXSCMCfT34 SINFzbUeDeBPRHi5zJ+gwl2+mUNUgyLOD8ShvwOr5/HFsI1OPyXAFcqXabF4aP/XnFF/25JCmxr BFLX0tJcc11khfLxKdk8nPpmBleRC/bbj/RZL73BvdGOvlal7gYRDjO3MaTGyTJmIvG4k/jauA+ MbEYrrD8QHqzRcQclKh6zyCC4A6OnMQfVEdkVNIl6tOzyEeng== X-Google-Smtp-Source: AGHT+IFXg03XavY3zbgVuUO+42mxcr0qW2R92VYVB7iUi/Qw3cgWDDrElu+xTCGcLsSdSOB6pE9uDg== X-Received: by 2002:a05:6512:15a2:b0:540:1abe:d6d2 with SMTP id 2adb3069b0e04-54024107498mr1918094e87.35.1733848838529; Tue, 10 Dec 2024 08:40:38 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:37 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 1/5] rcu/kvfree: Temporary reclaim over call_rcu() Date: Tue, 10 Dec 2024 17:40:31 +0100 Message-Id: <20241210164035.3391747-2-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> MIME-Version: 1.0 X-Stat-Signature: ot94cb5bguu6beebdkipy9qo3343jexx X-Rspamd-Queue-Id: 9AFC9160004 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733848825-360789 X-HE-Meta: U2FsdGVkX19DiqHGlG+F+BJ599RYZWYqcp0QroClPV+9vSQLGT+D362SNYW7Q7mhgp/QzJDGLKq3EndPs21CyxW3icER8RqTBtUGzFqP2jZA19orqnqh3lwbtiiQXLojQqNF7zpDP5shLb62wC5U86GAIypi8VgpmHEGlG868LZs7GuvK9Ryj+k2oSRkVuMdgDW1BlL2WC93kTUw3Grkm58zXpuWNnYNYThsBf283BKkIlhDvmB7fbOd/OGn+vn9SYlvo1+Rsdrm+tzLNK/nYiaAb6d5jvw/GQBlbjKDTVxtfYvJeSHExCdtxr/EnGGsdGh4hbYIb+2o2eBfzjyMRyuh4jZDLDzSX2uFuI1MuBvgXG3jdrQVKc/z3JUf93UuHtAML1tOcnWiM08R4r87ksdB0sEzKKpxFsWpwvp70q5H7uctxBSW8M89yp9CrJSlXamy426H1XKsYi5USyuxng4czqBcuywDkJQqftrlE5qxD3j7ieHXr8kosynq23hEzy5Ejfj/1eowRRzuLOZtAjHp0PRvRkSw19HnaHaGporHGvqbijq/yOm2EX5vaolQMdJNuIlLhd78/qULjw034O+V6T9VpIJbG4Jln6O0NzQaz44Bzh6ZczO6UFfwfyg3JgWhXNaBVa5JYPIsi3ZQwpsfCDB4+s4gxvxAvlcAR0tinjB/GId6lWlUBfPTdQxyzwfYGBU+T3ucCossUDICEEzQgwqTZcaSf7yjWMKa8tb+vn1aKbc7yBN2oy0jMwEOrok4jdfbrwVxoIHIo8+XHtx+kymdgGoCapdr6DAGogmHyojbMV/S4D6mfPoqoT3TCdxkPbLZIzKT47nzfnS3hHwGRhqFmtJIXyqrk3mwdc1UcCPjHF4ofnFRhFyI9QWJDDzHX5uEYYp54FYmjXkCbtmHpb/lCGKy+XFtJn5ainvuE3EXFxTxsyw9Xjv2+wubpECYt0a20PwhUpnh9/L YurUoRn3 OaMy0uIVtXGMHeKvgcHeEkD0EQiHaG1wjRW2uHQTq60OdOdFt703e6CqJx4OwgpjIk3V9lHOAtB6iy/3lvEKYEET45NHx221gTddXcULyXiisCrdgKY5jHhDapYyG50UyNxBS92cvqy39+lsL+5HIP9CEHALCFZ6W2X6bPEEEuv4dwg5a/24R6vJlZJHKYDzGtjFjDOpHQaSJWMm8/zchAEOaf3aC+R8eng9jjm5Qq/Ar28GtbShH8zVBdG1eAkCvuQYsVKTBh7VJEThSgRYzTyRQe/uNto2zSU7t9uxSF0PJee+6WrVjEhiqFOQbYJ2A2d/Qv43Fy/RBFCeTHBtvANev4e2khFly5NCGBYgHpejBUXXondh0iiosZ/I4H3AkQxk4gX/Xr+47/auQnAyOFuzTk9qcFMpVZTsvoUj/oPQDfXA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.036238, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is to start a smooth process of moving a main functionality to the SLAB. Therefore this patch: - adds a support(temporary) to reclaim freed objects over call_rcu(); - disconnects a main functionality of kvfree_rcu() API by using call_rcu(); - directly reclaims an object for a single-argument variant; - adds an rcu_barrier() call to the kvfree_rcu_barrier(). Signed-off-by: Uladzislau Rezki (Sony) --- kernel/rcu/tree.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b1f883fcd918..ab24229dfa73 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp) debug_rcu_head_unqueue(rhp); rcu_lock_acquire(&rcu_callback_map); - trace_rcu_invoke_callback(rcu_state.name, rhp); f = rhp->func; - debug_rcu_head_callback(rhp); - WRITE_ONCE(rhp->func, (rcu_callback_t)0L); - f(rhp); + /* This is temporary, it will be removed when migration is over. */ + if (__is_kvfree_rcu_offset((unsigned long) f)) { + trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f); + kvfree((void *) rhp - (unsigned long) f); + } else { + trace_rcu_invoke_callback(rcu_state.name, rhp); + debug_rcu_head_callback(rhp); + WRITE_ONCE(rhp->func, (rcu_callback_t)0L); + f(rhp); + } rcu_lock_release(&rcu_callback_map); /* @@ -3787,6 +3793,16 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) struct kfree_rcu_cpu *krcp; bool success; + if (head) { + call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); + } else { + synchronize_rcu(); + kvfree(ptr); + } + + /* Disconnect the rest. */ + return; + /* * Please note there is a limitation for the head-less * variant, that is why there is a clear rule for such @@ -3871,6 +3887,9 @@ void kvfree_rcu_barrier(void) bool queued; int i, cpu; + /* Temporary. */ + rcu_barrier(); + /* * Firstly we detach objects and queue them over an RCU-batch * for all CPUs. Finally queued works are flushed for each CPU. From patchwork Tue Dec 10 16:40:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uladzislau Rezki (Sony)" X-Patchwork-Id: 13901748 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06EFCE7717F for ; Tue, 10 Dec 2024 16:40:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CA886B025C; Tue, 10 Dec 2024 11:40:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3567B6B025E; Tue, 10 Dec 2024 11:40:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BBAE6B025F; Tue, 10 Dec 2024 11:40:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C993C6B025C for ; Tue, 10 Dec 2024 11:40:43 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 85DADAEA23 for ; Tue, 10 Dec 2024 16:40:43 +0000 (UTC) X-FDA: 82879612284.06.2EC3E3D Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf12.hostedemail.com (Postfix) with ESMTP id D7AA34001C for ; Tue, 10 Dec 2024 16:40:31 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oye+Yj14; spf=pass (imf12.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733848831; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ClETbVzs2/eIlopkwOYmZeoS18KY5TfMVLeWid2GulI=; b=pUCFawp9CWT7v6TBEBGmD2K+Dk/20pRpDnNR+3Igc2LM+MkbHDg+q0kLgINGpK7UKrKE6D lyqD6pcD69r+2rYuX73b+y0rxk3xyMU3qIKjpmXTrQvTJI7YrDY3oUz4sn5YgCcjnh1qUO Bk8XTVuSJGgp1DMQ7Cfft4qx6mitvic= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oye+Yj14; spf=pass (imf12.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733848831; a=rsa-sha256; cv=none; b=fBbjLDkS5SjTdFjkbcgDeWCjAY3Q/+r4sqA0YmJlpV8AqHgpJA3GrbFsslM4ZCrlc6lQiJ lHwMahS71QpsSfvSh7KvDSgCLeaGDhDmxH/Hmfomcn8hggbStS4dRQSxiIBef6E00kCEL+ ZyMvWwWFlf/epGvsQzJZMXmBVuKQL/0= Received: by mail-lf1-f47.google.com with SMTP id 2adb3069b0e04-5401c52000fso2401050e87.2 for ; Tue, 10 Dec 2024 08:40:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848840; x=1734453640; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ClETbVzs2/eIlopkwOYmZeoS18KY5TfMVLeWid2GulI=; b=Oye+Yj14vF93oX7Esf2xKNQp7Q9klYVUyg3h8/UzEKxD3oRtMQg4Ld34nZdZJH72gi 5i+6h+7kStdFvwBn7mRQQsdDHwdwcliOxU6yKieJ3KgsZ8HBOK7OuyDLwqiY64vL3oCx 2IZqAbpzQeUQVDOJE9sGO2l87mdtU0U7ze/qBNYftE3YDwBfXMf/5sYlRhxJ8WtTF7uJ MPHtXeb5w/ioH0fBym1Mf2ejPweiCBoPs5pqwkCA2/PmVqdLgMHNrEczUyNiqo4w9Xpe TZf8ffBvs0+T+ckpaOf2GYzhsdmIDjBu5CWbUQof1mRw5TdCyFTnnFMDlC9LzVjL7Xrn YFsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848840; x=1734453640; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ClETbVzs2/eIlopkwOYmZeoS18KY5TfMVLeWid2GulI=; b=IgRHCmd5vspRHf34DI1eW6GA6qic4mhylwcOvHUptQ0d4Zy8LZ/sYiMSI8328/+L+v qMqm0dMmmadxhaH6hM/beDoIHP4Ns2iexVMSGmeAeK7V6CYcQA4gcweWJv1Q78GBEJ5z ZUKh6wMB0K+gQWo1zXxEguQfw7Z+mqxJHEhXhuJUbdwTx/lGc4Up6ibT4AIKMWaccx4U n0np+wv08khHVdSxW3G33rUiWvWAHwAEjqdN4JPpXemh2A/5PjFd9Hs466dS5BXK7bxB Qak9sSlbpEAIwnfjWbLRGbTZOncGycVt7w1KGpbhD0euvwz3MllXft6bwbo1maaxNPcK Eg3g== X-Gm-Message-State: AOJu0YxXLt/zXcAGLpGynvPWMPrLdNNXhyav//15aGpQH6Laj7ZqGghQ uADKz1MMUwJ1xxnIYXg3JIy/ynmDLoI1ZPByG61VZG2K6nvzEdgVoe14gw== X-Gm-Gg: ASbGncujM1Umcq3Kw8GQNbdVwhsYzLO4f+ltFEkS4C66p9vbXp3gCB09yH3vDd17GrW Iwxc4HZc+GEw2FYtDqc2QdT9xOq+ZeRQaulJJMSyjEGz92pZLRkDWVWS/O/rb99113xIaqA47Is cUvtDKckTl6fqvQBvKRpQcSn9vWI5pWxqGKbaQXuif87yNUwTD7mqylFAsGtuvZ62dI3XL52CPa LFcghbYcIzClciPddIKpBqpLQlWFa15LwiBTtMSwpaeWUxSJw== X-Google-Smtp-Source: AGHT+IFLmxwSZlSjwaR27QpJke6rURhJNzaMl3K1lpQik83HRlwxxz03FF+x+GNaY58FSlMOxRq9gA== X-Received: by 2002:a05:6512:ba6:b0:540:1fec:f322 with SMTP id 2adb3069b0e04-5402410485emr1773977e87.39.1733848839554; Tue, 10 Dec 2024 08:40:39 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:38 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 2/5] mm/slab: Copy main data structures of kvfree_rcu() Date: Tue, 10 Dec 2024 17:40:32 +0100 Message-Id: <20241210164035.3391747-3-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: D7AA34001C X-Stat-Signature: k3pmhbgt5fjgmmsg5ixpdhc4yun3ibe4 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1733848831-733685 X-HE-Meta: U2FsdGVkX19dOoox06ZZcqnnEPri7+csyfNVbjHQavFWmrW8D6ijolRNB5AWwxPwemUxD09XdgDqWif74cY6sJnajQxQw8Yphg+4Uav/HmKcws3h9QigWlMzA31SjgFMb0Fh3c/tnb36V8YclMiqmrqysB+on0IGSg2y6SF0f+/FfhLuFfh9yqTzs9ElnJcbY3xLuayC7+3RdgaS3NXE1gCxxWhG2Yu+t2IiUZCKWQ4La/xQhnfF2DNoOTFUI7vtXiyG20ReJPZKhupiIJk+FKZ5Q+JxkwOYJ6ubjFhyrzXueG2xlEJ1wbuO6nnSLwf7ggip/fx0n1pBHN8yMUpvmTkGYnDXRGgM7udnolJjDXgpdGXGmy8uTSGhvZB8oMYsxp9zGQENTqPSS3+RoAq7whnysKL3AAyc69mBk0CptpvhH1BA95d3Y1xzLhPOWZWf2/bjf1ElCu0N3OkgqgVkHXtJ0oaAmb2/b2+3Ct0+yjDv+nV4CBkMIIRYTdGgl1jWi5bqgCuhs8da2eeAhIseA9OEf7YKCP2Q4XR8L4Vz+C0FOUH/R2vk4SFTlNvIVG3CZ75YX41MuIEg4qFLxjCn/8j+fztgvmj9Kf0dqQOD0xhl2Hda4J6lklQ/c41hT/taRsTTLuHwHyBpbG7Gc2ejysMK8jonby/txqQ+uRO4jSBz4Zunu7O2hYJQWx5l6ziWENBK+wxp3rr226V3yqskQpk0iI/HuKR8tAmHrwSsYTwJvZUDkBPHzuPLsusa16wfJJsPXSjTPa6yftruNErq1mwpxcslPlWg5oPzSroofXXT9xDDvUYWjxBLPjESL5Jfo0GkoBi9asDal/0TEPnN8dEkLo0AozNDHh32hKvUqTXtXkvS8ws/YzTFUDneFfTotBR17XcflOpYGvQjU9yFA3rygVeA9lO9EnbhKKs4KE6WfpY9l9Zl3ur8bYFy1pcNhPj9tSauycJ8UgZSEhq +Ahf7CrF cuiVmOHNFUezXFak/YJ8I3Ae9Z827BSomsquiuF0tKSG/AvzASbcLPTSBoYop6jMNG3+Z7OgUDgf/AYICKIFChkEpbFb1Dv3n43Xxdq8GK4UJ2yL50PS+8ekquffNzM5DJ63DVSE27o9asCbc37S0QDEwOTzw+cPabhtBnAaIdlV5TO0POpkWeFjBNW+zVBkcOuvSA2Vzix/wRmAHmyyJYM+lfTvQ1A9HI7TtZTp3gpN13Z3h7fM9fZ1PfJZRPND9Y6Orp7cDhrZSpTXhUXWEYmXp1kvnWq3PbAAPGPvZSxINMROKpnIiFFEENLN/5tvTKxP0L+nRcVzHWZJPhYDAnCwWw+q2fPJeZo51zWg/dbbFTPVvz6JawXuMrunoiAYqY/JB/Xgd/w7LEK4tTp0atnfwua/wt03g5OBrrgoa6Nr8qjM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch copies main data structures of kvfree_rcu() API from the kernel/rcu/tree.c into slab_common.c file. Later on, it will be removed from the tree.c. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index 893d32059915..a249fdb0d92e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1338,3 +1338,98 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc); EXPORT_TRACEPOINT_SYMBOL(kfree); EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free); +/* Maximum number of jiffies to wait before draining a batch. */ +#define KFREE_DRAIN_JIFFIES (5 * HZ) +#define KFREE_N_BATCHES 2 +#define FREE_N_CHANNELS 2 + +/** + * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers + * @list: List node. All blocks are linked between each other + * @gp_snap: Snapshot of RCU state for objects placed to this bulk + * @nr_records: Number of active pointers in the array + * @records: Array of the kvfree_rcu() pointers + */ +struct kvfree_rcu_bulk_data { + struct list_head list; + struct rcu_gp_oldstate gp_snap; + unsigned long nr_records; + void *records[] __counted_by(nr_records); +}; + +/* + * This macro defines how many entries the "records" array + * will contain. It is based on the fact that the size of + * kvfree_rcu_bulk_data structure becomes exactly one page. + */ +#define KVFREE_BULK_MAX_ENTR \ + ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) + +/** + * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests + * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period + * @head_free: List of kfree_rcu() objects waiting for a grace period + * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees. + * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period + * @krcp: Pointer to @kfree_rcu_cpu structure + */ + +struct kfree_rcu_cpu_work { + struct rcu_work rcu_work; + struct rcu_head *head_free; + struct rcu_gp_oldstate head_free_gp_snap; + struct list_head bulk_head_free[FREE_N_CHANNELS]; + struct kfree_rcu_cpu *krcp; +}; + +/** + * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period + * @head: List of kfree_rcu() objects not yet waiting for a grace period + * @head_gp_snap: Snapshot of RCU state for objects placed to "@head" + * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period + * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period + * @lock: Synchronize access to this structure + * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES + * @initialized: The @rcu_work fields have been initialized + * @head_count: Number of objects in rcu_head singular list + * @bulk_count: Number of objects in bulk-list + * @bkvcache: + * A simple cache list that contains objects for reuse purpose. + * In order to save some per-cpu space the list is singular. + * Even though it is lockless an access has to be protected by the + * per-cpu lock. + * @page_cache_work: A work to refill the cache when it is empty + * @backoff_page_cache_fill: Delay cache refills + * @work_in_progress: Indicates that page_cache_work is running + * @hrtimer: A hrtimer for scheduling a page_cache_work + * @nr_bkv_objs: number of allocated objects at @bkvcache. + * + * This is a per-CPU structure. The reason that it is not included in + * the rcu_data structure is to permit this code to be extracted from + * the RCU files. Such extraction could allow further optimization of + * the interactions with the slab allocators. + */ +struct kfree_rcu_cpu { + // Objects queued on a linked list + // through their rcu_head structures. + struct rcu_head *head; + unsigned long head_gp_snap; + atomic_t head_count; + + // Objects queued on a bulk-list. + struct list_head bulk_head[FREE_N_CHANNELS]; + atomic_t bulk_count[FREE_N_CHANNELS]; + + struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; + raw_spinlock_t lock; + struct delayed_work monitor_work; + bool initialized; + + struct delayed_work page_cache_work; + atomic_t backoff_page_cache_fill; + atomic_t work_in_progress; + struct hrtimer hrtimer; + + struct llist_head bkvcache; + int nr_bkv_objs; +}; From patchwork Tue Dec 10 16:40:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uladzislau Rezki (Sony)" X-Patchwork-Id: 13901749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7D73E77180 for ; Tue, 10 Dec 2024 16:40:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57BED6B025E; Tue, 10 Dec 2024 11:40:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 527866B0260; Tue, 10 Dec 2024 11:40:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3530D6B0261; Tue, 10 Dec 2024 11:40:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0B4686B025E for ; Tue, 10 Dec 2024 11:40:45 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B6450A0CF6 for ; Tue, 10 Dec 2024 16:40:44 +0000 (UTC) X-FDA: 82879612704.05.5DB5FDD Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf02.hostedemail.com (Postfix) with ESMTP id 9132C8000F for ; Tue, 10 Dec 2024 16:40:00 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="a+L/6bze"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733848822; a=rsa-sha256; cv=none; b=sUGaFgr7IeLizDpwhbYOV6q0QqdBLMgWjUtk44vJlJabe+VN5tlkQvsuGx62losl6kqa1w KodNAK/juGEqWVpE+tYMg7R5UyMELcCE96JU7SrkT5XNwGv6clrCjMA0FinFfvaWWqsY8Z dFwZJmEU+ol5gima+Tws/E/Kv67tnmk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="a+L/6bze"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733848822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dvLl1/wUXAtZR/OsurduIKqJhQ3HiLLZga1zIHvCWGI=; b=hRrfoCFWWv9SVIKOA1B86dp+P26zg4M11kQyTiZJaQ5lY4wsBcY+l30+ygIVVnTduo40XW 8SBymK2IbReJ5bI0jYdXcSFj5BT243fReDk6ix1bTpVD/TElZaeerrkQjPO27jOuUmOih6 lVW60syQ37wKjPiZiG8wWgb8jgK+Xug= Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-3003e203acaso29946371fa.1 for ; Tue, 10 Dec 2024 08:40:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848841; x=1734453641; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dvLl1/wUXAtZR/OsurduIKqJhQ3HiLLZga1zIHvCWGI=; b=a+L/6bzegP2SNdSkkTeMzpxiP78NNHNvLWaLr79KjHg4TF9+h/Uhh9anS4h4S2M8Wd kdAZpK/HfHM8g1pvQHb+JwRTvLArPa7aADHa7/cwLPIthERRVrtSSkKhaAHQSaEHC96s jjxfawhig4e41YZuiQH7av8WsgFRdaW92hLempxGNv96nDTor82lwQiI+0Ku2nScvGSi TMwe4LqYW24P8vvMbZQX333QIZS7lqN13pyoWYdvJcAePXBCHK29eleWJXMz7ybWazYo r1m+abtVY3Qvr2Yl2mEywvSA+e4mx1uwWIzwbSh4fNwKTmARyS/SZHGL5szglGVJmSkl G5WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848841; x=1734453641; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dvLl1/wUXAtZR/OsurduIKqJhQ3HiLLZga1zIHvCWGI=; b=VUbOCTH2JOykr7sxITTOQh+i6SdP8NpN/ZHbgIodSOIVzbUFZjj2huUeZ24qaibXPB FpgTrBjjDeFYrY2IYsNMUTPNbrkYZe8kIiXTWsa9tcFE5giZg0eCozrqoFeuBanJiXJd tkWbduAXdnr2+C65kPjaoJSnFhLhkwoO+aKjIJaUr0Lcb2eveBfJr4uXMgVkNs26mkBG KE3zlE4+k3CUzeiH0eBBl2sNyN5V41tEMXVLS9EZjPtZTLdgnyKYXydx73v1+WJBRAzv oPmkcTzv1yGlgUzyrIC5MSH+NKnbyZ1Dp5LvaDNhGnbdM0b9M5ihTz5oyb/NWhVXkb89 ZZjA== X-Gm-Message-State: AOJu0Ywghlgq+RQ562yTRPgXnoUeQRnBsGSqnNWL22KeQmUaENaA2KQD 6MAFnnX2rlLjZrEHCng715Eo7sc+Qdu+w8R3h6RWvJHewLD7UBP402MrPg== X-Gm-Gg: ASbGncuv+OvThNt36h4EQ84abUEj/EtcTb6mZj+Qvx2b5WCd/5egQ2ZRC+dNLJK3Acb gHOCsrmDmhUK58Fr0kS7vF99VREHTZgPlXndSUY3w0pNd0/PKg8AS0onPOPH204fMkJmUkPqeHP 7CL9ziWgHiVWlbgNFZdz+Wy3HrqF3vz1gIpdl2OsKcQgOcqi14/pI5BZiiTIat1AgiYqyNY/JMm d8GpNMit6BvYn5xTmt++ZDmOmy8TTOoyHUzABF+6lggEP5DKg== X-Google-Smtp-Source: AGHT+IEEONV/UlxxlrdyYro3lAWzK8q69rAAkBvPwN5AeuK8TFPXVq4XWd7GAu8I2w2bCJc4ds24Ug== X-Received: by 2002:ac2:42c6:0:b0:540:2542:cba6 with SMTP id 2adb3069b0e04-5402542cde0mr971031e87.21.1733848840603; Tue, 10 Dec 2024 08:40:40 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:39 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 3/5] mm/slab: Copy internal functions of kvfree_rcu() Date: Tue, 10 Dec 2024 17:40:33 +0100 Message-Id: <20241210164035.3391747-4-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9132C8000F X-Stat-Signature: 4mda5su4ghf83ykpjuq3a97kf1nqe45r X-Rspam-User: X-HE-Tag: 1733848800-270050 X-HE-Meta: U2FsdGVkX1/Sng6AVhAQw9euzBybGG8dw05cRl5MBDSnXgyMTepn+N4d5eMYLU+691HAVC93+bi8dgzWg2WENPowYV57EFn6CYDKyY+908WStVrHJ7S8nAAHlPFzvorYw9Y7OzLSFM2gwFHirSPOFLl4HU5nJqfabV4Td50qd80oviADCtHCzV62a3JgaslKP6HGHUUklY1DbQ4JKJ2LwIV3HOs/gMtJCDoEWR/pfl/ObThhsRfGR3aPnuHPriebZEbd4umwIVGrduCovRenN/Z7zZM6M5gxcjxgrzH/FsLNfDPxoiZ3Z6Htw0ky8//+9zGw4czqhTip+k7+yRbdgSNNQAoVcl9DGeNdfe8XztcdcJoLN7+30Zz0Su8k+tfryW45oSQPIn1C+Wx84ovAkx9IhyJYjeGQJ3zCsOsJHrK+D9eEYfwWdRkAzRhSfSFtBnRkx9dT0Ad3XyACq2LCPYtNcELLJN9mRGSc42hQWhUB+puxrQvpGtzofUdaFV8l3zz/KyBsq9cfP/N8nkrp8rS5NTJzVS0G25XNoL2MgvhMBRecy0n0/CLa9VOI+SAqRMYhF0HTz4MI1jRkN/NAJ3rb1rxK4vuhiNIkkoNfwmLeXVRmonp/H3ALoWNbxnbHLwQ8f1NEF0kKJAoy6vuuVkhtV/KAMdZM3xuQ77+gSUSgA/gORVldZbSgv2Sp2vx+tR8CHAuDV+PMKzVMFcxH002j33FSWPbFy8szulMZ+QXSY9DTgf91+4p8pJaPnLR9SfFwxPyhIU41df01muv+e3q8BXs9NFX1HEITae0TjCaHVJipNztjQT4vaoTHjlHMFgqCJl24sRsIIcnnsADfeIYaqeJFxqDOJuPpDKiAsaljB0lcGnUfAbM6VJBkY5pWGT7scCJGB67ctnBHIoBupbdW3S+Wo+y9vkgV9nOVaOSXfQmdipDV+Jl8JcydJPxOBRg9I5w7ptLn8RRK2SL Im+tW9FB L36WNcJoHtuOVLtLset/7XA7gJ4bC21BU7DuSmivIYxIvdnDKNaFxtWvwv1Iuc1v8HG+88/5OvQ7ZmcIKEQb289B2xctyWOH80jGzyLSIBHNjEu4HdTCNKl7sPODK3Pd5lr4KbcyQ/mAp8IdVcWPvH0w8reP922Cl+Osp8Ok12nUHafNWDQB99tL4duvxh5dtMwha5X8ICgY+I5DwB8ZKxMIuHlzarH0Dn0avRmhIXttzNXnAc01cgVOK7MVE9N5PYTv6/j+BBX61iOJHyT7nYfpXxSJwP9Jy9wiZcS725CFInGEbLpsQr/E/t7rrlOxnULjE9oUKWEi2rahUmBmmlVUKOrdgiAuEXHs7aZ2Rs6eMT/HpofwDEW/36YvWh/CWtmd34Y4htKIvuTZlZlnpFa00ilWrsbLpmJpk5bqpVBrw+7Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Copy main functions of kvfree_rcu() from the kernel/rcu/tree.c to the slab_common.c file. In order to prevent a compiler warnings about defined but not used functions, below ones: run_page_cache_worker() fill_page_cache_func() kfree_rcu_monitor() kfree_rcu_work() drain_page_cache() are temporary marked as "__maybe_unused" in the slab_common.c file. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 507 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 507 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index a249fdb0d92e..e7e1d5b5f31b 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -28,7 +28,9 @@ #include #include #include +#include +#include "../kernel/rcu/rcu.h" #include "internal.h" #include "slab.h" @@ -1433,3 +1435,508 @@ struct kfree_rcu_cpu { struct llist_head bkvcache; int nr_bkv_objs; }; + +/* + * This rcu parameter is runtime-read-only. It reflects + * a minimum allowed number of objects which can be cached + * per-CPU. Object size is equal to one page. This value + * can be changed at boot time. + */ +static int rcu_min_cached_objs = 5; +module_param(rcu_min_cached_objs, int, 0444); + +// A page shrinker can ask for pages to be freed to make them +// available for other parts of the system. This usually happens +// under low memory conditions, and in that case we should also +// defer page-cache filling for a short time period. +// +// The default value is 5 seconds, which is long enough to reduce +// interference with the shrinker while it asks other systems to +// drain their caches. +static int rcu_delay_page_cache_fill_msec = 5000; +module_param(rcu_delay_page_cache_fill_msec, int, 0444); + +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { + .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), +}; + +static __always_inline void +debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) +{ +#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD + int i; + + for (i = 0; i < bhead->nr_records; i++) + debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i])); +#endif +} + +static inline struct kfree_rcu_cpu * +krc_this_cpu_lock(unsigned long *flags) +{ + struct kfree_rcu_cpu *krcp; + + local_irq_save(*flags); // For safely calling this_cpu_ptr(). + krcp = this_cpu_ptr(&krc); + raw_spin_lock(&krcp->lock); + + return krcp; +} + +static inline void +krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) +{ + raw_spin_unlock_irqrestore(&krcp->lock, flags); +} + +static inline struct kvfree_rcu_bulk_data * +get_cached_bnode(struct kfree_rcu_cpu *krcp) +{ + if (!krcp->nr_bkv_objs) + return NULL; + + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1); + return (struct kvfree_rcu_bulk_data *) + llist_del_first(&krcp->bkvcache); +} + +static inline bool +put_cached_bnode(struct kfree_rcu_cpu *krcp, + struct kvfree_rcu_bulk_data *bnode) +{ + // Check the limit. + if (krcp->nr_bkv_objs >= rcu_min_cached_objs) + return false; + + llist_add((struct llist_node *) bnode, &krcp->bkvcache); + WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1); + return true; +} + +static int __maybe_unused +drain_page_cache(struct kfree_rcu_cpu *krcp) +{ + unsigned long flags; + struct llist_node *page_list, *pos, *n; + int freed = 0; + + if (!rcu_min_cached_objs) + return 0; + + raw_spin_lock_irqsave(&krcp->lock, flags); + page_list = llist_del_all(&krcp->bkvcache); + WRITE_ONCE(krcp->nr_bkv_objs, 0); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + llist_for_each_safe(pos, n, page_list) { + free_page((unsigned long)pos); + freed++; + } + + return freed; +} + +static void +kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, + struct kvfree_rcu_bulk_data *bnode, int idx) +{ + unsigned long flags; + int i; + + if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) { + debug_rcu_bhead_unqueue(bnode); + rcu_lock_acquire(&rcu_callback_map); + if (idx == 0) { // kmalloc() / kfree(). + trace_rcu_invoke_kfree_bulk_callback( + "slab", bnode->nr_records, + bnode->records); + + kfree_bulk(bnode->nr_records, bnode->records); + } else { // vmalloc() / vfree(). + for (i = 0; i < bnode->nr_records; i++) { + trace_rcu_invoke_kvfree_callback( + "slab", bnode->records[i], 0); + + vfree(bnode->records[i]); + } + } + rcu_lock_release(&rcu_callback_map); + } + + raw_spin_lock_irqsave(&krcp->lock, flags); + if (put_cached_bnode(krcp, bnode)) + bnode = NULL; + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (bnode) + free_page((unsigned long) bnode); + + cond_resched_tasks_rcu_qs(); +} + +static void +kvfree_rcu_list(struct rcu_head *head) +{ + struct rcu_head *next; + + for (; head; head = next) { + void *ptr = (void *) head->func; + unsigned long offset = (void *) head - ptr; + + next = head->next; + debug_rcu_head_unqueue((struct rcu_head *)ptr); + rcu_lock_acquire(&rcu_callback_map); + trace_rcu_invoke_kvfree_callback("slab", head, offset); + + if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) + kvfree(ptr); + + rcu_lock_release(&rcu_callback_map); + cond_resched_tasks_rcu_qs(); + } +} + +/* + * This function is invoked in workqueue context after a grace period. + * It frees all the objects queued on ->bulk_head_free or ->head_free. + */ +static void __maybe_unused +kfree_rcu_work(struct work_struct *work) +{ + unsigned long flags; + struct kvfree_rcu_bulk_data *bnode, *n; + struct list_head bulk_head[FREE_N_CHANNELS]; + struct rcu_head *head; + struct kfree_rcu_cpu *krcp; + struct kfree_rcu_cpu_work *krwp; + struct rcu_gp_oldstate head_gp_snap; + int i; + + krwp = container_of(to_rcu_work(work), + struct kfree_rcu_cpu_work, rcu_work); + krcp = krwp->krcp; + + raw_spin_lock_irqsave(&krcp->lock, flags); + // Channels 1 and 2. + for (i = 0; i < FREE_N_CHANNELS; i++) + list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]); + + // Channel 3. + head = krwp->head_free; + krwp->head_free = NULL; + head_gp_snap = krwp->head_free_gp_snap; + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + // Handle the first two channels. + for (i = 0; i < FREE_N_CHANNELS; i++) { + // Start from the tail page, so a GP is likely passed for it. + list_for_each_entry_safe(bnode, n, &bulk_head[i], list) + kvfree_rcu_bulk(krcp, bnode, i); + } + + /* + * This is used when the "bulk" path can not be used for the + * double-argument of kvfree_rcu(). This happens when the + * page-cache is empty, which means that objects are instead + * queued on a linked list through their rcu_head structures. + * This list is named "Channel 3". + */ + if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap))) + kvfree_rcu_list(head); +} + +static bool +need_offload_krc(struct kfree_rcu_cpu *krcp) +{ + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + if (!list_empty(&krcp->bulk_head[i])) + return true; + + return !!READ_ONCE(krcp->head); +} + +static bool +need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp) +{ + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + if (!list_empty(&krwp->bulk_head_free[i])) + return true; + + return !!krwp->head_free; +} + +static int krc_count(struct kfree_rcu_cpu *krcp) +{ + int sum = atomic_read(&krcp->head_count); + int i; + + for (i = 0; i < FREE_N_CHANNELS; i++) + sum += atomic_read(&krcp->bulk_count[i]); + + return sum; +} + +static void +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) +{ + long delay, delay_left; + + delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; + if (delayed_work_pending(&krcp->monitor_work)) { + delay_left = krcp->monitor_work.timer.expires - jiffies; + if (delay < delay_left) + mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); + return; + } + queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); +} + +static void +kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) +{ + struct list_head bulk_ready[FREE_N_CHANNELS]; + struct kvfree_rcu_bulk_data *bnode, *n; + struct rcu_head *head_ready = NULL; + unsigned long flags; + int i; + + raw_spin_lock_irqsave(&krcp->lock, flags); + for (i = 0; i < FREE_N_CHANNELS; i++) { + INIT_LIST_HEAD(&bulk_ready[i]); + + list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { + if (!poll_state_synchronize_rcu_full(&bnode->gp_snap)) + break; + + atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); + list_move(&bnode->list, &bulk_ready[i]); + } + } + + if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) { + head_ready = krcp->head; + atomic_set(&krcp->head_count, 0); + WRITE_ONCE(krcp->head, NULL); + } + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + for (i = 0; i < FREE_N_CHANNELS; i++) { + list_for_each_entry_safe(bnode, n, &bulk_ready[i], list) + kvfree_rcu_bulk(krcp, bnode, i); + } + + if (head_ready) + kvfree_rcu_list(head_ready); +} + +/* + * Return: %true if a work is queued, %false otherwise. + */ +static bool +kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) +{ + unsigned long flags; + bool queued = false; + int i, j; + + raw_spin_lock_irqsave(&krcp->lock, flags); + + // Attempt to start a new batch. + for (i = 0; i < KFREE_N_BATCHES; i++) { + struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); + + // Try to detach bulk_head or head and attach it, only when + // all channels are free. Any channel is not free means at krwp + // there is on-going rcu work to handle krwp's free business. + if (need_wait_for_krwp_work(krwp)) + continue; + + // kvfree_rcu_drain_ready() might handle this krcp, if so give up. + if (need_offload_krc(krcp)) { + // Channel 1 corresponds to the SLAB-pointer bulk path. + // Channel 2 corresponds to vmalloc-pointer bulk path. + for (j = 0; j < FREE_N_CHANNELS; j++) { + if (list_empty(&krwp->bulk_head_free[j])) { + atomic_set(&krcp->bulk_count[j], 0); + list_replace_init(&krcp->bulk_head[j], + &krwp->bulk_head_free[j]); + } + } + + // Channel 3 corresponds to both SLAB and vmalloc + // objects queued on the linked list. + if (!krwp->head_free) { + krwp->head_free = krcp->head; + get_state_synchronize_rcu_full(&krwp->head_free_gp_snap); + atomic_set(&krcp->head_count, 0); + WRITE_ONCE(krcp->head, NULL); + } + + // One work is per one batch, so there are three + // "free channels", the batch can handle. Break + // the loop since it is done with this CPU thus + // queuing an RCU work is _always_ success here. + queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work); + WARN_ON_ONCE(!queued); + break; + } + } + + raw_spin_unlock_irqrestore(&krcp->lock, flags); + return queued; +} + +/* + * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. + */ +static void __maybe_unused +kfree_rcu_monitor(struct work_struct *work) +{ + struct kfree_rcu_cpu *krcp = container_of(work, + struct kfree_rcu_cpu, monitor_work.work); + + // Drain ready for reclaim. + kvfree_rcu_drain_ready(krcp); + + // Queue a batch for a rest. + kvfree_rcu_queue_batch(krcp); + + // If there is nothing to detach, it means that our job is + // successfully done here. In case of having at least one + // of the channels that is still busy we should rearm the + // work to repeat an attempt. Because previous batches are + // still in progress. + if (need_offload_krc(krcp)) + schedule_delayed_monitor_work(krcp); +} + +static enum hrtimer_restart +schedule_page_work_fn(struct hrtimer *t) +{ + struct kfree_rcu_cpu *krcp = + container_of(t, struct kfree_rcu_cpu, hrtimer); + + queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); + return HRTIMER_NORESTART; +} + +static void __maybe_unused +fill_page_cache_func(struct work_struct *work) +{ + struct kvfree_rcu_bulk_data *bnode; + struct kfree_rcu_cpu *krcp = + container_of(work, struct kfree_rcu_cpu, + page_cache_work.work); + unsigned long flags; + int nr_pages; + bool pushed; + int i; + + nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? + 1 : rcu_min_cached_objs; + + for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) { + bnode = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + + if (!bnode) + break; + + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; + } + } + + atomic_set(&krcp->work_in_progress, 0); + atomic_set(&krcp->backoff_page_cache_fill, 0); +} + +static void __maybe_unused +run_page_cache_worker(struct kfree_rcu_cpu *krcp) +{ + // If cache disabled, bail out. + if (!rcu_min_cached_objs) + return; + + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && + !atomic_xchg(&krcp->work_in_progress, 1)) { + if (atomic_read(&krcp->backoff_page_cache_fill)) { + queue_delayed_work(system_unbound_wq, + &krcp->page_cache_work, + msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); + } else { + hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + krcp->hrtimer.function = schedule_page_work_fn; + hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); + } + } +} + +// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() +// state specified by flags. If can_alloc is true, the caller must +// be schedulable and not be holding any locks or mutexes that might be +// acquired by the memory allocator or anything that it might invoke. +// Returns true if ptr was successfully recorded, else the caller must +// use a fallback. +static inline bool +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, + unsigned long *flags, void *ptr, bool can_alloc) +{ + struct kvfree_rcu_bulk_data *bnode; + int idx; + + *krcp = krc_this_cpu_lock(flags); + if (unlikely(!(*krcp)->initialized)) + return false; + + idx = !!is_vmalloc_addr(ptr); + bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx], + struct kvfree_rcu_bulk_data, list); + + /* Check if a new block is required. */ + if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) { + bnode = get_cached_bnode(*krcp); + if (!bnode && can_alloc) { + krc_this_cpu_unlock(*krcp, *flags); + + // __GFP_NORETRY - allows a light-weight direct reclaim + // what is OK from minimizing of fallback hitting point of + // view. Apart of that it forbids any OOM invoking what is + // also beneficial since we are about to release memory soon. + // + // __GFP_NOMEMALLOC - prevents from consuming of all the + // memory reserves. Please note we have a fallback path. + // + // __GFP_NOWARN - it is supposed that an allocation can + // be failed under low memory or high memory pressure + // scenarios. + bnode = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + raw_spin_lock_irqsave(&(*krcp)->lock, *flags); + } + + if (!bnode) + return false; + + // Initialize the new block and attach it. + bnode->nr_records = 0; + list_add(&bnode->list, &(*krcp)->bulk_head[idx]); + } + + // Finally insert and update the GP for this page. + bnode->nr_records++; + bnode->records[bnode->nr_records - 1] = ptr; + get_state_synchronize_rcu_full(&bnode->gp_snap); + atomic_inc(&(*krcp)->bulk_count[idx]); + + return true; +} From patchwork Tue Dec 10 16:40:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uladzislau Rezki (Sony)" X-Patchwork-Id: 13901750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66B5FE7717F for ; Tue, 10 Dec 2024 16:40:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B5E46B0260; Tue, 10 Dec 2024 11:40:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 447806B0266; Tue, 10 Dec 2024 11:40:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21D1F6B0269; Tue, 10 Dec 2024 11:40:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ECB7E6B0260 for ; Tue, 10 Dec 2024 11:40:45 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9BF07C0946 for ; Tue, 10 Dec 2024 16:40:45 +0000 (UTC) X-FDA: 82879612998.02.B70CD31 Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by imf22.hostedemail.com (Postfix) with ESMTP id D47B0C0013 for ; Tue, 10 Dec 2024 16:40:19 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a9yoJfC1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.46 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733848822; a=rsa-sha256; cv=none; b=Lvts40e2dDf2EgtH9EM0+0xNlpmDafGPs7LPuTUz3C/VBUsIpF2F37WelzMNuvAJ7M+u4v loLU2fu/ZvhrNze3ACFMiw7KmBreT3aOSRn62zyU7dTaT9mT6fDUr9z9ZUaCjycBzesLZq UWXnGUbvvhBj87HZx1Lg5Qq5o77c2VY= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a9yoJfC1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.46 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733848822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8psq5tPidMqLngz98gskDzgVmEZF8iPxbnPcrOFnt1Y=; b=FLRGAyd4zTLfgV6dqhAokhMemx551fH4ltSIsRymr4/N6STwA318MrQITwKthmvwsnX6b+ cf+ONHHyeVj3kqQvkjDnKIkEYg+2jLjsTjTJzmRF3sg+zZBKNXZbVUxIRDpsQRLbEhhnOg RcvBNCzGQ0IK9Z17iOf/DkbxEdC7nfQ= Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-5401b7f7141so2277483e87.1 for ; Tue, 10 Dec 2024 08:40:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848842; x=1734453642; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8psq5tPidMqLngz98gskDzgVmEZF8iPxbnPcrOFnt1Y=; b=a9yoJfC1gwkFbxKQdwz/D11ygHrICO2vaeibZdi3DGokKHZkwXuaBzZDV3csyjTnmD 6RZVEGzH0BUVSBsa2m2/E8bjEHW1xEbWygWakIyAbYeNGwi3wtHeUFWPO8rfk5nTlUzO M0w184RTkIp7lwpXab4Uw36AGlSZuU5w6ba7/UU8mdmdU4l4w/KKuaOTpyGBveNOyWhO j/W8yltZeQ3mkJNi4mf6s0iJfU+MRN+ieLTsRSQDmvBjB4lJ9w7PMqzHKs1O4q/jApHm mECVNbTyegjElJrDUSvKknkYaI3brxCK0R0WXI7QaUe6A2Qgggxv+nunYRj3RoZJ0cXN QJGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848842; x=1734453642; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8psq5tPidMqLngz98gskDzgVmEZF8iPxbnPcrOFnt1Y=; b=RkClmAgC6h86PMFhABHxZuFC6g6MmMS5qzXsxpSKD0Ih5wLOD32HNPn30l91Zhyqba 9+uO0gl3vNLPXVSdqMXJXP1D/aV28Jnm2ebvdF+J4PieZxO9LQOBld97MwDBtur9gdV0 DzSDL1JTxUJyBYXKQygYwr7Ht/cfGUbXWnMEaPVFefv49jnPooBJPFAPUOjgvbJDcdgi jf5WrzWgoLzp3UG9753000ll9+CFi/mBLHq0ZuA1nQ5ZxHc0WC9R4OE6eNByMPf7yVhu j+2aWK+//Ph8ofvnmtX31adxnpSrzCQLBJT+twzMxt4qtedbQVEwPR0yrFnSsQdUAL4N C/+Q== X-Gm-Message-State: AOJu0Yx1v6evxXyCfJgoZ0t83zyGkGadAwxY98WMPvQJw/r0BrY7PpTn cpPZY4g8d1d+lAI+FQdJqJxUqbCyaPC/DkOrL7J/1DP30kBnhwSYGQ9PwA== X-Gm-Gg: ASbGncs4X8hfLDAmzGYcRY+yEbO5cXGriInqBmVhnNUq6OZ8njTEZgo9jaX78UvwbCS dt5DjoThn2LEq9nn4enpbtPamb2OB9Mn4jfasMshaTVrbZ/chwfH8mOYoiL+xnhsUrrCJ5QG3Gd n3601TeC29X/qslcIXeiUTM3cMbh2vw9EtRULP0U2PcyQ5YCM60nWthAsFUaoXOyILcCw9AsQtq mUDuYfqEBmBFYBEO+BQwR6XHfG6ORld69A/gPWI24Ukyb1zoQ== X-Google-Smtp-Source: AGHT+IH7sMXWy8vZyr5EGz+tyNTQnk0FxUAmt/2tf7nxEsgjM8bPz/qNg9Wj1lm+mcn0TAZSNwIVkg== X-Received: by 2002:a05:6512:2245:b0:53e:389d:8cdd with SMTP id 2adb3069b0e04-53e389d8dfdmr5755357e87.34.1733848841502; Tue, 10 Dec 2024 08:40:41 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:40 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 4/5] mm/slab: Copy a function of kvfree_rcu() initialization Date: Tue, 10 Dec 2024 17:40:34 +0100 Message-Id: <20241210164035.3391747-5-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> MIME-Version: 1.0 X-Stat-Signature: x8a37m154qqndhss4buc4di4m1tg7976 X-Rspam-User: X-Rspamd-Queue-Id: D47B0C0013 X-Rspamd-Server: rspam08 X-HE-Tag: 1733848819-640760 X-HE-Meta: U2FsdGVkX1+6cEPiD59ZbSv7MiMDOzYuWaolHoCzipC4chKxMqAjdTzAM4YbMfStFipwNarl/CGicmuCF2np/qWLDe1Zjxfu0nKSB6MLB/t5riDeHnB6koFE99GRImjW4dp9fK3oLdrrmJTO+m7e3NMDYLHF7cLZYov45OxrkdEWbpSG9qZmq07FzM+xWcztLFWoAJEVWKHukeXI2KcF3q5UDPuY05ckeeQ0tqaZ0r/qxAOITCFEugVYkJyDciqlvaHLTUx6+OVUO4RcLy1CRiQkfSXyDruLR/YcBBUYSj6wLgICMRH25d6noOrpjKT7Fq1cW3H3bbnYpuHF7rdxxK34znvDVk7+J/rGJCtfISroHa3OraY25tHbX8795AZWuCliiBzqJ9P62JZdoNgw82ptD2lDK8nmLIPsIVI9qZGqRD2cEi16vtvDxYqvG4fa9IuFKepNrXEGpEP50rS/UcTowSUtCRHKt2AHfNsW9pbZgMQByvUZzGrQTtvTLKo/VIjxlEVf6XTCMboNomqvD695WvfwHXn7mPuNhFXGcdD7Z40cTeQoheH8TwVsJ+t1a2fQE0OvZ6/OVNTibCxhuTeAL66JxUugg4vfTRtPAGY46OJxlgAPpS6zM7+r412jR/OxuE7PTSc48o74YGWnUYbpKnmHutUxcwWNw78BJdggahCwfbj/mjhO+t8ad2WpHSqDIG+qQ+oRqmeyt2/7VwumE51jOVWzttNFtzFO/4oFXXRvzxbsaPMLs3W7CS2+XdS203NhsoOmVPLQmAzXXqXVy1fLg4kSdsnMAJvlekH4bbO0th4O8TBM0Yo23yZJysOUIbfypEqdPVcCxYJgXLeydzRcJvymNvnfecac/occ3Yf3L77gIEvPmjbCvUVcBmv+NTpup14cZ6shyqjdFWmeha3Ebz+QCDh+kknBf4FsRiVrlXZTksyqF6H0qYW0o8SqARuCgPaI+kU925r J/FSnONe sN872T7AmNQO6x2l2E5lsU34gh1Ev7ILZnjQf/XEy6zOigkbRCH32Dpjy9gjpAHa1TkEe4OFchfpJKDfXcUnvpDVqpmvS7HgVIWierZhhN4822cnrelTFSAF5DhBiLo7OgWQI1uurNupI22QzRvoV7sORdJMF1LIqZXcjYg6Fsu4a9rhRXhJ13KNZWepnoKRveCdgTqyOvSxlSby/ap15u95ZLXyZURVVSd+ObalFY0AVcj0fcjj1JdDR0wmV0G/rpNW97CxF9b0tgbGk50WEso2mCc7TIC91iVa8ewnNCTOKUQsGZwHeMq5S4du7eOMAkBpJq0r5h9nybaG7CLqMsI7Ituj03L3rAQvvYz/FaVBp5fil5dcjLRGBhNjA5r/MskWWXmokTEzQ2iMlOgREngY03weDe1X6D36/nV9lABuXMhM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.107542, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As a final step an initialization of kvfree_rcu() functionality is copied into slab_common.c from the tree.c file as well as shrinker related code. The function is temporary marked as "__maybe_unused" to eliminate a compiler warnings. Signed-off-by: Uladzislau Rezki (Sony) --- mm/slab_common.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) diff --git a/mm/slab_common.c b/mm/slab_common.c index e7e1d5b5f31b..cffc96bd279a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1940,3 +1940,94 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, return true; } + +static unsigned long +kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu; + unsigned long count = 0; + + /* Snapshot count of all CPUs */ + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + count += krc_count(krcp); + count += READ_ONCE(krcp->nr_bkv_objs); + atomic_set(&krcp->backoff_page_cache_fill, 1); + } + + return count == 0 ? SHRINK_EMPTY : count; +} + +static unsigned long +kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int cpu, freed = 0; + + for_each_possible_cpu(cpu) { + int count; + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + count = krc_count(krcp); + count += drain_page_cache(krcp); + kfree_rcu_monitor(&krcp->monitor_work.work); + + sc->nr_to_scan -= count; + freed += count; + + if (sc->nr_to_scan <= 0) + break; + } + + return freed == 0 ? SHRINK_STOP : freed; +} + +static void __init __maybe_unused +kfree_rcu_batch_init(void) +{ + int cpu; + int i, j; + struct shrinker *kfree_rcu_shrinker; + + /* Clamp it to [0:100] seconds interval. */ + if (rcu_delay_page_cache_fill_msec < 0 || + rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { + + rcu_delay_page_cache_fill_msec = + clamp(rcu_delay_page_cache_fill_msec, 0, + (int) (100 * MSEC_PER_SEC)); + + pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n", + rcu_delay_page_cache_fill_msec); + } + + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + for (i = 0; i < KFREE_N_BATCHES; i++) { + INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); + krcp->krw_arr[i].krcp = krcp; + + for (j = 0; j < FREE_N_CHANNELS; j++) + INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]); + } + + for (i = 0; i < FREE_N_CHANNELS; i++) + INIT_LIST_HEAD(&krcp->bulk_head[i]); + + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); + INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); + krcp->initialized = true; + } + + kfree_rcu_shrinker = shrinker_alloc(0, "rcu-slab-kfree"); + if (!kfree_rcu_shrinker) { + pr_err("Failed to allocate kfree_rcu() shrinker!\n"); + return; + } + + kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count; + kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan; + + shrinker_register(kfree_rcu_shrinker); +} From patchwork Tue Dec 10 16:40:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Uladzislau Rezki (Sony)" X-Patchwork-Id: 13901751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBF03E77182 for ; Tue, 10 Dec 2024 16:40:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 726F86B0269; Tue, 10 Dec 2024 11:40:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B1596B026A; Tue, 10 Dec 2024 11:40:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F2656B026B; Tue, 10 Dec 2024 11:40:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 029BF6B0269 for ; Tue, 10 Dec 2024 11:40:47 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id ABCF4140916 for ; Tue, 10 Dec 2024 16:40:47 +0000 (UTC) X-FDA: 82879612704.26.0133D13 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf24.hostedemail.com (Postfix) with ESMTP id 98FD918001D for ; Tue, 10 Dec 2024 16:40:42 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YDyD7YBl; spf=pass (imf24.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733848822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gZyu20V0IT5sFAUs0VabPR/e3lkWhuOMoGZgqAXATFM=; b=MA3tcVSFDtwKpRu11Djt9RwNmulKIPn0/jxb+PpSL12FR0L6zp83BoYH9RQt4/d73UETaq QmVcbkXXbNKv0Fd89EJhaPf+6Lv+u/EyVtlP2Zx5hycjUfmiGBwumVKOzvg53ZWb1+s0Zl 1xKYN2eMe24Uunzxw8W8oe8r3SlEZoo= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YDyD7YBl; spf=pass (imf24.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733848822; a=rsa-sha256; cv=none; b=fR5ArTxrX/YL57At/wJkZ61y6RyhMjKDv3ayutfhw9B7BIa5tqHy6eiPzJiX47uqQnQzMz dhb76+MtbDo/4VVhXyz0n7Fs9ziARdr6b0IOBhFDvCFuOK7Dj49uEuMlTUiyfeoxBVWdF9 PejwO9Nln6wllHnSM7koJMiLZrBGFXM= Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-53ffaaeeb76so3087229e87.0 for ; Tue, 10 Dec 2024 08:40:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733848844; x=1734453644; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gZyu20V0IT5sFAUs0VabPR/e3lkWhuOMoGZgqAXATFM=; b=YDyD7YBlY0qW3mAfsBUS6Bbmb5AwWVpYF3bXWMeAIqcDSm/dEY+ntKenqL38O59PsX epSMQ10zuVL2JY7BOox7Xwl36vVlr6sxIIDkIMhXaPOiLDBTzcra1e3SVk7Ag2nyite+ RHpmpOhSIkdwqwknLDC8Ld8WXsfRlNTmT2QnTCYJ4CUk+RhRSFwrcBIJUvopnQsnNAOJ mBrRp1ieXHUXggGWafuE35PcyQggCVjNsFKzd+4VN1DkCGC7NEswCxOB7a9Wh3/yDzGr gLNbvys42dYIHkLkQgUS7tCYrNNDG2zskwmHHWXffGhBFPKTPRg59U4WYO985AKSc8ib Qj8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733848844; x=1734453644; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gZyu20V0IT5sFAUs0VabPR/e3lkWhuOMoGZgqAXATFM=; b=NQz4w7FHtCINFqOyDVH3m9naNJ4mLh69WIbXFhmzDLPDxAIgJl2adV84/8pr+mzXIx mKpYfYyQwaZljc4zAQqkQzWgvjGG3fGF3Z3yaJeYmaGsxHoLYf4xThnUNk5oYB4zX8rM DmFJVJywD9HGz/ElMTfchdYrdeMRu1Xa7osiET+cGeGuIpMqg/nCtxsl9KQv79+Q3+cP cqr7nIIsb0EO2mJ12BjByQnH12ZSDKZ2WmXHao3Y8s+2MH92EXyteBnEYskK4PowUYCQ n39iEpLZaKBIR974gvEUuIds489AJwcZkbKGgBj6y+EJDciMwFSyUx96aVxFfZnS6GlT eV9A== X-Gm-Message-State: AOJu0YyVkLiNddrana0gg+fr6uR9FAAVGmV0AJ8bp9Ml+fxZtAxgDoxv OoaQvoqpahToXZJUCld6mem6NKFDBR/yEyX99OZyRhgsgbcNCFf6EReUZA== X-Gm-Gg: ASbGncux0kY6p0TKsEL7ZEnEKG66sNBVWDzZSXgfjqbRmbGqltybrSNR17x2iOHYUd+ Ss1MtsllknYunQvd2KyD2kgnLgN/hjdquWNyOXhF81Fu5divujGC4xHkQgLRTzWkVShnaoGNTR+ SHATi7Upu+1SXuom4taSMqSkLeXyOg9C8vW7z3bQLdgSZsWEZ7o2KCrBH7skHmTtQQ3UCPWl5mJ qtFj60daRqol7QYPObHm8jzcHnUdafK1zZNsowSwwtGbrlEMA== X-Google-Smtp-Source: AGHT+IGC7l/kzkPm03FA64LMQLtPLRrGPZWNRGEIpVIvobJcPaCqJsqgeIavMCRlV0yqxEXaUpJUUg== X-Received: by 2002:a05:6512:6c8:b0:53f:f71:4d74 with SMTP id 2adb3069b0e04-53f0f714e3amr5537287e87.8.1733848842959; Tue, 10 Dec 2024 08:40:42 -0800 (PST) Received: from pc638.lan ([2001:9b1:d5a0:a500:2d8:61ff:fec9:d743]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53f93377eefsm1031875e87.67.2024.12.10.08.40.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Dec 2024 08:40:41 -0800 (PST) From: "Uladzislau Rezki (Sony)" To: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka Cc: RCU , LKML , Uladzislau Rezki , Oleksiy Avramchenko Subject: [RFC v1 5/5] mm/slab: Move kvfree_rcu() into SLAB Date: Tue, 10 Dec 2024 17:40:35 +0100 Message-Id: <20241210164035.3391747-6-urezki@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241210164035.3391747-1-urezki@gmail.com> References: <20241210164035.3391747-1-urezki@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 98FD918001D X-Rspam-User: X-Stat-Signature: oo4k5ir1yi8n4kfu738pjgubwd94bhmm X-HE-Tag: 1733848842-315930 X-HE-Meta: U2FsdGVkX1+WyqVxRvFmYkUVi2NMyungyKdOKwut0hZeiVnLfHqNPnRlV2AhZnsLFD1EOaSoGhP/XlIB7ySriRY1On2h2EMZzdUsCL7bHhZieaYBAeSP6zepVJvG1gW+fiuxYyC6oQIoSx8bUkWpydr+neZurssJsF2Xedl229s8R7bLTWP7j5mo2QaU5eWC+eoAplGDSNNPXrOdWk4DXKqeNMP+NR0a4VI6UjgY9TxkZi5hGwf/EqFxGwKRNIA8YyJFfu9io0CQFEFQ3x9vwakYJZs3LsX5qndXkPqsfrzXxRHy7dUo8f17nPielU9kUhWYbnZkgXRBuJZHbxeOCV2RLzbKTT0+L6opn8/FEyqNcP7uV/EviAmMFaO1llu4qVVP2WAWRTl/ARZhN8wUYC7PPm4cLAHz4nkQrqXmKdaQtH0tKnDaq3/9uNUUGEMs12qM5MJmoSCHOGT5xDaGQfL2YO2ZvudE8NNTLkSgrBQDvq1q9g/VFQY5saGsEzHe3WQt9lZXswh0PnKBMQe2GWxLywWg+xrBx6tfQHDIiJbiwapnG5j39E0KfRkJbxlVZm4t1opKuEOXQXadU0x9OBE90l3Fy6TzwmDo1rXSD53Mql4PcvgjB+UTOnzmSSza8vR6uitS0vrw9aAoXGikMnhPY7qDCKmbTS3IoNBetq4IZ+oQUGYpj8ThDdUAWc+sf1PEflYZBPSKCn5Fwdp0KHv/5V0oqHMdyjR6LHZLuOzNCcAOojesIAK/aBxuHAUQD+AX1RWgQj8GXqlY5pnjSVG2j+bFpOWHmkzsLdfKn8xaBIt2MKPa0RVYwXn1tjsP1E6Fw6H8hP9MMXCPJdepdR5KR1TCN7e6OpTiUjckClsIZhzt0kfVEEW9Bm6EoYMi8gk6i7RP06kJBVGGIrs99JT9hNATd981+IrvfEzZCwJv4T89KY7Ob4+2iQSi+cM7mDfGq8I9rLU6rXPbwD2 kINym3tk VcrBV3n4t8mUt+LpRdw5ZkPda7PCb/IDKka5lIrnrV997CUs77CEulSi6LgD3gbrLSkBfOT9PRYQnpWPEgiw4b2AV92GpDDu3CdVXbRdL31drDoTiu8o4zJQJzZX3tl6xuT70jbHuZQG+kYJjLmlzxUpoAjxWtkFcvMcqq3Ed8OwkRKU2PfKnZk5CeJeb9BKkBjRQS+QIjC4jUjvhZBhZe8bXR9dIZvsrOsf/DA/pCkVMpwThc7UQSwOD3DhpmGSAm6vinRKXS8xJl1E8ZGB+moIwvTwcMW4AlKaXGA/jaiCsUdAW6UUgSGE+X2fxsKjaZUfIdyP9OaUkFoPLgXMjZnCMHLJ4KfsjOUq1ojWZoSvT0YPI9h5zHrbYXv2axcRkglXi35jVKDgToN9xgclZXD87oqfaSPWY2lZVVeBjBSfeOCs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A final move of kvfree_rcu() functionality into slab_common.c file: - Rename kfree_rcu_batch_init() to the kvfree_rcu_init(); - Invoke the kvfree_rcu_init() function from main.c after rcu_init(); - Move the rest of functionality to the slab_common.c file; - Fully remove kvfree_rcu() from the kernel/rcu/tree.c file; - Remove a temporary solution to handle freeing ptrs. after GP; - Remove "__maybe_unused" from the slab_common.c file; - Do not export main functionality for CONFIG_TINY_RCU case. Signed-off-by: Uladzislau Rezki (Sony) --- include/linux/slab.h | 1 + init/main.c | 1 + kernel/rcu/tree.c | 893 +------------------------------------------ mm/slab_common.c | 256 +++++++++++-- 4 files changed, 225 insertions(+), 926 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index b35e2db7eb0e..8a2d006119f8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -1076,5 +1076,6 @@ unsigned int kmem_cache_size(struct kmem_cache *s); size_t kmalloc_size_roundup(size_t size); void __init kmem_cache_init_late(void); +void __init kvfree_rcu_init(void); #endif /* _LINUX_SLAB_H */ diff --git a/init/main.c b/init/main.c index c4778edae797..27d177784f3a 100644 --- a/init/main.c +++ b/init/main.c @@ -995,6 +995,7 @@ void start_kernel(void) workqueue_init_early(); rcu_init(); + kvfree_rcu_init(); /* Trace events are available after this */ trace_init(); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ab24229dfa73..4c9c16945e3a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -186,26 +186,6 @@ static int rcu_unlock_delay; module_param(rcu_unlock_delay, int, 0444); #endif -/* - * This rcu parameter is runtime-read-only. It reflects - * a minimum allowed number of objects which can be cached - * per-CPU. Object size is equal to one page. This value - * can be changed at boot time. - */ -static int rcu_min_cached_objs = 5; -module_param(rcu_min_cached_objs, int, 0444); - -// A page shrinker can ask for pages to be freed to make them -// available for other parts of the system. This usually happens -// under low memory conditions, and in that case we should also -// defer page-cache filling for a short time period. -// -// The default value is 5 seconds, which is long enough to reduce -// interference with the shrinker while it asks other systems to -// drain their caches. -static int rcu_delay_page_cache_fill_msec = 5000; -module_param(rcu_delay_page_cache_fill_msec, int, 0444); - /* Retrieve RCU kthreads priority for rcutorture */ int rcu_get_gp_kthreads_prio(void) { @@ -2559,19 +2539,13 @@ static void rcu_do_batch(struct rcu_data *rdp) debug_rcu_head_unqueue(rhp); rcu_lock_acquire(&rcu_callback_map); + trace_rcu_invoke_callback(rcu_state.name, rhp); f = rhp->func; + debug_rcu_head_callback(rhp); + WRITE_ONCE(rhp->func, (rcu_callback_t)0L); + f(rhp); - /* This is temporary, it will be removed when migration is over. */ - if (__is_kvfree_rcu_offset((unsigned long) f)) { - trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f); - kvfree((void *) rhp - (unsigned long) f); - } else { - trace_rcu_invoke_callback(rcu_state.name, rhp); - debug_rcu_head_callback(rhp); - WRITE_ONCE(rhp->func, (rcu_callback_t)0L); - f(rhp); - } rcu_lock_release(&rcu_callback_map); /* @@ -3197,815 +3171,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) } EXPORT_SYMBOL_GPL(call_rcu); -/* Maximum number of jiffies to wait before draining a batch. */ -#define KFREE_DRAIN_JIFFIES (5 * HZ) -#define KFREE_N_BATCHES 2 -#define FREE_N_CHANNELS 2 - -/** - * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers - * @list: List node. All blocks are linked between each other - * @gp_snap: Snapshot of RCU state for objects placed to this bulk - * @nr_records: Number of active pointers in the array - * @records: Array of the kvfree_rcu() pointers - */ -struct kvfree_rcu_bulk_data { - struct list_head list; - struct rcu_gp_oldstate gp_snap; - unsigned long nr_records; - void *records[] __counted_by(nr_records); -}; - -/* - * This macro defines how many entries the "records" array - * will contain. It is based on the fact that the size of - * kvfree_rcu_bulk_data structure becomes exactly one page. - */ -#define KVFREE_BULK_MAX_ENTR \ - ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) - -/** - * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests - * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period - * @head_free: List of kfree_rcu() objects waiting for a grace period - * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees. - * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period - * @krcp: Pointer to @kfree_rcu_cpu structure - */ - -struct kfree_rcu_cpu_work { - struct rcu_work rcu_work; - struct rcu_head *head_free; - struct rcu_gp_oldstate head_free_gp_snap; - struct list_head bulk_head_free[FREE_N_CHANNELS]; - struct kfree_rcu_cpu *krcp; -}; - -/** - * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period - * @head: List of kfree_rcu() objects not yet waiting for a grace period - * @head_gp_snap: Snapshot of RCU state for objects placed to "@head" - * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period - * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period - * @lock: Synchronize access to this structure - * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES - * @initialized: The @rcu_work fields have been initialized - * @head_count: Number of objects in rcu_head singular list - * @bulk_count: Number of objects in bulk-list - * @bkvcache: - * A simple cache list that contains objects for reuse purpose. - * In order to save some per-cpu space the list is singular. - * Even though it is lockless an access has to be protected by the - * per-cpu lock. - * @page_cache_work: A work to refill the cache when it is empty - * @backoff_page_cache_fill: Delay cache refills - * @work_in_progress: Indicates that page_cache_work is running - * @hrtimer: A hrtimer for scheduling a page_cache_work - * @nr_bkv_objs: number of allocated objects at @bkvcache. - * - * This is a per-CPU structure. The reason that it is not included in - * the rcu_data structure is to permit this code to be extracted from - * the RCU files. Such extraction could allow further optimization of - * the interactions with the slab allocators. - */ -struct kfree_rcu_cpu { - // Objects queued on a linked list - // through their rcu_head structures. - struct rcu_head *head; - unsigned long head_gp_snap; - atomic_t head_count; - - // Objects queued on a bulk-list. - struct list_head bulk_head[FREE_N_CHANNELS]; - atomic_t bulk_count[FREE_N_CHANNELS]; - - struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; - raw_spinlock_t lock; - struct delayed_work monitor_work; - bool initialized; - - struct delayed_work page_cache_work; - atomic_t backoff_page_cache_fill; - atomic_t work_in_progress; - struct hrtimer hrtimer; - - struct llist_head bkvcache; - int nr_bkv_objs; -}; - -static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = { - .lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock), -}; - -static __always_inline void -debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) -{ -#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD - int i; - - for (i = 0; i < bhead->nr_records; i++) - debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i])); -#endif -} - -static inline struct kfree_rcu_cpu * -krc_this_cpu_lock(unsigned long *flags) -{ - struct kfree_rcu_cpu *krcp; - - local_irq_save(*flags); // For safely calling this_cpu_ptr(). - krcp = this_cpu_ptr(&krc); - raw_spin_lock(&krcp->lock); - - return krcp; -} - -static inline void -krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) -{ - raw_spin_unlock_irqrestore(&krcp->lock, flags); -} - -static inline struct kvfree_rcu_bulk_data * -get_cached_bnode(struct kfree_rcu_cpu *krcp) -{ - if (!krcp->nr_bkv_objs) - return NULL; - - WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1); - return (struct kvfree_rcu_bulk_data *) - llist_del_first(&krcp->bkvcache); -} - -static inline bool -put_cached_bnode(struct kfree_rcu_cpu *krcp, - struct kvfree_rcu_bulk_data *bnode) -{ - // Check the limit. - if (krcp->nr_bkv_objs >= rcu_min_cached_objs) - return false; - - llist_add((struct llist_node *) bnode, &krcp->bkvcache); - WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1); - return true; -} - -static int -drain_page_cache(struct kfree_rcu_cpu *krcp) -{ - unsigned long flags; - struct llist_node *page_list, *pos, *n; - int freed = 0; - - if (!rcu_min_cached_objs) - return 0; - - raw_spin_lock_irqsave(&krcp->lock, flags); - page_list = llist_del_all(&krcp->bkvcache); - WRITE_ONCE(krcp->nr_bkv_objs, 0); - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - llist_for_each_safe(pos, n, page_list) { - free_page((unsigned long)pos); - freed++; - } - - return freed; -} - -static void -kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp, - struct kvfree_rcu_bulk_data *bnode, int idx) -{ - unsigned long flags; - int i; - - if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) { - debug_rcu_bhead_unqueue(bnode); - rcu_lock_acquire(&rcu_callback_map); - if (idx == 0) { // kmalloc() / kfree(). - trace_rcu_invoke_kfree_bulk_callback( - rcu_state.name, bnode->nr_records, - bnode->records); - - kfree_bulk(bnode->nr_records, bnode->records); - } else { // vmalloc() / vfree(). - for (i = 0; i < bnode->nr_records; i++) { - trace_rcu_invoke_kvfree_callback( - rcu_state.name, bnode->records[i], 0); - - vfree(bnode->records[i]); - } - } - rcu_lock_release(&rcu_callback_map); - } - - raw_spin_lock_irqsave(&krcp->lock, flags); - if (put_cached_bnode(krcp, bnode)) - bnode = NULL; - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - if (bnode) - free_page((unsigned long) bnode); - - cond_resched_tasks_rcu_qs(); -} - -static void -kvfree_rcu_list(struct rcu_head *head) -{ - struct rcu_head *next; - - for (; head; head = next) { - void *ptr = (void *) head->func; - unsigned long offset = (void *) head - ptr; - - next = head->next; - debug_rcu_head_unqueue((struct rcu_head *)ptr); - rcu_lock_acquire(&rcu_callback_map); - trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); - - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) - kvfree(ptr); - - rcu_lock_release(&rcu_callback_map); - cond_resched_tasks_rcu_qs(); - } -} - -/* - * This function is invoked in workqueue context after a grace period. - * It frees all the objects queued on ->bulk_head_free or ->head_free. - */ -static void kfree_rcu_work(struct work_struct *work) -{ - unsigned long flags; - struct kvfree_rcu_bulk_data *bnode, *n; - struct list_head bulk_head[FREE_N_CHANNELS]; - struct rcu_head *head; - struct kfree_rcu_cpu *krcp; - struct kfree_rcu_cpu_work *krwp; - struct rcu_gp_oldstate head_gp_snap; - int i; - - krwp = container_of(to_rcu_work(work), - struct kfree_rcu_cpu_work, rcu_work); - krcp = krwp->krcp; - - raw_spin_lock_irqsave(&krcp->lock, flags); - // Channels 1 and 2. - for (i = 0; i < FREE_N_CHANNELS; i++) - list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]); - - // Channel 3. - head = krwp->head_free; - krwp->head_free = NULL; - head_gp_snap = krwp->head_free_gp_snap; - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - // Handle the first two channels. - for (i = 0; i < FREE_N_CHANNELS; i++) { - // Start from the tail page, so a GP is likely passed for it. - list_for_each_entry_safe(bnode, n, &bulk_head[i], list) - kvfree_rcu_bulk(krcp, bnode, i); - } - - /* - * This is used when the "bulk" path can not be used for the - * double-argument of kvfree_rcu(). This happens when the - * page-cache is empty, which means that objects are instead - * queued on a linked list through their rcu_head structures. - * This list is named "Channel 3". - */ - if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap))) - kvfree_rcu_list(head); -} - -static bool -need_offload_krc(struct kfree_rcu_cpu *krcp) -{ - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - if (!list_empty(&krcp->bulk_head[i])) - return true; - - return !!READ_ONCE(krcp->head); -} - -static bool -need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp) -{ - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - if (!list_empty(&krwp->bulk_head_free[i])) - return true; - - return !!krwp->head_free; -} - -static int krc_count(struct kfree_rcu_cpu *krcp) -{ - int sum = atomic_read(&krcp->head_count); - int i; - - for (i = 0; i < FREE_N_CHANNELS; i++) - sum += atomic_read(&krcp->bulk_count[i]); - - return sum; -} - -static void -schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp) -{ - long delay, delay_left; - - delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES; - if (delayed_work_pending(&krcp->monitor_work)) { - delay_left = krcp->monitor_work.timer.expires - jiffies; - if (delay < delay_left) - mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); - return; - } - queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay); -} - -static void -kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) -{ - struct list_head bulk_ready[FREE_N_CHANNELS]; - struct kvfree_rcu_bulk_data *bnode, *n; - struct rcu_head *head_ready = NULL; - unsigned long flags; - int i; - - raw_spin_lock_irqsave(&krcp->lock, flags); - for (i = 0; i < FREE_N_CHANNELS; i++) { - INIT_LIST_HEAD(&bulk_ready[i]); - - list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) { - if (!poll_state_synchronize_rcu_full(&bnode->gp_snap)) - break; - - atomic_sub(bnode->nr_records, &krcp->bulk_count[i]); - list_move(&bnode->list, &bulk_ready[i]); - } - } - - if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) { - head_ready = krcp->head; - atomic_set(&krcp->head_count, 0); - WRITE_ONCE(krcp->head, NULL); - } - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - for (i = 0; i < FREE_N_CHANNELS; i++) { - list_for_each_entry_safe(bnode, n, &bulk_ready[i], list) - kvfree_rcu_bulk(krcp, bnode, i); - } - - if (head_ready) - kvfree_rcu_list(head_ready); -} - -/* - * Return: %true if a work is queued, %false otherwise. - */ -static bool -kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) -{ - unsigned long flags; - bool queued = false; - int i, j; - - raw_spin_lock_irqsave(&krcp->lock, flags); - - // Attempt to start a new batch. - for (i = 0; i < KFREE_N_BATCHES; i++) { - struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]); - - // Try to detach bulk_head or head and attach it, only when - // all channels are free. Any channel is not free means at krwp - // there is on-going rcu work to handle krwp's free business. - if (need_wait_for_krwp_work(krwp)) - continue; - - // kvfree_rcu_drain_ready() might handle this krcp, if so give up. - if (need_offload_krc(krcp)) { - // Channel 1 corresponds to the SLAB-pointer bulk path. - // Channel 2 corresponds to vmalloc-pointer bulk path. - for (j = 0; j < FREE_N_CHANNELS; j++) { - if (list_empty(&krwp->bulk_head_free[j])) { - atomic_set(&krcp->bulk_count[j], 0); - list_replace_init(&krcp->bulk_head[j], - &krwp->bulk_head_free[j]); - } - } - - // Channel 3 corresponds to both SLAB and vmalloc - // objects queued on the linked list. - if (!krwp->head_free) { - krwp->head_free = krcp->head; - get_state_synchronize_rcu_full(&krwp->head_free_gp_snap); - atomic_set(&krcp->head_count, 0); - WRITE_ONCE(krcp->head, NULL); - } - - // One work is per one batch, so there are three - // "free channels", the batch can handle. Break - // the loop since it is done with this CPU thus - // queuing an RCU work is _always_ success here. - queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work); - WARN_ON_ONCE(!queued); - break; - } - } - - raw_spin_unlock_irqrestore(&krcp->lock, flags); - return queued; -} - -/* - * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. - */ -static void kfree_rcu_monitor(struct work_struct *work) -{ - struct kfree_rcu_cpu *krcp = container_of(work, - struct kfree_rcu_cpu, monitor_work.work); - - // Drain ready for reclaim. - kvfree_rcu_drain_ready(krcp); - - // Queue a batch for a rest. - kvfree_rcu_queue_batch(krcp); - - // If there is nothing to detach, it means that our job is - // successfully done here. In case of having at least one - // of the channels that is still busy we should rearm the - // work to repeat an attempt. Because previous batches are - // still in progress. - if (need_offload_krc(krcp)) - schedule_delayed_monitor_work(krcp); -} - -static enum hrtimer_restart -schedule_page_work_fn(struct hrtimer *t) -{ - struct kfree_rcu_cpu *krcp = - container_of(t, struct kfree_rcu_cpu, hrtimer); - - queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); - return HRTIMER_NORESTART; -} - -static void fill_page_cache_func(struct work_struct *work) -{ - struct kvfree_rcu_bulk_data *bnode; - struct kfree_rcu_cpu *krcp = - container_of(work, struct kfree_rcu_cpu, - page_cache_work.work); - unsigned long flags; - int nr_pages; - bool pushed; - int i; - - nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? - 1 : rcu_min_cached_objs; - - for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) { - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); - - if (!bnode) - break; - - raw_spin_lock_irqsave(&krcp->lock, flags); - pushed = put_cached_bnode(krcp, bnode); - raw_spin_unlock_irqrestore(&krcp->lock, flags); - - if (!pushed) { - free_page((unsigned long) bnode); - break; - } - } - - atomic_set(&krcp->work_in_progress, 0); - atomic_set(&krcp->backoff_page_cache_fill, 0); -} - -static void -run_page_cache_worker(struct kfree_rcu_cpu *krcp) -{ - // If cache disabled, bail out. - if (!rcu_min_cached_objs) - return; - - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && - !atomic_xchg(&krcp->work_in_progress, 1)) { - if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_unbound_wq, - &krcp->page_cache_work, - msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); - } else { - hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - krcp->hrtimer.function = schedule_page_work_fn; - hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); - } - } -} - -// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() -// state specified by flags. If can_alloc is true, the caller must -// be schedulable and not be holding any locks or mutexes that might be -// acquired by the memory allocator or anything that it might invoke. -// Returns true if ptr was successfully recorded, else the caller must -// use a fallback. -static inline bool -add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, - unsigned long *flags, void *ptr, bool can_alloc) -{ - struct kvfree_rcu_bulk_data *bnode; - int idx; - - *krcp = krc_this_cpu_lock(flags); - if (unlikely(!(*krcp)->initialized)) - return false; - - idx = !!is_vmalloc_addr(ptr); - bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx], - struct kvfree_rcu_bulk_data, list); - - /* Check if a new block is required. */ - if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) { - bnode = get_cached_bnode(*krcp); - if (!bnode && can_alloc) { - krc_this_cpu_unlock(*krcp, *flags); - - // __GFP_NORETRY - allows a light-weight direct reclaim - // what is OK from minimizing of fallback hitting point of - // view. Apart of that it forbids any OOM invoking what is - // also beneficial since we are about to release memory soon. - // - // __GFP_NOMEMALLOC - prevents from consuming of all the - // memory reserves. Please note we have a fallback path. - // - // __GFP_NOWARN - it is supposed that an allocation can - // be failed under low memory or high memory pressure - // scenarios. - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); - raw_spin_lock_irqsave(&(*krcp)->lock, *flags); - } - - if (!bnode) - return false; - - // Initialize the new block and attach it. - bnode->nr_records = 0; - list_add(&bnode->list, &(*krcp)->bulk_head[idx]); - } - - // Finally insert and update the GP for this page. - bnode->nr_records++; - bnode->records[bnode->nr_records - 1] = ptr; - get_state_synchronize_rcu_full(&bnode->gp_snap); - atomic_inc(&(*krcp)->bulk_count[idx]); - - return true; -} - -/* - * Queue a request for lazy invocation of the appropriate free routine - * after a grace period. Please note that three paths are maintained, - * two for the common case using arrays of pointers and a third one that - * is used only when the main paths cannot be used, for example, due to - * memory pressure. - * - * Each kvfree_call_rcu() request is added to a batch. The batch will be drained - * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will - * be free'd in workqueue context. This allows us to: batch requests together to - * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. - */ -void kvfree_call_rcu(struct rcu_head *head, void *ptr) -{ - unsigned long flags; - struct kfree_rcu_cpu *krcp; - bool success; - - if (head) { - call_rcu(head, (rcu_callback_t) ((void *) head - ptr)); - } else { - synchronize_rcu(); - kvfree(ptr); - } - - /* Disconnect the rest. */ - return; - - /* - * Please note there is a limitation for the head-less - * variant, that is why there is a clear rule for such - * objects: it can be used from might_sleep() context - * only. For other places please embed an rcu_head to - * your data. - */ - if (!head) - might_sleep(); - - // Queue the object but don't yet schedule the batch. - if (debug_rcu_head_queue(ptr)) { - // Probable double kfree_rcu(), just leak. - WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", - __func__, head); - - // Mark as success and leave. - return; - } - - kasan_record_aux_stack_noalloc(ptr); - success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head); - if (!success) { - run_page_cache_worker(krcp); - - if (head == NULL) - // Inline if kvfree_rcu(one_arg) call. - goto unlock_return; - - head->func = ptr; - head->next = krcp->head; - WRITE_ONCE(krcp->head, head); - atomic_inc(&krcp->head_count); - - // Take a snapshot for this krcp. - krcp->head_gp_snap = get_state_synchronize_rcu(); - success = true; - } - - /* - * The kvfree_rcu() caller considers the pointer freed at this point - * and likely removes any references to it. Since the actual slab - * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore - * this object (no scanning or false positives reporting). - */ - kmemleak_ignore(ptr); - - // Set timer to drain after KFREE_DRAIN_JIFFIES. - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) - schedule_delayed_monitor_work(krcp); - -unlock_return: - krc_this_cpu_unlock(krcp, flags); - - /* - * Inline kvfree() after synchronize_rcu(). We can do - * it from might_sleep() context only, so the current - * CPU can pass the QS state. - */ - if (!success) { - debug_rcu_head_unqueue((struct rcu_head *) ptr); - synchronize_rcu(); - kvfree(ptr); - } -} -EXPORT_SYMBOL_GPL(kvfree_call_rcu); - -/** - * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete. - * - * Note that a single argument of kvfree_rcu() call has a slow path that - * triggers synchronize_rcu() following by freeing a pointer. It is done - * before the return from the function. Therefore for any single-argument - * call that will result in a kfree() to a cache that is to be destroyed - * during module exit, it is developer's responsibility to ensure that all - * such calls have returned before the call to kmem_cache_destroy(). - */ -void kvfree_rcu_barrier(void) -{ - struct kfree_rcu_cpu_work *krwp; - struct kfree_rcu_cpu *krcp; - bool queued; - int i, cpu; - - /* Temporary. */ - rcu_barrier(); - - /* - * Firstly we detach objects and queue them over an RCU-batch - * for all CPUs. Finally queued works are flushed for each CPU. - * - * Please note. If there are outstanding batches for a particular - * CPU, those have to be finished first following by queuing a new. - */ - for_each_possible_cpu(cpu) { - krcp = per_cpu_ptr(&krc, cpu); - - /* - * Check if this CPU has any objects which have been queued for a - * new GP completion. If not(means nothing to detach), we are done - * with it. If any batch is pending/running for this "krcp", below - * per-cpu flush_rcu_work() waits its completion(see last step). - */ - if (!need_offload_krc(krcp)) - continue; - - while (1) { - /* - * If we are not able to queue a new RCU work it means: - * - batches for this CPU are still in flight which should - * be flushed first and then repeat; - * - no objects to detach, because of concurrency. - */ - queued = kvfree_rcu_queue_batch(krcp); - - /* - * Bail out, if there is no need to offload this "krcp" - * anymore. As noted earlier it can run concurrently. - */ - if (queued || !need_offload_krc(krcp)) - break; - - /* There are ongoing batches. */ - for (i = 0; i < KFREE_N_BATCHES; i++) { - krwp = &(krcp->krw_arr[i]); - flush_rcu_work(&krwp->rcu_work); - } - } - } - - /* - * Now we guarantee that all objects are flushed. - */ - for_each_possible_cpu(cpu) { - krcp = per_cpu_ptr(&krc, cpu); - - /* - * A monitor work can drain ready to reclaim objects - * directly. Wait its completion if running or pending. - */ - cancel_delayed_work_sync(&krcp->monitor_work); - - for (i = 0; i < KFREE_N_BATCHES; i++) { - krwp = &(krcp->krw_arr[i]); - flush_rcu_work(&krwp->rcu_work); - } - } -} -EXPORT_SYMBOL_GPL(kvfree_rcu_barrier); - -static unsigned long -kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu; - unsigned long count = 0; - - /* Snapshot count of all CPUs */ - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - count += krc_count(krcp); - count += READ_ONCE(krcp->nr_bkv_objs); - atomic_set(&krcp->backoff_page_cache_fill, 1); - } - - return count == 0 ? SHRINK_EMPTY : count; -} - -static unsigned long -kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - int cpu, freed = 0; - - for_each_possible_cpu(cpu) { - int count; - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - count = krc_count(krcp); - count += drain_page_cache(krcp); - kfree_rcu_monitor(&krcp->monitor_work.work); - - sc->nr_to_scan -= count; - freed += count; - - if (sc->nr_to_scan <= 0) - break; - } - - return freed == 0 ? SHRINK_STOP : freed; -} - -void __init kfree_rcu_scheduler_running(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - if (need_offload_krc(krcp)) - schedule_delayed_monitor_work(krcp); - } -} - /* * During early boot, any blocking grace-period wait automatically * implies a grace period. @@ -5665,62 +4830,12 @@ static void __init rcu_dump_rcu_node_tree(void) struct workqueue_struct *rcu_gp_wq; -static void __init kfree_rcu_batch_init(void) -{ - int cpu; - int i, j; - struct shrinker *kfree_rcu_shrinker; - - /* Clamp it to [0:100] seconds interval. */ - if (rcu_delay_page_cache_fill_msec < 0 || - rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) { - - rcu_delay_page_cache_fill_msec = - clamp(rcu_delay_page_cache_fill_msec, 0, - (int) (100 * MSEC_PER_SEC)); - - pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n", - rcu_delay_page_cache_fill_msec); - } - - for_each_possible_cpu(cpu) { - struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - - for (i = 0; i < KFREE_N_BATCHES; i++) { - INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); - krcp->krw_arr[i].krcp = krcp; - - for (j = 0; j < FREE_N_CHANNELS; j++) - INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]); - } - - for (i = 0; i < FREE_N_CHANNELS; i++) - INIT_LIST_HEAD(&krcp->bulk_head[i]); - - INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); - INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func); - krcp->initialized = true; - } - - kfree_rcu_shrinker = shrinker_alloc(0, "rcu-kfree"); - if (!kfree_rcu_shrinker) { - pr_err("Failed to allocate kfree_rcu() shrinker!\n"); - return; - } - - kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count; - kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan; - - shrinker_register(kfree_rcu_shrinker); -} - void __init rcu_init(void) { int cpu = smp_processor_id(); rcu_early_boot_tests(); - kfree_rcu_batch_init(); rcu_bootup_announce(); sanitize_kthread_prio(); rcu_init_geometry(); diff --git a/mm/slab_common.c b/mm/slab_common.c index cffc96bd279a..39de00e2cf88 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1513,7 +1513,7 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, return true; } -static int __maybe_unused +static int drain_page_cache(struct kfree_rcu_cpu *krcp) { unsigned long flags; @@ -1600,7 +1600,7 @@ kvfree_rcu_list(struct rcu_head *head) * This function is invoked in workqueue context after a grace period. * It frees all the objects queued on ->bulk_head_free or ->head_free. */ -static void __maybe_unused +static void kfree_rcu_work(struct work_struct *work) { unsigned long flags; @@ -1793,7 +1793,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) /* * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. */ -static void __maybe_unused +static void kfree_rcu_monitor(struct work_struct *work) { struct kfree_rcu_cpu *krcp = container_of(work, @@ -1814,17 +1814,7 @@ kfree_rcu_monitor(struct work_struct *work) schedule_delayed_monitor_work(krcp); } -static enum hrtimer_restart -schedule_page_work_fn(struct hrtimer *t) -{ - struct kfree_rcu_cpu *krcp = - container_of(t, struct kfree_rcu_cpu, hrtimer); - - queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); - return HRTIMER_NORESTART; -} - -static void __maybe_unused +static void fill_page_cache_func(struct work_struct *work) { struct kvfree_rcu_bulk_data *bnode; @@ -1860,27 +1850,6 @@ fill_page_cache_func(struct work_struct *work) atomic_set(&krcp->backoff_page_cache_fill, 0); } -static void __maybe_unused -run_page_cache_worker(struct kfree_rcu_cpu *krcp) -{ - // If cache disabled, bail out. - if (!rcu_min_cached_objs) - return; - - if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && - !atomic_xchg(&krcp->work_in_progress, 1)) { - if (atomic_read(&krcp->backoff_page_cache_fill)) { - queue_delayed_work(system_unbound_wq, - &krcp->page_cache_work, - msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); - } else { - hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - krcp->hrtimer.function = schedule_page_work_fn; - hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); - } - } -} - // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock() // state specified by flags. If can_alloc is true, the caller must // be schedulable and not be holding any locks or mutexes that might be @@ -1941,6 +1910,219 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, return true; } +#if !defined(CONFIG_TINY_RCU) + +static enum hrtimer_restart +schedule_page_work_fn(struct hrtimer *t) +{ + struct kfree_rcu_cpu *krcp = + container_of(t, struct kfree_rcu_cpu, hrtimer); + + queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0); + return HRTIMER_NORESTART; +} + +static void +run_page_cache_worker(struct kfree_rcu_cpu *krcp) +{ + // If cache disabled, bail out. + if (!rcu_min_cached_objs) + return; + + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING && + !atomic_xchg(&krcp->work_in_progress, 1)) { + if (atomic_read(&krcp->backoff_page_cache_fill)) { + queue_delayed_work(system_unbound_wq, + &krcp->page_cache_work, + msecs_to_jiffies(rcu_delay_page_cache_fill_msec)); + } else { + hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + krcp->hrtimer.function = schedule_page_work_fn; + hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL); + } + } +} + +/* + * Queue a request for lazy invocation of the appropriate free routine + * after a grace period. Please note that three paths are maintained, + * two for the common case using arrays of pointers and a third one that + * is used only when the main paths cannot be used, for example, due to + * memory pressure. + * + * Each kvfree_call_rcu() request is added to a batch. The batch will be drained + * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will + * be free'd in workqueue context. This allows us to: batch requests together to + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load. + */ +void kvfree_call_rcu(struct rcu_head *head, void *ptr) +{ + unsigned long flags; + struct kfree_rcu_cpu *krcp; + bool success; + + /* + * Please note there is a limitation for the head-less + * variant, that is why there is a clear rule for such + * objects: it can be used from might_sleep() context + * only. For other places please embed an rcu_head to + * your data. + */ + if (!head) + might_sleep(); + + // Queue the object but don't yet schedule the batch. + if (debug_rcu_head_queue(ptr)) { + // Probable double kfree_rcu(), just leak. + WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", + __func__, head); + + // Mark as success and leave. + return; + } + + kasan_record_aux_stack_noalloc(ptr); + success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head); + if (!success) { + run_page_cache_worker(krcp); + + if (head == NULL) + // Inline if kvfree_rcu(one_arg) call. + goto unlock_return; + + head->func = ptr; + head->next = krcp->head; + WRITE_ONCE(krcp->head, head); + atomic_inc(&krcp->head_count); + + // Take a snapshot for this krcp. + krcp->head_gp_snap = get_state_synchronize_rcu(); + success = true; + } + + /* + * The kvfree_rcu() caller considers the pointer freed at this point + * and likely removes any references to it. Since the actual slab + * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore + * this object (no scanning or false positives reporting). + */ + kmemleak_ignore(ptr); + + // Set timer to drain after KFREE_DRAIN_JIFFIES. + if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) + schedule_delayed_monitor_work(krcp); + +unlock_return: + krc_this_cpu_unlock(krcp, flags); + + /* + * Inline kvfree() after synchronize_rcu(). We can do + * it from might_sleep() context only, so the current + * CPU can pass the QS state. + */ + if (!success) { + debug_rcu_head_unqueue((struct rcu_head *) ptr); + synchronize_rcu(); + kvfree(ptr); + } +} +EXPORT_SYMBOL_GPL(kvfree_call_rcu); + +void __init +kfree_rcu_scheduler_running(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + if (need_offload_krc(krcp)) + schedule_delayed_monitor_work(krcp); + } +} + +/** + * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete. + * + * Note that a single argument of kvfree_rcu() call has a slow path that + * triggers synchronize_rcu() following by freeing a pointer. It is done + * before the return from the function. Therefore for any single-argument + * call that will result in a kfree() to a cache that is to be destroyed + * during module exit, it is developer's responsibility to ensure that all + * such calls have returned before the call to kmem_cache_destroy(). + */ +void kvfree_rcu_barrier(void) +{ + struct kfree_rcu_cpu_work *krwp; + struct kfree_rcu_cpu *krcp; + bool queued; + int i, cpu; + + /* + * Firstly we detach objects and queue them over an RCU-batch + * for all CPUs. Finally queued works are flushed for each CPU. + * + * Please note. If there are outstanding batches for a particular + * CPU, those have to be finished first following by queuing a new. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * Check if this CPU has any objects which have been queued for a + * new GP completion. If not(means nothing to detach), we are done + * with it. If any batch is pending/running for this "krcp", below + * per-cpu flush_rcu_work() waits its completion(see last step). + */ + if (!need_offload_krc(krcp)) + continue; + + while (1) { + /* + * If we are not able to queue a new RCU work it means: + * - batches for this CPU are still in flight which should + * be flushed first and then repeat; + * - no objects to detach, because of concurrency. + */ + queued = kvfree_rcu_queue_batch(krcp); + + /* + * Bail out, if there is no need to offload this "krcp" + * anymore. As noted earlier it can run concurrently. + */ + if (queued || !need_offload_krc(krcp)) + break; + + /* There are ongoing batches. */ + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } + } + + /* + * Now we guarantee that all objects are flushed. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * A monitor work can drain ready to reclaim objects + * directly. Wait its completion if running or pending. + */ + cancel_delayed_work_sync(&krcp->monitor_work); + + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } +} +EXPORT_SYMBOL_GPL(kvfree_rcu_barrier); + +#endif /* #if !defined(CONFIG_TINY_RCU) */ + static unsigned long kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { @@ -1982,8 +2164,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) return freed == 0 ? SHRINK_STOP : freed; } -static void __init __maybe_unused -kfree_rcu_batch_init(void) +void __init +kvfree_rcu_init(void) { int cpu; int i, j;