From patchwork Tue Nov 19 15:34:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880151 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 901131CF7DB for ; Tue, 19 Nov 2024 15:36:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030582; cv=none; b=PvORG7NVhDRa979DeDYZELnvMz2oZXAXLBvd0U+gjk5mwb23MfL19jffYAKWXukDBmGA0gYPzjYSbnPG/VNHGA9IPf4jBw2TyyyYR6F6+u2Lu2H1xBOFn50GcZje3hTqRtecsDomAU7/UOVXQ4jmgSqal7IOSrkRp5rplZxz4Ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030582; c=relaxed/simple; bh=//juTrReR5RY2EXL4SOSTJ2NFApUk9waFhwjwwTnX5s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UiCFs40V5oV1gga0pcJeAcfLevah7RaYOBtUEs0kUi3Sx8i0U5cYKvY7BeEff6HCdCinR9osVFg7a/3Ibk2mvEg/IExX9zi08ExKghqWe8hc6XSq7ICaZB6hbwtBPy1XBQI7YtcQEdy9ZyOLdKJnNkbMn8c2Y4VW/a2+3TVLYOQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B9NxGyMv; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B9NxGyMv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0gaKqKMKlmwETWNmgCjaiOWIlMUIKbL1+CgbCwZ0tnA=; b=B9NxGyMvPu5laG8kmc8nos1ko2+aOBw54wa1PDN+1NFaMku1IBdJ19hGgziMZnWPrjbtut 6w6JFsX/tlWEuJcZU+6O9ufmRiB626TljiqZmsZicbTLwn5Vt/QqgaByiT5xNYP8/JwAUj vrl64a6+pO79TMBzWhBuGyYe+6zj9Mw= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-533-9WS2Ju2dPyC3J89ZBnjLFA-1; Tue, 19 Nov 2024 10:36:15 -0500 X-MC-Unique: 9WS2Ju2dPyC3J89ZBnjLFA-1 X-Mimecast-MFC-AGG-ID: 9WS2Ju2dPyC3J89ZBnjLFA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A8FD1955F2B; Tue, 19 Nov 2024 15:36:07 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 70AC63003C80; Tue, 19 Nov 2024 15:35:51 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 01/15] objtool: Make validate_call() recognize indirect calls to pv_ops[] Date: Tue, 19 Nov 2024 16:34:48 +0100 Message-ID: <20241119153502.41361-2-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 call_dest_name() does not get passed the file pointer of validate_call(), which means its invocation of insn_reloc() will always return NULL. Make it take a file pointer. While at it, make sure call_dest_name() uses arch_dest_reloc_offset(), otherwise it gets the pv_ops[] offset wrong. Fabricating an intentional warning shows the change; previously: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to {dynamic}() leaves .noinstr.text section now: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to pv_ops[1]() leaves .noinstr.text section Signed-off-by: Valentin Schneider --- tools/objtool/check.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 6604f5d038aad..5f1d0f95fc04b 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -3448,7 +3448,7 @@ static inline bool func_uaccess_safe(struct symbol *func) return false; } -static inline const char *call_dest_name(struct instruction *insn) +static inline const char *call_dest_name(struct objtool_file *file, struct instruction *insn) { static char pvname[19]; struct reloc *reloc; @@ -3457,9 +3457,9 @@ static inline const char *call_dest_name(struct instruction *insn) if (insn_call_dest(insn)) return insn_call_dest(insn)->name; - reloc = insn_reloc(NULL, insn); + reloc = insn_reloc(file, insn); if (reloc && !strcmp(reloc->sym->name, "pv_ops")) { - idx = (reloc_addend(reloc) / sizeof(void *)); + idx = (arch_dest_reloc_offset(reloc_addend(reloc)) / sizeof(void *)); snprintf(pvname, sizeof(pvname), "pv_ops[%d]", idx); return pvname; } @@ -3538,17 +3538,20 @@ static int validate_call(struct objtool_file *file, { if (state->noinstr && state->instr <= 0 && !noinstr_call_dest(file, insn, insn_call_dest(insn))) { - WARN_INSN(insn, "call to %s() leaves .noinstr.text section", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() leaves .noinstr.text section", + call_dest_name(file, insn)); return 1; } if (state->uaccess && !func_uaccess_safe(insn_call_dest(insn))) { - WARN_INSN(insn, "call to %s() with UACCESS enabled", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() with UACCESS enabled", + call_dest_name(file, insn)); return 1; } if (state->df) { - WARN_INSN(insn, "call to %s() with DF set", call_dest_name(insn)); + WARN_INSN(insn, "call to %s() with DF set", + call_dest_name(file, insn)); return 1; } From patchwork Tue Nov 19 15:34:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880152 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 895301CC161 for ; Tue, 19 Nov 2024 15:36:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030603; cv=none; b=Du5H/jrOGi24BwNmq4D/oLnkzuVczE4muxz7bpzz3AYEB6MsWBG1/ZcrGHfTON3DX+7xscbi4KpRZZruGI6tLGkNKgTdNXRJmawF+GpjslYB/lW1thdWYTu29A9aPGfVZDIynhFrq2pOZAxsnrl7yTgcIcrok8Ub4+13ppapdOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030603; c=relaxed/simple; bh=6INCw6UgC6cf6Vi+4TMlw8zTjMf5rZHOHFvuIHHprHg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ns/2N1j10WS3zxP4s8tVrmKWEqA6NzvBaFULT2aQR7RiQ0h0dhbH/K8mEtBLoHvSjN2s95bKwGTo53vMcJBUkFbTVVUQu95dRVaU0mxjGCLQjHdp6KEr93Mv0/fIt+7o+Vd5wJbNMeFjFlK+6+n3YY0y9rz45BCgNkXT1NiPQek= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=icBZySGT; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="icBZySGT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030600; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XIn2epAo4ei/z52bV+OQ9JOU7pzaMsMZcOjGajJfL0A=; b=icBZySGTRPZc2I4JP+m8YNd4oDU1365AzdmKJIzkyymUcm3miXMYNV37s5mIPxmsByogUL Q1s4ib+vWjdABWd5UWrXEml2fF0SWd2bcMjiVxjYySJkfqpgoL5ovMEWqxO41QdaB8AtX7 ixs1h4u2PNAwArDa9Rosq6W7fli9W0o= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-182-C9wx2u1EM12YXIqHhclsig-1; Tue, 19 Nov 2024 10:36:36 -0500 X-MC-Unique: C9wx2u1EM12YXIqHhclsig-1 X-Mimecast-MFC-AGG-ID: C9wx2u1EM12YXIqHhclsig Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0A1A61956046; Tue, 19 Nov 2024 15:36:24 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 06AD830001A0; Tue, 19 Nov 2024 15:36:07 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 02/15] objtool: Flesh out warning related to pv_ops[] calls Date: Tue, 19 Nov 2024 16:34:49 +0100 Message-ID: <20241119153502.41361-3-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 I had to look into objtool itself to understand what this warning was about; make it more explicit. Signed-off-by: Valentin Schneider --- tools/objtool/check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 5f1d0f95fc04b..00e25492f5065 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -3486,7 +3486,7 @@ static bool pv_call_dest(struct objtool_file *file, struct instruction *insn) list_for_each_entry(target, &file->pv_ops[idx].targets, pv_target) { if (!target->sec->noinstr) { - WARN("pv_ops[%d]: %s", idx, target->name); + WARN("pv_ops[%d]: indirect call to %s() leaves .noinstr.text section", idx, target->name); file->pv_ops[idx].clean = false; } } From patchwork Tue Nov 19 15:34:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880153 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8725A1D0426 for ; Tue, 19 Nov 2024 15:36:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030613; cv=none; b=SaFxthL+ZEfwadzlj8ad6vh6wS1xlPvDdD5TvcmFsMzfaYAT2ijiRXPKk3Myhy/2RtTJMzUqRiAvbMNX0TA2Zv8HYNbVn7P/FOZDrwWkx8dx3aWNslLxrOqAB568LlwoG4TRPfojRs31+MLmxRBQCxj+/lwkQvk9G6V+a4Qj0Wg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030613; c=relaxed/simple; bh=f+M7JVmuzBY3THk0fyjhKmyngjuTaSG33xgNtD22Ej8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J+XbWcMc0eUly2ooD+jKrSwpBef6yTKt7rV5S7WWPQNuhQgytVAicOGVqAltM3AYNdRQsez4n9qCVhdmXO/2WXZuhBEncAz/uAZGwY165KImCp12wQztjTYbZ/TK9tZ1xxaTviGqaMjO61EaNw9LWecWkC3w5YQYa8p67xA0rho= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hHcMIhx3; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hHcMIhx3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vzSk9ZzswXx0ijJl+1BL1pcjLYKU11Wq/W2Kyfj2Pb8=; b=hHcMIhx3Q3dLUnVD29roCrjOtMcA+fEjzuBNuNsEhUX+fw4APhFGvWhEsGK4dBd+CiR1zz JGkhmVEpQOoY5u9Im+Fmnx727BjqySlwxKXBNfttu7TsoMrmmNTCXQmCcbxlL+XDyvzcNt 31IxqdXS8vWkScgNypTiFJCnxM9U9kc= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-mg-v22g6NKyICZb_as9TEg-1; Tue, 19 Nov 2024 10:36:46 -0500 X-MC-Unique: mg-v22g6NKyICZb_as9TEg-1 X-Mimecast-MFC-AGG-ID: mg-v22g6NKyICZb_as9TEg Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7B5951955F34; Tue, 19 Nov 2024 15:36:40 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 77F8D3003B71; Tue, 19 Nov 2024 15:36:24 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 03/15] sched/clock: Make sched_clock_running __ro_after_init Date: Tue, 19 Nov 2024 16:34:50 +0100 Message-ID: <20241119153502.41361-4-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 sched_clock_running is only ever enabled in the __init functions sched_clock_init() and sched_clock_init_late(), and is never disabled. Mark it __ro_after_init. Signed-off-by: Valentin Schneider --- kernel/sched/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index a09655b481402..200e5568b9894 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -66,7 +66,7 @@ notrace unsigned long long __weak sched_clock(void) } EXPORT_SYMBOL_GPL(sched_clock); -static DEFINE_STATIC_KEY_FALSE(sched_clock_running); +static DEFINE_STATIC_KEY_FALSE_RO(sched_clock_running); #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK /* From patchwork Tue Nov 19 15:34:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880154 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99A031D1E72 for ; Tue, 19 Nov 2024 15:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030631; cv=none; b=rqQ/pEPqDuEdYK/TIE0u3A+2fSvzJtWe+Rbl2BqWxJsAklNC73T9Dytmytilzx/BJJnD+nRHz13WIG0R5RxS5NxJk7Y1KhiwtEps6e1ieWsrn2KnKfAYhhh5OJwQbB292ViuuDasH+WjCjebfwTepSXlom916F5rq//pJra4VaA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030631; c=relaxed/simple; bh=bTnrUBgUrau8YQv+XQ05EAUivqZb8FcVO7sSJMvjJKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ayCW4IvKEN5FQbuox8HFBh+lpRGG0P0SXAxyO3VPfaCSYGUGaHknVYLutDKv4xqbPLhO/88MhZYIVjYexZlVZFMPYpIBnS6MLp8LnyJxcPAEO+DFKCB47AiBzFeFnMpJTDt4FWnIDS31LS1fbdf3R8iuD75qrClUCVZ+sSf700I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hw1vnOoG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hw1vnOoG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9/zjMAAzjZ63CvtK65oto4Glzya8ix3n4q04/5M16qM=; b=hw1vnOoGvihMnz5J0IjLUXQNS+DglJmSwpysWTGyYdip7IYJGFoqRJkQQ3H+Tppm9HYil4 tJU/uFtitqo03rbyfWv2vzIt6V4PF8kVXp650EfpvanJnRpNOlMq4eXtA/3gXMl9qHWdH3 5kGjaLlCj3NNZ5RAGo8uWxQzoN3ncxs= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-668-nbrc9rTzMP-Bp1uCFPcarg-1; Tue, 19 Nov 2024 10:37:04 -0500 X-MC-Unique: nbrc9rTzMP-Bp1uCFPcarg-1 X-Mimecast-MFC-AGG-ID: nbrc9rTzMP-Bp1uCFPcarg Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C4EB41952D0D; Tue, 19 Nov 2024 15:36:56 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC7F330001A2; Tue, 19 Nov 2024 15:36:40 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: "Paul E . McKenney" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 04/15] rcu: Add a small-width RCU watching counter debug option Date: Tue, 19 Nov 2024 16:34:51 +0100 Message-ID: <20241119153502.41361-5-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 A later commit will reduce the size of the RCU watching counter to free up some bits for another purpose. Paul suggested adding a config option to test the extreme case where the counter is reduced to its minimum usable width for rcutorture to poke at, so do that. Make it only configurable under RCU_EXPERT. While at it, add a comment to explain the layout of context_tracking->state. Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney Signed-off-by: Valentin Schneider --- include/linux/context_tracking_state.h | 44 ++++++++++++++++++++++---- kernel/rcu/Kconfig.debug | 14 ++++++++ 2 files changed, 51 insertions(+), 7 deletions(-) diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 7b8433d5a8efe..0b81248aa03e2 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -18,12 +18,6 @@ enum ctx_state { CT_STATE_MAX = 4, }; -/* Odd value for watching, else even. */ -#define CT_RCU_WATCHING CT_STATE_MAX - -#define CT_STATE_MASK (CT_STATE_MAX - 1) -#define CT_RCU_WATCHING_MASK (~CT_STATE_MASK) - struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING_USER /* @@ -44,9 +38,45 @@ struct context_tracking { #endif }; +/* + * We cram two different things within the same atomic variable: + * + * CT_RCU_WATCHING_START CT_STATE_START + * | | + * v v + * MSB [ RCU watching counter ][ context_state ] LSB + * ^ ^ + * | | + * CT_RCU_WATCHING_END CT_STATE_END + * + * Bits are used from the LSB upwards, so unused bits (if any) will always be in + * upper bits of the variable. + */ #ifdef CONFIG_CONTEXT_TRACKING +#define CT_SIZE (sizeof(((struct context_tracking *)0)->state) * BITS_PER_BYTE) + +#define CT_STATE_WIDTH bits_per(CT_STATE_MAX - 1) +#define CT_STATE_START 0 +#define CT_STATE_END (CT_STATE_START + CT_STATE_WIDTH - 1) + +#define CT_RCU_WATCHING_MAX_WIDTH (CT_SIZE - CT_STATE_WIDTH) +#define CT_RCU_WATCHING_WIDTH (IS_ENABLED(CONFIG_RCU_DYNTICKS_TORTURE) ? 2 : CT_RCU_WATCHING_MAX_WIDTH) +#define CT_RCU_WATCHING_START (CT_STATE_END + 1) +#define CT_RCU_WATCHING_END (CT_RCU_WATCHING_START + CT_RCU_WATCHING_WIDTH - 1) +#define CT_RCU_WATCHING BIT(CT_RCU_WATCHING_START) + +#define CT_STATE_MASK GENMASK(CT_STATE_END, CT_STATE_START) +#define CT_RCU_WATCHING_MASK GENMASK(CT_RCU_WATCHING_END, CT_RCU_WATCHING_START) + +#define CT_UNUSED_WIDTH (CT_RCU_WATCHING_MAX_WIDTH - CT_RCU_WATCHING_WIDTH) + +static_assert(CT_STATE_WIDTH + + CT_RCU_WATCHING_WIDTH + + CT_UNUSED_WIDTH == + CT_SIZE); + DECLARE_PER_CPU(struct context_tracking, context_tracking); -#endif +#endif /* CONFIG_CONTEXT_TRACKING */ #ifdef CONFIG_CONTEXT_TRACKING_USER static __always_inline int __ct_state(void) diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 9b0b52e1836fa..8dc505d841f8d 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -168,4 +168,18 @@ config RCU_STRICT_GRACE_PERIOD when looking for certain types of RCU usage bugs, for example, too-short RCU read-side critical sections. + +config RCU_DYNTICKS_TORTURE + bool "Minimize RCU dynticks counter size" + depends on RCU_EXPERT + default n + help + This option controls the width of the dynticks counter. + + Lower values will make overflows more frequent, which will increase + the likelihood of extending grace-periods. This option sets the width + to its minimum usable value. + + This has no value for production and is only for testing. + endmenu # "RCU Debugging" From patchwork Tue Nov 19 15:34:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880155 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FBF51D0DF2 for ; Tue, 19 Nov 2024 15:37:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030643; cv=none; b=djGZhAx59IHqEX2bc0zUm7xy1ilQ+dKfY4G2e7wLKJtEkoa2AUEWm+arK+fVsplWBpyeYhobq+6Vf4a7QbIOYBbV3ItmY6zPZ1/Ki4dk8yKCE/pGlrjszO0ixA5P79IJwBTspl107vccqtvajAW8iNmlsS3JthMzaV+5hs13yPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030643; c=relaxed/simple; bh=Qe5AsYkNVOEHTNs/w1sH/iiGUncnosxKlZrjavgQTWc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iA1FygSbzogwLz74W73pDYvFgihSC0BZ4fw4cl3hr4mcM5MXhVC+Ok1CObTp6DgBF03+7SrnyHFvsqTFPHzc+JqNckt0J31crrtAznXQqOOc0G66jlGbCOxnsaly9hH+WXHyDpBwmAPFGNOTZSO3bIE8VghFsiu17u86A3/DttE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BisaG8D+; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BisaG8D+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VTrWPsFjEj6kjl2SR0Q+Shqo2M3AIl3ViwWNRJPCXGk=; b=BisaG8D+qJ4qzomWh3FIvOEc8gBYwapgMa119jf2hrfTTeTsrIWhTTHS4sSEbQj4BGABHz wWhhv8aFGn4YT9izvAw9goQTNqwORSOb5u+mnbfU+yyrCHXY01dwtI05FQ0D5NAuyqLf2x GdwCbLWBqcbGG7CPFMmtuRrUdzIgZXQ= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-240-1AlVl6ygNXGoWGme969LHQ-1; Tue, 19 Nov 2024 10:37:17 -0500 X-MC-Unique: 1AlVl6ygNXGoWGme969LHQ-1 X-Mimecast-MFC-AGG-ID: 1AlVl6ygNXGoWGme969LHQ Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 188B01955DC3; Tue, 19 Nov 2024 15:37:12 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3D90930001A2; Tue, 19 Nov 2024 15:36:57 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: "Paul E . McKenney" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 05/15] rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE Date: Tue, 19 Nov 2024 16:34:52 +0100 Message-ID: <20241119153502.41361-6-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 We now have an RCU_EXPERT config for testing small-sized RCU dynticks counter: CONFIG_RCU_DYNTICKS_TORTURE. Modify scenario TREE04 to exercise to use this config in order to test a ridiculously small counter (2 bits). Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney Signed-off-by: Valentin Schneider --- tools/testing/selftests/rcutorture/configs/rcu/TREE04 | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04 index dc4985064b3ad..67caf4276bb01 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04 @@ -16,3 +16,4 @@ CONFIG_DEBUG_OBJECTS_RCU_HEAD=n CONFIG_RCU_EXPERT=y CONFIG_RCU_EQS_DEBUG=y CONFIG_RCU_LAZY=y +CONFIG_RCU_DYNTICKS_TORTURE=y From patchwork Tue Nov 19 15:34:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880156 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D95A41D0E0E for ; Tue, 19 Nov 2024 15:37:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030657; cv=none; b=dz4pNUjzXfMZdDsX6UrY3LDAAFE22ElQGHEII+uyQqPC/OfHngurMdNn5NhTclxNSJ/XjjngUVa8IepFpiaPDpch24Y9Nsis2wrFRz7HuvRSzXEqLioc1L8G/6iNLH+1szkiuvbgWhNjvc8NFkwSFAfbFXWXWlsBMVfcF16NICw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030657; c=relaxed/simple; bh=g4RPCjUX2beCsOhmA+cf1EgW2WlK56h+myfKo4lgSbM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TKCvlyY/79Z5BOsX6B707hAwbWj8FMOr3EHjAOwNT97Yo5/FvoihZuKCX/aSnemO8W42MypD3nOEmS6bGfuzsBBa+ilVoPeiXl3r6HfHc/U26KTsKqF8ylWhRjKE1HVAOFg/B9AyK2PfyWCEsm3TcLdvikmXkjtdobb5O1ctRgk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YcW6xxsd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YcW6xxsd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030655; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wuJGo/SWDYxjpRhP1lHMjQsEt0PrcPcPyU5QylcJ498=; b=YcW6xxsdzDfWLYOzoZGGCwbUDf9UYpmgJMe9dvif+3qY60OAqlrjoAFQbPLPhLtsuxWDFv V3ClElQbm0+A7xaUKR2ejAZKy26BqV2peT1z0HMQlsi5xUgvqXVDWY95s8iMFcS5xnpqgk 7eU410cUYUMfK+aYr8b5MatAymSlxTQ= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-427-4V10RU0zMVukkEJohwz2Cw-1; Tue, 19 Nov 2024 10:37:32 -0500 X-MC-Unique: 4V10RU0zMVukkEJohwz2Cw-1 X-Mimecast-MFC-AGG-ID: 4V10RU0zMVukkEJohwz2Cw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4177119773FC; Tue, 19 Nov 2024 15:37:27 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 859B830001A0; Tue, 19 Nov 2024 15:37:12 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 06/15] jump_label: Add forceful jump label type Date: Tue, 19 Nov 2024 16:34:53 +0100 Message-ID: <20241119153502.41361-7-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will cause objtool to warn about non __ro_after_init static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. Two such keys currently exist: mds_idle_clear and __sched_clock_stable, which can both be modified at runtime. As discussed at LPC 2024 during the realtime micro-conference, modifying these specific keys incurs additional interference (SMT hotplug) or can even be downright incompatible with NOHZ_FULL (unstable TSC). Suppressing the IPI associated with modifying such keys is thus a minor concern wrt NOHZ_FULL interference, so add a jump type that will be leveraged by both the kernel (to know not to defer the IPI) and objtool (to know not to generate the aforementioned warning). Signed-off-by: Valentin Schneider --- include/linux/jump_label.h | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index f5a2727ca4a9a..93e729545b941 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -200,7 +200,8 @@ struct module; #define JUMP_TYPE_FALSE 0UL #define JUMP_TYPE_TRUE 1UL #define JUMP_TYPE_LINKED 2UL -#define JUMP_TYPE_MASK 3UL +#define JUMP_TYPE_FORCEFUL 4UL +#define JUMP_TYPE_MASK 7UL static __always_inline bool static_key_false(struct static_key *key) { @@ -244,12 +245,15 @@ extern enum jump_label_type jump_label_init_type(struct jump_entry *entry); * raw value, but have added a BUILD_BUG_ON() to catch any issues in * jump_label_init() see: kernel/jump_label.c. */ -#define STATIC_KEY_INIT_TRUE \ - { .enabled = { 1 }, \ - { .type = JUMP_TYPE_TRUE } } -#define STATIC_KEY_INIT_FALSE \ - { .enabled = { 0 }, \ - { .type = JUMP_TYPE_FALSE } } +#define __STATIC_KEY_INIT(_true, force) \ + { .enabled = { _true }, \ + { .type = (_true ? JUMP_TYPE_TRUE : JUMP_TYPE_FALSE) | \ + (force ? JUMP_TYPE_FORCEFUL : 0UL)} } + +#define STATIC_KEY_INIT_TRUE __STATIC_KEY_INIT(true, false) +#define STATIC_KEY_INIT_FALSE __STATIC_KEY_INIT(false, false) +#define STATIC_KEY_INIT_TRUE_FORCE __STATIC_KEY_INIT(true, true) +#define STATIC_KEY_INIT_FALSE_FORCE __STATIC_KEY_INIT(false, true) #else /* !CONFIG_JUMP_LABEL */ @@ -369,6 +373,8 @@ struct static_key_false { #define STATIC_KEY_TRUE_INIT (struct static_key_true) { .key = STATIC_KEY_INIT_TRUE, } #define STATIC_KEY_FALSE_INIT (struct static_key_false){ .key = STATIC_KEY_INIT_FALSE, } +#define STATIC_KEY_TRUE_FORCE_INIT (struct static_key_true) { .key = STATIC_KEY_INIT_TRUE_FORCE, } +#define STATIC_KEY_FALSE_FORCE_INIT (struct static_key_false){ .key = STATIC_KEY_INIT_FALSE_FORCE, } #define DEFINE_STATIC_KEY_TRUE(name) \ struct static_key_true name = STATIC_KEY_TRUE_INIT @@ -376,6 +382,9 @@ struct static_key_false { #define DEFINE_STATIC_KEY_TRUE_RO(name) \ struct static_key_true name __ro_after_init = STATIC_KEY_TRUE_INIT +#define DEFINE_STATIC_KEY_TRUE_FORCE(name) \ + struct static_key_true name = STATIC_KEY_TRUE_FORCE_INIT + #define DECLARE_STATIC_KEY_TRUE(name) \ extern struct static_key_true name @@ -385,6 +394,9 @@ struct static_key_false { #define DEFINE_STATIC_KEY_FALSE_RO(name) \ struct static_key_false name __ro_after_init = STATIC_KEY_FALSE_INIT +#define DEFINE_STATIC_KEY_FALSE_FORCE(name) \ + struct static_key_false name = STATIC_KEY_FALSE_FORCE_INIT + #define DECLARE_STATIC_KEY_FALSE(name) \ extern struct static_key_false name From patchwork Tue Nov 19 15:34:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880157 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3003478C76 for ; Tue, 19 Nov 2024 15:37:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030673; cv=none; b=P5em7ciRdrPFsuhx+6Bwyr1u0+BSSZ1IB1qQJKf4wTEYzEapd0i6MO+SCbhjPlfWgE7rS0okoBgEeryYzMmrWvBXLWmNCZopCDManrqHssmVYOjpgK46z6FnOsd4JUfjJbqn3YCw9JVlFjGm0cR3yUPWHUJpwKUDJks7hMY8puI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030673; c=relaxed/simple; bh=9pH7ZkeY4viWe7Wod5zfr5y5mOdJFAFbwB72hayvvGQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=b7csoCihCZ5lfkVI+D+kjaLKLTZQQvDWWZLdG1jyWFvbVQLpE4yWlg+0i9A/b+kbdUw8TXPSEkRRSCV9OWA6wUGCZICTi4FDz/ipFhQ/S1jBaCT4UieM9q1MGD0G+pSKrJZz2RUOI5UD4L/MQIJOt/YupqRGKHQSEcdCYyRrB7c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WxYnmFsT; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WxYnmFsT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UFFvFWWnWyZvXw5qfQp0QkusU/cmiYrMNoaijrYT6Zw=; b=WxYnmFsTh1VKnOyjeivpXhvhNB8Z2uuwFr6HuiWlB6+/dz81FTDXTl+JCsCaeKhuX/vyNO pDCP68CXG+h9CtY+QjaEYg02XYwBWoSAdOJgD4og2cDMbD+TkIH8sIeKq464jgq5Vwnbev E8UZsu6EApDTAx3laFJJwMst0HA1SK8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-594-8NjzI6AZPs6iJgNCQczRug-1; Tue, 19 Nov 2024 10:37:49 -0500 X-MC-Unique: 8NjzI6AZPs6iJgNCQczRug-1 X-Mimecast-MFC-AGG-ID: 8NjzI6AZPs6iJgNCQczRug Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DF4C91979052; Tue, 19 Nov 2024 15:37:43 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B133F30001A0; Tue, 19 Nov 2024 15:37:27 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 07/15] x86/speculation/mds: Make mds_idle_clear forceful Date: Tue, 19 Nov 2024 16:34:54 +0100 Message-ID: <20241119153502.41361-8-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will cause objtool to warn about non __ro_after_init static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. mds_idle_clear is used in .noinstr code, and can be modified at runtime (SMT hotplug). Suppressing the text_poke_sync() IPI has little benefits for this key, as hotplug implies eventually going through takedown_cpu() -> stop_machine_cpuslocked() which is going to cause interference on all online CPUs anyway. Mark it as forceful to let the kernel know to always send the text_poke_sync() IPI for it, and to let objtool know not to warn about it. Signed-off-by: Valentin Schneider --- arch/x86/kernel/cpu/bugs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 47a01d4028f60..dcc4d5d6e3b95 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -114,7 +114,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); /* Control MDS CPU buffer clear before idling (halt, mwait) */ -DEFINE_STATIC_KEY_FALSE(mds_idle_clear); +DEFINE_STATIC_KEY_FALSE_FORCE(mds_idle_clear); EXPORT_SYMBOL_GPL(mds_idle_clear); /* From patchwork Tue Nov 19 15:34:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880158 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9D2E1D0F51 for ; Tue, 19 Nov 2024 15:38:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030690; cv=none; b=bFW5K7BcL5S2dAZEza1sfzmHxvPiT7DNULLqEfwj62cwwG4AYd4LVuW5tWdsTda5JDYBy2PdPUk52xYkmqRDRvkptURrzqOc6/JPznJuWkF/PePYMwbBjaMBE8nqh5HK8z5NKYmLAk47tydhUvNIGY+reSr97TNrCw78JH+OUp8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030690; c=relaxed/simple; bh=WcoRXmH8EHSSUkrYncG38ywpgNOrSaCBNAPretltWvc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BIAp+9bxbQbG3j1E9dq0+va/roGvt4pCBm8SuNFHJlitreMAargO8fOGEYGg1roeYQ+mtx+0hdvnkh005ZaZ1ro8Ig3c3AYos2MCpbSF+xHilnxrPJdUvOtY8taASqKXxVIe8WosCkmIck2kBFjWqFSEbslvOEglYhedFi5Epqg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ksdh/qUf; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ksdh/qUf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030688; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lpSGcS2OHkivSptNEqV1yoUV7L+1Tpt77Idg0XadxAw=; b=Ksdh/qUfMGAu3JF60FBMQQA8j5wcLeTDikdgTdcG/3VfJ7rqNBrEJ3w6b83/WBkylTaf9p GJrpI8XeDKN4Qci7qUdjy2EHfJycO42eIA1roL9NZsOV6FOnAKmEdbSABiOKL/AFzDT5yF EgBcphIs5Mu7KHFkPluWl44QWo8zcnk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-209-w8WZReDNPhuTuNl53MRXLw-1; Tue, 19 Nov 2024 10:38:05 -0500 X-MC-Unique: w8WZReDNPhuTuNl53MRXLw-1 X-Mimecast-MFC-AGG-ID: w8WZReDNPhuTuNl53MRXLw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8F25E1955F41; Tue, 19 Nov 2024 15:37:59 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 59F3B30001A2; Tue, 19 Nov 2024 15:37:44 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 08/15] sched/clock, x86: Make __sched_clock_stable forceful Date: Tue, 19 Nov 2024 16:34:55 +0100 Message-ID: <20241119153502.41361-9-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will cause objtool to warn about non __ro_after_init static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. __sched_clock_stable is used in .noinstr code, and can be modified at runtime (e.g. KVM module loading). Suppressing the text_poke_sync() IPI has little benefits for this key, as NOHZ_FULL is incompatible with an unstable TSC anyway. Mark it as forceful to let the kernel know to always send the text_poke_sync() IPI for it, and to let objtool know not to warn about it. Signed-off-by: Valentin Schneider --- kernel/sched/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 200e5568b9894..dc94b3717f5ce 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -76,7 +76,7 @@ static DEFINE_STATIC_KEY_FALSE_RO(sched_clock_running); * Similarly we start with __sched_clock_stable_early, thereby assuming we * will become stable, such that there's only a single 1 -> 0 transition. */ -static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable); +static DEFINE_STATIC_KEY_FALSE_FORCE(__sched_clock_stable); static int __sched_clock_stable_early = 1; /* From patchwork Tue Nov 19 15:34:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880159 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF5D11CCEDF for ; Tue, 19 Nov 2024 15:38:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030709; cv=none; b=Sd0EuKUAYw7j8cJ9AN2/CDXPQIjA30uPoBzVXkAD/6YAs5npQBqfNUtpuucUwg/QYZIeSTKSj8EMbCoCo91gj+LZSMi39H3wHPaJqPyK1P27o7CCRwmRPnzBt7blfee5e/SACYiaxnpPSpZ11d5iIe2ThIEp2dFePlOhwnqckpo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030709; c=relaxed/simple; bh=HfgzzhQB9Z8CbP30UBBQFA/8juWm49GyqBubPeAUIoY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GkGbZ+rbb/NpvZ5smpHsoeo6HDdQdX5r5f43vVn28zxcTSZQnXejHpTptedrHSLrWeSPnHRt6Xj69SvbAF7XPrED5Mbkph8+0SUw1Az4cmik6luQkxfYIC8VWxIWn1Imkw3KnftAyrRxTW0E0mSJM4bLgoICEdproT19joobT7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Piap1QZd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Piap1QZd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030707; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bbbl+TXDo+t5kIfzQ14qJGOWt1RDAARb2PWsvxSVw1o=; b=Piap1QZd5jIiEORBIbGNTK9NLlHM+iO/r7rfCfPYmUrnS5q/iIUqSNekaAeetErXSBrrLI TZrOPmYympBWfD/D3NZhF3oM5bw7V7sIozSvXcJBCXZ1+U6yvd6O/t0KkDbN0hwfgF6Zu0 mACt5BPRwg4B9n2MlG805viGa6LWmLo= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-593-1WZ3p9jrPQW6G0U5yxmRQQ-1; Tue, 19 Nov 2024 10:38:24 -0500 X-MC-Unique: 1WZ3p9jrPQW6G0U5yxmRQQ-1 X-Mimecast-MFC-AGG-ID: 1WZ3p9jrPQW6G0U5yxmRQQ Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E53F01952D05; Tue, 19 Nov 2024 15:38:14 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DDE233003B71; Tue, 19 Nov 2024 15:37:59 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Josh Poimboeuf , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 09/15] objtool: Warn about non __ro_after_init static key usage in .noinstr Date: Tue, 19 Nov 2024 16:34:56 +0100 Message-ID: <20241119153502.41361-10-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Later commits will disallow runtime-mutable text in .noinstr sections in order to safely defer instruction patching IPIs. All static keys used in .noinstr sections have now been checked as being either flagged as __ro_after_init, or as forceful static keys. Any occurrence of this new warning would be the result of a code change that will need looking at. Suggested-by: Josh Poimboeuf Signed-off-by: Valentin Schneider --- offset_of(static_key.type) and JUMP_TYPE_FORCEFUL would need to be shoved into a somewhat standalone header file that could be included by objtool itself. --- tools/objtool/Documentation/objtool.txt | 13 ++++++++ tools/objtool/check.c | 41 +++++++++++++++++++++++++ tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/special.h | 2 ++ tools/objtool/special.c | 3 ++ 5 files changed, 60 insertions(+) diff --git a/tools/objtool/Documentation/objtool.txt b/tools/objtool/Documentation/objtool.txt index 7c3ee959b63c7..06fa285873387 100644 --- a/tools/objtool/Documentation/objtool.txt +++ b/tools/objtool/Documentation/objtool.txt @@ -447,6 +447,19 @@ the objtool maintainers. names and does not use module_init() / module_exit() macros to create them. +13. file.o: warning: func()+0x2a: non __ro_after_init static key "key" in .noinstr section + + This means that the noinstr function func() uses a static key that can be + modified at runtime. This is not allowed as noinstr functions rely on + containing stable instructions after init. + + Check whether the static key in question can really be modified at runtime, + and if it is only enabled during init then mark it as __ro_after_init. If it + genuinely needs to be modified at runtime: + + 1) Directly rely on the underlying atomic count of they key in the noinstr + functions. + 2) Mark the static key as forceful. If the error doesn't seem to make sense, it could be a bug in objtool. Feel free to ask the objtool maintainer for help. diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 00e25492f5065..c1fb02c326839 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2056,6 +2056,9 @@ static int add_special_section_alts(struct objtool_file *file) alt->next = orig_insn->alts; orig_insn->alts = alt; + if (special_alt->key_sym) + orig_insn->key_sym = special_alt->key_sym; + list_del(&special_alt->list); free(special_alt); } @@ -3605,6 +3608,41 @@ static int validate_return(struct symbol *func, struct instruction *insn, struct return 0; } +static bool static_key_is_forceful(struct symbol *key) +{ + if (!strcmp(key->sec->name, ".data")) { + unsigned long data_offset = key->sec->data->d_off; + unsigned long key_offset = key->sym.st_value; + char* data = key->sec->data->d_buf; + + /* + * offset_of(static_key.type) + * v + * v JUMP_TYPE_FORCEFUL + * v v + */ + return data[(key_offset + 8) - data_offset] & 0x4; + } + + return false; +} + +static int validate_static_key(struct instruction *insn, struct insn_state *state) +{ + if (state->noinstr && state->instr <= 0) { + if (static_key_is_forceful(insn->key_sym)) + return 0; + + if ((strcmp(insn->key_sym->sec->name, ".data..ro_after_init"))) { + WARN_INSN(insn, "non __ro_after_init static key \"%s\" in .noinstr section", + insn->key_sym->name); + return 1; + } + } + + return 0; +} + static struct instruction *next_insn_to_validate(struct objtool_file *file, struct instruction *insn) { @@ -3766,6 +3804,9 @@ static int validate_branch(struct objtool_file *file, struct symbol *func, if (handle_insn_ops(insn, next_insn, &state)) return 1; + if (insn->key_sym) + validate_static_key(insn, &state); + switch (insn->type) { case INSN_RETURN: diff --git a/tools/objtool/include/objtool/check.h b/tools/objtool/include/objtool/check.h index daa46f1f0965a..35dd21f8f41e1 100644 --- a/tools/objtool/include/objtool/check.h +++ b/tools/objtool/include/objtool/check.h @@ -77,6 +77,7 @@ struct instruction { struct symbol *sym; struct stack_op *stack_ops; struct cfi_state *cfi; + struct symbol *key_sym; }; static inline struct symbol *insn_func(struct instruction *insn) diff --git a/tools/objtool/include/objtool/special.h b/tools/objtool/include/objtool/special.h index 86d4af9c5aa9d..0e61f34fe3a28 100644 --- a/tools/objtool/include/objtool/special.h +++ b/tools/objtool/include/objtool/special.h @@ -27,6 +27,8 @@ struct special_alt { struct section *new_sec; unsigned long new_off; + struct symbol *key_sym; + unsigned int orig_len, new_len; /* group only */ }; diff --git a/tools/objtool/special.c b/tools/objtool/special.c index 097a69db82a0e..fefab2b471bf8 100644 --- a/tools/objtool/special.c +++ b/tools/objtool/special.c @@ -127,6 +127,9 @@ static int get_alt_entry(struct elf *elf, const struct special_entry *entry, return -1; } alt->key_addend = reloc_addend(key_reloc); + + reloc_to_sec_off(key_reloc, &sec, &offset); + alt->key_sym = find_symbol_by_offset(sec, offset & ~3); } return 0; From patchwork Tue Nov 19 15:34:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880160 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDDD21D043A for ; Tue, 19 Nov 2024 15:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030728; cv=none; b=VSs7QMaLNqzPmBPvJNghHEEXq/amBA4S5ihHAmxGK6L/UiMx2+NZf65MVv6da0sSZz+Kut22w6s7SkjQQy0MkYxGmIxeOPRICCte+atM99NXLXY3gVbTeATE2tobokDSMrzSKrlOqZP/dMJEfa6gWxGnRaacqL+YQxvs7rbN0o8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030728; c=relaxed/simple; bh=soZGZa5eZvVq13Yv3+buGnph0yEGo0cZoyN46yk11Tc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JWHKuJEesylIWR5wrXeA5ao0d3Jhl+av9dd2YsrNqOHW6H8/Y6GF8k8BLfo1FDwNZFpRLkf23MGvSKymmHgbczv6HxMyFfCXXG+EQ52woIbCylDINQPI4VgySZjKq0W+3LgJFa7fiODas8fgZE7hRnLDSrZq37ktvc1hpP9ybuQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HIvWDcbz; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HIvWDcbz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030725; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=URgg3URVd8Hv8WB8p504Oe6mTtzB0H+jxnfnYfZO9Vc=; b=HIvWDcbzhnEiJtma39ziZA7zU9vx6h4AMm0GyQJRoJXMUP82K8DaH0T2liVvoFoxUQ6Joj RlgwHh52NsaOrVvdIyNc/dgmzDHvH4efrEXARXf9uAb7cINgFY5YMKBjVuB9fbJZ6jmz9E buQHtSLU631hkpSWa+JdJ5XZ4jhg3Nc= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-99-3C0haj_PPdCTmZEKdqoqww-1; Tue, 19 Nov 2024 10:38:43 -0500 X-MC-Unique: 3C0haj_PPdCTmZEKdqoqww-1 X-Mimecast-MFC-AGG-ID: 3C0haj_PPdCTmZEKdqoqww Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DC0AB193EF53; Tue, 19 Nov 2024 15:38:32 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 613EB30001A2; Tue, 19 Nov 2024 15:38:15 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 10/15] x86/alternatives: Record text_poke's of JUMP_TYPE_FORCEFUL labels Date: Tue, 19 Nov 2024 16:34:57 +0100 Message-ID: <20241119153502.41361-11-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Forceful static keys are used in early entry code where it is unsafe to defer the sync_core() IPIs, and flagged as such via their ->type field. Record that information when creating a text_poke_loc. The text_poke_loc.old field is written to when first iterating a text_poke() entry, and as such can be (ab)used to store this information at the start of text_poke_bp_batch(). Signed-off-by: Valentin Schneider --- arch/x86/include/asm/text-patching.h | 12 ++++++++++-- arch/x86/kernel/alternative.c | 16 ++++++++++------ arch/x86/kernel/jump_label.c | 7 ++++--- 3 files changed, 24 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 6259f1937fe77..e34de36cab61e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -38,9 +38,17 @@ extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok); extern void *text_poke_set(void *addr, int c, size_t len); extern int poke_int3_handler(struct pt_regs *regs); -extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); +extern void __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi); +static inline void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) +{ + __text_poke_bp(addr, opcode, len, emulate, false); +} -extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate); +extern void __text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi); +static inline void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate) +{ + __text_poke_queue(addr, opcode, len, emulate, false); +} extern void text_poke_finish(void); #define INT3_INSN_SIZE 1 diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index d17518ca19b8b..954c4c0f7fc58 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -2098,7 +2098,10 @@ struct text_poke_loc { u8 opcode; const u8 text[POKE_MAX_OPCODE_SIZE]; /* see text_poke_bp_batch() */ - u8 old; + union { + u8 old; + u8 force_ipi; + }; }; struct bp_patching_desc { @@ -2385,7 +2388,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, - const void *opcode, size_t len, const void *emulate) + const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct insn insn; int ret, i = 0; @@ -2402,6 +2405,7 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr, tp->rel_addr = addr - (void *)_stext; tp->len = len; tp->opcode = insn.opcode.bytes[0]; + tp->force_ipi = force_ipi; if (is_jcc32(&insn)) { /* @@ -2493,14 +2497,14 @@ void text_poke_finish(void) text_poke_flush(NULL); } -void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate) +void __ref __text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct text_poke_loc *tp; text_poke_flush(addr); tp = &tp_vec[tp_vec_nr++]; - text_poke_loc_init(tp, addr, opcode, len, emulate); + text_poke_loc_init(tp, addr, opcode, len, emulate, force_ipi); } /** @@ -2514,10 +2518,10 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi * dynamically allocated memory. This function should be used when it is * not possible to allocate memory. */ -void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) +void __ref __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate, bool force_ipi) { struct text_poke_loc tp; - text_poke_loc_init(&tp, addr, opcode, len, emulate); + text_poke_loc_init(&tp, addr, opcode, len, emulate, force_ipi); text_poke_bp_batch(&tp, 1); } diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index f5b8ef02d172c..e03a4f56b30fd 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -101,8 +101,8 @@ __jump_label_transform(struct jump_entry *entry, text_poke_early((void *)jump_entry_code(entry), jlp.code, jlp.size); return; } - - text_poke_bp((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL); + __text_poke_bp((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL, + jump_entry_key(entry)->type & JUMP_TYPE_FORCEFUL); } static void __ref jump_label_transform(struct jump_entry *entry, @@ -135,7 +135,8 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, mutex_lock(&text_mutex); jlp = __jump_label_patch(entry, type); - text_poke_queue((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL); + __text_poke_queue((void *)jump_entry_code(entry), jlp.code, jlp.size, NULL, + jump_entry_key(entry)->type & JUMP_TYPE_FORCEFUL); mutex_unlock(&text_mutex); return true; } From patchwork Tue Nov 19 15:34:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880161 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E897E1D0B8C for ; Tue, 19 Nov 2024 15:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030741; cv=none; b=sJJXGrP2cU0gIeed5R1E/nDKq08n0VrqIUk5hkS4D4jfxfm2QWjIMsQIUL9W00jCbD7Lf5fk8g76OvP38IJBybwgGbIthSbYyvyxvYQ1mhyb8pnXciJNLrphiSyMYcrE/gITGnl9/Lu42+8R6ylIY3aErSujl/t3211528eJWSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030741; c=relaxed/simple; bh=Ehnt0BxdQ77p1LjEevHZqtmZgJlHu3XHS7C6LkZB1xY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e/SeDoETpzPwPJHMQg+FVEmaOgeH7y8OwhcNZXeZfYciWy/f0wlAm0xXWA9iXvj8vBCe3rmW3Ulv9E5ThR4ouz4kKZS22n9IGoO1q13JESkzPNNinskLs9Jyw+gZxsqXjEz5vF9ZygTRWsWw9womwARbNRMLplwZXbamJke6TFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VDW8lpR8; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VDW8lpR8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gBDFZv8Y0YCT5rYDqfeJn3PBG/oi2+BA7C4xITE08vE=; b=VDW8lpR8GrBIHpRLIQPuFjI6AMOKG/16UqCNNf7WnztkQc3Za+WL6/7lJZKwLn+aOM7uID u3nM81CdrZQvvuX2/wxjoTgJJ61mwCYf60+GZykMYSU5QaycDF0awnM0ReXzqOUJP4c86g UpgSA78VTJVVvvixergzBAsKdvt3r9w= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-146-YBGQR59uOBaV7Ovj9k6N-g-1; Tue, 19 Nov 2024 10:38:56 -0500 X-MC-Unique: YBGQR59uOBaV7Ovj9k6N-g-1 X-Mimecast-MFC-AGG-ID: YBGQR59uOBaV7Ovj9k6N-g Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 466291952D29; Tue, 19 Nov 2024 15:38:49 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 569B930001A2; Tue, 19 Nov 2024 15:38:33 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 11/15] context-tracking: Introduce work deferral infrastructure Date: Tue, 19 Nov 2024 16:34:58 +0100 Message-ID: <20241119153502.41361-12-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 smp_call_function() & friends have the unfortunate habit of sending IPIs to isolated, NOHZ_FULL, in-userspace CPUs, as they blindly target all online CPUs. Some callsites can be bent into doing the right, such as done by commit: cc9e303c91f5 ("x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs") Unfortunately, not all SMP callbacks can be omitted in this fashion. However, some of them only affect execution in kernelspace, which means they don't have to be executed *immediately* if the target CPU is in userspace: stashing the callback and executing it upon the next kernel entry would suffice. x86 kernel instruction patching or kernel TLB invalidation are prime examples of it. Reduce the RCU dynticks counter width to free up some bits to be used as a deferred callback bitmask. Add some build-time checks to validate that setup. Presence of CT_STATE_KERNEL in the ct_state prevents queuing deferred work. Later commits introduce the bit:callback mappings. Link: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Valentin Schneider --- arch/Kconfig | 9 ++++ arch/x86/Kconfig | 1 + arch/x86/include/asm/context_tracking_work.h | 14 ++++++ include/linux/context_tracking.h | 21 +++++++++ include/linux/context_tracking_state.h | 30 ++++++++----- include/linux/context_tracking_work.h | 26 +++++++++++ kernel/context_tracking.c | 46 +++++++++++++++++++- kernel/time/Kconfig | 5 +++ 8 files changed, 140 insertions(+), 12 deletions(-) create mode 100644 arch/x86/include/asm/context_tracking_work.h create mode 100644 include/linux/context_tracking_work.h diff --git a/arch/Kconfig b/arch/Kconfig index bd9f095d69fa0..e7f3f797a34a4 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -912,6 +912,15 @@ config HAVE_CONTEXT_TRACKING_USER_OFFSTACK - No use of instrumentation, unless instrumentation_begin() got called. +config HAVE_CONTEXT_TRACKING_WORK + bool + help + Architecture supports deferring work while not in kernel context. + This is especially useful on setups with isolated CPUs that might + want to avoid being interrupted to perform housekeeping tasks (for + ex. TLB invalidation or icache invalidation). The housekeeping + operations are performed upon re-entering the kernel. + config HAVE_TIF_NOHZ bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 16354dfa6d965..c703376dd326b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -213,6 +213,7 @@ config X86 select HAVE_CMPXCHG_LOCAL select HAVE_CONTEXT_TRACKING_USER if X86_64 select HAVE_CONTEXT_TRACKING_USER_OFFSTACK if HAVE_CONTEXT_TRACKING_USER + select HAVE_CONTEXT_TRACKING_WORK if X86_64 select HAVE_C_RECORDMCOUNT select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL select HAVE_OBJTOOL_NOP_MCOUNT if HAVE_OBJTOOL_MCOUNT diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h new file mode 100644 index 0000000000000..5bc29e6b2ed38 --- /dev/null +++ b/arch/x86/include/asm/context_tracking_work.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H +#define _ASM_X86_CONTEXT_TRACKING_WORK_H + +static __always_inline void arch_context_tracking_work(int work) +{ + switch (work) { + case CONTEXT_WORK_n: + // Do work... + break; + } +} + +#endif diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index af9fe87a09225..16a2eb7525f1f 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -137,6 +138,26 @@ static __always_inline unsigned long ct_state_inc(int incby) return raw_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state)); } +#ifdef CONTEXT_TRACKING_WORK +static __always_inline unsigned long ct_state_inc_clear_work(int incby) +{ + struct context_tracking *ct = this_cpu_ptr(&context_tracking); + unsigned long new, old, state; + + state = arch_atomic_read(&ct->state); + do { + old = state; + new = old & ~CONTEXT_WORK_MASK; + new += incby; + state = arch_atomic_cmpxchg(&ct->state, old, new); + } while (old != state); + + return new; +} +#else +#define ct_state_inc_clear_work(x) ct_state_inc(x) +#endif + static __always_inline bool warn_rcu_enter(void) { bool ret = false; diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 0b81248aa03e2..d1d37dbdf7195 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -5,6 +5,7 @@ #include #include #include +#include /* Offset to allow distinguishing irq vs. task-based idle entry/exit. */ #define CT_NESTING_IRQ_NONIDLE ((LONG_MAX / 2) + 1) @@ -39,16 +40,19 @@ struct context_tracking { }; /* - * We cram two different things within the same atomic variable: + * We cram up to three different things within the same atomic variable: * - * CT_RCU_WATCHING_START CT_STATE_START - * | | - * v v - * MSB [ RCU watching counter ][ context_state ] LSB - * ^ ^ - * | | - * CT_RCU_WATCHING_END CT_STATE_END + * CT_RCU_WATCHING_START CT_STATE_START + * | CT_WORK_START | + * | | | + * v v v + * MSB [ RCU watching counter ][ context work ][ context_state ] LSB + * ^ ^ ^ + * | | | + * | CT_WORK_END | + * CT_RCU_WATCHING_END CT_STATE_END * + * The [ context work ] region spans 0 bits if CONFIG_CONTEXT_WORK=n * Bits are used from the LSB upwards, so unused bits (if any) will always be in * upper bits of the variable. */ @@ -59,18 +63,24 @@ struct context_tracking { #define CT_STATE_START 0 #define CT_STATE_END (CT_STATE_START + CT_STATE_WIDTH - 1) -#define CT_RCU_WATCHING_MAX_WIDTH (CT_SIZE - CT_STATE_WIDTH) +#define CT_WORK_WIDTH (IS_ENABLED(CONFIG_CONTEXT_TRACKING_WORK) ? CONTEXT_WORK_MAX_OFFSET : 0) +#define CT_WORK_START (CT_STATE_END + 1) +#define CT_WORK_END (CT_WORK_START + CT_WORK_WIDTH - 1) + +#define CT_RCU_WATCHING_MAX_WIDTH (CT_SIZE - CT_WORK_WIDTH - CT_STATE_WIDTH) #define CT_RCU_WATCHING_WIDTH (IS_ENABLED(CONFIG_RCU_DYNTICKS_TORTURE) ? 2 : CT_RCU_WATCHING_MAX_WIDTH) -#define CT_RCU_WATCHING_START (CT_STATE_END + 1) +#define CT_RCU_WATCHING_START (CT_WORK_END + 1) #define CT_RCU_WATCHING_END (CT_RCU_WATCHING_START + CT_RCU_WATCHING_WIDTH - 1) #define CT_RCU_WATCHING BIT(CT_RCU_WATCHING_START) #define CT_STATE_MASK GENMASK(CT_STATE_END, CT_STATE_START) +#define CT_WORK_MASK GENMASK(CT_WORK_END, CT_WORK_START) #define CT_RCU_WATCHING_MASK GENMASK(CT_RCU_WATCHING_END, CT_RCU_WATCHING_START) #define CT_UNUSED_WIDTH (CT_RCU_WATCHING_MAX_WIDTH - CT_RCU_WATCHING_WIDTH) static_assert(CT_STATE_WIDTH + + CT_WORK_WIDTH + CT_RCU_WATCHING_WIDTH + CT_UNUSED_WIDTH == CT_SIZE); diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h new file mode 100644 index 0000000000000..fb74db8876dd2 --- /dev/null +++ b/include/linux/context_tracking_work.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CONTEXT_TRACKING_WORK_H +#define _LINUX_CONTEXT_TRACKING_WORK_H + +#include + +enum { + CONTEXT_WORK_n_OFFSET, + CONTEXT_WORK_MAX_OFFSET +}; + +enum ct_work { + CONTEXT_WORK_n = BIT(CONTEXT_WORK_n_OFFSET), + CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) +}; + +#include + +#ifdef CONFIG_CONTEXT_TRACKING_WORK +extern bool ct_set_cpu_work(unsigned int cpu, unsigned int work); +#else +static inline bool +ct_set_cpu_work(unsigned int cpu, unsigned int work) { return false; } +#endif + +#endif diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 938c48952d265..37b094ea56fb6 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -72,6 +72,47 @@ static __always_inline void rcu_task_trace_heavyweight_exit(void) #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ } +#ifdef CONFIG_CONTEXT_TRACKING_WORK +static noinstr void ct_work_flush(unsigned long seq) +{ + int bit; + + seq = (seq & CT_WORK_MASK) >> CT_WORK_START; + + /* + * arch_context_tracking_work() must be noinstr, non-blocking, + * and NMI safe. + */ + for_each_set_bit(bit, &seq, CONTEXT_WORK_MAX) + arch_context_tracking_work(BIT(bit)); +} + +bool ct_set_cpu_work(unsigned int cpu, unsigned int work) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + unsigned int old; + bool ret = false; + + preempt_disable(); + + old = atomic_read(&ct->state); + /* + * Try setting the work until either + * - the target CPU has entered kernelspace + * - the work has been set + */ + do { + ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CT_WORK_START)); + } while (!ret && ((old & CT_STATE_MASK) != CT_STATE_KERNEL)); + + preempt_enable(); + return ret; +} +#else +static __always_inline void ct_work_flush(unsigned long work) { } +static __always_inline void ct_work_clear(struct context_tracking *ct) { } +#endif + /* * Record entry into an extended quiescent state. This is only to be * called when not already in an extended quiescent state, that is, @@ -88,7 +129,7 @@ static noinstr void ct_kernel_exit_state(int offset) * next idle sojourn. */ rcu_task_trace_heavyweight_enter(); // Before CT state update! - seq = ct_state_inc(offset); + seq = ct_state_inc_clear_work(offset); // RCU is no longer watching. Better be in extended quiescent state! WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && (seq & CT_RCU_WATCHING)); } @@ -100,7 +141,7 @@ static noinstr void ct_kernel_exit_state(int offset) */ static noinstr void ct_kernel_enter_state(int offset) { - int seq; + unsigned long seq; /* * CPUs seeing atomic_add_return() must see prior idle sojourns, @@ -108,6 +149,7 @@ static noinstr void ct_kernel_enter_state(int offset) * critical section. */ seq = ct_state_inc(offset); + ct_work_flush(seq); // RCU is now watching. Better not be in an extended quiescent state! rcu_task_trace_heavyweight_exit(); // After CT state update! WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !(seq & CT_RCU_WATCHING)); diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 8ebb6d5a106be..04efc2b605823 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -186,6 +186,11 @@ config CONTEXT_TRACKING_USER_FORCE Say N otherwise, this option brings an overhead that you don't want in production. +config CONTEXT_TRACKING_WORK + bool + depends on HAVE_CONTEXT_TRACKING_WORK && CONTEXT_TRACKING_USER + default y + config NO_HZ bool "Old Idle dynticks config" help From patchwork Tue Nov 19 15:34:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880162 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B640B1D0BA0 for ; Tue, 19 Nov 2024 15:39:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030758; cv=none; b=s6QYdMsav6zhKZoq6LvYNUFG7SMJAbj6DLvlqXgiqkThb5GUVi/T7T+Lwpc6T7SIBNLZm7EdHbnD8dDzzgbBG2eSyX2XWguNy0uBSTwNeXktE8vd29pKG8CQrmRCbT8K41ma/l5hrFqyP87IT/wvpUYS/gpeBLcvP8YxVzQfF+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030758; c=relaxed/simple; bh=cXoNA/ZSNgL3xQCJGLUfOCoVbT9uZZp4m09FWAiotH8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O+bBgKxEhm9kv8pyLC6iAeA5C2Ah7w0mFv0NN7UJuX++MwgBy1cFsdCvSUBAiz3pN9V5dmjLlzfEIEr6EsVkvKLJ/L2PfSbobqpSgXkC+T+62WDu8XSr29EZPMD1GaxUz3/dWVvz3VXmIamKjuBMiVGFL2HY+9QEz//HiyfK9zY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZJovQOz/; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZJovQOz/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gHLzoO2lla+YiGjlL3mjjdCwCDcrDiVLGjLeux29GIU=; b=ZJovQOz/9dzf4VIxh9Q3noIPFwY43WnEdImzZxZ4VpQgkFEAjOUjbfGViUXRhdW/ExoX1Z Gi5WW+j1sV60GaWRKO70BwURFzRmZbPJe1nOiN64kyr9enykqzHLhXZM0EL0Hw8IZ6V/Gm 8o8mD7Mwoa2H8BSyYBfW7rFdsTDbJxI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-467-Mc4eVWE3PlutRNpeiXlJfw-1; Tue, 19 Nov 2024 10:39:11 -0500 X-MC-Unique: Mc4eVWE3PlutRNpeiXlJfw-1 X-Mimecast-MFC-AGG-ID: Mc4eVWE3PlutRNpeiXlJfw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0E26E1955EA3; Tue, 19 Nov 2024 15:39:06 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B578D3003B7E; Tue, 19 Nov 2024 15:38:49 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Peter Zijlstra , Nicolas Saenz Julienne , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 12/15] context_tracking,x86: Defer kernel text patching IPIs Date: Tue, 19 Nov 2024 16:34:59 +0100 Message-ID: <20241119153502.41361-13-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 text_poke_bp_batch() sends IPIs to all online CPUs to synchronize them vs the newly patched instruction. CPUs that are executing in userspace do not need this synchronization to happen immediately, and this is actually harmful interference for NOHZ_FULL CPUs. As the synchronization IPIs are sent using a blocking call, returning from text_poke_bp_batch() implies all CPUs will observe the patched instruction(s), and this should be preserved even if the IPI is deferred. In other words, to safely defer this synchronization, any kernel instruction leading to the execution of the deferred instruction sync (ct_work_flush()) must *not* be mutable (patchable) at runtime. This means we must pay attention to mutable instructions in the early entry code: - alternatives - static keys - all sorts of probes (kprobes/ftrace/bpf/???) The early entry code leading to ct_work_flush() is noinstr, which gets rid of the probes. Alternatives are safe, because it's boot-time patching (before SMP is even brought up) which is before any IPI deferral can happen. This leaves us with static keys. Any static key used in early entry code should be only forever-enabled at boot time, IOW __ro_after_init (pretty much like alternatives). Exceptions are marked as forceful and will always generate an IPI when flipped. Objtool is now able to point at static keys that don't respect this, and all static keys used in early entry code have now been verified as behaving like so. Leverage the new context_tracking infrastructure to defer sync_core() IPIs to a target CPU's next kernel entry. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Valentin Schneider --- arch/x86/include/asm/context_tracking_work.h | 6 ++-- arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 33 +++++++++++++++++--- arch/x86/kernel/kprobes/core.c | 4 +-- arch/x86/kernel/kprobes/opt.c | 4 +-- arch/x86/kernel/module.c | 2 +- include/linux/context_tracking_work.h | 4 +-- 7 files changed, 41 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 5bc29e6b2ed38..2c66687ce00e2 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -2,11 +2,13 @@ #ifndef _ASM_X86_CONTEXT_TRACKING_WORK_H #define _ASM_X86_CONTEXT_TRACKING_WORK_H +#include + static __always_inline void arch_context_tracking_work(int work) { switch (work) { - case CONTEXT_WORK_n: - // Do work... + case CONTEXT_WORK_SYNC: + sync_core(); break; } } diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e34de36cab61e..37344e95f738f 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -33,6 +33,7 @@ extern void apply_relocation(u8 *buf, const u8 * const instr, size_t instrlen, u */ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); +extern void text_poke_sync_deferrable(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern void *text_poke_copy_locked(void *addr, const void *opcode, size_t len, bool core_ok); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 954c4c0f7fc58..4ce224d927b03 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -2080,9 +2081,24 @@ static void do_sync_core(void *info) sync_core(); } +static bool do_sync_core_defer_cond(int cpu, void *info) +{ + return !ct_set_cpu_work(cpu, CONTEXT_WORK_SYNC); +} + +static void __text_poke_sync(smp_cond_func_t cond_func) +{ + on_each_cpu_cond(cond_func, do_sync_core, NULL, 1); +} + void text_poke_sync(void) { - on_each_cpu(do_sync_core, NULL, 1); + __text_poke_sync(NULL); +} + +void text_poke_sync_deferrable(void) +{ + __text_poke_sync(do_sync_core_defer_cond); } /* @@ -2257,6 +2273,8 @@ static int tp_vec_nr; static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries) { unsigned char int3 = INT3_INSN_OPCODE; + bool force_ipi = false; + void (*sync_fn)(void); unsigned int i; int do_sync; @@ -2291,11 +2309,18 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * First step: add a int3 trap to the address that will be patched. */ for (i = 0; i < nr_entries; i++) { + /* + * Record that we need to send the IPI if at least one location + * in the batch requires it. + */ + force_ipi |= tp[i].force_ipi; tp[i].old = *(u8 *)text_poke_addr(&tp[i]); text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE); } - text_poke_sync(); + sync_fn = force_ipi ? text_poke_sync : text_poke_sync_deferrable; + + sync_fn(); /* * Second step: update all but the first byte of the patched range. @@ -2357,7 +2382,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries * not necessary and we'd be safe even without it. But * better safe than sorry (plus there's not only Intel). */ - text_poke_sync(); + sync_fn(); } /* @@ -2378,7 +2403,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries } if (do_sync) - text_poke_sync(); + sync_fn(); /* * Remove and wait for refs to be zero. diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 72e6a45e7ec24..c2fd2578fd5fc 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -817,7 +817,7 @@ void arch_arm_kprobe(struct kprobe *p) u8 int3 = INT3_INSN_OPCODE; text_poke(p->addr, &int3, 1); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1); } @@ -827,7 +827,7 @@ void arch_disarm_kprobe(struct kprobe *p) perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1); text_poke(p->addr, &p->opcode, 1); - text_poke_sync(); + text_poke_sync_deferrable(); } void arch_remove_kprobe(struct kprobe *p) diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 36d6809c6c9e1..b2ce4d9c3ba56 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -513,11 +513,11 @@ void arch_unoptimize_kprobe(struct optimized_kprobe *op) JMP32_INSN_SIZE - INT3_INSN_SIZE); text_poke(addr, new, INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); text_poke(addr + INT3_INSN_SIZE, new + INT3_INSN_SIZE, JMP32_INSN_SIZE - INT3_INSN_SIZE); - text_poke_sync(); + text_poke_sync_deferrable(); perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE); } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 837450b6e882f..00e71ad30c01d 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -191,7 +191,7 @@ static int write_relocate_add(Elf64_Shdr *sechdrs, write, apply); if (!early) { - text_poke_sync(); + text_poke_sync_deferrable(); mutex_unlock(&text_mutex); } diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index fb74db8876dd2..13fc97b395030 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -5,12 +5,12 @@ #include enum { - CONTEXT_WORK_n_OFFSET, + CONTEXT_WORK_SYNC_OFFSET, CONTEXT_WORK_MAX_OFFSET }; enum ct_work { - CONTEXT_WORK_n = BIT(CONTEXT_WORK_n_OFFSET), + CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET), CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) }; From patchwork Tue Nov 19 15:35:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880194 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3582B1D1E60 for ; Tue, 19 Nov 2024 15:39:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030771; cv=none; b=aOui87zRExo8GOnBRib72PEXEM8Ye0mT6jWKw2P7vH/ol0SYm1NRPDHMoiSilGxI8VhBvFf+IeRKwnpPixadxBxKIru0JkSx3BwHo0LQL0HRYOoer6K9Wco2NlZqVsaE2gi4uzgwaz80AWXuTmNhpNAAPFmWxyKPQyTAtRTFfLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030771; c=relaxed/simple; bh=qZYlteLvCxXOvYUuzfcxn7fFpUiypvGlWfQxkKpgHw8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jqJy1dQxBPok9WYh/FVx0qV3qdWDXjoQ9NlxkHgYZlya+qNISZAdAEv5LphLltE2YsNOk+CrSMWvSgxHWgKGLieKFLSSawz4vvq3kCH/XSyikAa5rUDPkgDVm617ACIrlxeWosKAVFt6hxb5XNF/YtJMmf0a+kiB2SLM9Asef4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V0UoDJj+; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V0UoDJj+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030769; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XZhR8LJzRJ/NA3355q22R2um1DqV6KheeHOnZlXN8BQ=; b=V0UoDJj+mrLCzIuqd0TVI6rtcQMmNtLBqG7yksCmePkzi3Sy5tJgYeU27r+6MRHmgV/FZt B3r0mBHzyMzBYWwaOoyAEGMIhXACLHmGMB8Fyg7HJ6562o9ha1V5cUYPxXzpSc9AsukeWA DgJ84bicFE6Nxb0rlGAzlFnvhHwZKa8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-620-EyVM_Q77Oy2qLzT--XqbYw-1; Tue, 19 Nov 2024 10:39:27 -0500 X-MC-Unique: EyVM_Q77Oy2qLzT--XqbYw-1 X-Mimecast-MFC-AGG-ID: EyVM_Q77Oy2qLzT--XqbYw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 209A21955F2C; Tue, 19 Nov 2024 15:39:22 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7E38F30001A0; Tue, 19 Nov 2024 15:39:06 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI Date: Tue, 19 Nov 2024 16:35:00 +0100 Message-ID: <20241119153502.41361-14-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Kernel TLB invalidation IPIs are a common source of interference on NOHZ_FULL CPUs. Given NOHZ_FULL CPUs executing in userspace are not accessing any kernel addresses, these invalidations do not need to happen immediately, and can be deferred until the next user->kernel transition. Add a minimal, noinstr-compliant variant of __flush_tlb_all() that doesn't try to leverage INVPCID. To achieve this: o Add a noinstr variant of native_write_cr4(), keeping native_write_cr4() as "only" __no_profile. o Make invalidate_user_asid() __always_inline XXX: what about paravirt? Should these be instead new operations made available through pv_ops.mmu.*? x86_64_start_kernel() uses __native_tlb_flush_global() regardless of paravirt, so I'm thinking the paravirt variants are more optimizations than hard requirements? Signed-off-by: Valentin Schneider --- arch/x86/include/asm/context_tracking_work.h | 4 +++ arch/x86/include/asm/special_insns.h | 1 + arch/x86/include/asm/tlbflush.h | 16 ++++++++++-- arch/x86/kernel/cpu/common.c | 6 ++++- arch/x86/mm/tlb.c | 26 ++++++++++++++++++-- include/linux/context_tracking_work.h | 2 ++ 6 files changed, 50 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/context_tracking_work.h b/arch/x86/include/asm/context_tracking_work.h index 2c66687ce00e2..9d4f021b5a45b 100644 --- a/arch/x86/include/asm/context_tracking_work.h +++ b/arch/x86/include/asm/context_tracking_work.h @@ -3,6 +3,7 @@ #define _ASM_X86_CONTEXT_TRACKING_WORK_H #include +#include static __always_inline void arch_context_tracking_work(int work) { @@ -10,6 +11,9 @@ static __always_inline void arch_context_tracking_work(int work) case CONTEXT_WORK_SYNC: sync_core(); break; + case CONTEXT_WORK_TLBI: + __flush_tlb_all_noinstr(); + break; } } diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index aec6e2d3aa1d5..b97157a69d48e 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -74,6 +74,7 @@ static inline unsigned long native_read_cr4(void) return val; } +void native_write_cr4_noinstr(unsigned long val); void native_write_cr4(unsigned long val); #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 69e79fff41b80..a653b5f47f0e6 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -18,6 +18,7 @@ DECLARE_PER_CPU(u64, tlbstate_untag_mask); void __flush_tlb_all(void); +void noinstr __flush_tlb_all_noinstr(void); #define TLB_FLUSH_ALL -1UL #define TLB_GENERATION_INVALID 0 @@ -418,9 +419,20 @@ static inline void cpu_tlbstate_update_lam(unsigned long lam, u64 untag_mask) #endif #endif /* !MODULE */ +#define __NATIVE_TLB_FLUSH_GLOBAL(suffix, cr4) \ + native_write_cr4##suffix(cr4 ^ X86_CR4_PGE); \ + native_write_cr4##suffix(cr4) +#define NATIVE_TLB_FLUSH_GLOBAL(cr4) __NATIVE_TLB_FLUSH_GLOBAL(, cr4) +#define NATIVE_TLB_FLUSH_GLOBAL_NOINSTR(cr4) __NATIVE_TLB_FLUSH_GLOBAL(_noinstr, cr4) + static inline void __native_tlb_flush_global(unsigned long cr4) { - native_write_cr4(cr4 ^ X86_CR4_PGE); - native_write_cr4(cr4); + NATIVE_TLB_FLUSH_GLOBAL(cr4); } + +static inline void __native_tlb_flush_global_noinstr(unsigned long cr4) +{ + NATIVE_TLB_FLUSH_GLOBAL_NOINSTR(cr4); +} + #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index a5f221ea56888..a84bb8511650b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -424,7 +424,7 @@ void native_write_cr0(unsigned long val) } EXPORT_SYMBOL(native_write_cr0); -void __no_profile native_write_cr4(unsigned long val) +noinstr void native_write_cr4_noinstr(unsigned long val) { unsigned long bits_changed = 0; @@ -442,6 +442,10 @@ void __no_profile native_write_cr4(unsigned long val) bits_changed); } } +void native_write_cr4(unsigned long val) +{ + native_write_cr4_noinstr(val); +} #if IS_MODULE(CONFIG_LKDTM) EXPORT_SYMBOL_GPL(native_write_cr4); #endif diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 86593d1b787d8..973a4ab3f53b3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -256,7 +256,7 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, * * See SWITCH_TO_USER_CR3. */ -static inline void invalidate_user_asid(u16 asid) +static __always_inline void invalidate_user_asid(u16 asid) { /* There is no user ASID if address space separation is off */ if (!IS_ENABLED(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION)) @@ -1198,7 +1198,7 @@ STATIC_NOPV void native_flush_tlb_global(void) /* * Flush the entire current user mapping */ -STATIC_NOPV void native_flush_tlb_local(void) +static noinstr void native_flush_tlb_local_noinstr(void) { /* * Preemption or interrupts must be disabled to protect the access @@ -1213,6 +1213,11 @@ STATIC_NOPV void native_flush_tlb_local(void) native_write_cr3(__native_read_cr3()); } +STATIC_NOPV void native_flush_tlb_local(void) +{ + native_flush_tlb_local_noinstr(); +} + void flush_tlb_local(void) { __flush_tlb_local(); @@ -1240,6 +1245,23 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +void noinstr __flush_tlb_all_noinstr(void) +{ + /* + * This is for invocation in early entry code that cannot be + * instrumented. A RMW to CR4 works for most cases, but relies on + * being able to flip either of the PGE or PCIDE bits. Flipping CR4.PCID + * would require also resetting CR3.PCID, so just try with CR4.PGE, else + * do the CR3 write. + * + * XXX: this gives paravirt the finger. + */ + if (cpu_feature_enabled(X86_FEATURE_PGE)) + __native_tlb_flush_global_noinstr(this_cpu_read(cpu_tlbstate.cr4)); + else + native_flush_tlb_local_noinstr(); +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/include/linux/context_tracking_work.h b/include/linux/context_tracking_work.h index 13fc97b395030..47d5ced39a43a 100644 --- a/include/linux/context_tracking_work.h +++ b/include/linux/context_tracking_work.h @@ -6,11 +6,13 @@ enum { CONTEXT_WORK_SYNC_OFFSET, + CONTEXT_WORK_TLBI_OFFSET, CONTEXT_WORK_MAX_OFFSET }; enum ct_work { CONTEXT_WORK_SYNC = BIT(CONTEXT_WORK_SYNC_OFFSET), + CONTEXT_WORK_TLBI = BIT(CONTEXT_WORK_TLBI_OFFSET), CONTEXT_WORK_MAX = BIT(CONTEXT_WORK_MAX_OFFSET) }; From patchwork Tue Nov 19 15:35:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880195 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1ABEC1D1507 for ; Tue, 19 Nov 2024 15:39:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030785; cv=none; b=oxmTDUklVbcELsFl/wATdt3f1Abe5Wc8WCLUS4/7+Lvds6z66J7bhDlsDIcR9q+sAdaHcg6w7qSVDKHyq8tEYeFqhLue2Uwmqgpy0mFAB0mj3QW5pbtTWGGez1vmEF9s3yVgtaww5MjTAM4EG7uxvl1RHS83g3XYma5+xny6VlM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030785; c=relaxed/simple; bh=xDTiVCke+dcOcHJv8qISnvLe6s6Z2yfLt5JyeXbmVNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pW9GpzLfdTC1qwEYOOh2zr5vhOcukFdt3ROU3UA3mJvN2/Vhnrue7XP++Wl5fFTLWlBvXpsrLnI5V53IThMqELqAyz91Xlm6VDRyy8OorCUUr0ajleHEmcelFyJon1WvgWc3//HIVtW0HCRrJXhVZjQUKiEMMdJOg8jzxlC7HFs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=JJeT2Tyl; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JJeT2Tyl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xImyrBZyvIgQkmg3c7kz+ISL+g7A1zTmv4nEmku3rgI=; b=JJeT2Tyl0AB0TWw6cslgZoBG+I/YkoALTGILYVjDpfm1MKG2XHNktYy+U8S1ecXs2huUhU Ty9YZZHrFPpxEpTNqz67bOVNf3ZIaN7drJheODbTJvNQQIq3egdmB6ZNBCHN1tJp6kiOpM u/BzbgaDmlOt0sqQZYQsTWYZtxuQ5Xk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-Nxl_Cjx6PWO_riG5D1782Q-1; Tue, 19 Nov 2024 10:39:41 -0500 X-MC-Unique: Nxl_Cjx6PWO_riG5D1782Q-1 X-Mimecast-MFC-AGG-ID: Nxl_Cjx6PWO_riG5D1782Q Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E9AB019560A2; Tue, 19 Nov 2024 15:39:36 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 901D830001A0; Tue, 19 Nov 2024 15:39:22 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 14/15] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Date: Tue, 19 Nov 2024 16:35:01 +0100 Message-ID: <20241119153502.41361-15-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 vunmap()'s issued from housekeeping CPUs are a relatively common source of interference for isolated NOHZ_FULL CPUs, as they are hit by the flush_tlb_kernel_range() IPIs. Given that CPUs executing in userspace do not access data in the vmalloc range, these IPIs could be deferred until their next kernel entry. Deferral vs early entry danger zone =================================== This requires a guarantee that nothing in the vmalloc range can be vunmap'd and then accessed in early entry code. Vmalloc uses are, as reported by vmallocinfo: $ cat /proc/vmallocinfo | awk '{ print $3 }' | sort | uniq __pci_enable_msix_range+0x32b/0x560 acpi_os_map_iomem+0x22d/0x250 bpf_prog_alloc_no_stats+0x34/0x140 fork_idle+0x79/0x120 gen_pool_add_owner+0x3e/0xb0 ? hpet_enable+0xbf/0x470 irq_init_percpu_irqstack+0x129/0x160 kernel_clone+0xab/0x3b0 memremap+0x164/0x330 n_tty_open+0x18/0xa0 pcpu_create_chunk+0x4e/0x1b0 pcpu_create_chunk+0x75/0x1b0 pcpu_get_vm_areas+0x0/0x1100 unpurged vp_modern_map_capability+0x140/0x270 zisofs_init+0x16/0x30 I've categorized these as: a) Device or percpu mappings For these to be unmapped, the device (or CPU) has to be removed and an eventual IRQ freed. Even if the IRQ isn't freed, context tracking entry happens before handling the IRQ itself, per irqentry_enter() usage. __pci_enable_msix_range() acpi_os_map_iomem() irq_init_percpu_irqstack() (not even unmapped when CPU is hot-unplugged!) memremap() n_tty_open() pcpu_create_chunk() pcpu_get_vm_areas() vp_modern_map_capability() b) CONFIG_VMAP_STACK fork_idle() & kernel_clone() vmalloc'd kernel stacks are AFAICT a safe example, as a task running in userspace needs to enter kernelspace to execute do_exit() before its stack can be vfree'd. c) Non-issues bpf_prog_alloc_no_stats() - early entry is noinstr, no BPF! hpet_enable() - hpet_clear_mapping() is only called if __init function fails, no runtime worries zisofs_init () - used for zisofs block decompression, that's way past context tracking entry d) I'm not sure, have a look? gen_pool_add_owner() - AIUI this is mainly for PCI / DMA stuff, which again I wouldn't expect to be accessed before context tracking entry. Changes ====== Blindly deferring any and all flush of the kernel mappings is a risky move, so introduce a variant of flush_tlb_kernel_range() that explicitly allows deferral. Use it for vunmap flushes. Note that while flush_tlb_kernel_range() may end up issuing a full flush (including user mappings), this only happens when reaching a invalidation range threshold where it is cheaper to do a full flush than to individually invalidate each page in the range via INVLPG. IOW, it doesn't *require* invalidating user mappings, and thus remains safe to defer until a later kernel entry. Signed-off-by: Valentin Schneider --- arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/tlb.c | 23 +++++++++++++++++++--- mm/vmalloc.c | 35 ++++++++++++++++++++++++++++----- 3 files changed, 51 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index a653b5f47f0e6..d89345c85fa9c 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -249,6 +249,7 @@ extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables); extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); +extern void flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end); static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 973a4ab3f53b3..bf6ff16a1a523 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -1041,6 +1042,11 @@ static void do_flush_tlb_all(void *info) __flush_tlb_all(); } +static bool do_kernel_flush_defer_cond(int cpu, void *info) +{ + return !ct_set_cpu_work(cpu, CONTEXT_WORK_TLBI); +} + void flush_tlb_all(void) { count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); @@ -1057,12 +1063,13 @@ static void do_kernel_range_flush(void *info) flush_tlb_one_kernel(addr); } -void flush_tlb_kernel_range(unsigned long start, unsigned long end) +static inline void +__flush_tlb_kernel_range(smp_cond_func_t cond_func, unsigned long start, unsigned long end) { /* Balance as user space task's flush, a bit conservative */ if (end == TLB_FLUSH_ALL || (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) { - on_each_cpu(do_flush_tlb_all, NULL, 1); + on_each_cpu_cond(cond_func, do_flush_tlb_all, NULL, 1); } else { struct flush_tlb_info *info; @@ -1070,13 +1077,23 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) info = get_flush_tlb_info(NULL, start, end, 0, false, TLB_GENERATION_INVALID); - on_each_cpu(do_kernel_range_flush, info, 1); + on_each_cpu_cond(cond_func, do_kernel_range_flush, info, 1); put_flush_tlb_info(); preempt_enable(); } } +void flush_tlb_kernel_range(unsigned long start, unsigned long end) +{ + __flush_tlb_kernel_range(NULL, start, end); +} + +void flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end) +{ + __flush_tlb_kernel_range(do_kernel_flush_defer_cond, start, end); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 634162271c004..02838c515ce2c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -467,6 +467,31 @@ void vunmap_range_noflush(unsigned long start, unsigned long end) __vunmap_range_noflush(start, end); } +#ifdef CONFIG_CONTEXT_TRACKING_WORK +/* + * !!! BIG FAT WARNING !!! + * + * The CPU is free to cache any part of the paging hierarchy it wants at any + * time. It's also free to set accessed and dirty bits at any time, even for + * instructions that may never execute architecturally. + * + * This means that deferring a TLB flush affecting freed page-table-pages (IOW, + * keeping them in a CPU's paging hierarchy cache) is akin to dancing in a + * minefield. + * + * This isn't a problem for deferral of TLB flushes in vmalloc, because + * page-table-pages used for vmap() mappings are never freed - see how + * __vunmap_range_noflush() walks the whole mapping but only clears the leaf PTEs. + * If this ever changes, TLB flush deferral will cause misery. + */ +void __weak flush_tlb_kernel_range_deferrable(unsigned long start, unsigned long end) +{ + flush_tlb_kernel_range(start, end); +} +#else +#define flush_tlb_kernel_range_deferrable(start, end) flush_tlb_kernel_range(start, end) +#endif + /** * vunmap_range - unmap kernel virtual addresses * @addr: start of the VM area to unmap @@ -480,7 +505,7 @@ void vunmap_range(unsigned long addr, unsigned long end) { flush_cache_vunmap(addr, end); vunmap_range_noflush(addr, end); - flush_tlb_kernel_range(addr, end); + flush_tlb_kernel_range_deferrable(addr, end); } static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, @@ -2265,7 +2290,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end, nr_purge_nodes = cpumask_weight(&purge_nodes); if (nr_purge_nodes > 0) { - flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range_deferrable(start, end); /* One extra worker is per a lazy_max_pages() full set minus one. */ nr_purge_helpers = atomic_long_read(&vmap_lazy_nr) / lazy_max_pages(); @@ -2368,7 +2393,7 @@ static void free_unmap_vmap_area(struct vmap_area *va) flush_cache_vunmap(va->va_start, va->va_end); vunmap_range_noflush(va->va_start, va->va_end); if (debug_pagealloc_enabled_static()) - flush_tlb_kernel_range(va->va_start, va->va_end); + flush_tlb_kernel_range_deferrable(va->va_start, va->va_end); free_vmap_area_noflush(va); } @@ -2816,7 +2841,7 @@ static void vb_free(unsigned long addr, unsigned long size) vunmap_range_noflush(addr, addr + size); if (debug_pagealloc_enabled_static()) - flush_tlb_kernel_range(addr, addr + size); + flush_tlb_kernel_range_deferrable(addr, addr + size); spin_lock(&vb->lock); @@ -2881,7 +2906,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) free_purged_blocks(&purge_list); if (!__purge_vmap_area_lazy(start, end, false) && flush) - flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range_deferrable(start, end); mutex_unlock(&vmap_purge_lock); } From patchwork Tue Nov 19 15:35:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Valentin Schneider X-Patchwork-Id: 13880196 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 629821D1F73 for ; Tue, 19 Nov 2024 15:40:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030803; cv=none; b=RJrWlatYEB4BHJt8nhurwvCnILQwH+d5UkmeeHgjKLEi0DevK/ynUMqX+VD/6WR5lqN/NvjMvA4juhrKLPILwRdWuGNmRkonDWkIFA9rSCTkQH7XtmElhIruVaKQkHjShELhmeFLfHddtFmewkiuLsrNF5j97jkOc9cwj2Yc58c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732030803; c=relaxed/simple; bh=FDSvP1EhYqYkFNY54ZUxRHJZiVHOU5zN9wYGjv6JRq4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D5qMhMpuygtyRGpZoeRUSTy1284cUbu1/qaAwduEvakgS6WDHTrJOzS5/gMUJVPJACjgS5yug4Y0LZi2MCkvBxHxksx5ikcdOaOCKzay7F705FNZ8LPI6xSTZbFOAm0X7j1XHYpDzEZ91XnsGcsMu+kT9b96qDhFahuc8PDE3fw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OICJEgQh; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OICJEgQh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732030800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F1GnqQSFPr8VITMqjCT6H9sq2glpq77hQWm9Vj7AfgE=; b=OICJEgQhlqvRogHx92dhLfq7JCpGGCfyKI3RpuKIR+izFdYESAGX/yxLUD5U1ZLxF/Qn8w A5LXOPhn61tKr4vgY/9/WXMRuAZJ3caT5gtZIh9dSHzOh7kofZWyGP7UB12G8UOzfbsV6D 00OI6vt0P0V0mw8NqYdbdR5CuH1RHcY= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-207-92ypQAdDPJmP9uLfkEuaoA-1; Tue, 19 Nov 2024 10:39:57 -0500 X-MC-Unique: 92ypQAdDPJmP9uLfkEuaoA-1 X-Mimecast-MFC-AGG-ID: 92ypQAdDPJmP9uLfkEuaoA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 15EC119541A0; Tue, 19 Nov 2024 15:39:52 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.194.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 63E1730001A0; Tue, 19 Nov 2024 15:39:37 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, x86@kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Nadav Amit , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?utf-8?q?Thomas_Wei=C3=9Fschuh?= , Juri Lelli , Marcelo Tosatti , Yair Podemsky , Daniel Wagner , Petr Tesarik Subject: [RFC PATCH v3 15/15] context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE Date: Tue, 19 Nov 2024 16:35:02 +0100 Message-ID: <20241119153502.41361-16-vschneid@redhat.com> In-Reply-To: <20241119153502.41361-1-vschneid@redhat.com> References: <20241119153502.41361-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 With NO_HZ_IDLE, we get CONTEXT_TRACKING_IDLE, so we get these transitions: ct_idle_enter() ct_kernel_exit() ct_state_inc_clear_work() ct_idle_exit() ct_kernel_enter() ct_work_flush() With just CONTEXT_TRACKING_IDLE, ct_state_inc_clear_work() is just ct_state_inc() and ct_work_flush() is a no-op. However, making them be functional as if under CONTEXT_TRACKING_WORK would allow NO_HZ_IDLE to leverage IPI deferral to keep idle CPUs idle longer. Having this enabled for NO_HZ_IDLE is a different argument than for having it for NO_HZ_FULL (power savings vs latency/performance), but the backing mechanism is identical. Add a default-no option to enable IPI deferral with NO_HZ_IDLE. Signed-off-by: Valentin Schneider --- kernel/time/Kconfig | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index 04efc2b605823..a3cb903a4e022 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -188,9 +188,23 @@ config CONTEXT_TRACKING_USER_FORCE config CONTEXT_TRACKING_WORK bool - depends on HAVE_CONTEXT_TRACKING_WORK && CONTEXT_TRACKING_USER + depends on HAVE_CONTEXT_TRACKING_WORK && (CONTEXT_TRACKING_USER || CONTEXT_TRACKING_WORK_IDLE) default y +config CONTEXT_TRACKING_WORK_IDLE + bool + depends on HAVE_CONTEXT_TRACKING_WORK && CONTEXT_TRACKING_IDLE && !CONTEXT_TRACKING_USER + default n + help + This option enables deferral of some IPIs when they are targeted at CPUs + that are idle. This can help keep CPUs idle longer, but induces some + extra overhead to idle <-> kernel transitions and to IPI sending. + + Say Y if the power improvements are worth more to you than the added + overheads. + + Say N otherwise. + config NO_HZ bool "Old Idle dynticks config" help