From patchwork Fri May 24 20:03:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Cooper X-Patchwork-Id: 13673591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C6F97C25B7E for ; Fri, 24 May 2024 20:03:58 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.729750.1135046 (Exim 4.92) (envelope-from ) id 1sAb8q-0008Qz-BB; Fri, 24 May 2024 20:03:52 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 729750.1135046; Fri, 24 May 2024 20:03:52 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sAb8q-0008Q7-0z; Fri, 24 May 2024 20:03:52 +0000 Received: by outflank-mailman (input) for mailman id 729750; Fri, 24 May 2024 20:03:51 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sAb8p-00076x-4k for xen-devel@lists.xenproject.org; Fri, 24 May 2024 20:03:51 +0000 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [2a00:1450:4864:20::533]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id bd08fe0b-1a08-11ef-90a1-e314d9c70b13; Fri, 24 May 2024 22:03:50 +0200 (CEST) Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-57857e0f465so1148359a12.1 for ; Fri, 24 May 2024 13:03:50 -0700 (PDT) Received: from andrewcoop.citrite.net ([160.101.139.1]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a626c9377d8sm173553066b.55.2024.05.24.13.03.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 May 2024 13:03:49 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: bd08fe0b-1a08-11ef-90a1-e314d9c70b13 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrix.com; s=google; t=1716581030; x=1717185830; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=s4dvCcdXKh6qx9C0SJSN8op2mWkgo8zuNPTidlZTBXU=; b=reUBOJmf1e5428vJskaJS1GtdgRJ7z02uxzxYV1OCXvzkopMKeVvnOrkE2L8n8tYc9 oETtHEk8gD2P40lb1QmZ2WagdYLSyeUdMAt5IRPPV+vh25DWgk4drijRtLL3akvCrV3u 9ZIOXjH0Gxbw/IZ1wLuRB2WD627VaEKDw3GCU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716581030; x=1717185830; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s4dvCcdXKh6qx9C0SJSN8op2mWkgo8zuNPTidlZTBXU=; b=wl7Etr7+vgdJBonoBoFVHQ9z2M7DpAdlBR+LwdVNw/oUYiu0nwC9EYnDNdQbdcfSNZ daFn7zlZCWW/QdDzLxyqsae/WF5ztLadihc8/DBIpI+6XX7BrWxFM8CIBgJQK1o2oIWC wBEBPuT59WjR3uOylwwXhzv8BZBY2lRFRMmmW+vm2YylJmhlB8OZ6NgpdgZaD/H5hv7K Ixvenn+VkAP9vWZ/9VcaqegWWOTKQvc8Gp98Ro/I3FbKjJ2l+NvMLQYnTBP3LbC59d4v 3c1+uE6Px/mJ+M4fokGXuSxfhv5K6MWafvaJ+ePEa+q/QPdGaSmMeu9Ixjqt5ga2KYT4 bxFw== X-Gm-Message-State: AOJu0YxWRaUfSwAckQh2KVkGucuzwqprVsa5tM2MZ0ii0pDtpRFyJohT fBKL5k4bNBGIw28S+2/Rjmv1SBrUTyGJ4kVvmFq1If+FTqYw40zuqiFv3VOtKCg7FKQKnsS3pAX 5F+k= X-Google-Smtp-Source: AGHT+IGpbR8qQpWJhSYJH93pbLDUKTmP1Mf1+5HUrKKu6ErvFgLI74jMTr42QIaJ6HBcUl8sMDuvhw== X-Received: by 2002:a17:906:c0e:b0:a62:49ba:8242 with SMTP id a640c23a62f3a-a62651118admr194060366b.77.1716581029839; Fri, 24 May 2024 13:03:49 -0700 (PDT) From: Andrew Cooper To: Xen-devel Cc: Andrew Cooper , Jan Beulich , =?utf-8?q?Roger_Pau_Monn=C3=A9?= , Wei Liu , Stefano Stabellini , Julien Grall , Volodymyr Babchuk , Bertrand Marquis , Michal Orzel , Oleksii Kurochko , Shawn Anastasio , "consulting @ bugseng . com" , Simone Ballarin , Federico Serafini , Nicola Vetrini Subject: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case Date: Fri, 24 May 2024 21:03:32 +0100 Message-Id: <20240524200338.1232391-8-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20240524200338.1232391-1-andrew.cooper3@citrix.com> References: <20240524200338.1232391-1-andrew.cooper3@citrix.com> MIME-Version: 1.0 The asm in arch_ffs() is safe but inefficient. CMOV would be an improvement over a conditional branch, but for 64bit CPUs both Intel and AMD have provided enough details about the behaviour for a zero input. It is safe to pre-load the destination register with -1 and drop the conditional logic. However, it is common to find ffs() in a context where the optimiser knows that x in nonzero even if it the value isn't known precisely, and in that case it's safe to drop the preload of -1 too. There are only a handful of uses of ffs() in the x86 build, and all of them improve as a result of this: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31) Function old new delta mask_write 114 107 -7 xmem_pool_alloc 1063 1039 -24 Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu CC: Stefano Stabellini CC: Julien Grall CC: Volodymyr Babchuk CC: Bertrand Marquis CC: Michal Orzel CC: Oleksii Kurochko CC: Shawn Anastasio CC: consulting@bugseng.com CC: Simone Ballarin CC: Federico Serafini CC: Nicola Vetrini v2: * New. * Use __builtin_constant_p(x > 0) to optimise better. --- xen/arch/x86/include/asm/bitops.h | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h index 122767fc0d10..1d7aea6065ef 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) static always_inline unsigned int arch_ffs(unsigned int x) { - int r; + unsigned int r; + + if ( __builtin_constant_p(x > 0) && x > 0 ) + { + /* Safe, when the compiler knows that x is nonzero. */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x) ); + } + else + { + /* + * The AMD manual states that BSF won't modify the destination + * register if x=0. The Intel manual states that the result is + * undefined, but the architects have said that the register is + * written back with it's old value (zero extended as normal). + */ + asm ( "bsf %[val], %[res]" + : [res] "=r" (r) + : [val] "rm" (x), "[res]" (-1) ); + } - asm ( "bsf %1,%0\n\t" - "jnz 1f\n\t" - "mov $-1,%0\n" - "1:" : "=r" (r) : "rm" (x)); return r + 1; } #define arch_ffs arch_ffs