From patchwork Wed Jul 3 14:33:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13722350 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1BFC17C9E8; Wed, 3 Jul 2024 14:07:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015638; cv=none; b=cZhuq6Q2+mCL1TxBpKuXUfFiynzO9joFk8Alq8oziBox7SwUkGN+LgCrDLuzJfJ5Q8MFVq1RcNSMyrmHGb8jLGioIuPm3eLx/e7/m4kt/UCmsGZ2xzuzlNfoeaQ1Nw5PsP6ASPLfgQQeb3QKBMGUwIgJy2EZXkOAtS4vgZR+cZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015638; c=relaxed/simple; bh=MWZtVxjh3PFjIWGDTRmBv/J1DQS8L9dlFBlCzhsyd10=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j6Reh7WbqCQLRnwOa8DvJFJigMVd5ylNB2HTOGY1PzOVbktXdXObO3HIIm2MWf/Mmtvh5HKeeQDNvDlcSVfFAgCC9AtWPpJ+v9rXF0XI9IA0tbHsteGVFB7/r/QGSegJp3zJqkx72TMkJrMBtm9qexsrNM5rJXE1Mp55axrsRCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Vv2DXr1I; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Vv2DXr1I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015637; x=1751551637; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MWZtVxjh3PFjIWGDTRmBv/J1DQS8L9dlFBlCzhsyd10=; b=Vv2DXr1I3VJY/hBYq3zAzjHcMy1Q1p4/9orh4trqI88lxQtJ5wYFAu/v c8sZQGtjsdtroglT8KRCSJ/A/xHHIthC45fwdnWBjGK+FG2VsP385ERj7 4ey7eQ1herNrNh5Q6ZX5rHB7diQeSI/xcKZur5oaePTGjy0SImgpsunaP VOmDmFzXxETOjrSSMReY/ZDNAN2CU/rOO4V2Bam12MnphpcLFuF9qIt76 tUDATNEptuAs2/2bHcuRxek/3VZI2izMRp5i8I9Y4o9WuPhi0Uq7k50Uw /A6xTUPS1yYDPNB/qKcb6AHjOipH2KvRahDUWzyValU7LtP+POV8xMtop A==; X-CSE-ConnectionGUID: vZ76/Iq6RqKhm9hbhc6ibQ== X-CSE-MsgGUID: +/P0uKcsQeKgH7CZzzu6hA== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900702" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900702" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:17 -0700 X-CSE-ConnectionGUID: CfWTawYGR9WQ+E4bqntFRQ== X-CSE-MsgGUID: 4H/zJIhMSgOQsEe6WsNVOg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693470" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:13 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 1/3] fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd() Date: Wed, 3 Jul 2024 10:33:09 -0400 Message-ID: <20240703143311.2184454-2-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 alloc_fd() has a sanity check inside to make sure the struct file mapping to the allocated fd is NULL. Remove this sanity check since it can be assured by exisitng zero initilization and NULL set when recycling fd. Meanwhile, add likely/unlikely and expand_file() call avoidance to reduce the work under file_lock. Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/fs/file.c b/fs/file.c index a3b72aa64f11..5178b246e54b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -515,28 +515,29 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) if (fd < files->next_fd) fd = files->next_fd; - if (fd < fdt->max_fds) + if (likely(fd < fdt->max_fds)) fd = find_next_fd(fdt, fd); + error = -EMFILE; + if (unlikely(fd >= fdt->max_fds)) { + error = expand_files(files, fd); + if (error < 0) + goto out; + /* + * If we needed to expand the fs array we + * might have blocked - try again. + */ + if (error) + goto repeat; + } + /* * N.B. For clone tasks sharing a files structure, this test * will limit the total number of files that can be opened. */ - error = -EMFILE; - if (fd >= end) - goto out; - - error = expand_files(files, fd); - if (error < 0) + if (unlikely(fd >= end)) goto out; - /* - * If we needed to expand the fs array we - * might have blocked - try again. - */ - if (error) - goto repeat; - if (start <= files->next_fd) files->next_fd = fd + 1; @@ -546,13 +547,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) else __clear_close_on_exec(fd, fdt); error = fd; -#if 1 - /* Sanity check */ - if (rcu_access_pointer(fdt->fd[fd]) != NULL) { - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); - rcu_assign_pointer(fdt->fd[fd], NULL); - } -#endif out: spin_unlock(&files->file_lock); @@ -618,7 +612,7 @@ void fd_install(unsigned int fd, struct file *file) rcu_read_unlock_sched(); spin_lock(&files->file_lock); fdt = files_fdtable(files); - BUG_ON(fdt->fd[fd] != NULL); + WARN_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); spin_unlock(&files->file_lock); return; From patchwork Wed Jul 3 14:33:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13722351 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B24CD17B501; Wed, 3 Jul 2024 14:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015642; cv=none; b=LwKH+B5sUqu031iRUVT/SCc/eiwg1S6HADRL8DBupPffecj+o2ZWmwUj1KNVRgBoi4Yskw4QqApFoBeYcHlCBBnax0yOJi7aB/sX/+nl8ojm7AVPbf5pYk+AliHeBYwJj2N2tIhCeLsGnUDhUNliuK5JO1SA3eDuCQfPYcYc/qM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015642; c=relaxed/simple; bh=tkmY9I0q+40NyjPrxluCu/UB2+hGM/Hp1Gj0qiZNw8w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nR4CpYEYWhx9w3m1FbkZqut5cxE+okSuxbUBPAev3ZG7zOsGY8jvIrjqay7zvh909NZw6gp6NeN+51rIsoh+KdzWzVRTMBx0q/IsK9f+eo6dJDJ8gxr3e3NW8Bm+ULkhM16GXG3T/husu+4lcK22lFwnzshsvFCDmxWHcKkdPPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MSFFQ4Ba; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MSFFQ4Ba" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015641; x=1751551641; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tkmY9I0q+40NyjPrxluCu/UB2+hGM/Hp1Gj0qiZNw8w=; b=MSFFQ4BaB+zkOX8uA8n/WkaUubX4vbu7K/911VCgOO4Z6yeqp3o9mfvE +F7E/voqwpp5aUkpPhWWc+wB1rnaTJAftKwypaPr63bl5u40tzCx3O/cy y3mnl+kwc2trF6KfuECfeiFGH6MJTMr+ZqPnEqmLDEXAIClSWHLz2ypJV jwH0ZXS/kuVkFEBoBeE4hMY7la+7PYkkAy1cocL6TuZjcLzogCOMaP7zN I7BAK+WfoJdSh8VNAOXibwctqiDbN1NIpG51jzzaiQuW5G1l7g2W3XRTP PzcY0T5b/dRdwMDfLLv7iTt61GqoDqpkBWaybbj6SFbMau+IBR1Dam8Ts g==; X-CSE-ConnectionGUID: 2U7NZC/DQ0C77zbLdLLVng== X-CSE-MsgGUID: QaHpGd+ISsWGa8LXoSHpYg== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900716" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900716" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:21 -0700 X-CSE-ConnectionGUID: jbiKvuIdSmq3Psj50HB5bQ== X-CSE-MsgGUID: 4YFiFHXHRMGFRybRNZopDg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693488" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:17 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 2/3] fs/file.c: conditionally clear full_fds Date: Wed, 3 Jul 2024 10:33:10 -0400 Message-ID: <20240703143311.2184454-3-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 64 bits in open_fds are mapped to a common bit in full_fds_bits. It is very likely that a bit in full_fds_bits has been cleared before in __clear_open_fds()'s operation. Check the clear bit in full_fds_bits before clearing to avoid unnecessary write and cache bouncing. See commit fc90888d07b8 ("vfs: conditionally clear close-on-exec flag") for a similar optimization. Take stock kernel with patch 1 as baseline, it improves pts/blogbench-1.1.0 read for 13%, and write for 5% on Intel ICX 160 cores configuration with v6.10-rc6. Reviewed-by: Jan Kara Reviewed-by: Tim Chen Signed-off-by: Yu Ma --- fs/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 5178b246e54b..a15317db3119 100644 --- a/fs/file.c +++ b/fs/file.c @@ -268,7 +268,9 @@ static inline void __set_open_fd(unsigned int fd, struct fdtable *fdt) static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt) { __clear_bit(fd, fdt->open_fds); - __clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits); + fd /= BITS_PER_LONG; + if (test_bit(fd, fdt->full_fds_bits)) + __clear_bit(fd, fdt->full_fds_bits); } static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt) From patchwork Wed Jul 3 14:33:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ma, Yu" X-Patchwork-Id: 13722352 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A2F617FABD; Wed, 3 Jul 2024 14:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015646; cv=none; b=OPHdQnT8fPHoN5aBPHZFqb+EJNCD7Ns68pz7mF/1AdR4AfN001eoJvLPkk6LVgQXESHCOGjBZ9ekrUrtRhX9B8zpnDr34BCv6hgg1xxznpgJMsAkpMnKabLC+isxx+7jKBwN7DVpHfHKG3V30iKMe3OuWfkHmQ0hQlNBeg+5TYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720015646; c=relaxed/simple; bh=L4N3eNb+hTwnRqOC9zrQnPwJrLZ4wp6PpFdHRrO6tkU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CB+TpC+k3yQnLeAg3DZPf1DEFldXoWdNQP/DnRJ5qU/IyHu39fE3iumQwci1kool33PrqVWoYXimQkFbDCOGNyddrLt2ota3xxok7dBQ9VaQa5OW2FUgp7ZwsR/JZ0u9/TqF21g1RZxA0YigPz4GhEkJcL+UmNfQy3ASk2Qu8mk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OPzep+Ir; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OPzep+Ir" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720015646; x=1751551646; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L4N3eNb+hTwnRqOC9zrQnPwJrLZ4wp6PpFdHRrO6tkU=; b=OPzep+Ir+doochKpEDrcqT1JZoPWQDB3bV6reFOaERd7S78yctGt1zv8 9a+SL6LwtkwptFeBo5SB4OurgncPZ+NmMLBeWb6EEle1BrNqASIxTC6U0 1haZtdtl0N0qi7gqkP/xCQCOkYKzKKs+WsvTkjbVoKnnezVgl5mFsKIMW 5W/ZVWG0pF8Xh1pz20/BBZWaKL1pbTm70z4PLjri5kdpWupcaDCg4jWbv veD6Q+UrJg+Flz5DCNR/vGRfqk2Al0NDdRk72x0qzlH69fKcxeb8TiFtT /ISt+z+7xmS2xUw5ShG41SUjmgLaEmqR11u6wHErZpjevcBrCc6J17vxU A==; X-CSE-ConnectionGUID: sUSNJtu0Q2CXUWw8/6PWfA== X-CSE-MsgGUID: Kg3WhDsxQ1mf51F+SnBBjQ== X-IronPort-AV: E=McAfee;i="6700,10204,11121"; a="16900731" X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="16900731" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 07:07:25 -0700 X-CSE-ConnectionGUID: mTsZkbyJTsapUGliTg5XdQ== X-CSE-MsgGUID: h119Mf75TQOy7vvAYg3Iaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,182,1716274800"; d="scan'208";a="46693515" Received: from linux-pnp-server-16.sh.intel.com ([10.239.177.152]) by orviesa006.jf.intel.com with ESMTP; 03 Jul 2024 07:07:22 -0700 From: Yu Ma To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, mjguzik@gmail.com, edumazet@google.com Cc: yu.ma@intel.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, pan.deng@intel.com, tianyou.li@intel.com, tim.c.chen@intel.com, tim.c.chen@linux.intel.com Subject: [PATCH v3 3/3] fs/file.c: add fast path in find_next_fd() Date: Wed, 3 Jul 2024 10:33:11 -0400 Message-ID: <20240703143311.2184454-4-yu.ma@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240703143311.2184454-1-yu.ma@intel.com> References: <20240614163416.728752-1-yu.ma@intel.com> <20240703143311.2184454-1-yu.ma@intel.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There is available fd in the lower 64 bits of open_fds bitmap for most cases when we look for an available fd slot. Skip 2-levels searching via find_next_zero_bit() for this common fast path. Look directly for an open bit in the lower 64 bits of open_fds bitmap when a free slot is available there, as: (1) The fd allocation algorithm would always allocate fd from small to large. Lower bits in open_fds bitmap would be used much more frequently than higher bits. (2) After fdt is expanded (the bitmap size doubled for each time of expansion), it would never be shrunk. The search size increases but there are few open fds available here. (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up searching. As suggested by Mateusz Guzik and Jan Kara , update the fast path from alloc_fd() to find_next_fd(). With which, on top of patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on Intel ICX 160 cores configuration with v6.10-rc6. Reviewed-by: Tim Chen Signed-off-by: Yu Ma Reviewed-by: Jan Kara --- fs/file.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/file.c b/fs/file.c index a15317db3119..f25eca311f51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -488,6 +488,11 @@ struct files_struct init_files = { static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { + unsigned int bit; + bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start); + if (bit < BITS_PER_LONG) + return bit; + unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */ unsigned int maxbit = maxfd / BITS_PER_LONG; unsigned int bitbit = start / BITS_PER_LONG;