From patchwork Thu Jun 13 00:12:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13695696 Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46B1D15CB; Thu, 13 Jun 2024 00:12:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718237554; cv=none; b=rEjK1q1EBkvAjoeQD1Lbg244LSPTEkprU1i/UsdLPjyUapdVHKqBsL0vWLTnX5x5i2PuYBWq01OfSe6vzvuALfGCpkDDq82ICkRxyDuCxPEAJoBAvVPdCxi0xFolaOgle1V9jI12HtXMfkzoRUEVnB7xBvLbpEfUw3J9M66ja4o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718237554; c=relaxed/simple; bh=8EUyhfj+zpqg4Fce77r2+ZXQqEGm3vNZ/nYjDp2Nkc0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U57g4pSxf925xCiFteMQkZ60Mr+j83Q6cbO4UiCGFW4xNeHQxw9popR0lMIa0aLibHw4cf89Qiaf4/CGPWo2CyRcvGoo6JEn3HPS6Dv8DCF7W74funWOw+z/T/wp1tqfQ15vj41GRdySpMtCC8CEqA4qBHIvSEs0QwwJ/70wadQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DJp9VgaE; arc=none smtp.client-ip=209.85.167.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DJp9VgaE" Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-52bc335e49aso558904e87.3; Wed, 12 Jun 2024 17:12:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718237549; x=1718842349; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oXABC3gILXNUweUcHa6z6RMjkl0AXhA15Mutin2/Bpc=; b=DJp9VgaEUCVmwAuDgVdyCna+LtZ+JrlYoUXYW/xSKPW4+BVNLf1cH/YSRuRRpJG+BS 1oCHXvnysEnhM9w9YTVvX0wdOK8gN3qTsxVwBggEMruDdMFXfzTLrweB173GaKncJhb/ /BCT4YY4RrJrKqviPmIO6iyir5SCTW07s2BAVFcT+Gix9o/LXdnQPFwWIlukN2k0plpR mZ6tS7y5bXmscBWGTXqoN0YgMM09sbPfe9417thSUfmnBJTZ5m1o2A3UiGoVQ8J2MxcX 15Fwj/6nzhKyAsSjIns9cg/bQs3fC0S0pTLO0KD94yuAIoBl9r/gbN7xFdsdDzJM8xnh 8zRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718237549; x=1718842349; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oXABC3gILXNUweUcHa6z6RMjkl0AXhA15Mutin2/Bpc=; b=SVQEzLaXL7Np4nqANjvZ4MdUs2Tu/vUHaoEnXssG/E8ne1FMR8Kb4oUYKIClEmkRuA /8vrnrUfc6xnJ2QJBubf1Y9cYJDAHIGhAqdE50D4kpUm5V05iSXEAtyS94JgsYsykqZq ZXP0nTK5TJRdyBtxwKzFj7cnqAYS5IE9hZ5fzJagfPE7y7kmOge1zwFePYUjsKmARg4/ cFt9MCfYcyIJhATbLN0INbBrKUlsSC/CHUnbipO7hXU2veA6ereFdSEH1AYlDUJJ1agE 6IjrOP4lPhcSm3+Kr05x5PZYgAS04B5xmPEjuS499PrlzvryyEMlddr7xiHd5tNawXgL Q/nQ== X-Forwarded-Encrypted: i=1; AJvYcCWfT9y035cb5LWkFHFswqfmkNzRYrIXCH/PyG4W/QqxGXnXBtjBCEP1FsYryDvPJxFqhCgw7gew2bU+OYuIWoW7nMpJ+a2hoTjRq3KSkHdrqCZrlos2sz+LwoNFxAZHQ86AeFNZZmjHWm+IJQ== X-Gm-Message-State: AOJu0YwnhKa9hpOEHjjvBK0LmMDlUm5H2TyJqUS9AtDUKKHhpMQMfhHB V/VTVb1YWUSFV6TE0eJt5ujivFah+4FAyHiFuplqi+MhiEkEtzxb X-Google-Smtp-Source: AGHT+IGjPBG5pMhwkwGYp05epSPLIMXAMD1I3C5uel4NRtBgDRj93E7KJ2rVGTs5w8txmElp3SaDCg== X-Received: by 2002:a05:6512:2f7:b0:52c:88b8:12ab with SMTP id 2adb3069b0e04-52c9a3c7362mr1662292e87.23.1718237549121; Wed, 12 Jun 2024 17:12:29 -0700 (PDT) Received: from f.. (cst-prg-65-249.cust.vodafone.cz. [46.135.65.249]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-360750f2489sm145114f8f.69.2024.06.12.17.12.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 17:12:28 -0700 (PDT) From: Mateusz Guzik To: torvalds@linux-foundation.org Cc: brauner@kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released Date: Thu, 13 Jun 2024 02:12:14 +0200 Message-ID: <20240613001215.648829-2-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240613001215.648829-1-mjguzik@gmail.com> References: <20240613001215.648829-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The usual inc/dec scheme is to try to do it with atomics if the lock is not taken and fallback to locked operation otherwise. If the lock is only transiently held routines could instead speculatively wait and avoid the acquire altogether. This has a minor benefit of slightly speeding up cases of the sort, which do happen in real workloads. More importantly this fixes a corner case where lockref consumers degrade to locked operation and are unable to go back to atomics. Consider a continuous stream of lockref ops from several threads and another thread taking the lock for a brief moment. Any lockref user which manages to observe the lock as taken is going to fallback to locking it itself, but this increases the likelihood that more threads executing the code will do the same. This eventually can culminate in nobody being able to go back to atomics on the problematic lockref. While this is not a state which can persist in a real workload for anything but few calls, it very much can permanently degrade when benchmarking and in consequence grossly disfigure results. Here is an example of will-it-scale doing 20-way stat(2) on the same file residing on tmpfs, reports once per second with number of ops completed: [snip several ok results] min:417297 max:426249 total:8401766 <-- expected performance min:219024 max:221764 total:4398413 <-- some locking started min:62855 max:64949 total:1273712 <-- everyone degraded min:62472 max:64873 total:1268733 min:62179 max:63843 total:1256375 [snip remaining bad results] While I did not try to figure out who transiently took the lock (it was something outside of the benchmark), I devised a trivial reproducer which triggers the problem almost every time: merely issue "ls" of the directory containing the tested file (in this case: "ls /tmp"). The problem does not persist with the fix below. Signed-off-by: Mateusz Guzik --- lib/lockref.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/lib/lockref.c b/lib/lockref.c index 2afe4c5d8919..596b521bc1f1 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -26,10 +26,46 @@ } \ } while (0) +/* + * Most routines only ever inc/dec the reference, but the lock may be + * transiently held forcing them to take it as well. + * + * Should the lock be taken for any reason (including outside of lockref), + * a steady stream of ref/unref requests may find itself unable to go back + * to lockless operation. + * + * Combat the problem by giving the routines a way to speculatively wait in + * hopes of avoiding having to take the lock. + * + * The spin count is limited to guarantee forward progress, although the + * value is arbitrarily chosen. + * + * Note this routine is only used if the lock was found to be taken. + */ +static inline bool lockref_trywait_unlocked(struct lockref *lockref) +{ + struct lockref old; + int retry = 100; + + for (;;) { + cpu_relax(); + old.lock_count = READ_ONCE(lockref->lock_count); + if (arch_spin_value_unlocked(old.lock.rlock.raw_lock)) + return true; + if (!--retry) + return false; + } +} + #else #define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0) +static inline bool lockref_trywait_unlocked(struct lockref *lockref) +{ + return false; +} + #endif /** @@ -47,6 +83,14 @@ void lockref_get(struct lockref *lockref) return; ); + if (lockref_trywait_unlocked(lockref)) { + CMPXCHG_LOOP( + new.count++; + , + return; + ); + } + spin_lock(&lockref->lock); lockref->count++; spin_unlock(&lockref->lock); @@ -70,6 +114,16 @@ int lockref_get_not_zero(struct lockref *lockref) return 1; ); + if (lockref_trywait_unlocked(lockref)) { + CMPXCHG_LOOP( + new.count++; + if (old.count <= 0) + return 0; + , + return 1; + ); + } + spin_lock(&lockref->lock); retval = 0; if (lockref->count > 0) { @@ -98,6 +152,16 @@ int lockref_put_not_zero(struct lockref *lockref) return 1; ); + if (lockref_trywait_unlocked(lockref)) { + CMPXCHG_LOOP( + new.count--; + if (old.count <= 1) + return 0; + , + return 1; + ); + } + spin_lock(&lockref->lock); retval = 0; if (lockref->count > 1) { @@ -125,6 +189,17 @@ int lockref_put_return(struct lockref *lockref) , return new.count; ); + + if (lockref_trywait_unlocked(lockref)) { + CMPXCHG_LOOP( + new.count--; + if (old.count <= 0) + return -1; + , + return new.count; + ); + } + return -1; } EXPORT_SYMBOL(lockref_put_return); @@ -181,6 +256,16 @@ int lockref_get_not_dead(struct lockref *lockref) return 1; ); + if (lockref_trywait_unlocked(lockref)) { + CMPXCHG_LOOP( + new.count++; + if (old.count < 0) + return 0; + , + return 1; + ); + } + spin_lock(&lockref->lock); retval = 0; if (lockref->count >= 0) { From patchwork Thu Jun 13 00:12:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13695697 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B3C52F30; Thu, 13 Jun 2024 00:12:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718237556; cv=none; b=QROAfvEkOT1WZgW+F53++ebcs0YhIK7VaEI20cNwPyPcXfO34qWUiUnN5NRyBZ7WJwKD1sb39ZaZ/WmkvMEoQ42UFo+enpU3bscyozQznwehoSah8fUaiVhNbSwar1bM/k0vi03Qur/TSH7nJ+WJhEkdD4hFFkktOKjzkepuRuU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718237556; c=relaxed/simple; bh=4CN0BABw9P45+dIm5MM8+Z9BGuyxpCqM9kUy0ysbb6w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e+oT7Ytn7hwhyefLv/TpScCiuZkh2RqpGIGSh57tKijvMqskVT2VPwF5HkAoFArzU6n+WZopq7GwJzlWB39CXrQhqFbAJiyztgTmiqyL7Wc7ElVhBiFjSM/o0DLd7W2RpoMebQ26Px1m7fvWr3Ov8bDRpKHDVniV7DnE9zIOhRE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nF3hm3Lx; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nF3hm3Lx" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-35f24fabb81so458043f8f.3; Wed, 12 Jun 2024 17:12:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718237553; x=1718842353; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=skJsGTfQA/atlXrCIf+eUs5ghWrbbjUT30HKSoC+DWs=; b=nF3hm3LxSxoGBHe/AmZlRSjhYXKvn/ywUys3ny+yNNOIzvmjNF+VRVZdYv6lrQ48ba U7pnqs0AlTVF5P3XCC0CZSrXA9dQKRkHeWdPC2Zetkbx28K7nXpZb9Ez9QGhzhhzikQe qzOL01R83/CiKsauLWL3yX7ptIyNSDgtV8Kb7rW9gee584GSYyo3r/fkhX0jBxfZYn2r q0+6U0HrqnaRr46P7QAcRMBJzl0VDEli6SdGPli4neYRZSJmTlcZ9+JFNEyD31zhC3fj XSrmwH4v5GWhSLDjOj9OIdVUqH0S+8HDURHh5OkhqbPHMgz6Te2Zu2Zon53DaA9EAjEs 18jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718237553; x=1718842353; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=skJsGTfQA/atlXrCIf+eUs5ghWrbbjUT30HKSoC+DWs=; b=j/SQsuYKvJDXYIgmGgAlqPGjUz1M/S0oz8KAAB5Wx2DUbQ1O3Dbv7e7ZhN3PyolNcM G4MHvNNX7msyX+/mD/1yO+807cnS2HZ3003Gu4mWWXBMscSCz6OFPFBsVm0QsJ1ly8X0 HAs5YTmYKUnJ3kVHcy/OWDUeRK0KMoh+c1SnT/ga74GIahPS95zzcFZ6aSjdSJ4MM+cW Nix/V20QV43hdJ7oGOAmpmEC+o6PwAhe7FSSjzXyppkPPlnOWbX9kl3tK4dtvF6QLsmn dG/Q6toNIB+ShZlGZNbj9Bp76DBQMAK4zL7WKK1eFJtkt7c62RmyV71jwgO5+pDbHRWP e/Pg== X-Forwarded-Encrypted: i=1; AJvYcCWguGp7RSQihVUi4KuiT2efIWy2GH7ffli5jm2j7vT/uzsweCLCEFFnH2INz4epJY1FBve6EiMjbjhV+rbqqPN1XUmd+p1sxQ/WaWXykZCRkHwlHm/helV7yiIx8wyKiy1ORB7H+KM1F2L3yg== X-Gm-Message-State: AOJu0YwKtOTdpOgJFj8t6ZKPTUXUcXlkZAlcEQLQGliKhFewC8L31OHf 56oIqJwdkjUxMWHmbSXE3zHaLCnC9xaLAV5lgXyihv8keHWJNj6bD/gfFg== X-Google-Smtp-Source: AGHT+IFmdvPhq1t2HVQXAKgkUAxLvn/tCNhQ3cEx2SPcDqi3MH5JN7s2+xM+NatUFQjLGS7UgRfuAQ== X-Received: by 2002:a5d:4d01:0:b0:35f:1c34:adfc with SMTP id ffacd0b85a97d-35fe8937f33mr2232162f8f.67.1718237552999; Wed, 12 Jun 2024 17:12:32 -0700 (PDT) Received: from f.. (cst-prg-65-249.cust.vodafone.cz. [46.135.65.249]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-360750f2489sm145114f8f.69.2024.06.12.17.12.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jun 2024 17:12:32 -0700 (PDT) From: Mateusz Guzik To: torvalds@linux-foundation.org Cc: brauner@kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH 2/2] vfs: move d_lockref out of the area used by RCU lookup Date: Thu, 13 Jun 2024 02:12:15 +0200 Message-ID: <20240613001215.648829-3-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240613001215.648829-1-mjguzik@gmail.com> References: <20240613001215.648829-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Stock kernel scales worse than FreeBSD when doing a 20-way stat(2) on the same tmpfs-backed file. According to perf top: 38.09% lockref_put_return 26.08% lockref_get_not_dead 25.60% __d_lookup_rcu 0.89% clear_bhb_loop __d_lookup_rcu is participating in cacheline ping pong due to the embedded name sharing a cacheline with lockref. Moving it out resolves the problem: 41.50% lockref_put_return 41.03% lockref_get_not_dead 1.54% clear_bhb_loop benchmark (will-it-scale, Sapphire Rapids, tmpfs, ops/s): FreeBSD:7219334 before: 5038006 after: 7842883 (+55%) One minor remark: the 'after' result is unstable, fluctuating in the range ~7.8 mln to ~9 mln during different runs. Signed-off-by: Mateusz Guzik --- include/linux/dcache.h | 7 ++++++- lib/lockref.c | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/dcache.h b/include/linux/dcache.h index bf53e3894aae..326dbccc3736 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -89,13 +89,18 @@ struct dentry { struct inode *d_inode; /* Where the name belongs to - NULL is * negative */ unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */ + /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */ /* Ref lookup also touches following */ - struct lockref d_lockref; /* per-dentry lock and refcount */ const struct dentry_operations *d_op; struct super_block *d_sb; /* The root of the dentry tree */ unsigned long d_time; /* used by d_revalidate */ void *d_fsdata; /* fs-specific data */ + /* --- cacheline 2 boundary (128 bytes) --- */ + struct lockref d_lockref; /* per-dentry lock and refcount + * keep separate from RCU lookup area if + * possible! + */ union { struct list_head d_lru; /* LRU list */ diff --git a/lib/lockref.c b/lib/lockref.c index 596b521bc1f1..c1e2736a7bac 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -45,7 +45,7 @@ static inline bool lockref_trywait_unlocked(struct lockref *lockref) { struct lockref old; - int retry = 100; + int retry = 256; for (;;) { cpu_relax();