From patchwork Wed Oct 16 04:35:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: lizhe.67@bytedance.com X-Patchwork-Id: 13837714 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31CCCD2069D for ; Wed, 16 Oct 2024 04:36:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B801C6B0083; Wed, 16 Oct 2024 00:36:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2DF86B0088; Wed, 16 Oct 2024 00:36:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F5E66B0089; Wed, 16 Oct 2024 00:36:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 805ED6B0083 for ; Wed, 16 Oct 2024 00:36:24 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 319611418F9 for ; Wed, 16 Oct 2024 04:36:14 +0000 (UTC) X-FDA: 82678203762.29.ED70035 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf09.hostedemail.com (Postfix) with ESMTP id 0FC2514000C for ; Wed, 16 Oct 2024 04:36:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VRdzsVql; spf=pass (imf09.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729053309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/nF3Zt0gxviNAj6W7zuplHnOejMHjOAOwzub01HWozw=; b=Wdx7Q4yNqs7Qm2fPpi5YNO4fNgTfWt1YVtGVhaw3e8DcN5J+vAFfrQH6/FM/8fgvzhWEMY dHfzzqPU8Zi2SvVp9buwOTJRvcjuL4Oetkni6sV51PcjW6NOaU7Rw8c+1CVoamNFjALGrf s34wBDjiVeYqxpUtLV0bzDPzdf+gVSQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VRdzsVql; spf=pass (imf09.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729053309; a=rsa-sha256; cv=none; b=7UDFf1GSsFw8GnqCIZUMds1Sa2BcoJ1pQzktvuS/EA0hYyp+ZP0TX3Qvoe6sD3tccLwkqj 3GtE5sVxRj9tGNNpT+L/Kbkpb4MqL/z/9vkHxJw5pmLTX8Bqh66N1StgKgODwCVkblgIQl Slt75EKCfD83MxBrCmOtMWk7lOEDZNo= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-71e6f085715so2064519b3a.2 for ; Tue, 15 Oct 2024 21:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1729053381; x=1729658181; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/nF3Zt0gxviNAj6W7zuplHnOejMHjOAOwzub01HWozw=; b=VRdzsVqlZaJ7cqUWEqy++hsYWF6+n4MjOw5oD3Jat9TylujkjpUfV2XhwP1r+dDBYW nNHvl9DpjoQD5O1TMfwubiwJb3gyogByUWMlF/Z0Yp0YbZNBKf8DuLoxI/YP2w81epYS Kn74QY3kaqJOUM3ChnAE60ydZepd1eGxBApXNyhbmiYLcegzfYEEB/kAGSJqxGHLQ3lF weX1JDy77VTXLK1xDyiRz1BaTxcx72zIn7umYwtfazbUj4XbPnwY3kynRm4KsszHc7cf Elicnqekw8HowNEWknFLqnMTpbi/h/t2Vy+1JWYjG4hJDr+A4/lXjTGudYHHS4s/8rl6 IyJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729053381; x=1729658181; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/nF3Zt0gxviNAj6W7zuplHnOejMHjOAOwzub01HWozw=; b=mJXEnzz02w2Js090853AkzuieN5YhyyxwB/BgQQXBbckGQzAq9thqoyo5CtvLoVVvS gpux0SLq/pOajhy8sgYg1EBKzpw9T/F1rnuLvltHmtyidbjie+usUzLBbhlyWVc9tzgy aDRdK8ilPPzsQIp+03utCuQPfVcNfSK9jLzZVmxmV0xswpeYdjcLwf0luiwpeuwY82ll F3OnRxG5kJsQJx/OXe3jER0NertLXzgz2EnuZWzKHpG9CrtjEk/iCCfPj7DqEUKGuKJt zEnVoSmMGcVUsfq20ZNrDIWf531VQan8zODNRoKNf5WViWaak3Cd68UhhcV3maasBr1I uu0Q== X-Forwarded-Encrypted: i=1; AJvYcCVwQeK0gXN6J/dByzYO+lvYROni59oyO/+xzETaDptDVMTUCKZ8KYI7wZT2c5TSZXjHyN7qaNxkSA==@kvack.org X-Gm-Message-State: AOJu0YyNninl5GzU8Z6TDFYcTXeQqZhXi2j6z8y9Oy8zggfdYMSGUEBG YveStzhqeBK8Pb9CaBkJE5jrdTUxBpWSd/KIN73NuRO/l0Yott0Boc7e/BUTxaQ= X-Google-Smtp-Source: AGHT+IFczFz3ErUrY+fjvYOL+NXV28+0Iu8qbIMMGYJw/5xJ5jVf0FYBW/ESLdbc+kY+vlNNDpmfZQ== X-Received: by 2002:a05:6a00:10c3:b0:71d:f012:6de7 with SMTP id d2e1a72fcca58-71e4c1cfd6fmr22325426b3a.27.1729053380740; Tue, 15 Oct 2024 21:36:20 -0700 (PDT) Received: from GQ6QX3JCW2.bytedance.net ([203.208.189.8]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e77518a76sm2189192b3a.220.2024.10.15.21.36.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 15 Oct 2024 21:36:20 -0700 (PDT) From: lizhe.67@bytedance.com To: peterz@infradead.org, mingo@redhat.com, will@kernel.org, longman@redhat.com, boqun.feng@gmail.com, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, lizhe.67@bytedance.com Subject: [RFC 1/2] rwsem: introduce upgrade_read interface Date: Wed, 16 Oct 2024 12:35:59 +0800 Message-ID: <20241016043600.35139-2-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241016043600.35139-1-lizhe.67@bytedance.com> References: <20241016043600.35139-1-lizhe.67@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 0FC2514000C X-Stat-Signature: bzain1zo7gug9pzg7hx91hmcxcd7yk1z X-HE-Tag: 1729053375-401816 X-HE-Meta: U2FsdGVkX1/HXtsvgzXcQbT2onMGoaI1+0trcfVhZgDaeZlGgUQhSSO4g5ZIWTWf82wWrsclP0P1d4mAQG4XCAnSGDBcDM70tlHAPTIXX/FbneqB0kJswP7vguS1kI0cPeepQS2W+Q7Et9Usw7lgIvdmnBl2sFYLGJHkQ4vtFcLDYsTcPrqPYx5bc5tCAc26d0W/hNXoLx7HHQZctSx5fOIyuSEpYU54N+61A1sB6zkaZ2qsWy14K/dRyU5Z4r/ISUIm660j63j5xGq1EKITVtnIpGBpa6vrT+Svd5+3tT8/nm35M3Z8MoJFuRhBRaTurH5RZlkJJ7xfRhWmFxA9TyuLHO7sJitSfF3VrCm+9wO8o7sRpoHmTjE7i1wz+gJNVK9JfAQ8hJ3qVJtDnzbbO6elO5VObKizYjv7weThZQh4fi9Mcy7EEddHV6MvTWJdcOiPq7zeH7u5HOB54PsCtgx7/B8fITEpzv3SAeDDa/BCukSvsQ2YA6nsJCzMrl7482grTwH7iFez/JTKIr91IATUiWi42QdYZMb/+tiCWdcZ94YY+qwXvDEwiFdfxRZSHZhrd77ek8JCs78DWCHqMFq0l2ORxS/6e6R9zY+BL8YKj2/pHoNIRN9zC1jRgt2ZLFtQIsNq8GGm7uo3rFaM+mcjZ0PFlPiMcmyx3KiEDHrscNzhFP+BCLjdl89led4VP3j/Us4a7fqOSZevH5/hDIglIeCyMc6yK1NL88XbG8xgGRa3n3vzcRENxFNFV14JptHwC7N7l9ZqCfRRLqXNp8F6XYONKN/30CWfDawPC/VEZsURDG7JSbv/NaxH1mjVa6LB4XDqxVsHw+xE5PRSYytDecvqPPLCP9T/icjH9pcMa3hMSv7eGjV/QwMcAE+50uMrQOnMph2uIBJUBN1gjs0ETYwOCQHHJ8PMmQ3RrIA92v3UoQCkX/+DBaZYbK+XwrQbTXhJBIcWbL4uHHK MSXXilrh Zp8lKl+Mo7L9oYKFaf3Mlp5iRHedWYC9GUx/1PP3kzkMX56HZQZq4mGTpAn7x9QUynAqo7B+tL+pJGp0u95FnWfjd0gnVOM19foDh7ggA7Jf8Se0h6JT0rt5asIMVv1/wVvKjKgV1R1epvghiydrQrxWLKHwVCaapOgxNHOwHEgHcm94kwYBmm+wPdMuRPCZPijwmggxN5E2+QVT1RmHqls+XRjoVtmB8WK4s4aI1Wv+c3FIp4q/u07ouS7ldm+/79063MOVB3hCxR70Bo8lHrMl1kcS6vL3Q6CvHX87L0KQbmaRoy5mFUVQDI6RmfD7fYb8E6ccYxc3a+qbgEUF4xcM2/IthsZYNl4JRuy37z8nmO6QZ302N/Bug3/yJbhY/7ZD79b07KOH637hscPyiI5VW6rWHI3rk4vuWUV2fWuwq+io= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe Introduce a new rwsem interface upgrade_read(). We can call it to upgrade the lock into write rwsem lock after we get read lock. This interface will wait for all readers to exit before obtaining the write lock. In addition, this interface has a higher priority than any process waiting for the write lock and subsequent threads that want to obtain the read lock. Signed-off-by: Li Zhe --- include/linux/rwsem.h | 1 + kernel/locking/rwsem.c | 87 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 85 insertions(+), 3 deletions(-) diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index c8b543d428b0..90183ab5ea79 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -249,6 +249,7 @@ DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T)) * downgrade write lock to read lock */ extern void downgrade_write(struct rw_semaphore *sem); +extern int upgrade_read(struct rw_semaphore *sem); #ifdef CONFIG_DEBUG_LOCK_ALLOC /* diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 2bbb6eca5144..0583e1be3dbf 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -37,6 +37,7 @@ * meanings when set. * - Bit 0: RWSEM_READER_OWNED - rwsem may be owned by readers (just a hint) * - Bit 1: RWSEM_NONSPINNABLE - Cannot spin on a reader-owned lock + * - Bit 2: RWSEM_UPGRADING - doing upgrade read process * * When the rwsem is reader-owned and a spinning writer has timed out, * the nonspinnable bit will be set to disable optimistic spinning. @@ -62,7 +63,8 @@ */ #define RWSEM_READER_OWNED (1UL << 0) #define RWSEM_NONSPINNABLE (1UL << 1) -#define RWSEM_OWNER_FLAGS_MASK (RWSEM_READER_OWNED | RWSEM_NONSPINNABLE) +#define RWSEM_UPGRADING (1UL << 2) +#define RWSEM_OWNER_FLAGS_MASK (RWSEM_READER_OWNED | RWSEM_NONSPINNABLE | RWSEM_UPGRADING) #ifdef CONFIG_DEBUG_RWSEMS # define DEBUG_RWSEMS_WARN_ON(c, sem) do { \ @@ -93,7 +95,8 @@ * Bit 0 - writer locked bit * Bit 1 - waiters present bit * Bit 2 - lock handoff bit - * Bits 3-7 - reserved + * Bit 3 - upgrade read bit + * Bits 4-7 - reserved * Bits 8-30 - 23-bit reader count * Bit 31 - read fail bit * @@ -117,6 +120,7 @@ #define RWSEM_WRITER_LOCKED (1UL << 0) #define RWSEM_FLAG_WAITERS (1UL << 1) #define RWSEM_FLAG_HANDOFF (1UL << 2) +#define RWSEM_FLAG_UPGRADE_READ (1UL << 3) #define RWSEM_FLAG_READFAIL (1UL << (BITS_PER_LONG - 1)) #define RWSEM_READER_SHIFT 8 @@ -143,6 +147,13 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem) atomic_long_set(&sem->owner, (long)current); } +static inline void rwsem_set_owner_upgrade(struct rw_semaphore *sem) +{ + lockdep_assert_preemption_disabled(); + atomic_long_set(&sem->owner, (long)current | RWSEM_UPGRADING | + RWSEM_READER_OWNED | RWSEM_NONSPINNABLE); +} + static inline void rwsem_clear_owner(struct rw_semaphore *sem) { lockdep_assert_preemption_disabled(); @@ -201,7 +212,7 @@ static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) */ long count = atomic_long_read(&sem->count); - if (count & RWSEM_WRITER_MASK) + if ((count & RWSEM_WRITER_MASK) && !(count & RWSEM_FLAG_UPGRADE_READ)) return false; return rwsem_test_oflags(sem, RWSEM_READER_OWNED); } @@ -1336,6 +1347,8 @@ static inline int __down_write_trylock(struct rw_semaphore *sem) static inline void __up_read(struct rw_semaphore *sem) { long tmp; + unsigned long flags; + struct task_struct *owner; DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); @@ -1349,6 +1362,9 @@ static inline void __up_read(struct rw_semaphore *sem) clear_nonspinnable(sem); rwsem_wake(sem); } + owner = rwsem_owner_flags(sem, &flags); + if (unlikely(!(tmp & RWSEM_READER_MASK) && (flags & RWSEM_UPGRADING))) + wake_up_process(owner); preempt_enable(); } @@ -1641,6 +1657,71 @@ void downgrade_write(struct rw_semaphore *sem) } EXPORT_SYMBOL(downgrade_write); +static inline void rwsem_clear_upgrade_flag(struct rw_semaphore *sem) +{ + atomic_long_andnot(RWSEM_FLAG_UPGRADE_READ, &sem->count); +} + +/* + * upgrade read lock to write lock + */ +static inline int __upgrade_read(struct rw_semaphore *sem) +{ + long tmp; + + preempt_disable(); + + tmp = atomic_long_read(&sem->count); + do { + if (tmp & (RWSEM_WRITER_MASK | RWSEM_FLAG_UPGRADE_READ)) { + preempt_enable(); + return -EBUSY; + } + } while (!atomic_long_try_cmpxchg(&sem->count, &tmp, + tmp + RWSEM_FLAG_UPGRADE_READ + RWSEM_WRITER_LOCKED - RWSEM_READER_BIAS)); + + if ((tmp & RWSEM_READER_MASK) == RWSEM_READER_BIAS) { + /* fast path */ + DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem); + rwsem_clear_upgrade_flag(sem); + rwsem_set_owner(sem); + preempt_enable(); + return 0; + } + /* slow path */ + raw_spin_lock_irq(&sem->wait_lock); + rwsem_set_owner_upgrade(sem); + + set_current_state(TASK_UNINTERRUPTIBLE); + + for (;;) { + if (!(atomic_long_read(&sem->count) & RWSEM_READER_MASK)) + break; + raw_spin_unlock_irq(&sem->wait_lock); + schedule_preempt_disabled(); + set_current_state(TASK_UNINTERRUPTIBLE); + raw_spin_lock_irq(&sem->wait_lock); + } + + rwsem_clear_upgrade_flag(sem); + rwsem_set_owner(sem); + __set_current_state(TASK_RUNNING); + raw_spin_unlock_irq(&sem->wait_lock); + preempt_enable(); + return 0; +} + +/* + * upgrade read lock to write lock + * + * Return: 0 on success, error code on failure + */ +int upgrade_read(struct rw_semaphore *sem) +{ + return __upgrade_read(sem); +} +EXPORT_SYMBOL(upgrade_read); + #ifdef CONFIG_DEBUG_LOCK_ALLOC void down_read_nested(struct rw_semaphore *sem, int subclass) From patchwork Wed Oct 16 04:36:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: lizhe.67@bytedance.com X-Patchwork-Id: 13837715 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55E3FD2069D for ; Wed, 16 Oct 2024 04:36:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEF7E6B0089; Wed, 16 Oct 2024 00:36:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA0046B008A; Wed, 16 Oct 2024 00:36:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C68AC6B008C; Wed, 16 Oct 2024 00:36:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AE8146B0089 for ; Wed, 16 Oct 2024 00:36:30 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C63F21C728F for ; Wed, 16 Oct 2024 04:36:19 +0000 (UTC) X-FDA: 82678203678.20.C8071F9 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf12.hostedemail.com (Postfix) with ESMTP id 986F240012 for ; Wed, 16 Oct 2024 04:36:24 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZdmAi+az; spf=pass (imf12.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729053197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q7wGUcAWkY/xXNhwj5XB7vCbP6fvR8LLicMUY6R9k4E=; b=EfCbtu2ELmEsETrNfMNeh+ImhqWPbbDBjpHQQ/9SpPqGTaE5K1jwiW/N6BMo9YUWB8Advs Wgu4ibupvaacHhFIoHsCe1zTpIsdhz5QdSR8GR/EKG1J38kvbL8rBVUkbEe47uo52NwSZp WtN4Un4zVo4iKMy+UtNHKCsVqPSEdUE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZdmAi+az; spf=pass (imf12.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729053197; a=rsa-sha256; cv=none; b=KiorpY4HL3UuMoUj5OxfUCG2VjXJUKrYcNqNT7OXx2hBA5VEdY9Je788gcRHDt21XIlcy0 F/hVFLC8wGCbMweNUB2I+22pJyuYeWqo/6TE6DPuHv/k6xEDPoX+PVf4/8NEh0Cc0tcGul qRzYpkaA7GFxfso7c+4FoSafXq0YHhE= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-71e8235f0b6so242267b3a.3 for ; Tue, 15 Oct 2024 21:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1729053387; x=1729658187; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q7wGUcAWkY/xXNhwj5XB7vCbP6fvR8LLicMUY6R9k4E=; b=ZdmAi+azZq0D+Qg6tZ4y9XCKu26fWO+mfI/B2NO2PaL16O4xEXQ9QTm7CrnI21gdLJ AcBnQ34sBdT/GBtOW0L+e6hmCBAcbzD2+cDBp1PZBKtleY80J5qmAKxAnl42TgkxR3nF kzBeGGtzJO2GoP5JsaA00r/3W3HnIpoAfv8ehYnk093IMmBoFAA5dnRV1sQ5SO0sUrow laOnewUVR7JvLcxUVA/rSC6GBY1XuHYz5HK3vJfGkeVzw3OIfsmaegd3MG/JF/gq8qJE 9Kn8SG7/IDDvs0ps4UWcMdHwetwV3m1qs/VDV8qYJx5H5iXQ7O8xZv/jE7KbFfQ/6esy N0JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729053387; x=1729658187; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q7wGUcAWkY/xXNhwj5XB7vCbP6fvR8LLicMUY6R9k4E=; b=SSPn5vAayHM98ycBk7ka7KXWN7Tmof9xXXoz81kMFRfJH7yi1DgxvJ3+ah9GmhUBnu X4X/nukAfaPF17Jj4TvNIsGlU0X7myTJR3mdayTNPlNqghNDksfIOBVKZwr1uRXNMQE3 nbzza143cf3B6D0h7ge6So9iWdQuJS7NBwSNdl+iCYS45zsv41SKmYScp686n/ZWfyYu 4GTHg3s+oPkZ1ryojLdkmLcDDYkf3SoYQi8b8NJWH0yfSkVqyyRshZMdrzWuMTlEKuOC AvisKBWg5MHy64DGQtKaNGoTR1UlYaskylTT/COD6dQA3NUb9fOKtHHr4QNs+G1wPdMe CsGg== X-Forwarded-Encrypted: i=1; AJvYcCW35LKFji28kxU0LF+o4TNy3NaqmrKIxsw+0o/oEoUBDMMDhMcr0FVLHgEvQAL7Xz88+/1kHcMEhQ==@kvack.org X-Gm-Message-State: AOJu0Yy92DRclXLXqXcr43KBhqPYWaeupFdcaVGH75LBqDGZXyv1dkMy tHlSVXcPYJUOIjKvrK31NGZ+CSBCuYbrrxJ/S0gjHA+lsi4z9fBSDrbQagS221tsShGosAyAw6t NfMU= X-Google-Smtp-Source: AGHT+IGsHi96TSHaDa9/5tTO7fMcmHr9rDiWd6O9B7Avgxv2U6sxlJiV4p4N3nYNqbJHGkIB2KwGlA== X-Received: by 2002:a05:6a00:2288:b0:71e:4dc5:259e with SMTP id d2e1a72fcca58-71e4dc5275dmr18417384b3a.17.1729053387124; Tue, 15 Oct 2024 21:36:27 -0700 (PDT) Received: from GQ6QX3JCW2.bytedance.net ([203.208.189.8]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71e77518a76sm2189192b3a.220.2024.10.15.21.36.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 15 Oct 2024 21:36:26 -0700 (PDT) From: lizhe.67@bytedance.com To: peterz@infradead.org, mingo@redhat.com, will@kernel.org, longman@redhat.com, boqun.feng@gmail.com, akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, lizhe.67@bytedance.com Subject: [RFC 2/2] khugepaged: use upgrade_read() to optimize collapse_huge_page Date: Wed, 16 Oct 2024 12:36:00 +0800 Message-ID: <20241016043600.35139-3-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241016043600.35139-1-lizhe.67@bytedance.com> References: <20241016043600.35139-1-lizhe.67@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 986F240012 X-Stat-Signature: 4da6g4ux93ds9prxprr6acsyzemreh1b X-Rspam-User: X-HE-Tag: 1729053384-6404 X-HE-Meta: U2FsdGVkX1+AQqsnOxIUvgJGo8CjNIJyEeiEgEKDfHtm/IiGWPrHrtujoHOyqNts7cWwDJZx7eoYSmuGFDwFJPCBxGBsqoO+cSUhM0gW41O30aKwXmwKnw/vTxDHTnYQIuD7ejXaYoTOBqS1bC4WKsCR1UbcG3KFsKNsAHfDEtMGMXJvoVuoko1IJYy58x1N/YZmbZKNafbNXBXo2RPCvoBOcm5jpoNjksdP9tL6MAlZBS8mUQhC74cxW4E9AlUWEW+frvErEbrp+XT/Qp2ORCe3rO4knoCWJBIb1wDe4tssOOAGYwELNfryN0MShBwpOf6kvNyaBpM3g6Zxy1XAFk9nRUMpUnDETw6cuD1dRmfKQDS4tk8YerWoZWSo8MVXN+XJIMxwsghMpoODfsrN2EC/nWGzG2x33a6VlKj401LfXE0nijePcLLCylfjq2O2KTw4OnIoug1UjOCvYIo1AwvtLCqolG+g2Sf0LxrWVijk++swrQnMcgQTXhXOXiiBu/pzUPy+bbgoQL3C3bTNylquVKxQ2fnwVPClnstJZsNnSFd6fH8apnBvWhueaUo9zqsfy685JdeNAHKqyUKF9NQhmKP6Qwr0Lup9+exk4DIcKi0TGejvw24mPhwF98hraY3m60hnzP0F5uNbs4amisaf5AGa9n2rPt2i9SfaB6CzNf9Z0645au5uV9GmfFVrtj8iy/zNA4f207OYQn+fRa6gO0L20jifAz7oRNdSylsS1nYkdM6zyb+X4JVHHgIcwY91g+MciU6OZtQuGcr/I97/3/iCpXVzGMN/WH1qVCay4dTSy2zDWjvkYtP9ByU74lfPXVzoEbBhm7JGZ2SSy7XaC3D2vVFKWfmM6nuxERKyDI/jtDtZhd5Xk/+MIc8xMei6bCGdQ9DoeYa0ltwcqqLQXGtK6i4oCy+OF3vC2OR5OD8r9wjYBY+X92CDzC6KIU4++xOdcXzjdxcW1Lm G0oiEogX bzNB8oOu3Xch7E5SfGulK5JtQVQGW22nG5pyn/dWkYPw2AGAfKyls/8XxlC5TuCZ71BHkUchQfG1yXDU4tDfEaq44GjlbAaar7cVE5W6L6VccVzNNpENIYgOI+6/j8dGdETcZCLJL7U8v4TC1IOxoiDd2ISyU4osjYRiDiEcVmhARDi5DZ38FYg//FsiUgXkm3KOxE7/0o9xxHZsqyPwA4TA3wzasSU7vcRHLtzk7c8aNBzFdfZyHNN1K64Q+0U9I2IC8OH1h18otbWrGtNVN9KeMSijtXK7q4+bGrjYDNa1tAxxrxKQtmRqp9zGYC5KKQme/LcwH9K21kKkCqP1mTP0SBSOGqoM6i/POOxtMWbw+c4NxTkVvA5Vhp0lf+BbC4vx1+Q6XM5+H8GyDZmNMqICj1yBHdXYVSEZtiOTGWz2ZrMU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe In function collapse_huge_page(), we drop mmap read lock and get mmap write lock to prevent most accesses to pagetables. There is a small time window to allow other tasks to acquire the mmap lock. With the use of upgrade_read(), we don't need to check vma and pmd again in most cases. Signed-off-by: Li Zhe --- mm/khugepaged.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f9c39898eaff..934051274f7a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1142,23 +1142,25 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, goto out_nolock; } - mmap_read_unlock(mm); - /* - * Prevent all access to pagetables with the exception of - * gup_fast later handled by the ptep_clear_flush and the VM - * handled by the anon_vma lock + PG_lock. - * - * UFFDIO_MOVE is prevented to race as well thanks to the - * mmap_lock. - */ - mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, cc); - if (result != SCAN_SUCCEED) - goto out_up_write; - /* check if the pmd is still valid */ - result = check_pmd_still_valid(mm, address, pmd); - if (result != SCAN_SUCCEED) - goto out_up_write; + if (upgrade_read(&mm->mmap_lock)) { + mmap_read_unlock(mm); + /* + * Prevent all access to pagetables with the exception of + * gup_fast later handled by the ptep_clear_flush and the VM + * handled by the anon_vma lock + PG_lock. + * + * UFFDIO_MOVE is prevented to race as well thanks to the + * mmap_lock. + */ + mmap_write_lock(mm); + result = hugepage_vma_revalidate(mm, address, true, &vma, cc); + if (result != SCAN_SUCCEED) + goto out_up_write; + /* check if the pmd is still valid */ + result = check_pmd_still_valid(mm, address, pmd); + if (result != SCAN_SUCCEED) + goto out_up_write; + } vma_start_write(vma); anon_vma_lock_write(vma->anon_vma);