From patchwork Fri Oct 6 03:59:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13410942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDDE6E92FCA for ; Fri, 6 Oct 2023 04:00:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C653940013; Fri, 6 Oct 2023 00:00:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0736F94000B; Fri, 6 Oct 2023 00:00:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5675940013; Fri, 6 Oct 2023 00:00:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D3CD194000B for ; Fri, 6 Oct 2023 00:00:38 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9FF1012016C for ; Fri, 6 Oct 2023 04:00:38 +0000 (UTC) X-FDA: 81313684956.18.2B27A17 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf06.hostedemail.com (Postfix) with ESMTP id A37FE180017 for ; Fri, 6 Oct 2023 04:00:36 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=none (imf06.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696564836; a=rsa-sha256; cv=none; b=ooZ5SobsgY0BbzZUkCzFX4fdoO/nKHIq1dHSDpmm4FGTtBxjLNT0+OlO8vQXk+nBDx6Hee yEn+8z1kvYExboE1J/2Zj4amMPl5ThjJKx+F8/eXKoUGkLxVPf5Hz8Gd2FiA2nJ8WmUDzi wHSH+78F032Hb1iXzUEDwIefCUH9OK8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=none (imf06.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696564836; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0/VdX9bl+Xb9k4V/ifVYNOohd2r5cJQt0iSwzjcZa2U=; b=Qqla62vUeqv0WqnU/z9h4u8HgZpGnetzFyGeYGpXzEy74r03P6qsbqw7oe4uhkOhvIglaM KGAYaGDZ+u/sHWmbQ+enOz8E9n33T8gHWZnGhkX+ld5WFZOm6BODhV9GAGuQacOSubEfeU uUQCpXLUrZ+qD+WgfXPk+adm6u7wcG4= Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qoc0k-0000mf-3B; Fri, 06 Oct 2023 00:00:23 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org, Rik van Riel , stable@kernel.org Subject: [PATCH 2/4] hugetlbfs: extend hugetlb_vma_lock to private VMAs Date: Thu, 5 Oct 2023 23:59:07 -0400 Message-ID: <20231006040020.3677377-3-riel@surriel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231006040020.3677377-1-riel@surriel.com> References: <20231006040020.3677377-1-riel@surriel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A37FE180017 X-Stat-Signature: zdjuow4ntynaqpiw5oqqxq9pqmyp8hc9 X-HE-Tag: 1696564836-548203 X-HE-Meta: U2FsdGVkX19Xg//wbjyBak5Y9IuDTEnFcagQhaupX1ub4kuFZDJuuGxGbWPEtEtUH5bz4eBhKXDyPOSvr9vI4zr4idSHyoTb72zAYpc0B/As8kFNUybhDrVZQUpDtmhSkDLYwhfVwjdbNesWOuH2g66oRdUcbaIFwRgpF5VPUImtNoeIUiytfWzbZ0qp7nMtIrqGqTkUSYP7ec44xLAvs5lctfMjH+IHEPG5GFW5tfCHACk8e6mf/2+Y2EoK59bXojk+L9X1NDMaf8PYq5rLjNykZfsdSb4Vy0JEOVHvRR5amaNIuWbJNihAkeMXpNcyBby/MRqb+4zEOGd+S50fHFw1zXOS2+mpP7lRR8786lbLGEpZ0vBrTYtvqB6KRihjEQTyS8GQ4iiG/Olv4I/k7P6tIyE4mXUw/se5RqB9yYEH/evo9JfqDu1LEFnH83Z446N7FTsD60Y08x+jbL8uT6oI5XNqK2Mw9q9rSV7Su/1Hb4WC8n/b19WV68ylWuyiyH1RY/XPqYlaFYmxx47IbVayZ8WFZIOpYbi8H5C6I5bQfh/UbUPd/IUOYJTkoFhTi1FR5mHdIyxIvS/UXa1hPHxmmoOeIFywbILqdU+LdEdYTSSfiwRCyNrIdg9R+dBZ3tuzmtrbIAtHUxMqbLssip1WRhMg6g6qR6UetrI+xmoHb0/MS+ZTsJQG+iEOA+WwncOuyMGkBGWnhvfEsJDUAhmPJcHjoPKLn6aUqQ12oNXf3XROX7NUld8wWp/fo+L8lTllx+dlQDtlnRsJA3oyUyDbbuYsN6KkXOKAYZp568DnFAfLlH7yaeFVZbFFddx6P1Lio8aYrEKxlildCMEH/7muUBAfRuMlVpcGPrZiogc/zI9lQtlMVEmX8KP2+m/G397rGK9cQJjqbsOPwMdFsx7Rr1Dy9Fdv2ZKPLU6QidJC4kZjSDxPAAR5EK/hYwrSobGlhA0fjMaSQwwIYz0 YiK6tv1n BUoV6RaAmXYq3N2GHQN56lVSeEBzrqBnzc/M1JJKtjF6DXcmeww4on1tmYB7tWXZ95Rc2vN87qae7vbSOy0Z//s7aKakkR0bdMH4hr1RDAUJnJDcVNleps2Y8g1nyP6/ZehMN/jFxnbR8wgkcMGT8xpnB9+Ta6+Vrn0AsVIqV+TziYGB9nt5ghEnmM+yJFJmcK2Z9/st9TvZxJhIQXXXUcDSPGHC4LVWryuBgfB75MbBlxDTkEMBLUZS6Y11vHG7/hfKcqJpiXA7kK0hISZCSNBr+85LdxDPe+7uXo7tH0geB3ZXXmP5qInc0FMU5UiRYTkMH4NzReZylgIlXXNo36ceGISpXmCkde1NQefaV4dOPMW47RCUhx/Ilb0ToN/RPZnTO9Vxo4L/TxhoY2Jtu2ZoT1exVQZdaIcLkVbGnQI8vX7s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Rik van Riel Extend the locking scheme used to protect shared hugetlb mappings from truncate vs page fault races, in order to protect private hugetlb mappings (with resv_map) against MADV_DONTNEED. Add a read-write semaphore to the resv_map data structure, and use that from the hugetlb_vma_(un)lock_* functions, in preparation for closing the race between MADV_DONTNEED and page faults. Signed-off-by: Rik van Riel Reviewed-by: Mike Kravetz Cc: stable@kernel.org Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 41 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5b2626063f4f..694928fa06a3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -60,6 +60,7 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; + struct rw_semaphore rw_sema; #ifdef CONFIG_CGROUP_HUGETLB /* * On private mappings, the counter to uncharge reservations is stored @@ -1231,6 +1232,11 @@ static inline bool __vma_shareable_lock(struct vm_area_struct *vma) return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } +static inline bool __vma_private_lock(struct vm_area_struct *vma) +{ + return (!(vma->vm_flags & VM_MAYSHARE)) && vma->vm_private_data; +} + /* * Safe version of huge_pte_offset() to check the locks. See comments * above huge_pte_offset(). diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a86e070d735b..dd3de6ec8f1a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,6 +97,7 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); +static struct resv_map *vma_resv_map(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -267,6 +268,10 @@ void hugetlb_vma_lock_read(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_read(&resv_map->rw_sema); } } @@ -276,6 +281,10 @@ void hugetlb_vma_unlock_read(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_read(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_read(&resv_map->rw_sema); } } @@ -285,6 +294,10 @@ void hugetlb_vma_lock_write(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + down_write(&resv_map->rw_sema); } } @@ -294,17 +307,27 @@ void hugetlb_vma_unlock_write(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_write(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + up_write(&resv_map->rw_sema); } } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - if (!__vma_shareable_lock(vma)) - return 1; + if (__vma_shareable_lock(vma)) { + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - return down_write_trylock(&vma_lock->rw_sema); + return down_write_trylock(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + return down_write_trylock(&resv_map->rw_sema); + } + + return 1; } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) @@ -313,6 +336,10 @@ void hugetlb_vma_assert_locked(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; lockdep_assert_held(&vma_lock->rw_sema); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + lockdep_assert_held(&resv_map->rw_sema); } } @@ -345,6 +372,11 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; __hugetlb_vma_unlock_write_put(vma_lock); + } else if (__vma_private_lock(vma)) { + struct resv_map *resv_map = vma_resv_map(vma); + + /* no free for anon vmas, but still need to unlock */ + up_write(&resv_map->rw_sema); } } @@ -1068,6 +1100,7 @@ struct resv_map *resv_map_alloc(void) kref_init(&resv_map->refs); spin_lock_init(&resv_map->lock); INIT_LIST_HEAD(&resv_map->regions); + init_rwsem(&resv_map->rw_sema); resv_map->adds_in_progress = 0; /*