From patchwork Wed Oct 4 03:25:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13408260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E416E8FDC7 for ; Wed, 4 Oct 2023 03:28:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36C848D0002; Tue, 3 Oct 2023 23:28:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CB3B8D009A; Tue, 3 Oct 2023 23:28:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E36FD8D009C; Tue, 3 Oct 2023 23:28:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C96C38D0002 for ; Tue, 3 Oct 2023 23:28:36 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A0A5C160569 for ; Wed, 4 Oct 2023 03:28:36 +0000 (UTC) X-FDA: 81306346632.18.401BBFD Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf08.hostedemail.com (Postfix) with ESMTP id 093F7160011 for ; Wed, 4 Oct 2023 03:28:34 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=none; spf=none (imf08.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696390115; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DcHNSNGJCV+XZ3KxBK1lCzjQrk1JZIFGOOUAqNkTUsA=; b=zyNr4VPd6hjtjPbuozF07MkO6OpZgVj8Dk18vQBhWDIRcVb92JYn40WYLKLkuHFXsvzJq4 q1fuSjyUen0H5+Rk/GZmcyZIMoEI9/zr51IOrJGp/hUJlwR0zGIHNt7r0QzHo6Ix81rrza cq+9YGkMmC3r9yrp2KKJevcqApJOUHE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696390115; a=rsa-sha256; cv=none; b=m6QyVbvDyU5TCI7Cijd1LnMfyCIIZ4saOGy5AartafbcaEo15jcDQD7TDtFzQJOXUPsL5r vBMTTYr6iFXk8GkfGEf7Y3GmS85P9yRF8Y9fBmcRe/uH3GxyNqG6HqLguOpjihVdHCsRXj 8dkCuOrogv+vtAGzHS3Rja/twDHrlTc= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=none (imf08.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qnsYb-0005Bb-0n; Tue, 03 Oct 2023 23:28:17 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org, Rik van Riel Subject: [PATCH 3/3] hugetlbfs: replace hugetlb_vma_lock with invalidate_lock Date: Tue, 3 Oct 2023 23:25:05 -0400 Message-ID: <20231004032814.3108383-4-riel@surriel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231004032814.3108383-1-riel@surriel.com> References: <20231004032814.3108383-1-riel@surriel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 093F7160011 X-Rspam-User: X-Stat-Signature: xh41krffua97yj4y4j6s1gudwq5wpqht X-Rspamd-Server: rspam03 X-HE-Tag: 1696390114-563723 X-HE-Meta: U2FsdGVkX1+bRVkeuXbkzUHuBXb6ULnwzZqc3WEcdmQPjglpjHMVerTAuxe0PfI3PevoizaJ2WdRdRC3YjReFFnKt/82tvL88xtbMx3JYJD8skSntfGLYix/s4UoT9sSVPwXSYine2+04gkaWiQtLW5JoxwTcVHuMN5Xy+B7i4V8h95pW8l2DLB33+YMffSB6I745oDS/4oP0lq5Apb0Utdz2NuDm8yAP0CIkGRaNWZ6GGixIE/YmNgNHdQHa03RE/YKBd4ix1HKXSmhhPLX9Kms4Bb3pBGBgGaU8dovmAAUEayHz1auQXYKnT2+K3+ATgaQ+2yNM6MjYWy8BeV3bSnea34au2CVES7KC1IXLH+/3QwiOgPpwtmFz19r1PsXrm6CO/b99xxRDOQK5AVSWgdW4o+Nu2l0iJxhNNzY5hTrla25QFosogMghP/ReJGK6CrDU/53BqBp2Ss5Mb1EEMpnzHQpPYb8jL3Pg+eEyjaAc2LXBlL9RlxXMIFNzREv5/qFMJGbKTUrtBAppxZ17pU3+OYIsyOilK3CSR1mT+E2pdqhyjh4H7+BVU5qxQv5ATfPJaZstq6O85P4uCS2BOTVK5atREI//ZAICJThgiEHTF+CryUUsPI/QCvJ0pdblwS9f1O/FQVAK38oSN7xvLAF7FWACbzKp7E0spiTN7ovAMcXv4KykdpC496aaZ/0HJNeP+1TiwHwxWFWWIp3yclzHwFUsjjuhQxmlPQMeOOiXFibKUgdLACosR6lM/78TfF/t32HnGjHhRc2iA9IAnTE8i3OUwsOdt8jxrKIqI3TEC87ntYbtCQ/a9BUXqSe7XlWogbtwtpJdAOq0tx1u8zYWdU+/XSyRR7X8uSjInPxPwEmcQ2wNyOrQhh5833kgwqfhO6YgBYKeLJQxiuZXoxmelVUnh+77bWr1PwO+a6YQ2VulJsCvWsefuL1WsqGv10fZOlwkCxdzjCI3nX xT8Vgx0l 01DbAepl3A8R/RrKXRQFo+huucnid1jkHPI+k5ll+aIKqYOPCqEBU3abipc/R6mRFo3pB9oCk7Pri1Y+76vARx1xnh/2IcrhoKgsKFAGgxv8c94CuadKJ1ikFq1xJJbeZMJy14MMzTLvff73RrT2jTiIFVo9zeqtAZUiWJjhSbiEAfocyR5eLXsrHgDjPP3rbPJEQPSCd4b0cBPa8nnmOA3hKv7Qr5cV8QIBok1ePSB8kDgOpdlfPB1A/5zl11RUOpjb1cTQSS1+M7gaE005lfs2tbxxeV5ZJ+IzeI9t/wAQJ1QI3cBDB+vn7/TE80XkAhCdFA2iqfivZzk5E22vQLS3aVL104rVhY9xT4hRm5GXAFfgTsbnWMJ1/qUUaDRGiZoIyoGw3MY2rB6Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Rik van Riel Replace the custom hugetlbfs VMA locking code with the recently introduced invalidate_lock. This greatly simplifies things. However, this is a large enough change that it should probably go in separately from the other changes. Another question is whether this simplification hurts scalability for certain workloads. Suggested-by: Matthew Wilcox Signed-off-by: Rik van Riel --- fs/hugetlbfs/inode.c | 70 ++---------- include/linux/fs.h | 6 + include/linux/hugetlb.h | 21 +--- mm/hugetlb.c | 237 ++++------------------------------------ 4 files changed, 35 insertions(+), 299 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 316c4cebd3f3..d1aed4590b3d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -485,7 +485,6 @@ static void hugetlb_unmap_file_folio(struct hstate *h, struct folio *folio, pgoff_t index) { struct rb_root_cached *root = &mapping->i_mmap; - struct hugetlb_vma_lock *vma_lock; struct page *page = &folio->page; struct vm_area_struct *vma; unsigned long v_start; @@ -495,9 +494,9 @@ static void hugetlb_unmap_file_folio(struct hstate *h, start = index * pages_per_huge_page(h); end = (index + 1) * pages_per_huge_page(h); + filemap_invalidate_lock(mapping); i_mmap_lock_write(mapping); -retry: - vma_lock = NULL; + vma_interval_tree_foreach(vma, root, start, end - 1) { v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); @@ -505,62 +504,13 @@ static void hugetlb_unmap_file_folio(struct hstate *h, if (!hugetlb_vma_maps_page(vma, v_start, page)) continue; - if (!hugetlb_vma_trylock_write(vma)) { - vma_lock = vma->vm_private_data; - /* - * If we can not get vma lock, we need to drop - * immap_sema and take locks in order. First, - * take a ref on the vma_lock structure so that - * we can be guaranteed it will not go away when - * dropping immap_sema. - */ - kref_get(&vma_lock->refs); - break; - } - unmap_hugepage_range(vma, v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); hugetlb_vma_unlock_write(vma); } + filemap_invalidate_unlock(mapping); i_mmap_unlock_write(mapping); - - if (vma_lock) { - /* - * Wait on vma_lock. We know it is still valid as we have - * a reference. We must 'open code' vma locking as we do - * not know if vma_lock is still attached to vma. - */ - down_write(&vma_lock->rw_sema); - i_mmap_lock_write(mapping); - - vma = vma_lock->vma; - if (!vma) { - /* - * If lock is no longer attached to vma, then just - * unlock, drop our reference and retry looking for - * other vmas. - */ - up_write(&vma_lock->rw_sema); - kref_put(&vma_lock->refs, hugetlb_vma_lock_release); - goto retry; - } - - /* - * vma_lock is still attached to vma. Check to see if vma - * still maps page and if so, unmap. - */ - v_start = vma_offset_start(vma, start); - v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, v_start, page)) - unmap_hugepage_range(vma, v_start, v_end, NULL, - ZAP_FLAG_DROP_MARKER); - - kref_put(&vma_lock->refs, hugetlb_vma_lock_release); - hugetlb_vma_unlock_write(vma); - - goto retry; - } } static void @@ -578,20 +528,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, unsigned long v_start; unsigned long v_end; - if (!hugetlb_vma_trylock_write(vma)) - continue; - v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags); - - /* - * Note that vma lock only exists for shared/non-private - * vmas. Therefore, lock is not held when calling - * unmap_hugepage_range for private vmas. - */ - hugetlb_vma_unlock_write(vma); } } @@ -725,10 +665,12 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) pgoff = offset >> PAGE_SHIFT; i_size_write(inode, offset); + filemap_invalidate_lock(mapping); i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); + filemap_invalidate_unlock(mapping); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -778,6 +720,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) return -EPERM; } + filemap_invalidate_lock(mapping); i_mmap_lock_write(mapping); /* If range starts before first full page, zero partial page. */ @@ -799,6 +742,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) hole_end, offset + len); i_mmap_unlock_write(mapping); + filemap_invalidate_unlock(mapping); /* Remove full pages from the file. */ if (hole_end > hole_start) diff --git a/include/linux/fs.h b/include/linux/fs.h index 4aeb3fa11927..b455a8913db4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -847,6 +847,12 @@ static inline void filemap_invalidate_lock(struct address_space *mapping) down_write(&mapping->invalidate_lock); } +static inline int filemap_invalidate_trylock( + struct address_space *mapping) +{ + return down_write_trylock(&mapping->invalidate_lock); +} + static inline void filemap_invalidate_unlock(struct address_space *mapping) { up_write(&mapping->invalidate_lock); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d9ec500cfef9..2908c47e7bf2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -60,7 +60,6 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; - struct rw_semaphore rw_sema; #ifdef CONFIG_CGROUP_HUGETLB /* * On private mappings, the counter to uncharge reservations is stored @@ -107,12 +106,6 @@ struct file_region { #endif }; -struct hugetlb_vma_lock { - struct kref refs; - struct rw_semaphore rw_sema; - struct vm_area_struct *vma; -}; - extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); @@ -1277,17 +1270,9 @@ hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { #if defined(CONFIG_HUGETLB_PAGE) && \ defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - /* - * If pmd sharing possible, locking needed to safely walk the - * hugetlb pgtables. More information can be found at the comment - * above huge_pte_offset() in the same file. - * - * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. - */ - if (__vma_shareable_lock(vma)) - WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && + if (vma->vm_file) + WARN_ON_ONCE(!lockdep_is_held( + &vma->vm_file->f_mapping->invalidate_lock) && !lockdep_is_held( &vma->vm_file->f_mapping->i_mmap_rwsem)); #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 424bb8da9519..a942ea3579f8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -92,9 +92,6 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); -static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); -static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); static struct resv_map *vma_resv_map(struct vm_area_struct *vma); @@ -264,170 +261,41 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) */ void hugetlb_vma_lock_read(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - down_read(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - down_read(&resv_map->rw_sema); - } + if (vma->vm_file) + filemap_invalidate_lock_shared(vma->vm_file->f_mapping); } void hugetlb_vma_unlock_read(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - up_read(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - up_read(&resv_map->rw_sema); - } + if (vma->vm_file) + filemap_invalidate_unlock_shared(vma->vm_file->f_mapping); } void hugetlb_vma_lock_write(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - down_write(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - down_write(&resv_map->rw_sema); - } + if (vma->vm_file) + filemap_invalidate_lock(vma->vm_file->f_mapping); } void hugetlb_vma_unlock_write(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - up_write(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - up_write(&resv_map->rw_sema); - } + if (vma->vm_file) + filemap_invalidate_unlock(vma->vm_file->f_mapping); } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - return down_write_trylock(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - return down_write_trylock(&resv_map->rw_sema); - } + if (vma->vm_file) + return filemap_invalidate_trylock(vma->vm_file->f_mapping); return 1; } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - lockdep_assert_held(&vma_lock->rw_sema); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - lockdep_assert_held(&resv_map->rw_sema); - } -} - -void hugetlb_vma_lock_release(struct kref *kref) -{ - struct hugetlb_vma_lock *vma_lock = container_of(kref, - struct hugetlb_vma_lock, refs); - - kfree(vma_lock); -} - -static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock) -{ - struct vm_area_struct *vma = vma_lock->vma; - - /* - * vma_lock structure may or not be released as a result of put, - * it certainly will no longer be attached to vma so clear pointer. - * Semaphore synchronizes access to vma_lock->vma field. - */ - vma_lock->vma = NULL; - vma->vm_private_data = NULL; - up_write(&vma_lock->rw_sema); - kref_put(&vma_lock->refs, hugetlb_vma_lock_release); -} - -static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) -{ - if (__vma_shareable_lock(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - __hugetlb_vma_unlock_write_put(vma_lock); - } else if (__vma_private_lock(vma)) { - struct resv_map *resv_map = vma_resv_map(vma); - - /* no free for anon vmas, but still need to unlock */ - up_write(&resv_map->rw_sema); - } -} - -static void hugetlb_vma_lock_free(struct vm_area_struct *vma) -{ - /* - * Only present in sharable vmas. - */ - if (!vma || !__vma_shareable_lock(vma)) - return; - - if (vma->vm_private_data) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - down_write(&vma_lock->rw_sema); - __hugetlb_vma_unlock_write_put(vma_lock); - } -} - -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) -{ - struct hugetlb_vma_lock *vma_lock; - - /* Only establish in (flags) sharable vmas */ - if (!vma || !(vma->vm_flags & VM_MAYSHARE)) - return; - - /* Should never get here with non-NULL vm_private_data */ - if (vma->vm_private_data) - return; - - vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); - if (!vma_lock) { - /* - * If we can not allocate structure, then vma can not - * participate in pmd sharing. This is only a possible - * performance enhancement and memory saving issue. - * However, the lock is also used to synchronize page - * faults with truncation. If the lock is not present, - * unlikely races could leave pages in a file past i_size - * until the file is removed. Warn in the unlikely case of - * allocation failure. - */ - pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); - return; - } - - kref_init(&vma_lock->refs); - init_rwsem(&vma_lock->rw_sema); - vma_lock->vma = vma; - vma->vm_private_data = vma_lock; + if (vma->vm_file) + lockdep_assert_held(&vma->vm_file->f_mapping->invalidate_lock); } /* Helper that removes a struct file_region from the resv_map cache and returns @@ -1100,7 +968,6 @@ struct resv_map *resv_map_alloc(void) kref_init(&resv_map->refs); spin_lock_init(&resv_map->lock); INIT_LIST_HEAD(&resv_map->regions); - init_rwsem(&resv_map->rw_sema); resv_map->adds_in_progress = 0; /* @@ -1195,22 +1062,11 @@ void hugetlb_dup_vma_private(struct vm_area_struct *vma) VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); /* * Clear vm_private_data - * - For shared mappings this is a per-vma semaphore that may be - * allocated in a subsequent call to hugetlb_vm_op_open. - * Before clearing, make sure pointer is not associated with vma - * as this will leak the structure. This is the case when called - * via clear_vma_resv_huge_pages() and hugetlb_vm_op_open has already - * been called to allocate a new structure. * - For MAP_PRIVATE mappings, this is the reserve map which does * not apply to children. Faults generated by the children are * not guaranteed to succeed, even if read-only. */ - if (vma->vm_flags & VM_MAYSHARE) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - if (vma_lock && vma_lock->vma != vma) - vma->vm_private_data = NULL; - } else + if (!(vma->vm_flags & VM_MAYSHARE)) vma->vm_private_data = NULL; } @@ -4846,25 +4702,6 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) resv_map_dup_hugetlb_cgroup_uncharge_info(resv); kref_get(&resv->refs); } - - /* - * vma_lock structure for sharable mappings is vma specific. - * Clear old pointer (if copied via vm_area_dup) and allocate - * new structure. Before clearing, make sure vma_lock is not - * for this vma. - */ - if (vma->vm_flags & VM_MAYSHARE) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - - if (vma_lock) { - if (vma_lock->vma != vma) { - vma->vm_private_data = NULL; - hugetlb_vma_lock_alloc(vma); - } else - pr_warn("HugeTLB: vma_lock already exists in %s.\n", __func__); - } else - hugetlb_vma_lock_alloc(vma); - } } static void hugetlb_vm_op_close(struct vm_area_struct *vma) @@ -4875,8 +4712,6 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) unsigned long reserve, start, end; long gbl_reserve; - hugetlb_vma_lock_free(vma); - resv = vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -5048,16 +4883,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); vma_assert_write_locked(src_vma); raw_write_seqcount_begin(&src->write_protect_seq); - } else { - /* - * For shared mappings the vma lock must be held before - * calling hugetlb_walk() in the src vma. Otherwise, the - * returned ptep could go away if part of a shared pmd and - * another thread calls huge_pmd_unshare. - */ - hugetlb_vma_lock_read(src_vma); } + hugetlb_vma_lock_read(src_vma); + last_addr_mask = hugetlb_mask_last_page(h); for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; @@ -5209,10 +5038,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); - } else { - hugetlb_vma_unlock_read(src_vma); } + hugetlb_vma_unlock_read(src_vma); + return ret; } @@ -5447,25 +5276,9 @@ void __hugetlb_zap_begin(struct vm_area_struct *vma, void __hugetlb_zap_end(struct vm_area_struct *vma, struct zap_details *details) { - zap_flags_t zap_flags = details ? details->zap_flags : 0; - - if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */ - /* - * Unlock and free the vma lock before releasing i_mmap_rwsem. - * When the vma_lock is freed, this makes the vma ineligible - * for pmd sharing. And, i_mmap_rwsem is required to set up - * pmd sharing. This is important as page tables for this - * unmapped range will be asynchrously deleted. If the page - * tables are shared, there will be issues when accessed by - * someone else. - */ - __hugetlb_vma_unlock_write_free(vma); - } else { - hugetlb_vma_unlock_write(vma); - } - if (vma->vm_file) i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, @@ -6708,12 +6521,6 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } - /* - * vma specific semaphore used for pmd sharing and fault/truncation - * synchronization - */ - hugetlb_vma_lock_alloc(vma); - /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page @@ -6836,7 +6643,6 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: - hugetlb_vma_lock_free(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6906,13 +6712,10 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, /* * match the virtual addresses, permission and the alignment of the * page table page. - * - * Also, vma_lock (vm_private_data) is required for sharing. */ if (pmd_index(addr) != pmd_index(saddr) || vm_flags != svm_flags || - !range_in_vma(svma, sbase, s_end) || - !svma->vm_private_data) + !range_in_vma(svma, sbase, s_end)) return 0; return saddr; @@ -6932,8 +6735,6 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) */ if (!(vma->vm_flags & VM_MAYSHARE)) return false; - if (!vma->vm_private_data) /* vma lock required for sharing */ - return false; if (!range_in_vma(vma, start, end)) return false; return true;