From patchwork Sun Dec 1 01:56:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11268433 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDF9917E0 for ; Sun, 1 Dec 2019 01:56:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9110E215E5 for ; Sun, 1 Dec 2019 01:56:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="GHWZZ/kf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9110E215E5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B4C8B6B035E; Sat, 30 Nov 2019 20:56:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B24236B0360; Sat, 30 Nov 2019 20:56:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A148B6B0361; Sat, 30 Nov 2019 20:56:53 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 87F786B035E for ; Sat, 30 Nov 2019 20:56:53 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 315DD824999B for ; Sun, 1 Dec 2019 01:56:53 +0000 (UTC) X-FDA: 76214909106.09.smell80_3c17e89c5c342 X-Spam-Summary: 2,0,0,0e880932872935ab,d41d8cd98f00b204,akpm@linux-foundation.org,:akpm@linux-foundation.org:dave@stgolabs.net::longman@redhat.com:mike.kravetz@oracle.com:mingo@redhat.com:mm-commits@vger.kernel.org:peterz@infradead.org:torvalds@linux-foundation.org:will.deacon@arm.com:willy@infradead.org,RULES_HIT:41:355:379:421:800:960:967:973:988:989:1260:1263:1345:1381:1431:1437:1534:1542:1711:1730:1747:1777:1792:2393:2525:2559:2563:2682:2685:2693:2859:2898:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4250:4321:5007:6119:6120:6261:6630:6653:6737:7576:7901:7903:8599:9025:9545:10004:10913:11026:11658:11914:12043:12048:12296:12297:12438:12517:12519:12555:12679:12783:12986:13161:13229:13846:13869:14037:14181:14721:14849:21063:21080:21222:21324:21451:21627:21939:30005:30054:30064:30070,0,RBL:error,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCach e:0,MSF: X-HE-Tag: smell80_3c17e89c5c342 X-Filterd-Recvd-Size: 3582 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Sun, 1 Dec 2019 01:56:52 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 45325205ED; Sun, 1 Dec 2019 01:56:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575165412; bh=IwRVlB4vwjLVk8uI3zroRyCOxTdPijo+Y0tZsYNIwgY=; h=Date:From:To:Subject:From; b=GHWZZ/kfhrr2uqDtsSdtnysXyFIywBJ0Jr+0jpOS6yEMuw9dVSkYrRHUlRhTNooob WC1RWnEr0uJ/qNFmxPKp9OdpgmZfRSi/ozx2op+R6GtHF6E5JkrG2wwg3LfEkh36yG TpTdz2zdhditelrf59UgeYsLVSllTmTCwbhBfZLY= Date: Sat, 30 Nov 2019 17:56:49 -0800 From: akpm@linux-foundation.org To: akpm@linux-foundation.org, dave@stgolabs.net, linux-mm@kvack.org, longman@redhat.com, mike.kravetz@oracle.com, mingo@redhat.com, mm-commits@vger.kernel.org, peterz@infradead.org, torvalds@linux-foundation.org, will.deacon@arm.com, willy@infradead.org Subject: [patch 128/158] hugetlbfs: take read_lock on i_mmap for PMD sharing Message-ID: <20191201015649.nrcYKXX9H%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Waiman Long Subject: hugetlbfs: take read_lock on i_mmap for PMD sharing A customer with large SMP systems (up to 16 sockets) with application that uses large amount of static hugepages (~500-1500GB) are experiencing random multisecond delays. These delays were caused by the long time it took to scan the VMA interval tree with mmap_sem held. The sharing of huge PMD does not require changes to the i_mmap at all. Therefore, we can just take the read lock and let other threads searching for the right VMA share it in parallel. Once the right VMA is found, either the PMD lock (2M huge page for x86-64) or the mm->page_table_lock will be acquired to perform the actual PMD sharing. Lock contention, if present, will happen in the spinlock. That is much better than contention in the rwsem where the time needed to scan the the interval tree is indeterminate. With this patch applied, the customer is seeing significant performance improvement over the unpatched kernel. Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.com Signed-off-by: Waiman Long Suggested-by: Mike Kravetz Reviewed-by: Mike Kravetz Cc: Davidlohr Bueso Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Will Deacon Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing +++ a/mm/hugetlb.c @@ -4769,7 +4769,7 @@ pte_t *huge_pmd_share(struct mm_struct * if (!vma_shareable(vma, addr)) return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -4799,7 +4799,7 @@ pte_t *huge_pmd_share(struct mm_struct * spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); return pte; }