From patchwork Tue Jan 7 20:40:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13929597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA457E77199 for ; Tue, 7 Jan 2025 20:40:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4623F6B00AA; Tue, 7 Jan 2025 15:40:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3ED296B00AB; Tue, 7 Jan 2025 15:40:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F1746B00AC; Tue, 7 Jan 2025 15:40:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EE38A6B00AA for ; Tue, 7 Jan 2025 15:40:20 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7CA9B1C7E0A for ; Tue, 7 Jan 2025 20:40:20 +0000 (UTC) X-FDA: 82981823400.16.1EF8B80 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 4919880012 for ; Tue, 7 Jan 2025 20:40:18 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HM1XZ9vP; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736282418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=npkz93gdH10d0Ns+IAXk+8j0cZFcyrv86CqE+UDh8/gAyMC8zXo0McZcy31+jYDdIZWvZW QNJxV/mg/0a1vGtJ03uA3fhyPrf+Q9v0gqZ3z9Zor4aL8p+2BXBqH+AO9OIE+EoMxQPt0N lP1xo5emANT5cUAGannaKUAxAKV7sac= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HM1XZ9vP; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736282418; a=rsa-sha256; cv=none; b=qLjMlCxEW1wfIePYnn+OGHpRJzVIePDClWtevtNeIrhJofsftgaXEAxQLjyEKdI39qB+Zc qhikm9cVSf4StwowFvGdOWzEP0Z27h2ccHJlpvPPrnlQZ1hVoRLgRun7PE/SRRm/8Cx4bh 7htSOuXrwoabZgrSvzxteVbLLflDTf0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736282417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=HM1XZ9vPLD6v8HDh3lIpoCYe6x2e3fdxI4n7B7Hph5Dp9WV8KksMMM12N81m3+PcRG+rKt 43CNm9yThNllS9GLEGWgYJ1dCiXZoRBYbxEHtofM6iyn2n6NVSUwM9sc4Ww62fBzZ+60XH hsjRqY1CyyAIZplMauMRl9ExXns3alA= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-362-5WID1n0rNJCWudo3xIupMg-1; Tue, 07 Jan 2025 15:40:16 -0500 X-MC-Unique: 5WID1n0rNJCWudo3xIupMg-1 X-Mimecast-MFC-AGG-ID: 5WID1n0rNJCWudo3xIupMg Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6d8eb5ea994so186588916d6.1 for ; Tue, 07 Jan 2025 12:40:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736282416; x=1736887216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FeRSwArockADk7XuGGq52WKujX3Jyjof3FZOrhLMA7I=; b=Vs/MHV71iYGr2XcpamxfmZZ6NZhwmeAG4jgu7pc19OJINNogVfMToQBAXmlNJbeIzZ HJZs+/aqeNzImnt37aUwv/H2MQEefojO3LeMpng+n9v2XKXaGAZeD4KqzM0qg5YUbO4G wed81w8y/sPywY9fXr6qyQcPLrO9UryFD3k4lf0hNCVCOkCqrdQ+eZWhkf6+vCdFVXda l+Px4LSIs26huXFuIZdu5b/04KlV0agZEqMpZsAm8u7yNaxC1H0/1BhbDl0hEHBx68ve O6u026dFgOu1gmIFbK9asdQv3TM7XQ39GFa72QL5qucxm5x9quRiX6+aaklgTyTKrFjZ BrBw== X-Gm-Message-State: AOJu0YxBWCSSIxETIOnHSth62L0+ae/FoahAzFlXsWQqICIvjOKW30iQ /WEu84AXohu4CHOzFAYQdaPsygrcY8Aa/Xb+R1LEHgC4QnUMoqKWxLF0yrFwRm9Tla49zhIX9et F5hnjqKDKy0JQJ8zWRCafYrUBz4/FcK6z6LlFKNPthno3iT+4y07VunnxUpENdux66sQmsXJ5Jn bWtyt4km8iDFcYkFTvVkjkrhlleK0ovQ== X-Gm-Gg: ASbGncut6/BOkXUKCncAbDyKSa1w04bVXncoWXixq6INTwUav05UkVUvNcT0zC0JepH jaCI2GVnD3J8RbG37XEApvd+c94a0bVRz7IAulaHf1PLu4BzhYq+gZn733ZbZ9iMhgXv+KphhTD dLrPJ5G+hd49kT/nwiUTNMSoUEXvMroVa2QS7qqmqxfxnWxIs5z7ZnUW02nlR+LSmL7nODbc1/N NIrFcxNk01cBG08JTqaRGbuTLQheBuGM94b+P8ACZ/MZTgXLBMZ329opeIwTDzXyzbgpA5l7YkS izW+jMs+jwo4Tnx1vNVPfV3fH9AhJQqK X-Received: by 2002:a05:6214:434a:b0:6d8:cff9:f373 with SMTP id 6a1803df08f44-6df9b2d1a40mr7581116d6.30.1736282415574; Tue, 07 Jan 2025 12:40:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IE3MtP6+dWvdpGsDlM1wGqTR7s1w0fblu3/xx93mfYpEOm6U5pVW/zHRBSeKXgOBBACpJiRAA== X-Received: by 2002:a05:6214:434a:b0:6d8:cff9:f373 with SMTP id 6a1803df08f44-6df9b2d1a40mr7580606d6.30.1736282415027; Tue, 07 Jan 2025 12:40:15 -0800 (PST) Received: from x1n.redhat.com (pool-99-254-114-190.cpe.net.cable.rogers.com. [99.254.114.190]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6dd181373f6sm184478306d6.62.2025.01.07.12.40.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 12:40:13 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Breno Leitao , Rik van Riel , Muchun Song , Naoya Horiguchi , Roman Gushchin , Ackerley Tng , Andrew Morton , peterx@redhat.com, Oscar Salvador Subject: [PATCH v2 5/7] mm/hugetlb: Simplify vma_has_reserves() Date: Tue, 7 Jan 2025 15:40:00 -0500 Message-ID: <20250107204002.2683356-6-peterx@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107204002.2683356-1-peterx@redhat.com> References: <20250107204002.2683356-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: f8UL08aBnSclgQrznmi9LcuEszG4VBfXTyhXzkyFihc_1736282416 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 4919880012 X-Rspamd-Server: rspam12 X-Stat-Signature: 5tub6c4tndtnbn7zeoek7zmxnx9bt87f X-Rspam-User: X-HE-Tag: 1736282418-163087 X-HE-Meta: U2FsdGVkX19Lxo7l+5FsTRIOJeFA1U/OvelZFFCURSYlVJPfcdEmoXiZa0cPeal7PcwPJGpa7kas6/Kc8GgK1swi7/uMPsbh60mAMr+YBfiIX8v4gLuazifotFZJCtTfoblV9ltic9TB2kwhmNNCTU+QnN5eYufKrV8lMAVhuR25RuVIesS+hUSMUdm6hb+AIY5Yvn34hd243fmtgTBlksuLxtcrGVbBtE0IMswRuSNsBLTQniWvXgqEnJHGxWVDEA+D157D9LkaFoVAxT2azdqmtutfOefg+XgIWuaA2n5Ll/bTXOPladIXP6ycqZv6+mA/J27T62Xl51gbLgTDpoNze66TB3f5IMTXmyGTDE7S0Ix/s9Wm8aYPtpHVowet+43BFUj/TOzgCSZwyqxM78KU41Dj242MTcQyZh7JDw+0dOPqYbY32+PDrokZNTlGqSIjxkpHaoarpG9QP/DtmGdffXpoV9X4aL9P20PtWKkQTULFOdlulzTZOYe26dFxWbR9RQzIPQaIzTIs2T7Kvwu3bMwLRCm5GBHfuepJTM9QHUVB/uRr1082Oz1eTuf085NEea+aYuHQhC1RNDUXzdaP3p4Yw7OoHx3FTRQA/AtEu27JflR+9MlmOptJjmxG3T0bTGH1/UxEsWshhdHg9aNIibbexc/6zsmIjq7MekK+ggMB+nGfJdgQMshIXaHnEqYAczsQS8YNiPDZLmmT+jifjFUAsm5XxM+k2CRP230dApKprdKC53Zf/d8Iz5HFAoy+XRTVWfK3Olh2BV0vJnOa10yNoT9y6B9m33W9G617gHpU4o3sNApbf+QM28mhjzO8XhE3Ei+SoOoLoagBjtQOwuQk5yIIlg/ZCjl+8r3Kijo/sQ2F5M6UsmKIm+vfUqqAaHdJq/YJYybI+l0U392uN5lWqKWnUAVrRlJOKpy4t5yG2byEwkxd4Pe7LGdAl/MiqbjnKJ4UBUGaJPh UgnJs+EZ ko1WY0agQ4pf4rxaH33ybdtRrD91GHPOVsnUYV1LUlT5TIKzLJ/oJLRTTCTpnbVTGxQRgMeBAfiprRaLfYhQ3h5WXm6EkMh2Gq2PWowMWvZKXvSFGV7VmCCxq2MmWktEOXFIi1yPFc23+fZgZXlnwKdSzw+270RzcDHTyvVnPrKbphd4CIZu2gXuP0t6EbBS2TDWO071uuT83Xgecdq8vqajB1ba7r6z+LsfNNuMmL4LpBARtt5ucybrgK6mG1n7JdrhiU69jn5j4Wj7pqcEvjAl75/C+dhC2v0BjoK67FHFXpEotdDDBLX/iNcJNd14PHa2AEMB5YC51kcjnUUBUaIk4G28r8uBELvtkC8nsxbo8jYGMaYGFcmb1MYNftlY0QyDIJffhmadl+3V+8Cfuz0U9MdmTxjwbDLebvk6fozkPv5RGOcxTbWoFMD0lvlGnDKXQKgF7pwJa+FTDI3dGWlsGk+kqpPPkhd88mQTUr9QOI84= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: vma_has_reserves() is a helper "trying" to know whether the vma should consume one reservation when allocating the hugetlb folio. However it's not clear on why we need such complexity, as such information is already represented in the "chg" variable. From alloc_hugetlb_folio() context, "chg" (or in the function's context, "gbl_chg") is defined as: - If gbl_chg=1, the allocation cannot reuse an existing reservation - If gbl_chg=0, the allocation should reuse an existing reservation Firstly, map_chg is defined as following, to cover all cases of hugetlb reservation scenarios (mostly, via vma_needs_reservation(), but cow_from_owner is an outlier): CONDITION HAS RESERVATION? ========= ================ - SHARED: always check against per-inode resv_map (ignore NONRESERVE) - If resv exists ==> YES [1] - If not ==> NO [2] - PRIVATE: complicated... - Request came from a CoW from owner resv map ==> NO [3] (when cow_from_owner==true) - If does not own a resv_map at all.. ==> NO [4] (examples: VM_NORESERVE, private fork()) - If owns a resv_map, but resv donsn't exists ==> NO [5] - If owns a resv_map, and resv exists ==> YES [6] Further on, gbl_chg considered spool setup, so that is a decision based on all the context. If we look at vma_has_reserves(), it almost does check that has already been processed by map_chg accounting (I marked each return value to the case above): static bool vma_has_reserves(struct vm_area_struct *vma, long chg) { if (vma->vm_flags & VM_NORESERVE) { if (vma->vm_flags & VM_MAYSHARE && chg == 0) return true; ==> [1] else return false; ==> [2] or [4] } if (vma->vm_flags & VM_MAYSHARE) { if (chg) return false; ==> [2] else return true; ==> [1] } if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { if (chg) return false; ==> [5] else return true; ==> [6] } return false; ==> [4] } It didn't check [3], but [3] case was actually already covered now by the "chg" / "gbl_chg" / "map_chg" calculations. In short, vma_has_reserves() doesn't provide anything more than return "!chg".. so just simplify all the things. There're a lot of comments describing truncation races, IIUC there should have no race as long as map_chg is properly done. Signed-off-by: Peter Xu --- mm/hugetlb.c | 67 ++++++---------------------------------------------- 1 file changed, 7 insertions(+), 60 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b8a849fe1531..5ec079f32f44 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1247,66 +1247,13 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) } /* Returns true if the VMA has associated reserve pages */ -static bool vma_has_reserves(struct vm_area_struct *vma, long chg) +static bool vma_has_reserves(long chg) { - if (vma->vm_flags & VM_NORESERVE) { - /* - * This address is already reserved by other process(chg == 0), - * so, we should decrement reserved count. Without decrementing, - * reserve count remains after releasing inode, because this - * allocated page will go into page cache and is regarded as - * coming from reserved pool in releasing step. Currently, we - * don't have any other solution to deal with this situation - * properly, so add work-around here. - */ - if (vma->vm_flags & VM_MAYSHARE && chg == 0) - return true; - else - return false; - } - - /* Shared mappings always use reserves */ - if (vma->vm_flags & VM_MAYSHARE) { - /* - * We know VM_NORESERVE is not set. Therefore, there SHOULD - * be a region map for all pages. The only situation where - * there is no region map is if a hole was punched via - * fallocate. In this case, there really are no reserves to - * use. This situation is indicated if chg != 0. - */ - if (chg) - return false; - else - return true; - } - /* - * Only the process that called mmap() has reserves for - * private mappings. + * Now "chg" has all the conditions considered for whether we + * should use an existing reservation. */ - if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { - /* - * Like the shared case above, a hole punch or truncate - * could have been performed on the private mapping. - * Examine the value of chg to determine if reserves - * actually exist or were previously consumed. - * Very Subtle - The value of chg comes from a previous - * call to vma_needs_reserves(). The reserve map for - * private mappings has different (opposite) semantics - * than that of shared mappings. vma_needs_reserves() - * has already taken this difference in semantics into - * account. Therefore, the meaning of chg is the same - * as in the shared case above. Code could easily be - * combined, but keeping it separate draws attention to - * subtle differences. - */ - if (chg) - return false; - else - return true; - } - - return false; + return chg == 0; } static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) @@ -1407,7 +1354,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, * have no page reserves. This check ensures that reservations are * not "stolen". The child may still get SIGKILLed */ - if (!vma_has_reserves(vma, chg) && !available_huge_pages(h)) + if (!vma_has_reserves(chg) && !available_huge_pages(h)) goto err; gfp_mask = htlb_alloc_mask(h); @@ -1425,7 +1372,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); - if (folio && vma_has_reserves(vma, chg)) { + if (folio && vma_has_reserves(chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } @@ -3116,7 +3063,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); - if (vma_has_reserves(vma, gbl_chg)) { + if (vma_has_reserves(gbl_chg)) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; }