From patchwork Tue Jul 5 20:00:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 12906994 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60FB8C43334 for ; Tue, 5 Jul 2022 20:00:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CED1F8E0001; Tue, 5 Jul 2022 16:00:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9D036B0073; Tue, 5 Jul 2022 16:00:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B64E58E0001; Tue, 5 Jul 2022 16:00:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A514F6B0072 for ; Tue, 5 Jul 2022 16:00:40 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5A4D721398 for ; Tue, 5 Jul 2022 20:00:40 +0000 (UTC) X-FDA: 79654113840.16.B34DEF1 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf08.hostedemail.com (Postfix) with ESMTP id D16E9160053 for ; Tue, 5 Jul 2022 20:00:39 +0000 (UTC) Received: by mail-qt1-f170.google.com with SMTP id r2so15331104qta.0 for ; Tue, 05 Jul 2022 13:00:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yuW75eqpsaZTs+7uj3nHQWulT3nobQdJU4XEVp0gxTQ=; b=IBNL/PuiWmzq9f2gWfDfGWWKLhi0i/kgftXuy90GJjDce8CjvFjjYuvCM1BCLgYPZW Dcjwm3ELFLFJotngu8bVeIZJx74R9IAdzGl5SMiMuaizxDXrMdGdTl7E1noixR67OTXW hCWCuRSJ8n/3uJLBUl5rr3AoWTE7aEKAnvPj3RWJc5mOeTcblQfuedakVYu8AsmqVyFg OLMc9Z1m6u+Vmq78CsOwNKuOX7Iy2oxCrRo3IjsCY/QHSR+jIfVzL9anABfdQTEOIExo 5AYPLp2VUbvtsuc+UXaAuZAYi+h0hn3hHEywLJrAaY6uZVC7obZIiVfQ0mLHmMyqNEjl wIBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=yuW75eqpsaZTs+7uj3nHQWulT3nobQdJU4XEVp0gxTQ=; b=TkibUtHQ3pRqEbBVIBvEvk6VEbzPBwixcgI1PB04gXz+ewFncWFMgsp2+E6xCMHaHC FRcj0AxGgiZPQr7LRSks7GS7PQugqtNiINo69iinDsVRGuZiUAhKljguiRdbSicMfv+f 7/2Hu7F54clrJ388GTJlfIyV8GqwReDbiZcRk/u/Dr6umElS5zzfcaiOWdBN/FbKaDBn U2lmLDnvpqN5scnEnG56jprtfYs9649xfyzYlkxnzgcGaGg2jRfVrmcyOYpMNRuTCeSq SOeILUQYE709DPl+T5RuCyY/bbuj3YJO9D+QTYcsv1i9zacbJ9KPgJaApKdr/eij8J8P 3vgA== X-Gm-Message-State: AJIora+RDxdXfpmgaih6J455Kz0W/ZusgeEpZzXmxw0bJh486NkG2Uay 67QCxq1us8OpRWqzvz83f+8i7xJK1suo1w== X-Google-Smtp-Source: AGRyM1tXKS5ydwvi6k2mdyauIYYmQGEBDL6Fv8im9VRkPiOnXIC3oJmN+ag4IUKeKU5xZi3t0oPzUA== X-Received: by 2002:a05:622a:58c:b0:319:9323:4ac7 with SMTP id c12-20020a05622a058c00b0031993234ac7mr29791910qtb.240.1657051238703; Tue, 05 Jul 2022 13:00:38 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id bx15-20020a05622a090f00b0031c56d5f7e1sm14276087qtb.92.2022.07.05.13.00.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 13:00:38 -0700 (PDT) From: Josef Bacik To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Kirill A . Shutemov" , Matthew Wilcox , Rik van Riel , Chris Mason Subject: [PATCH] mm: fix page leak with multiple threads mapping the same page Date: Tue, 5 Jul 2022 16:00:36 -0400 Message-Id: <2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com> X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657051240; a=rsa-sha256; cv=none; b=haSUuPILJMbefPNVDaomrQBlwZhj9cxC/ZR437B1rIJbq03p8QgHPsC37EkZYGXParzYUi olP1TtHY6pyWPhiY/hwev+f1qEgQRQV7D07/FxCB9nOTVHUAohBs894JGXqaYqXQDfE15y A+tfN2Xrwt/RsaEjvDP6gC/h2oUD8vQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b="IBNL/Pui"; dmarc=none; spf=none (imf08.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.160.170) smtp.mailfrom=josef@toxicpanda.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657051240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=yuW75eqpsaZTs+7uj3nHQWulT3nobQdJU4XEVp0gxTQ=; b=G85iAdMfqTcPhVGCdMWZ6UNuwGu1/m7ifrFf1SqwLI/Shp+tK/8ArWdprO+ZbqTnuPtWW0 yO1bOQJ9H0Uipsvc9QKyCOdHGr1RWKN9q/cKdXznwphNJB8mWgxtWQUn5MYqnIKg7RLREL 0slrcFbqAdUfvtpjut9iktzDar+wLAI= X-Stat-Signature: u9eye9dthdqpfhkwijz8hunk4uye1s7m X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D16E9160053 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b="IBNL/Pui"; dmarc=none; spf=none (imf08.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.160.170) smtp.mailfrom=josef@toxicpanda.com X-HE-Tag: 1657051239-242384 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have an application with a lot of threads that use a shared mmap backed by tmpfs mounted with -o huge=within_size. This application started leaking loads of huge pages when we upgraded to a recent kernel. Using the page ref tracepoints and a BPF program written by Tejun Heo we were able to determine that these pages would have multiple refcounts from the page fault path, but when it came to unmap time we wouldn't drop the number of refs we had added from the faults. I wrote a reproducer that mmap'ed a file backed by tmpfs with -o huge=always, and then spawned 20 threads all looping faulting random offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge page aligned ranges. This very quickly reproduced the problem. The problem here is that we check for the case that we have multiple threads faulting in a range that was previously unmapped. One thread maps the PMD, the other thread loses the race and then returns 0. However at this point we already have the page, and we are no longer putting this page into the processes address space, and so we leak the page. We actually did the correct thing prior to f9ce0be71d1f, however it looks like Kirill copied what we do in the anonymous page case. In the anonymous page case we don't yet have a page, so we don't have to drop a reference on anything. Previously we did the correct thing for file based faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on the page we faulted in. Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable() case, this makes us drop the ref on the page properly, and now my reproducer no longer leaks the huge pages. Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Signed-off-by: Josef Bacik Signed-off-by: Rik van Riel Signed-off-by: Chris Mason Acked-by: Kirill A. Shutemov --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 7a089145cad4..f10724d7dca3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4371,7 +4371,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) /* See comment in handle_pte_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; + return VM_FAULT_NOPAGE; vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl);