From patchwork Wed Sep 20 02:16:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 13392032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81C85CE79AC for ; Wed, 20 Sep 2023 02:18:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7D676B00E6; Tue, 19 Sep 2023 22:18:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2DCA6B00F8; Tue, 19 Sep 2023 22:18:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1BD36B00F9; Tue, 19 Sep 2023 22:18:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C458B6B00E6 for ; Tue, 19 Sep 2023 22:18:42 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9096C412EE for ; Wed, 20 Sep 2023 02:18:42 +0000 (UTC) X-FDA: 81255367284.24.8E2368E Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf25.hostedemail.com (Postfix) with ESMTP id D7365A0014 for ; Wed, 20 Sep 2023 02:18:40 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; spf=none (imf25.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695176320; a=rsa-sha256; cv=none; b=ZNYA0RvCMmtogRDzTxY/bKammblNzGItl7SQLtZXtuGZjk/lfo7iwqUgrFCUt7RthReMCW uxETsZQQLLhV/khQG/8k6+mk6dj+Ok/ZYV8Mw0gYG+ZqTZsNgrHQBqNhvAkeFb73TXDfRj QCsjtsTd8KlL39RI5n7W2C7f70kNaiM= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; spf=none (imf25.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695176320; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=UXByQc3t3A49Pgq8d2E1JEtcEwQkNM/QU0Xg3xEoGLo=; b=CD/7gY5BFbgty1J9UeLsMo4jBOrB+Zqxyb/ZRgISarh2ClpZksct/sl+r1CFkEUY7fx8bK ndUvr7KS9aUOYLu9vo2UV2KeUlGcgFUpxeNub3/zz+NarOjceDDwVTVaBw5ghVvT5zf4oD 2JslYHrRiTMWZY+iJv1nwQuANKbYWyc= Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qimn9-0006me-2W; Tue, 19 Sep 2023 22:18:15 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com Subject: [PATCH 0/2] hugetlbfs: close race between MADV_DONTNEED and page fault Date: Tue, 19 Sep 2023 22:16:08 -0400 Message-ID: <20230920021811.3095089-1-riel@surriel.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D7365A0014 X-Stat-Signature: ap5ge7d7wzcaykjc4zs6mn8xu5hfbmzx X-HE-Tag: 1695176320-343565 X-HE-Meta: U2FsdGVkX18r+ol7922DhpJwn6ssTGwxJx/hg7cF1Nb/CJ9rh0j+YIUKFbs5KwblTDSGE+WwkW16xYb3pOZRVMf1VZlB5QI2CJTFRYG1DAyJQkPE9ld8qhumoQdUCYcK+ZJ6zxhhw8XrC1rF0pbCNMezf7M4tvbNknyv09YxWNdlsdZC24Xvr2b16eNw/Ci1kApRaIKfhlF9ApBN8bJhpz3+OysOwXcJXdkNDQUhBeQiE17ZyC9xUzGfhTtu8DlgUhMa5Qs9vSfGKkO1PZM6aXwwDUkO/LQqsJODqFiGG7KF57X5TrePlAunNw2uPfVKB+zqnEhCNvxX2wPaVsx5+0T/dLHYw776Y1xhIQM25f8JO45/rC10xR7YhCwBPIshQE6Cq3b1EcyUovxUYfKatGkG4Rp33eXxj1pEucNuYfyUuGYfU31fh3eOhXI3yAcGtcLVvntAbPK3tG7vGMc2bwl9o8idHgW+tIlYYAnNsgN3OqU6hFMpZb9Csn4qo2bYJwwB6dEQD+YvXQWH3wjjsKZ9XgQY2sbTJwyvCOgRHijxgHyxbuRzh/zkLL9SRZPllJu9uHTk3ZTWXxmilG8LsgNBr8eAva3RnyfW0XNtkFHtPRIc4Q6OJFpVlDka/yc1biNJwDLh1/gKp7oP8aZgj83Xy6Qy91OmR+yu+k8y08GnfQIrCpFjSTOXZ9o9Frb7XMm5Fdf10B0OJTK2Hnl/lVwSbfQ2ngUUZR5zKOsK0YWIzdDe7EzoSPZSLTxdzTyUb5m4bDt2HRZgZjoymTf+SQTqTlRTARKJlQn9Uq9e6yjlrUdowyEYiJUsoE40ZDShnBvdWAT8uEm8xT2Xjh0SONohp+M9o96fwhM4c3Jhv6zrf4gTY9KQGHAPjxjHOJFgvsvuPS1pr1HaxQOZSzuKrfmBqWWrjNXw0ULcYYyxB0lCXI9CUxhl15ZZcRFqCxXtjbksr69E6GbLZ6vitYR xt8ING5/ mskLAhJx6gOoDVwxTn5fMo41OEt1Cyi+lGZbkSQbIZBSCWK8kxZxBS1yVA3e2e/ovXOE5EJf4T7V+Mtg+LVj1nDPyQqayLFaMi+10qnPnICsUe/EljZBCRwsikMDpU0OVFmoxOC6CgIdHHNCHDn6sZb0OKwQAhs+Bbo/YPDrp+xgRm1hIygNdsAQUOKZp//4SXetGyxMhIVFzdhxdjy+Yp21Ua+oSqg4Z6Zv5yeCab2kLvNZngniMPIC2OnNyMvAcEvnJMivyocKk/di1ZGbIU2H+pg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by extending the hugetlb_vma_lock locking scheme to also cover private hugetlb mappings (with resv_map), and pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed.