From patchwork Fri Jul 27 11:48:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 10546943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 382AB112B for ; Fri, 27 Jul 2018 11:48:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26E0E2B76C for ; Fri, 27 Jul 2018 11:48:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B34F2B7D5; Fri, 27 Jul 2018 11:48:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6EE92B76C for ; Fri, 27 Jul 2018 11:48:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB1196B0003; Fri, 27 Jul 2018 07:48:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B5FD26B0005; Fri, 27 Jul 2018 07:48:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A01A96B0007; Fri, 27 Jul 2018 07:48:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 5C83E6B0003 for ; Fri, 27 Jul 2018 07:48:29 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id g5-v6so2834955pgq.5 for ; Fri, 27 Jul 2018 04:48:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=cE+qWPsDXhLf4oGsc5f+dxANd3GqYbVcEmO2QibUyvY=; b=fffrcmWijFp3tVcif6lvlJWXeAU6hYphZJW1rPHP4BOoFS/8xmWUmFT5YbDXAEZUgw IgoKmvLHgajPEYXCKWHDTvvH03NMMoyHOcOaEC0gyGFxclJSh6gM6izFpiwDyXME6zB5 mBiaq+kpIiyO8rSZhsJrin0m4cQRKswrwLuBoXKFMJqd0mpjyzstCghC6c4WPhslzMSx F/qWAOsS9KK0ShBQ5/rWQsF/OdhjUCvxaJQHzKCBJaqrsUX16C8GASszs5nEn044R2wg GUt7o6yQEb9/vzOjzc83ctmm0X7DeGEsKoW9xJvmjyW6IOpGS6cHUhD1e2aR8BWVQKSC K3HA== X-Gm-Message-State: AOUpUlEaCmgRzCFL5CmhX28FI971LArTXEgh1DFlR//KUVNizat6SLvV Ia43GZ+3RvO6Z9xPwCmbbh2gSvJtPWt2nLRG7tAXuoshPm6sr92fnv0P7cGDhCyWI15rFRdRbL1 z5YVm6yXjoijMO5MJp9WxG68qysHF66lAnnUlV0onvSICP8iHBhV/UxCjEfgop3IOCWjhOi3/WI qUD6M2ZoAW/y6kB7Zd1D3EAbt+5RLDVatU0JS99SqfSzKsMmywIY7CF+agznvmpwapZaJWIz+sv Btjm2KDVqo0ib6p5fJ/gjbbUQIBQdSkkVXjjqECzR+taqitXDnaeurFN/E6tTUCfRXwEGH4qDAp sUFTkN+svC23o/62vPvp8kK5PeDhkCrpEFCkI1m+ziG6Abu5MDn52MGqSxqnLTlN0ws22+58Jof 0 X-Received: by 2002:a17:902:b40c:: with SMTP id x12-v6mr5836398plr.163.1532692109048; Fri, 27 Jul 2018 04:48:29 -0700 (PDT) X-Received: by 2002:a17:902:b40c:: with SMTP id x12-v6mr5836360plr.163.1532692108235; Fri, 27 Jul 2018 04:48:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532692108; cv=none; d=google.com; s=arc-20160816; b=wlycOgM+420/iWLqnAH76xETTnICwM5olJRX9mHAnFRxAPWq7EQFtdbBh889NwqPHY OD82VmqB5vcwowiA8aKA+j0T3zxo/c/j9MUjjrqhAKE0r7j8ycAYNaYh/+be3VMVUWEc honf0qkhIUa0nZJmMZXJmTODVC36pQpp4L4OWK626hnFy7tJ5yU64+rGaCIB8Cwy7q76 q3opXafAhe5YCa7KK3kgXMcsjnBYbHUQ7yMPX+NLhlKHMjdN1gPykSv4YLUG7xcgsJ7g 58Pmcy0xUThgq0JBw8QgpAhUsAnmJUSPvPrnVjVfOp+r9oLmtK9lF8zIpNuz8/wdrDxg mZ2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=cE+qWPsDXhLf4oGsc5f+dxANd3GqYbVcEmO2QibUyvY=; b=EiN42HtjVPVgpJACErrF/GjDLzwwtUD1flkaLpMe2Pqg4p4YCRMFn3vZ0M1tI+9Oyk lC2ahAYQoZUHguquOvuZPuUsBfaA9E6UJEDYsKLLiPuhnT3Z8VV4GtWj/IrIGQGrEi8g cnBKaUrCDLwvp7FnNb7qcw1Bl/YaKTNcAptOdclNLhGMu71yZPNu0VgNIxF+cuuIyYQO tfWUzA1rU8xK2OxnDF7WjLjiZJ1l/50cr9xXk1vf7eB4Cm9e/WdZW/pyF6bKs38CN4wP TC/BRvJ4qEn4AqUrgOpJ0PLE7yw1M3HwtyqCq0Bphy3aF6MxT1B0QuV5KHuhwIG9A8fJ l6qA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=N20EayxE; spf=pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d37-v6sor867290pla.28.2018.07.27.04.48.28 for (Google Transport Security); Fri, 27 Jul 2018 04:48:28 -0700 (PDT) Received-SPF: pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=N20EayxE; spf=pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=cE+qWPsDXhLf4oGsc5f+dxANd3GqYbVcEmO2QibUyvY=; b=N20EayxEN1Ofob1zRTpsQX7aXFQ4XbH3ReK/WlskRHvvGGx3ZumLraZxJy4z2AlqXf r2KlOCThRwaeqPm1c7Lp51ZNAATKL4WCAQgxjpT8pBKEihhkO4nDrSpVDKKCx7hy3btJ vBCAAp4DIT1G/biEtZbVcgWFpRynDMJx8Spwz75aqFOm7II6ZFTcO1r56rLMHGhCmZBd P0d3msAr11jauAMf5FrTgCO5RHYpxLzCN1whE83GZXy3Aetm7yXdOmVFmOR+oTxZqT0l QPH0kaDUaJ6u1pKeLoGK58jOBV/gYQ93/IuWg4XsB2bUyDz675hkFPsQQbAfwYNkya4x GsCw== X-Google-Smtp-Source: AAOMgpdseZrwh5MGAjjnNRlk3oFESMPy4evlgE9z+mjAu4Qj82b5AXpo+fdhpn4UnhkAiM0n9lPlkQ== X-Received: by 2002:a17:902:82c7:: with SMTP id u7-v6mr5746077plz.83.1532692107955; Fri, 27 Jul 2018 04:48:27 -0700 (PDT) Received: from roar.au.ibm.com ([61.69.188.107]) by smtp.gmail.com with ESMTPSA id i13-v6sm6374822pgo.75.2018.07.27.04.48.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jul 2018 04:48:26 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Cc: Nicholas Piggin , Andrew Morton , Linus Torvalds , "Aneesh Kumar K . V" , linux-mm@kvack.org Subject: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references Date: Fri, 27 Jul 2018 21:48:17 +1000 Message-Id: <20180727114817.27190-1-npiggin@gmail.com> X-Mailer: git-send-email 2.17.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The page table fragment allocator uses the main page refcount racily with respect to speculative references. A customer observed a BUG due to page table page refcount underflow in the fragment allocator. This can be caused by the fragment allocator set_page_count stomping on a speculative reference, and then the speculative failure handler decrements the new reference, and the underflow eventually pops when the page tables are freed. Fix this by using a dedicated field in the struct page for the page table fragment allocator. Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") Reviewed-by: Aneesh Kumar K.V Signed-off-by: Nicholas Piggin --- Any objection to the struct page change to grab the arch specific page table page word for powerpc to use? If not, then this should go via powerpc tree because it's inconsequential for core mm. Thanks, Nick arch/powerpc/mm/mmu_context_book3s64.c | 8 ++++---- arch/powerpc/mm/pgtable-book3s64.c | 17 +++++++++++------ include/linux/mm_types.h | 5 ++++- 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index f3d4b4a0e561..3bb5cec03d1f 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -200,9 +200,9 @@ static void pte_frag_destroy(void *pte_frag) /* drop all the pending references */ count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT; /* We allow PTE_FRAG_NR fragments from a PTE page */ - if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) { + if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) { pgtable_page_dtor(page); - free_unref_page(page); + __free_page(page); } } @@ -215,9 +215,9 @@ static void pmd_frag_destroy(void *pmd_frag) /* drop all the pending references */ count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT; /* We allow PTE_FRAG_NR fragments from a PTE page */ - if (page_ref_sub_and_test(page, PMD_FRAG_NR - count)) { + if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) { pgtable_pmd_page_dtor(page); - free_unref_page(page); + __free_page(page); } } diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c index 4afbfbb64bfd..78d0b3d5ebad 100644 --- a/arch/powerpc/mm/pgtable-book3s64.c +++ b/arch/powerpc/mm/pgtable-book3s64.c @@ -270,6 +270,8 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm) return NULL; } + atomic_set(&page->pt_frag_refcount, 1); + ret = page_address(page); /* * if we support only one fragment just return the @@ -285,7 +287,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm) * count. */ if (likely(!mm->context.pmd_frag)) { - set_page_count(page, PMD_FRAG_NR); + atomic_set(&page->pt_frag_refcount, PMD_FRAG_NR); mm->context.pmd_frag = ret + PMD_FRAG_SIZE; } spin_unlock(&mm->page_table_lock); @@ -308,9 +310,10 @@ void pmd_fragment_free(unsigned long *pmd) { struct page *page = virt_to_page(pmd); - if (put_page_testzero(page)) { + BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); + if (atomic_dec_and_test(&page->pt_frag_refcount)) { pgtable_pmd_page_dtor(page); - free_unref_page(page); + __free_page(page); } } @@ -352,6 +355,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel) return NULL; } + atomic_set(&page->pt_frag_refcount, 1); ret = page_address(page); /* @@ -367,7 +371,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel) * count. */ if (likely(!mm->context.pte_frag)) { - set_page_count(page, PTE_FRAG_NR); + atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR); mm->context.pte_frag = ret + PTE_FRAG_SIZE; } spin_unlock(&mm->page_table_lock); @@ -390,10 +394,11 @@ void pte_fragment_free(unsigned long *table, int kernel) { struct page *page = virt_to_page(table); - if (put_page_testzero(page)) { + BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); + if (atomic_dec_and_test(&page->pt_frag_refcount)) { if (!kernel) pgtable_page_dtor(page); - free_unref_page(page); + __free_page(page); } } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 99ce070e7dcb..22651e124071 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -139,7 +139,10 @@ struct page { unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ - struct mm_struct *pt_mm; /* x86 pgds only */ + union { + struct mm_struct *pt_mm; /* x86 pgds only */ + atomic_t pt_frag_refcount; /* powerpc */ + }; #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; #else