From patchwork Tue Dec 8 17:28:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 11959129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E903EC4361B for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7324623AFD for ; Tue, 8 Dec 2020 17:30:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7324623AFD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F74F6B006C; Tue, 8 Dec 2020 12:30:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D1056B0070; Tue, 8 Dec 2020 12:30:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFFE06B0071; Tue, 8 Dec 2020 12:30:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id D7DC16B006C for ; Tue, 8 Dec 2020 12:30:45 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9EAF98249980 for ; Tue, 8 Dec 2020 17:30:45 +0000 (UTC) X-FDA: 77570804850.30.scene99_410b44f273e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 69369180B3C8B for ; Tue, 8 Dec 2020 17:30:45 +0000 (UTC) X-HE-Tag: scene99_410b44f273e8 X-Filterd-Recvd-Size: 7164 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 17:30:44 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HNtKb191639; Tue, 8 Dec 2020 17:30:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=g6swsjDZvzJC7QAazYa+f++QasgrbynEkAanWwTCzP4=; b=G8frdM5XIvg9XNJYzwKy9DT8AVDtQLnRWPKHtTS1ysuB8J6wjUoZ48J6Ah6k2K08fCYN gayysIbwvSAGqQ+2rsmuUyvbcfD65SUtXoOveitYVRNSHxkzsQi2Kido21OcUpobyErC BkFrxaUds/KwXQUfH435n3AxCugDcn6NAO7bAynllz+8CIDrY8zg5rYN/cfhCw5paFhV Yh5LCbrwqB+9kZBZ/XqsDmLdl/qaWuXjzfKRqxZeeLh5xIRyJicrZAEdK/9bv4vgo4LB 924C1ySZJbXDduABBaU9lMizWtDsya5BR6n44HWzrkGn7x9N8vfgGDuPJZlkmGUTVzI1 0Q== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3581mqv00b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 08 Dec 2020 17:30:36 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0B8HPjfh032498; Tue, 8 Dec 2020 17:30:36 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 358m4y6ssv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 08 Dec 2020 17:30:36 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0B8HUZH2004555; Tue, 8 Dec 2020 17:30:35 GMT Received: from paddy.uk.oracle.com (/10.175.194.215) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Dec 2020 09:30:34 -0800 From: Joao Martins To: linux-mm@kvack.org Cc: Dan Williams , Ira Weiny , linux-nvdimm@lists.01.org, Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Joao Martins Subject: [PATCH RFC 5/9] device-dax: Compound pagemap support Date: Tue, 8 Dec 2020 17:28:57 +0000 Message-Id: <20201208172901.17384-7-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20201208172901.17384-1-joao.m.martins@oracle.com> References: <20201208172901.17384-1-joao.m.martins@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 spamscore=0 suspectscore=1 bulkscore=0 malwarescore=0 phishscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9829 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxlogscore=999 clxscore=1015 malwarescore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012080107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: dax devices are created with a fixed @align (huge page size) which is enforced through as well at mmap() of the device. Faults, consequently happen too at the specified @align specified at the creation, and those don't change through out dax device lifetime. MCEs poisons a whole dax huge page, as well as splits occurring at at the configured page size. As such, use the newly added compound pagemap facility which onlines the assigned dax ranges as compound pages. Currently, this means, that region/namespace bootstrap would take considerably less, given that you would initialize considerably less pages. On emulated NVDIMM guests this can be easily seen, e.g. on a setup with an emulated NVDIMM with 128G in size seeing improvements from ~750ms to ~190ms with 2M pages, and to less than a 1msec with 1G pages. Signed-off-by: Joao Martins --- Probably deserves its own sysfs attribute for enabling PGMAP_COMPOUND? --- drivers/dax/device.c | 54 +++++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 13 deletions(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 25e0b84a4296..9daec6e08efe 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -192,6 +192,39 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, } #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn, + unsigned int fault_size, + struct address_space *f_mapping) +{ + unsigned long i; + pgoff_t pgoff; + + pgoff = linear_page_index(vmf->vma, vmf->address + & ~(fault_size - 1)); + + for (i = 0; i < fault_size / PAGE_SIZE; i++) { + struct page *page; + + page = pfn_to_page(pfn_t_to_pfn(pfn) + i); + if (page->mapping) + continue; + page->mapping = f_mapping; + page->index = pgoff + i; + } +} + +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn, + unsigned int fault_size, + struct address_space *f_mapping) +{ + struct page *head; + + head = pfn_to_page(pfn_t_to_pfn(pfn)); + head->mapping = f_mapping; + head->index = linear_page_index(vmf->vma, vmf->address + & ~(fault_size - 1)); +} + static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) { @@ -225,8 +258,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, } if (rc == VM_FAULT_NOPAGE) { - unsigned long i; - pgoff_t pgoff; + struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap; /* * In the device-dax case the only possibility for a @@ -234,17 +266,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, * mapped. No need to consider the zero page, or racing * conflicting mappings. */ - pgoff = linear_page_index(vmf->vma, vmf->address - & ~(fault_size - 1)); - for (i = 0; i < fault_size / PAGE_SIZE; i++) { - struct page *page; - - page = pfn_to_page(pfn_t_to_pfn(pfn) + i); - if (page->mapping) - continue; - page->mapping = filp->f_mapping; - page->index = pgoff + i; - } + if (pgmap->flags & PGMAP_COMPOUND) + set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping); + else + set_page_mapping(vmf, pfn, fault_size, filp->f_mapping); } dax_read_unlock(id); @@ -426,6 +451,9 @@ int dev_dax_probe(struct dev_dax *dev_dax) } pgmap->type = MEMORY_DEVICE_GENERIC; + pgmap->flags = PGMAP_COMPOUND; + pgmap->align = dev_dax->align; + addr = devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) return PTR_ERR(addr);