From patchwork Mon May 14 08:13:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 10397475 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8765D600D0 for ; Mon, 14 May 2018 08:14:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77B8A290C1 for ; Mon, 14 May 2018 08:14:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A9A7290C5; Mon, 14 May 2018 08:14:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9FC9290C1 for ; Mon, 14 May 2018 08:14:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2827B6B000D; Mon, 14 May 2018 04:13:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 01C7C6B000C; Mon, 14 May 2018 04:13:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D68606B000E; Mon, 14 May 2018 04:13:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id 9FE376B000D for ; Mon, 14 May 2018 04:13:56 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id c73-v6so14544156qke.2 for ; Mon, 14 May 2018 01:13:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:in-reply-to:references:message-id; bh=TQC1Nt4pZCTvNBf4y0ra/nNW4V2ocbOzvDiGRic3B8Q=; b=br0s1oP7fgbTh40qUZ7EtIGbpxMLqRntnpTPd7L97mlqeY1RlaVJEyRUW/P4ZrtO8U G4IM0xSHKqlp//N1XERIBPio54Oe7WP9WHdQLm8W+jiESZQuufTZSezyHVvOvgxutxuM tDG80Hpkd1iHQAZ08Xgh75XDiWK1sYj2O6iwdnKyJXECquDImcsxoXh7jtRRTxFYWnLm bUQr2yJoKq1qwN24xzIusKpts6xBoudQsgLiNp4sE7fGESgutLC8wGdzNDEh/ueLaleP 4O1fNzOLpcaIaGyxhevK7JE9Z0eMEKmEybeg98lN/42GoWXAkKGmilLVlTrhaBH+F98+ UStA== X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.156.1 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com X-Gm-Message-State: ALKqPwfASM/R+SOAfq+Df/120x063Pk9xbvMK/7mPyPA2vihNIGkr1s9 wIlpOeRtxFh4egSpeLzamYC6zJfQsgHAppF5h92uWv0wg0Q3uscSS3x8ztTkikpB3S0+qqYDmPt QjAwdyD+oVDkmrwOIJE01hPTa2P0UCxBueHoUob+w1yIOzYRj4oJKdbDuc6UiFgw= X-Received: by 2002:a0c:e242:: with SMTP id x2-v6mr7956174qvl.60.1526285636352; Mon, 14 May 2018 01:13:56 -0700 (PDT) X-Google-Smtp-Source: AB8JxZovwpRdBEquX8kkvYtIDWsEYM49MZrCFzJXSLmN6uQYziivDbIaxNF28t1oYDHoCI/oik8p X-Received: by 2002:a0c:e242:: with SMTP id x2-v6mr7956149qvl.60.1526285635629; Mon, 14 May 2018 01:13:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526285635; cv=none; d=google.com; s=arc-20160816; b=JKzoiabGDD3H1tm4OW5po+6YXPdEhGvX8KE0W5vVNdEkslTYYImGLT87k6BHBScmks qzmVjys2NUMh8/7VeU1UELNNFiuqVkI0p2vusHkbI2Qo9BoxoiYNM/ap7FV0XfHu6z7+ lkIDbBAd40Wu0fxR07vB/yWXjuOO/GqRadaEo4JQAkBuYTrxECIV9mJ02aiEVsQGwuiX c9FN0ZM1TfroBVMuM2PwuzSNbg+d9hmMn9Rcz++G+zlFxRQrwocAlbca1v90ovRGFT6o 0kjKgPT9NfexaR3uspLzZA74iLokVmU2TfE+WtbrwsGrSgYhPbW45PxTc16K/IBNJkYg eHNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:references:in-reply-to:date:subject:cc:to:from :arc-authentication-results; bh=TQC1Nt4pZCTvNBf4y0ra/nNW4V2ocbOzvDiGRic3B8Q=; b=A8t8usC/3JrC//gG/LrY+8e8cTWT+/FnpmCMKZQajS3jqm8nxnMzLFvC1mzrxdBF6W 4OziM5UI69ltNzoI+F/2Pl03AlUpKKlxBiLNrLnu/0UHBzW9zPBdbO+A6oNs/PeEJrx3 TX2eW39+0bwAAqxt3bw0h4XfG9/3GiBaKyQiTrggpMZ9p+5rGLe51mxJRr6dML2Qc0CF qdq0WWP/VbCyozisht1cFMeCimTgd7CB2tGShSc+OryDwd8+Fpn0r2DWCWboA9hnUpwz GWRuOfvZg7bwg6hpeoBqHRvUNEl/ZTZXrX9JKTq09OBjmSDbjRgFzb2fZFM6ngoJu5TH GShg== ARC-Authentication-Results: i=1; mx.google.com; spf=neutral (google.com: 148.163.156.1 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com. [148.163.156.1]) by mx.google.com with ESMTPS id p13-v6si6135105qtg.64.2018.05.14.01.13.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 May 2018 01:13:55 -0700 (PDT) Received-SPF: neutral (google.com: 148.163.156.1 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) client-ip=148.163.156.1; Authentication-Results: mx.google.com; spf=neutral (google.com: 148.163.156.1 is neither permitted nor denied by best guess record for domain of rppt@linux.vnet.ibm.com) smtp.mailfrom=rppt@linux.vnet.ibm.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4E84Itb133843 for ; Mon, 14 May 2018 04:13:54 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hy19s3rbf-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 14 May 2018 04:13:54 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 May 2018 09:13:52 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 14 May 2018 09:13:48 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4E8Dmur8520122; Mon, 14 May 2018 08:13:48 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A7D84C04A; Mon, 14 May 2018 09:05:41 +0100 (BST) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2EBA34C040; Mon, 14 May 2018 09:05:40 +0100 (BST) Received: from rapoport-lnx (unknown [9.148.8.81]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 14 May 2018 09:05:40 +0100 (BST) Received: by rapoport-lnx (sSMTP sendmail emulation); Mon, 14 May 2018 11:13:46 +0300 From: Mike Rapoport To: Jonathan Corbet Cc: linux-doc , linux-mm , lkml , Mike Rapoport Subject: [PATCH 1/3] docs/vm: transhuge: change sections order Date: Mon, 14 May 2018 11:13:38 +0300 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1526285620-453-1-git-send-email-rppt@linux.vnet.ibm.com> References: <1526285620-453-1-git-send-email-rppt@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18051408-0008-0000-0000-000004F639B8 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18051408-0009-0000-0000-00001E8A96F7 Message-Id: <1526285620-453-2-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-14_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805140085 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP so that userspace interface and implementation description will be grouped together Signed-off-by: Mike Rapoport --- Documentation/vm/transhuge.rst | 82 +++++++++++++++++++++--------------------- 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/Documentation/vm/transhuge.rst b/Documentation/vm/transhuge.rst index 2c6867f..56d04cbb 100644 --- a/Documentation/vm/transhuge.rst +++ b/Documentation/vm/transhuge.rst @@ -38,31 +38,6 @@ are using hugepages but a significant speedup already happens if only one of the two is using hugepages just because of the fact the TLB miss is going to run faster. -Design -====== - -- "graceful fallback": mm components which don't have transparent hugepage - knowledge fall back to breaking huge pmd mapping into table of ptes and, - if necessary, split a transparent hugepage. Therefore these components - can continue working on the regular pages or regular pte mappings. - -- if a hugepage allocation fails because of memory fragmentation, - regular pages should be gracefully allocated instead and mixed in - the same vma without any failure or significant delay and without - userland noticing - -- if some task quits and more hugepages become available (either - immediately in the buddy or through the VM), guest physical memory - backed by regular pages should be relocated on hugepages - automatically (with khugepaged) - -- it doesn't require memory reservation and in turn it uses hugepages - whenever possible (the only possible reservation here is kernelcore= - to avoid unmovable pages to fragment all the memory but such a tweak - is not specific to transparent hugepage support and it's a generic - feature that applies to all dynamic high order allocations in the - kernel) - Transparent Hugepage Support maximizes the usefulness of free memory if compared to the reservation approach of hugetlbfs by allowing all unused memory to be used as cache or other movable (or even unmovable @@ -401,6 +376,47 @@ tracer to record how long was spent in __alloc_pages_nodemask and using the mm_page_alloc tracepoint to identify which allocations were for huge pages. +Optimizing the applications +=========================== + +To be guaranteed that the kernel will map a 2M page immediately in any +memory region, the mmap region has to be hugepage naturally +aligned. posix_memalign() can provide that guarantee. + +Hugetlbfs +========= + +You can use hugetlbfs on a kernel that has transparent hugepage +support enabled just fine as always. No difference can be noted in +hugetlbfs other than there will be less overall fragmentation. All +usual features belonging to hugetlbfs are preserved and +unaffected. libhugetlbfs will also work fine as usual. + +Design principles +================= + +- "graceful fallback": mm components which don't have transparent hugepage + knowledge fall back to breaking huge pmd mapping into table of ptes and, + if necessary, split a transparent hugepage. Therefore these components + can continue working on the regular pages or regular pte mappings. + +- if a hugepage allocation fails because of memory fragmentation, + regular pages should be gracefully allocated instead and mixed in + the same vma without any failure or significant delay and without + userland noticing + +- if some task quits and more hugepages become available (either + immediately in the buddy or through the VM), guest physical memory + backed by regular pages should be relocated on hugepages + automatically (with khugepaged) + +- it doesn't require memory reservation and in turn it uses hugepages + whenever possible (the only possible reservation here is kernelcore= + to avoid unmovable pages to fragment all the memory but such a tweak + is not specific to transparent hugepage support and it's a generic + feature that applies to all dynamic high order allocations in the + kernel) + get_user_pages and follow_page ============================== @@ -432,22 +448,6 @@ hugepages being returned (as it's not only checking the pfn of the page and pinning it during the copy but it pretends to migrate the memory in regular page sizes and with regular pte/pmd mappings). -Optimizing the applications -=========================== - -To be guaranteed that the kernel will map a 2M page immediately in any -memory region, the mmap region has to be hugepage naturally -aligned. posix_memalign() can provide that guarantee. - -Hugetlbfs -========= - -You can use hugetlbfs on a kernel that has transparent hugepage -support enabled just fine as always. No difference can be noted in -hugetlbfs other than there will be less overall fragmentation. All -usual features belonging to hugetlbfs are preserved and -unaffected. libhugetlbfs will also work fine as usual. - Graceful fallback =================