From patchwork Tue Aug 23 13:58:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Haiyue" X-Patchwork-Id: 12952069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DB78C32792 for ; Tue, 23 Aug 2022 13:58:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C7668D0005; Tue, 23 Aug 2022 09:58:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12A9E8D0001; Tue, 23 Aug 2022 09:58:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11598D0005; Tue, 23 Aug 2022 09:58:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3AAA8D0001 for ; Tue, 23 Aug 2022 09:58:29 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AF109160B93 for ; Tue, 23 Aug 2022 13:58:29 +0000 (UTC) X-FDA: 79831012338.06.470FCB6 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf31.hostedemail.com (Postfix) with ESMTP id 24AA920045 for ; Tue, 23 Aug 2022 13:58:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661263109; x=1692799109; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kZQc7O6HWnTgp0S4GaudnO6gE/HwhaVoL4feiAvfF0s=; b=feA8ozIccabkBkrJ5YnEFMvtNawflRkkTp73t+TTaMEjZGXmNLM+lLZN jS4xkNAbnYjRaT42Mdr38m7Hu6wu/Uxiu6SQcItL+rmGxagj2+Qqlg22i +jPT9FXsxX28moy24ORBBjN0p3dBaLIUpDBbNClEecCfpSne7CXVzL4Li CPy/Wv5vJSeD3GEWR5+0U1vaUVYhMx73PYoP84QoTEiEP6NhfjcizhMmG ZVILZgx13d9BoAt3ZxH+fi/dROv/zKu9FV2hibkJHMO/JmJatB1Ltq4V6 YwcxvF8S5OtFCH9co2DbUwYvNXMsGw0dZuFZzqerRTI14ES092KKdpAQG g==; X-IronPort-AV: E=McAfee;i="6500,9779,10448"; a="294974633" X-IronPort-AV: E=Sophos;i="5.93,257,1654585200"; d="scan'208";a="294974633" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Aug 2022 06:58:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,257,1654585200"; d="scan'208";a="638663365" Received: from sse-cse-haiyue-nuc.sh.intel.com ([10.239.241.114]) by orsmga008.jf.intel.com with ESMTP; 23 Aug 2022 06:58:25 -0700 From: Haiyue Wang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, david@redhat.com, apopple@nvidia.com, linmiaohe@huawei.com, ying.huang@intel.com, songmuchun@bytedance.com, naoya.horiguchi@linux.dev, alex.sierra@amd.com, mike.kravetz@oracle.com, gerald.schaefer@linux.ibm.com, Haiyue Wang , Baolin Wang Subject: [PATCH v7 1/2] mm: migration: fix the FOLL_GET failure on following huge page Date: Tue, 23 Aug 2022 21:58:40 +0800 Message-Id: <20220823135841.934465-2-haiyue.wang@intel.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220823135841.934465-1-haiyue.wang@intel.com> References: <20220823135841.934465-1-haiyue.wang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661263109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zqonPhSh/EYcnyW3HulZm9cGPCWz9TDXDGzawIinv2g=; b=J414dr1Rp4h4J/vmeS5SlQsrxWh+JzvmdKfBZYBMaN1+5ilPn9utj/hwuuVD/y3eFrZHV2 Zd8q908mRffymuDhpYW4miiRGbOENSVzg2/sJBf7ACKu1IOTA6d+Ud3mFhbM7xWz1BSHlP Tt3dAR9/YLNbncioDmQHDflhzBqo1zc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=feA8ozIc; spf=pass (imf31.hostedemail.com: domain of haiyue.wang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=haiyue.wang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661263109; a=rsa-sha256; cv=none; b=8l3COrK6BLbKtVSvoOe2PvaPMYWi0p6uHwovKpz71mj6DNlBi8TLAKo4mqoWU6kMA1RirH OvJ2F4TcZargNNZF+EDdx6bQBnIGcxyIQ2p5yqLrddcVPGaIsDHV3MS1AnketNL8aPT4bn LloWUDQ/fQXc8mi6aSxnLjuFqPkfmMM= X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=feA8ozIc; spf=pass (imf31.hostedemail.com: domain of haiyue.wang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=haiyue.wang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 24AA920045 X-Stat-Signature: o57jtsxirtfsucbh3ya4w3448pfkq381 X-HE-Tag: 1661263108-128681 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Not all huge page APIs support FOLL_GET option, so move_pages() syscall will fail to get the page node information for some huge pages. Like x86 on linux 5.19 with 1GB huge page API follow_huge_pud(), it will return NULL page for FOLL_GET when calling move_pages() syscall with the NULL 'nodes' parameter, the 'status' parameter has '-2' error in array. Note: follow_huge_pud() now supports FOLL_GET in linux 6.0. Link: https://lore.kernel.org/all/20220714042420.1847125-3-naoya.horiguchi@linux.dev But these huge page APIs don't support FOLL_GET: 1. follow_huge_pud() in arch/s390/mm/hugetlbpage.c 2. follow_huge_addr() in arch/ia64/mm/hugetlbpage.c It will cause WARN_ON_ONCE for FOLL_GET. 3. follow_huge_pgd() in mm/hugetlb.c This is an temporary solution to mitigate the side effect of the race condition fix by calling follow_page() with FOLL_GET set for huge pages. After supporting follow huge page by FOLL_GET is done, this fix can be reverted safely. Fixes: 4cd614841c06 ("mm: migration: fix possible do_pages_stat_array racing with memory offline") Signed-off-by: Haiyue Wang Reviewed-by: "Huang, Ying" Reviewed-by: Baolin Wang Reviewed-by: Miaohe Lin --- mm/migrate.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 6a1597c92261..581dfaad9257 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1848,6 +1848,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); + unsigned int foll_flags = FOLL_DUMP; struct vm_area_struct *vma; struct page *page; int err = -EFAULT; @@ -1856,8 +1857,12 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, if (!vma) goto set_status; + /* Not all huge page follow APIs support 'FOLL_GET' */ + if (!is_vm_hugetlb_page(vma)) + foll_flags |= FOLL_GET; + /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); + page = follow_page(vma, addr, foll_flags); err = PTR_ERR(page); if (IS_ERR(page)) @@ -1865,7 +1870,8 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, if (page && !is_zone_device_page(page)) { err = page_to_nid(page); - put_page(page); + if (foll_flags & FOLL_GET) + put_page(page); } else { err = -ENOENT; } From patchwork Tue Aug 23 13:58:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Haiyue" X-Patchwork-Id: 12952070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A441C32772 for ; Tue, 23 Aug 2022 13:58:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D60A98D0006; Tue, 23 Aug 2022 09:58:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0F468D0001; Tue, 23 Aug 2022 09:58:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB0BA8D0006; Tue, 23 Aug 2022 09:58:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id ACB3A8D0001 for ; Tue, 23 Aug 2022 09:58:33 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 83B171C5AE4 for ; Tue, 23 Aug 2022 13:58:33 +0000 (UTC) X-FDA: 79831012506.26.93D44FA Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf31.hostedemail.com (Postfix) with ESMTP id C151B20045 for ; Tue, 23 Aug 2022 13:58:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661263112; x=1692799112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4Pu50YmzN3N8F0nGXEPtFaQtALXj/wKmJ3as5yUVNsM=; b=Cg2mWjGDiDzLLB0dsiSgaCLxU6sC9HoY+NyEG8KV5K10Vkhedq/zW+NH 0/vKhvIZ8ER55gJcuAQ4OW2hnZMtOcWz6M9PkJ8yCiO5SHgJ8ksEIdQlz a9FP9uIwQ9InLHVMnBQBosgrgNp2yHoI8uammwi6v3dVLKp8p2S8GU5Zw 4X0tpbbqE3S2CkC4wvNQPlGGPlFHVX2Xrd6SkjTD7Xc75Pnp3+NzumOIf vKJdB2UIdu4RCHgLvqbYGkwyPxQox4uYFwmL8l0siAwx8ojq8+nGAh2V2 +hYe3798iyLVGBcf7swFqUOSc6hdF4RmL33WkDw3CTrAbB3I2PWLbFWLR g==; X-IronPort-AV: E=McAfee;i="6500,9779,10448"; a="294974646" X-IronPort-AV: E=Sophos;i="5.93,257,1654585200"; d="scan'208";a="294974646" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Aug 2022 06:58:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,257,1654585200"; d="scan'208";a="638663381" Received: from sse-cse-haiyue-nuc.sh.intel.com ([10.239.241.114]) by orsmga008.jf.intel.com with ESMTP; 23 Aug 2022 06:58:29 -0700 From: Haiyue Wang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, david@redhat.com, apopple@nvidia.com, linmiaohe@huawei.com, ying.huang@intel.com, songmuchun@bytedance.com, naoya.horiguchi@linux.dev, alex.sierra@amd.com, mike.kravetz@oracle.com, gerald.schaefer@linux.ibm.com, Haiyue Wang , Felix Kuehling Subject: [PATCH v7 2/2] mm: fix the handling Non-LRU pages returned by follow_page Date: Tue, 23 Aug 2022 21:58:41 +0800 Message-Id: <20220823135841.934465-3-haiyue.wang@intel.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220823135841.934465-1-haiyue.wang@intel.com> References: <20220823135841.934465-1-haiyue.wang@intel.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661263113; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QTcwD8hhjKpG4Ex3yQuhOBlJMHaSmz0B3AI7MGnnu88=; b=j2OTkrdd+dKGF4AzRuUAkLYVYmT7rBDUhjfFr203KWxPXRWPQnZ5DZkzlRDHV7WixpPJHZ 09n5jILZX8a1wUsxnfjkbWouVLmzgOjEQbr9tB53+wfOJRsOhHoOvAEOf3y/qCRrTOmqZ6 sQ8wsYMRy/iZUyKiWVfJdYKztAZbUcQ= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Cg2mWjGD; spf=pass (imf31.hostedemail.com: domain of haiyue.wang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=haiyue.wang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661263113; a=rsa-sha256; cv=none; b=pgACnHiGbJfWpnTqsoYipw4efN7cSMCbMcW2fKeoU3YRsVh5hn6lvg5z5izWnYwXNwyRw9 Dyrx2oe4qaPJKpcO+G5aWzAsxRVwOkNDZXhtvMugIceXBXfUG2Jz3691E6VxwhFxBCInK+ NRWJXHRUs9bF5cDH3uKsBVXT+rtWH9s= X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Cg2mWjGD; spf=pass (imf31.hostedemail.com: domain of haiyue.wang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=haiyue.wang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C151B20045 X-Stat-Signature: f481u5wa9py66y1i4u39umcco6tdmmqp X-HE-Tag: 1661263112-41525 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The handling Non-LRU pages returned by follow_page() jumps directly, it doesn't call put_page() to handle the reference count, since 'FOLL_GET' flag for follow_page() has get_page() called. Fix the zone device page check by handling the page reference count correctly before returning. And as David reviewed, "device pages are never PageKsm pages". Drop this zone device page check for break_ksm(). Since the zone device page can't be a transparent huge page, so drop the redundant zone device page check for split_huge_pages_pid(). (by Miaohe) Fixes: 3218f8712d6b ("mm: handling Non-LRU pages returned by vm_normal_pages") Signed-off-by: Haiyue Wang Reviewed-by: "Huang, Ying" Reviewed-by: Felix Kuehling Reviewed-by: Alistair Popple Reviewed-by: Miaohe Lin Acked-by: David Hildenbrand --- mm/huge_memory.c | 2 +- mm/ksm.c | 12 +++++++++--- mm/migrate.c | 19 ++++++++++++------- 3 files changed, 22 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8a7c1b344abe..2ee6d38a1426 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2963,7 +2963,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, /* FOLL_DUMP to ignore special (like zero) pages */ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - if (IS_ERR_OR_NULL(page) || is_zone_device_page(page)) + if (IS_ERR_OR_NULL(page)) continue; if (!is_transparent_hugepage(page)) diff --git a/mm/ksm.c b/mm/ksm.c index 42ab153335a2..e26f57fc1f0e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -475,7 +475,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) cond_resched(); page = follow_page(vma, addr, FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); - if (IS_ERR_OR_NULL(page) || is_zone_device_page(page)) + if (IS_ERR_OR_NULL(page)) break; if (PageKsm(page)) ret = handle_mm_fault(vma, addr, @@ -560,12 +560,15 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) goto out; page = follow_page(vma, addr, FOLL_GET); - if (IS_ERR_OR_NULL(page) || is_zone_device_page(page)) + if (IS_ERR_OR_NULL(page)) goto out; + if (is_zone_device_page(page)) + goto out_putpage; if (PageAnon(page)) { flush_anon_page(vma, page, addr); flush_dcache_page(page); } else { +out_putpage: put_page(page); out: page = NULL; @@ -2308,11 +2311,13 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) if (ksm_test_exit(mm)) break; *page = follow_page(vma, ksm_scan.address, FOLL_GET); - if (IS_ERR_OR_NULL(*page) || is_zone_device_page(*page)) { + if (IS_ERR_OR_NULL(*page)) { ksm_scan.address += PAGE_SIZE; cond_resched(); continue; } + if (is_zone_device_page(*page)) + goto next_page; if (PageAnon(*page)) { flush_anon_page(vma, *page, ksm_scan.address); flush_dcache_page(*page); @@ -2327,6 +2332,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) mmap_read_unlock(mm); return rmap_item; } +next_page: put_page(*page); ksm_scan.address += PAGE_SIZE; cond_resched(); diff --git a/mm/migrate.c b/mm/migrate.c index 581dfaad9257..44e05ce41d49 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1672,9 +1672,12 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, goto out; err = -ENOENT; - if (!page || is_zone_device_page(page)) + if (!page) goto out; + if (is_zone_device_page(page)) + goto out_putpage; + err = 0; if (page_to_nid(page) == node) goto out_putpage; @@ -1868,13 +1871,15 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, if (IS_ERR(page)) goto set_status; - if (page && !is_zone_device_page(page)) { + err = -ENOENT; + if (!page) + goto set_status; + + if (!is_zone_device_page(page)) err = page_to_nid(page); - if (foll_flags & FOLL_GET) - put_page(page); - } else { - err = -ENOENT; - } + + if (foll_flags & FOLL_GET) + put_page(page); set_status: *status = err;