From patchwork Mon Jul 29 05:17:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Li Wang X-Patchwork-Id: 11063181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD88D15AC for ; Mon, 29 Jul 2019 05:17:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE5572853C for ; Mon, 29 Jul 2019 05:17:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C254E285AF; Mon, 29 Jul 2019 05:17:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BA302853C for ; Mon, 29 Jul 2019 05:17:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 908A08E0003; Mon, 29 Jul 2019 01:17:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B9068E0002; Mon, 29 Jul 2019 01:17:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A8D78E0003; Mon, 29 Jul 2019 01:17:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-vk1-f200.google.com (mail-vk1-f200.google.com [209.85.221.200]) by kanga.kvack.org (Postfix) with ESMTP id 56C548E0002 for ; Mon, 29 Jul 2019 01:17:39 -0400 (EDT) Received: by mail-vk1-f200.google.com with SMTP id a4so25973776vki.23 for ; Sun, 28 Jul 2019 22:17:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:mime-version :from:date:message-id:subject:to:cc; bh=fcCv9MHNIF4hI0relbpOiFS7M9UJhROw7Nzfv6sRNyY=; b=t81LWbyklVTzYEschG8rvfqOT3cjUQyCcwqpVSB2154CX+k/3OIh+GDG76vF/HaBRV D1dFycMZSwKcov7ZPIi6uy/lCzZT3W1cmcyqBsGKBUug+6ofXkOjC0PEWNwQ6Mbl6DfI zdCOW5IG5SSit8XAGRv0RVmJnX4iX3SC+FyOhVSg1yUR5L7Q3ZdnezJySs8j8pGXbgxb 7Ai1tezZn1+5F9i9v3iZOkaPEtaQMrggWy0qYuK/433jACti2mHmMJn5swINwYdLnAMu wpHsWncgN6XpUmu/pk0crs277u4EH5eYT3aZLOt1RT466ewJxk3wkHw8ToOvOHSmmmam 2J6w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of liwan@redhat.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=liwan@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: APjAAAXLuInCS3yTWg5Qh9Nqd8GyX9fm94pdLG4yt2sEePRa+UAl9Wr2 /ob75FGltD+zwxt07ckEN7u23yMOzsCcdK3u4BcG9kSw46Z3vJ9PKpslyattdLJJsbA9cZq+PZI XHiI2N/bTMmRk1EtN7BuROGhGCeOz8oTat7tsh4A83YtdP5aEMgAlTMIFrE1lhWg7ig== X-Received: by 2002:a9f:3208:: with SMTP id x8mr21786209uad.49.1564377459049; Sun, 28 Jul 2019 22:17:39 -0700 (PDT) X-Received: by 2002:a9f:3208:: with SMTP id x8mr21786192uad.49.1564377458288; Sun, 28 Jul 2019 22:17:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564377458; cv=none; d=google.com; s=arc-20160816; b=qsv91zRJkdoHe/cHczv+1oZ3wVFICUhZwl7diEPSTOg7h0eMJgOJrt0vbDjzYiVYQk a1jDnXMwGGod0t4yRmB4SivcaGOewMv3jd6guUmzuFbbboQoAhEXujXGVlEoelEnMElE TwiyX/PhjLUcFTkEeQMGLvg+78hYsBY0BmG3QgUVq7tJaTlxs/na/OfTy0WsRPlcFQjO rjKUkzzg3fTu8Cfy5JTS/W53v28yZkBg7LzWMZMBZRr9TcLRWHCZpVrHtcepYdZXNXyw eu+spSFbDiZcnI4dPj7WmLVcj+I64W5VBsZXNdoMWT++5G5MnzgwPWYVFysE4P0DMgDW FK/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:mime-version; bh=fcCv9MHNIF4hI0relbpOiFS7M9UJhROw7Nzfv6sRNyY=; b=Gqq5zwpcd4gfMSDCHgni2C1B71Z47rFOmS0s2ki2um6XVchyxdsm5YyZ2kDLXJQKE1 A3q0mPudDvQCrTwN5ZRVYZKNHg8UjeUcKS9ccVWczSAILpDmh1lLRC2bO8tMjraSOJCR W8KSGEr5quIg8ViqtcZv+FQjnQuy9NpKEr2+t0THzkopj53wpFEF9XJn4NhsyaOUsUFu J1YTA8ZrAwO1fruTnQPpJxhaLrtRothngg/D691viVS/a74HhwG7tH3DzkEdrgbU7UFU FEpa98lzj/7mJF46OQfKAqGxzUjQ6Efb6dQSwV0CC0bndmzYi5QnZfwr3D3kITzZYGQT 9z9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of liwan@redhat.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=liwan@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id l190sor17655771vkl.34.2019.07.28.22.17.38 for (Google Transport Security); Sun, 28 Jul 2019 22:17:38 -0700 (PDT) Received-SPF: pass (google.com: domain of liwan@redhat.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of liwan@redhat.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=liwan@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Google-Smtp-Source: APXvYqx2I1JxcgzwrYX7T6dYEg9Pz76lPdDf0MccfC7TFcONppQXR3mBKv1aYlv2uDv5cXvBFit+fGSpSlBAImdVNRQ= X-Received: by 2002:a1f:2117:: with SMTP id h23mr4058435vkh.91.1564377457785; Sun, 28 Jul 2019 22:17:37 -0700 (PDT) MIME-Version: 1.0 From: Li Wang Date: Mon, 29 Jul 2019 13:17:27 +0800 Message-ID: Subject: =?utf-8?q?=5BMM_Bug=3F=5D_mmap=28=29_triggers_SIGBUS_while_doing_th?= =?utf-8?q?e=E2=80=8B_=E2=80=8Bnuma=5Fmove=5Fpages=28=29_for_offlined_hugepa?= =?utf-8?q?ge_in_background?= To: Naoya Horiguchi Cc: Linux-MM , LTP List , mike.kravetz@oracle.com, xishi.qiuxishi@alibaba-inc.com, mhocko@kernel.org, Cyril Hrubis X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Hi Naoya and Linux-MMers, The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing. https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/move_pages/move_pages12.c It seems like the retry mmap() triggers SIGBUS while doing the numa_move_pages() in background. That is very similar to the kernel bug which was mentioned by commit 6bc9b56433b76e40d(mm: fix race on soft-offlining ): A race condition between soft offline and hugetlb_fault which causes unexpected process SIGBUS killing. I'm not sure if that below patch is making sene to memory-failures.c, but after building a new kernel-5.2.3 with this change, the problem can NOT be reproduced. Any comments? ---------------------------------- ----- test on kernel-v5.2.3 ------ # ./move_pages12 tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s move_pages12.c:251: INFO: Free RAM 194212832 kB move_pages12.c:269: INFO: Increasing 2048kB hugepages pool on node 0 to 4 move_pages12.c:279: INFO: Increasing 2048kB hugepages pool on node 1 to 6 move_pages12.c:195: INFO: Allocating and freeing 4 hugepages on node 0 move_pages12.c:195: INFO: Allocating and freeing 4 hugepages on node 1 move_pages12.c:185: PASS: Bug not reproduced tst_test.c:1145: BROK: Test killed by SIGBUS! move_pages12.c:114: FAIL: move_pages failed: ESRCH ----- test on kernel-v5.2.3 + above patch------ # ./move_pages12 tst_test.c:1100: INFO: Timeout per run is 0h 05m 00s move_pages12.c:252: INFO: Free RAM 64780164 kB move_pages12.c:270: INFO: Increasing 2048kB hugepages pool on node 0 to 7 move_pages12.c:280: INFO: Increasing 2048kB hugepages pool on node 1 to 10 move_pages12.c:196: INFO: Allocating and freeing 4 hugepages on node 0 move_pages12.c:196: INFO: Allocating and freeing 4 hugepages on node 1 move_pages12.c:186: PASS: Bug not reproduced move_pages12.c:186: PASS: Bug not reproduced --- Regards, Li Wang --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1695,15 +1695,16 @@ static int soft_offline_huge_page(struct page *page, int flags) unlock_page(hpage); ret = isolate_huge_page(hpage, &pagelist); + if (!ret) { + pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); + return -EBUSY; + } + /* * get_any_page() and isolate_huge_page() takes a refcount each, * so need to drop one here. */ put_hwpoison_page(hpage); - if (!ret) { - pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn); - return -EBUSY; - }