From patchwork Fri Nov 9 06:47:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 10675375 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFDA1109C for ; Fri, 9 Nov 2018 06:47:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EF642DD3D for ; Fri, 9 Nov 2018 06:47:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90ED82DD52; Fri, 9 Nov 2018 06:47:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2A212DD3D for ; Fri, 9 Nov 2018 06:47:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE2F26B06A0; Fri, 9 Nov 2018 01:47:49 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C6B6C6B06A6; Fri, 9 Nov 2018 01:47:49 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0BA46B06A7; Fri, 9 Nov 2018 01:47:49 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 67C4F6B06A0 for ; Fri, 9 Nov 2018 01:47:49 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id s22so627954pgv.8 for ; Thu, 08 Nov 2018 22:47:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references; bh=CqlIoVW1d98lHuODa9U9NYQESN2YZOyjeuQFZV3TRgI=; b=WzRqofS5/qZeIOU50ubeHyo78H37cjIR54ahLsMLwB8DAKK1VoyBbYt3eLL7t+futw DNXSBq5oSCL7jO1UDHC8lnmMQKBuWda2y1MMF45/VO8nrKjxYaCUOfqbBOstUPFlCm7t Ft7v33ivXXVtufxDZ3tfRBAOmxGyJMu0guNsIRiPN6CcRsfcSVC/tI5+JEpoH0fW8ch1 EGsBnUUJ9djKcPxlovnKMip+ZriuIvx40DvvuVDtF/zKXZEnyMCXIFGUH5/eyQa+qSib 7oVFg5HQ5lBxSQMQN/JGgiZ0zWOa2GDwJKaM8s1RP6lUA6/VkUovvCATpJQBYcyYo0mv lw8w== X-Gm-Message-State: AGRZ1gLX7HbynGkk4pwtoTBI+FOaHJTr7Jg0hECkovoQdilpq7Z+5CbD Tfe6QreboEG6pHJhd62WaaUEvXdlzFRbRhQWsuEQ8Fo+pNxtF0tPAfGyI+Fl8Kp1765CZxjFd6x iJ2ikw15E6/kUDsiEKtyRsVYvJVHHGde6d927ATld2IXndc48MuxekbjuVNgrpNOyfmQjJtQpt7 n5fkXUTRsqE3EDb1hfUZe/2nUFRAG+X4Pcsy20PFnLqGC6iOmwK5KnEKtOhQ48B30aAuZplilVo 3n9BgaJr/pknxpeItdp3IpINc067nibcVczSurm+QxnCzwpgS1MtFhMxCg+VM+Nxru8KVPPekta N9hMPijQ88m7ivgMwkTEWJMM7o+5mvbbeVInw24e/Hc/LRZV//hqqN4R5gRe4b+2jrLTw3AHNA= = X-Received: by 2002:a62:da54:: with SMTP id w20-v6mr7775922pfl.106.1541746069083; Thu, 08 Nov 2018 22:47:49 -0800 (PST) X-Received: by 2002:a62:da54:: with SMTP id w20-v6mr7775866pfl.106.1541746067553; Thu, 08 Nov 2018 22:47:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541746067; cv=none; d=google.com; s=arc-20160816; b=HmZBRESovU4jtO7kkY2XtTwlTMirm6MoAhPst2LeOUvCXhVZRxjX+BTXSpvX3TMsIl A+TvvYxkOiAjKhh/IfTzDEp60vzmlI7LaVezfiirMwcRa4ltIkGLMK33trJkTARPtjoy 7JC+E2uOzhO/AWOJ29U4tovWeb6/YFHmFKogvppdjdHcuUC/i5HCMxQI+n7DRool3AMO JwucPgkN39DQac+r5lnXTdB12Z1HPq5WQCMiee6gn+cfOS/wfgyfa3X2Vq2x5EGdh3Is 6rqzR8k8mh7nCw7ZS277wonUw295APkqC+pGOCPP32fcz9GuwHc0S2JTvflNA64/zPC3 9pLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from:sender :dkim-signature; bh=CqlIoVW1d98lHuODa9U9NYQESN2YZOyjeuQFZV3TRgI=; b=cze3FLgI7hfXlpwy5f5urIlfORSeX+w9wq1CTg4tgL+jIonWP/q2ms3u163sIHOIkz VrqPxH7VUb7sVkifRryWiySOdqcle+rdF2PSnnEQHp72EGE85qcizkKF/WRIu2427Jt4 Z2PaJoVmRpBGD6QLr181oSiXGJLAnCkNCR2xIptjZi0pOH53weE2E03FYJDh4eVEACqu FDb9zeai61Lb11iAqB10CFfOVMe4g1TnNVMHtJ13L0J8FAKKVx5XKngtSMKrLc62l778 8/vF4BPfSaRzev63OlVDQtPMPxnxN2EFmuMexdHAJ2UJxWLYei9R6+9jZ5+IgREoMr+q MVFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BZXfJP7E; spf=pass (google.com: domain of nao.horiguchi@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nao.horiguchi@gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 30sor6958120pgz.10.2018.11.08.22.47.47 for (Google Transport Security); Thu, 08 Nov 2018 22:47:47 -0800 (PST) Received-SPF: pass (google.com: domain of nao.horiguchi@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BZXfJP7E; spf=pass (google.com: domain of nao.horiguchi@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=nao.horiguchi@gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=CqlIoVW1d98lHuODa9U9NYQESN2YZOyjeuQFZV3TRgI=; b=BZXfJP7E9452UWPu00jxx7AAEkL3FE0eiuA79Uq2lq79+nmfyqRUMQwn3p0Z5tdZrP 7fBk2ZeT3y1O/uQoN1kvFM7xM7OQzsATk98Dm87zfeV+soDfPLnzeXEkjVz4B36lMc2W zd2JY/UtzvA8UoaB23w+H9RPe8EysVWiguGFC6AUaKA4TeLg6ny7E/X6vj7ENVdAQZC+ mBQCq2bmn0Xvfo/hweqUk9F0P+erBmTwNtJsI8lyVMT85g4ht4x61mczVvF3aTjT9xIj 114z+3uGUEkR+db9bQ2l2/E/B4KOmvFelbNfdRSMi9QCyi+9UmSiG5SXYBcZCcYoWIes Uihw== X-Google-Smtp-Source: AJdET5cAN+kFUim4VY8q26Arb96EgiRTzks00K5X7xsj6J58mRGj6TVQ4mfDhPwnLf5PQLXLXpmSwQ== X-Received: by 2002:a63:88c7:: with SMTP id l190mr6336857pgd.110.1541746066992; Thu, 08 Nov 2018 22:47:46 -0800 (PST) Received: from www9186uo.sakura.ne.jp (www9186uo.sakura.ne.jp. [153.121.56.200]) by smtp.gmail.com with ESMTPSA id c70-v6sm6808355pfg.97.2018.11.08.22.47.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Nov 2018 22:47:46 -0800 (PST) From: Naoya Horiguchi To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Andrew Morton , Mike Kravetz , xishi.qiuxishi@alibaba-inc.com, Laurent Dufour Subject: [RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist Date: Fri, 9 Nov 2018 15:47:12 +0900 Message-Id: <1541746035-13408-9-git-send-email-n-horiguchi@ah.jp.nec.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com> References: <1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Soft-offline shares PG_hwpoison with hard-offline to keep track of memory error, but recently we found that the approach can be undesirable for soft-offline because it never expects to stop applications unlike hard-offline. So this patch suggests that memory error handler (not only sets PG_hwpoison, but) isolates error pages from buddy allocator in its context. In previous works [1], we allow soft-offline handler to set PG_hwpoison only after successful page migration and page freeing. This patch, along with that, makes the isolation always done via set_hwpoison_free_buddy_page() with zone->lock, so the behavior should be less racy and more predictable. Note that only considering for isolation, we don't have to set PG_hwpoison, but my analysis shows that to make memory hotremove properly work, we still need some flag to clearly separate memory error from any other type of pages. So this patch doesn't change this. [1]: commit 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages") commit d4ae9916ea29 ("mm: soft-offline: close the race against page allocation") Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 8 +++--- mm/page_alloc.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 70 insertions(+), 9 deletions(-) diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c index 869ff8f..ecafd4a 100644 --- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c @@ -1762,9 +1762,11 @@ static int __soft_offline_page(struct page *page) if (ret == 1) { put_hwpoison_page(page); pr_info("soft_offline: %#lx: invalidated\n", pfn); - SetPageHWPoison(page); - num_poisoned_pages_inc(); - return 0; + if (set_hwpoison_free_buddy_page(page)) { + num_poisoned_pages_inc(); + return 0; + } else + return -EBUSY; } /* diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c index ae31839..970d6ff 100644 --- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c @@ -8183,10 +8183,55 @@ bool is_free_buddy_page(struct page *page) } #ifdef CONFIG_MEMORY_FAILURE + +/* + * Pick out a free page from buddy allocator. Unlike expand(), this + * function can choose the target page by @target which is not limited + * to the first page of some free block. + * + * This function changes zone state, so callers need to hold zone->lock. + */ +static inline void pickout_buddy_page(struct zone *zone, struct page *page, + struct page *target, int torder, int low, int high, + struct free_area *area, int migratetype) +{ + unsigned long size = 1 << high; + struct page *current_buddy, *next_page; + + while (high > low) { + area--; + high--; + size >>= 1; + + if (target >= &page[size]) { /* target is in higher buddy */ + next_page = page + size; + current_buddy = page; + } else { /* target is in lower buddy */ + next_page = page; + current_buddy = page + size; + } + VM_BUG_ON_PAGE(bad_range(zone, current_buddy), current_buddy); + + if (set_page_guard(zone, &page[size], high, migratetype)) + continue; + + list_add(¤t_buddy->lru, &area->free_list[migratetype]); + area->nr_free++; + set_page_order(current_buddy, high); + page = next_page; + } +} + /* - * Set PG_hwpoison flag if a given page is confirmed to be a free page. This - * test is performed under the zone lock to prevent a race against page - * allocation. + * Isolate hwpoisoned free page which actully does the following + * - confirm that a given page is a free page under zone->lock, + * - set PG_hwpoison flag, + * - remove the page from buddy allocator, subdividing buddy page + * of each order. + * + * Just setting PG_hwpoison flag is not safe enough for complete isolation + * because rapidly-changing memory allocator code is always with the + * risk of mishandling the flag and potential race. */ bool set_hwpoison_free_buddy_page(struct page *page) { @@ -8199,10 +8244,24 @@ bool set_hwpoison_free_buddy_page(struct page *page) spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); + unsigned int forder = page_order(page_head); + struct free_area *area = &(zone->free_area[forder]); - if (PageBuddy(page_head) && page_order(page_head) >= order) { - if (!TestSetPageHWPoison(page)) - hwpoisoned = true; + if (PageBuddy(page_head) && forder >= order) { + int migtype = get_pfnblock_migratetype(page_head, + page_to_pfn(page_head)); + /* + * TestSetPageHWPoison() will be used later when + * reworking hard-offline part is finished. + */ + SetPageHWPoison(page); + + list_del(&page_head->lru); + rmv_page_order(page_head); + area->nr_free--; + pickout_buddy_page(zone, page_head, page, 0, 0, forder, + area, migtype); + hwpoisoned = true; break; } }