From patchwork Wed May 23 07:35:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10420455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 60E896016C for ; Wed, 23 May 2018 07:35:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A80428E8E for ; Wed, 23 May 2018 07:35:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3EB7228E9B; Wed, 23 May 2018 07:35:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BF36628E8E for ; Wed, 23 May 2018 07:35:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03E0A6B0003; Wed, 23 May 2018 03:35:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F30AC6B0005; Wed, 23 May 2018 03:35:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E46FA6B0006; Wed, 23 May 2018 03:35:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id 8D2B16B0003 for ; Wed, 23 May 2018 03:35:49 -0400 (EDT) Received: by mail-wm0-f70.google.com with SMTP id f21-v6so1826440wmh.5 for ; Wed, 23 May 2018 00:35:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:mime-version:content-disposition:user-agent; bh=m7Qv6LqXkXoX7SWL/gookRAmgcbmXxt6h+669ncQwOo=; b=KpXmNREr7MSEl8jDND0eznHENmv2KvoL/zYqeQSw3dWncPX0VDtJ/RvHA2lVL5RMJx FkR+jvImR6gcw50CVhZt/LLovbcjeilZGuEv+6S5z3YRw2QFWC0fRrUaPf8gmW2HmPPh C0rT432OtGsS3AgbQ4axiE0ht+UKQIqUGlxFa8vgwRzdVr+IQPHsgdSlotgPsDCwzk5x p5YGVzRd1mzL5pwUUDzVUl5EvHbuR6PVtWoYaMd81mc2i31zRJ8A+UgRuEGmX9IE3to4 tazxH3ewvBo/FFLvFrAbW0ZI7VxnZjjA8EtuzqprVXeMkpQEhWeoerrqJNpLk2KcF10/ zLQQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@techadventures.net designates 62.201.165.239 as permitted sender) smtp.mailfrom=osalvador@techadventures.net X-Gm-Message-State: ALKqPweqhNkby8qbQEBantGisqizZy99S/EoU4BElblwJJ0AuDYV/xBQ HqM9/WxXbxNoHDBekQdEX+sXf9ldb76ZAblCXq0RNnooA34U45twX+5A4cGQp7kscYPuo5v1VfB hoy6iC6QwF8eYfyAvjngNuAtGAhi7o9QF4xbV9DtCH7Xt7mqTKZf7fUsijiXQ8o/ZaQ== X-Received: by 2002:a1c:24c4:: with SMTP id k187-v6mr3220892wmk.116.1527060948949; Wed, 23 May 2018 00:35:48 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqJCfU48Pd8R9BOlCaoJbMdq9s4FzjixYsL2Ba9LSwjGvGFZwv9ld5eywvhvZ9UGwIgrhoO X-Received: by 2002:a1c:24c4:: with SMTP id k187-v6mr3220859wmk.116.1527060947975; Wed, 23 May 2018 00:35:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527060947; cv=none; d=google.com; s=arc-20160816; b=X330zcKqmcJP3G1m3Y7xFOYeHRhTHgtQ1k1nJ+sPXdrv5HsZTkzh3oIxPnizJI++3p vxTo2q38xxb7Jcs+/zzOl9LWevXFg+oGC+q6twJhDzOAA19e7an/WAbs8dKDPyrCuIhC ff/X1XOlnMdvPirgBhMjnZMukpVk5vN/Ov7A4Pzj7h+1Aqvj6ew2ELksD0xepqfxTa7H dbgLPCj9bkwq0L5OzMk+f0rs65IVFtwCkj95qVy42GDtzrenrvSQJU8cPcRiOwEfyxF0 k9ovMOY9glPzN7OFGTMzQsxSkothDE+izc85MtVZLEYjfiHQtIumF6JTQutr6T1Dar09 b5RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:content-disposition:mime-version:message-id:subject:cc :to:from:date:arc-authentication-results; bh=m7Qv6LqXkXoX7SWL/gookRAmgcbmXxt6h+669ncQwOo=; b=lauk/IHO21w1XTkXlsDVGd/eKJWeAZimidPnG9M2KxKQza9nds/T0DuH2KdMumNngt AjuLw/tqMIz3Fyl1cPSUcyvInKehTKWFFw/EUK0c9aHGuTCIkM460XzV4dLtobD9ZBku heWptlh+uVKOWX1qFQvmw+sajKrzsMGMO/DJHw000gxKSKc2JiSDo4xhZWx0zP11siF6 CzXCL1jrVVulSBJjnOGNpWUHQkl7rHEh3OoGy54plPn4SDTTrLzOU/3ICvQLQQrLWRGh X6L1yY3iEI3H9tAFBAr/LTKwZIKGVzXFN1nNA0Z6PA8jAiddtRCTLH26biBrwrekYfhd /UZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@techadventures.net designates 62.201.165.239 as permitted sender) smtp.mailfrom=osalvador@techadventures.net Received: from techadventures.net (techadventures.net. [62.201.165.239]) by mx.google.com with ESMTP id q184-v6si1054106wma.222.2018.05.23.00.35.47 for ; Wed, 23 May 2018 00:35:47 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@techadventures.net designates 62.201.165.239 as permitted sender) client-ip=62.201.165.239; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@techadventures.net designates 62.201.165.239 as permitted sender) smtp.mailfrom=osalvador@techadventures.net Received: by techadventures.net (Postfix, from userid 1000) id 56C0D122DC4; Wed, 23 May 2018 09:35:47 +0200 (CEST) Date: Wed, 23 May 2018 09:35:47 +0200 From: Oscar Salvador To: linux-mm@kvack.org Cc: mhocko@suse.com, vbabka@suse.cz, pasha.tatashin@oracle.com, akpm@linux-foundation.org Subject: [RFC] Checking for error code in __offline_pages Message-ID: <20180523073547.GA29266@techadventures.net> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Hi, This is something I spotted while testing offlining memory. __offline_pages() calls do_migrate_range() to try to migrate a range, but we do not actually check for the error code. This, besides of ignoring underlying failures, can led to a situations where we never break up the loop because we are totally unaware of what is going on. They way I spotted this was when trying to offline all memblocks belonging to a node. Due to an unfortunate setting with movablecore, memblocks containing bootmem memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. So while trying to remove that memory, the system failed in: do_migrate_range() { ... if (PageLRU(page)) ret = isolate_lru_page(page); else ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) // success: do something else if (page_count(page)) ret = -EBUSY; ... } Since the pages from bootmem are not LRU, we call isolate_movable_page() but we fail when checking for __PageMovable(). Since the page_count is more than 0 we return -EBUSY, but we do not check this in our caller, so we keep trying to migrate this memory over and over: repeat: ... pfn = scan_movable_pages(start_pfn, end_pfn); if (pfn) { /* We have movable pages */ ret = do_migrate_range(pfn, end_pfn); goto repeat; } But this is not only situation where we can get stuck. For example, if we fail with -ENOMEM in migrate_pages()->unmap_and_move()/unmap_and_move_huge_page(), we will keep trying as well. I think we should really detect these cases and fail with "goto failed_removal". Something like Now, unless I overlooked something migrate_pages()->unmap_and_move()/unmap_and_move_huge_page() can return: -ENOMEM -EAGAIN -EBUSY -ENOSYS. I am not sure if we should differentiate betweeen those errors. For example, it is possible that in migrate_pages() we just get -EAGAIN, and we return the number of "retry" we tried without having really failed. Although, since we do 10 passes it might be considered as failed. And I am not sure either if we want to propagate the error codes, or in case we fail in migrate_pages(), whatever the error was (-ENOMEM, -EBUSY, etc.), we just return -EBUSY. What do you think? Thanks Oscar Salvador --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1651,6 +1651,11 @@ static int __ref __offline_pages(unsigned long start_pfn, pfn = scan_movable_pages(start_pfn, end_pfn); if (pfn) { /* We have movable pages */ ret = do_migrate_range(pfn, end_pfn); + if (ret) { + if (ret != -ENOMEM) + ret = -EBUSY; + goto failed_removal; + } goto repeat; }