From patchwork Fri May 25 08:31:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10426631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2A8AC602D8 for ; Fri, 25 May 2018 08:31:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 150BC294A9 for ; Fri, 25 May 2018 08:31:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08E17295F7; Fri, 25 May 2018 08:31:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68A40294A9 for ; Fri, 25 May 2018 08:31:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EC3A6B0003; Fri, 25 May 2018 04:31:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4C2416B0005; Fri, 25 May 2018 04:31:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 363496B0006; Fri, 25 May 2018 04:31:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id E21E16B0003 for ; Fri, 25 May 2018 04:31:24 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id r63-v6so2505341pfl.12 for ; Fri, 25 May 2018 01:31:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=bNgf//Bw86zWBgJLqviXMne/tqgm5Tx+kT+tB5yq6No=; b=mreRNYdAbdjwdUIu3wjnn2MekHTA+6J/LsKgJMa/b3L2RwmxzOw802+5bMIfYzD+e9 u4MXM6LLbJR3uLlklLXJ1P/Qt7IpwpAOFIIX+ePIac3yLg1Q2RYlnq9ijhy9frvDOC4R w7ifWOLeTc+VFTXtpmNauEr5/2xnD4lN1TMD0l39GA2KFhTG90cLuMmvQUPZXpBd5Nib DouEAjUxLoz9pQiTY6y3bbmExOahTc1k/+SG1pbFyWB/TNpeiexH+HkZxFRrtek3jGVz f/xsNpzp3MCcN28PylebocWqVZOzeJqdn0Vj6JKz7GnejtPrYjSnNpbqLNu/y7PZoKxp kUgQ== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: ALKqPwdpWqvpwD3joNI9vIoWsyL2MRJ/DvgyQfO3RWVEaJTTqGkStg4I op3PgOdc9PvwldBk1F/YtmaxH/p7XuZ+95J9Fv2cFCnPwW5Eefyyc6aTsIZH2lGPg0HVRZNuHDa JYPF1mRHPub7TlCz9f9GkDc1GhtaSzTCpIjO5l3dB+RwGmL9vb7/CjVijnoy34Ag= X-Received: by 2002:a17:902:bb0b:: with SMTP id l11-v6mr1676465pls.190.1527237084559; Fri, 25 May 2018 01:31:24 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpGCwavuJLFns2d5QL1t0zYpUzGH3Ahm9+ciRv/MGgJ+6mo0x4NqhGgMcoGJiGk7ALS+nxO X-Received: by 2002:a17:902:bb0b:: with SMTP id l11-v6mr1676422pls.190.1527237083804; Fri, 25 May 2018 01:31:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527237083; cv=none; d=google.com; s=arc-20160816; b=IMFWzgFVJXZeCNkNHyoO6deAeAd8HFhDbtyMS2WrUR+X3TSx1kZ8zH0DGJjpUPWbQi Q2Wo56Wn4zunhwAyBlmWf9llHmEwgu1pYUPquzLQ2pPTJdlqc8Qn+R6YTnrBDLpNaTvn lNrVSIA6Fb8OIQBzfkSwR9uQOKgMT3oi21kGgYDRGRezOfxaH3UVF46tTA/69L27AJAT rvBDO3AVQ+Mu/aL6peDs7gWk30pqFEbFyz5UTc1CS0EzL2Enb0kCnp8F3lVrFSu4UO02 /KLIHCFaAMuxk7xSkIVOfhV4EClsvvt28DeiWRYELPJh2A3zfcLISOU1Hu3QhOh9yWye uKQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=bNgf//Bw86zWBgJLqviXMne/tqgm5Tx+kT+tB5yq6No=; b=E5ozkJ++mKkhTt0fX29tEg74OTSMZeXhe2Z4bngAt1a9oV8CkWh7ZwmITJstCnwmqn mM+XSGpSMRF4xZkew7SPKFXm2ZrzKD8dqwID2hxYo2JXWUn8QP2n/fYtKYpeRDYt3mw6 bW5z4jxjZdO7qhmyFl2Qouox701eGCbMovaRI33L+g1rBqWPk++Y0cYmNCGpWjQGhMGl 8wEJalHGhOh5Tzs+qpZ5HNfRbAKPzIGFhhJOi0zcNgm763F4sVehhJsNkBX5Vzm6vLVH RP2AFaYtL1Mu+PSxOrZPNWtHXmFI8zA8ylHhSed6+dqPCbKQiOUg4RbQA0LurEJT7h9H fHJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id n2-v6si24281697plk.433.2018.05.25.01.31.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 25 May 2018 01:31:23 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning mhocko@kernel.org does not designate 195.135.220.15 as permitted sender) smtp.mailfrom=mhocko@kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 0338AAE09; Fri, 25 May 2018 08:31:21 +0000 (UTC) Date: Fri, 25 May 2018 10:31:18 +0200 From: Michal Hocko To: Tetsuo Handa Cc: guro@fb.com, rientjes@google.com, hannes@cmpxchg.org, vdavydov.dev@gmail.com, tj@kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, torvalds@linux-foundation.org Subject: Re: [PATCH] mm,oom: Don't call schedule_timeout_killable() with oom_lock held. Message-ID: <20180525083118.GI11881@dhcp22.suse.cz> References: <201805241951.IFF48475.FMOSOJFQHLVtFO@I-love.SAKURA.ne.jp> <20180524115017.GE20441@dhcp22.suse.cz> <201805250117.w4P1HgdG039943@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <201805250117.w4P1HgdG039943@www262.sakura.ne.jp> User-Agent: Mutt/1.9.5 (2018-04-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On Fri 25-05-18 10:17:42, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Thu 24-05-18 19:51:24, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > Look. I am fed up with this discussion. You are fiddling with the code > > > > and moving hacks around with a lot of hand waving. Rahter than trying to > > > > look at the underlying problem. Your patch completely ignores PREEMPT as > > > > I've mentioned in previous versions. > > > > > > I'm not ignoring PREEMPT. To fix this OOM lockup problem properly, as much > > > efforts as fixing Spectre/Meltdown problems will be required. This patch is > > > a mitigation for regression introduced by fixing CVE-2018-1000200. Nothing > > > is good with deferring this patch. > > > > > > > I would be OK with removing the sleep from the out_of_memory path based > > > > on your argumentation that we have a _proper_ synchronization with the > > > > exit path now. > > > > > > Such attempt should be made in a separate patch. > > > > > > You suggested removing this sleep from my patch without realizing that > > > we need explicit schedule_timeout_*() for PF_WQ_WORKER threads. > > > > And that sleep is in should_reclaim_retry. If that is not sufficient it > > should be addressed rather than spilling more of that crud all over the > > place. > > Then, please show me (by writing a patch yourself) how to tell whether > we should sleep there. What I can come up is shown below. > > -@@ -4241,6 +4240,12 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > - /* Retry as long as the OOM killer is making progress */ > - if (did_some_progress) { > - no_progress_loops = 0; > -+ /* > -+ * This schedule_timeout_*() serves as a guaranteed sleep for > -+ * PF_WQ_WORKER threads when __zone_watermark_ok() == false. > -+ */ > -+ if (!tsk_is_oom_victim(current)) > -+ schedule_timeout_uninterruptible(1); > - goto retry; > - } > +@@ -3927,6 +3926,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) > + (*no_progress_loops)++; > > + /* > ++ * We do a short sleep here if the OOM killer/reaper/victims are > ++ * holding oom_lock, in order to try to give them some CPU resources > ++ * for releasing memory. > ++ */ > ++ if (mutex_is_locked(&oom_lock) && !tsk_is_oom_victim(current)) > ++ schedule_timeout_uninterruptible(1); > ++ > ++ /* > + * Make sure we converge to OOM if we cannot make any progress > + * several times in the row. > + */ > > As far as I know, whether a domain which the current thread belongs to is > already OOM is not known as of should_reclaim_retry(). Therefore, sleeping > there can become a pointless delay if the domain which the current thread > belongs to and the domain which the owner of oom_lock (it can be a random > thread inside out_of_memory() or exit_mmap()) belongs to differs. > > But you insist sleeping there means that you don't care about such > pointless delay? What is wrong with the folliwing? should_reclaim_retry should be a natural reschedule point. PF_WQ_WORKER is a special case which needs a stronger rescheduling policy. Doing that unconditionally seems more straightforward than depending on a zone being a good candidate for a further reclaim. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c6f4008ea55..b01b19d3d596 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3925,6 +3925,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, { struct zone *zone; struct zoneref *z; + bool ret = false; /* * Costly allocations might have made a progress but this doesn't mean @@ -3988,25 +3989,26 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } } - /* - * Memory allocation/reclaim might be called from a WQ - * context and the current implementation of the WQ - * concurrency control doesn't recognize that - * a particular WQ is congested if the worker thread is - * looping without ever sleeping. Therefore we have to - * do a short sleep here rather than calling - * cond_resched(). - */ - if (current->flags & PF_WQ_WORKER) - schedule_timeout_uninterruptible(1); - else - cond_resched(); - - return true; + ret = true; + goto out; } } - return false; +out: + /* + * Memory allocation/reclaim might be called from a WQ + * context and the current implementation of the WQ + * concurrency control doesn't recognize that + * a particular WQ is congested if the worker thread is + * looping without ever sleeping. Therefore we have to + * do a short sleep here rather than calling + * cond_resched(). + */ + if (current->flags & PF_WQ_WORKER) + schedule_timeout_uninterruptible(1); + else + cond_resched(); + return ret; } static inline bool