From patchwork Mon Jun 24 21:26:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 11014177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C010F14B6 for ; Mon, 24 Jun 2019 21:26:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A445128A32 for ; Mon, 24 Jun 2019 21:26:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9255E28A35; Mon, 24 Jun 2019 21:26:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04A6128A32 for ; Mon, 24 Jun 2019 21:26:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2AAC6B0007; Mon, 24 Jun 2019 17:26:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CDBFD8E0003; Mon, 24 Jun 2019 17:26:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA3338E0002; Mon, 24 Jun 2019 17:26:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 84D126B0007 for ; Mon, 24 Jun 2019 17:26:52 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id h15so10358529pfn.3 for ; Mon, 24 Jun 2019 14:26:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:message-id:mime-version :subject:from:to:cc; bh=QuhQR2g5/6NgmqGFuivV/bpabSOVZ9f1VzuLjz/nCSs=; b=Y0PD7IVzjrZkSUS/N4odZ0GfJ7HptMNq8C07SeDb1E43WVwGjbFiQxIOGxKOH6VFWE k5JoZssRU/jXrWpLk2j5bwbgNZAPe9o6Tp7CUeo6keNcAk3RcEindluSnGaWDqe0uyID 0cZRQbkJYHsDDD4aLX10UvUgM7Mg2rzpn5lescN46bgnojiZ3noyxpT8wTTSWUBlLnNK uqP4JQWK2/PFnB0CajDv4sUzlZFPBC+b30/kxwZ8F9kssMnLF/9nyv/0w3Rnk3zt4Akl X6EEAd/FpKHrbUcD7WFR32ZazGk1rgR1bmJ5E9I5B0iFenVZkc9N5U6l5uMF/vTY3NK4 iZhQ== X-Gm-Message-State: APjAAAVd3w1NPYyUrWz3CPTFwXXDSpyud184+7kmjIS4IIi5WqDg2W3h hETAoIHWP9UfTe8cJvLjTT0s9hS88vO68VaFnIrc2q5BvQF2f9bO10k1Zt3tDrAFyUUL9kvOwpf T4ybKEvhd722UFWubyqbV4/YQUaiKnda1Jlpp2Qz2dWs6ufxyOLAZJiE8DpihLeU4iA== X-Received: by 2002:a17:90a:cf0d:: with SMTP id h13mr27178774pju.63.1561411612023; Mon, 24 Jun 2019 14:26:52 -0700 (PDT) X-Received: by 2002:a17:90a:cf0d:: with SMTP id h13mr27178684pju.63.1561411610978; Mon, 24 Jun 2019 14:26:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561411610; cv=none; d=google.com; s=arc-20160816; b=J+/sh6BBA8wbJkXuBG0rxiIbfGMr5Q5QtCqOnOWkUnvAwX090b8L0S1K2HX+It3tm+ eijD/gXYwxYf8RTXRI83BORRlYoTouRQD9QDrA0il3Jx+voAS2wAiz/yF2/GcLsb8AxL 1dhFMRYLVNM5m0XNbJYkfmDGlw+BLj1A759SNPh4gT2syzRvdoxJ1fUnO7IkDUCAn1EZ 16uBOy6OUq9BM0B70OHfRGEni0HUISZYneXbygbmw7bkKhO3BKaWGT1sNUbHLe4CwXFQ VHrDIOFbjpSKwF1oKO6xM38vsP61z6uWmobJAeuSRrNzZVN9Ib+oH7N2XS6HNvfA82vX 87ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:mime-version:message-id:date:dkim-signature; bh=QuhQR2g5/6NgmqGFuivV/bpabSOVZ9f1VzuLjz/nCSs=; b=Z+XzvYYLorbuwJoh0iAhvciceDEQ92MqJ9yUZkqweh5doC54Uz4XjpcdatRW5KDFN5 nfzfDWvS0/Fel7k19kPWXW/XC/3k48IWv8kLp8xAjI254Qh9xlDhib13kIqt1WKEITyT cEqmrASdFBQ3H1wbj3uFooYMRddCHyCVDKCax2eq7AAGLRLL+hTOy26/5ivdrHhIEJUA k1VG40jp4/poJNWHv2XEiVV6zps+ZLkLQ4jXT2y9GYeJ3T8pHDDEXDNGi3ic1Tfz9UKE SaUI5qDePf+/oaRZ7b8SBc7NkRWl+I7Iol+KLgFmTMBZqs1/U6Na8apqDmqcq3LFSMMr qnrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fCpgEkM3; spf=pass (google.com: domain of 3gkarxqgkcm0b0t3xx4uz77z4x.v75416dg-553etv3.7az@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3GkARXQgKCM0B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id a3sor7807372pfc.33.2019.06.24.14.26.50 for (Google Transport Security); Mon, 24 Jun 2019 14:26:50 -0700 (PDT) Received-SPF: pass (google.com: domain of 3gkarxqgkcm0b0t3xx4uz77z4x.v75416dg-553etv3.7az@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fCpgEkM3; spf=pass (google.com: domain of 3gkarxqgkcm0b0t3xx4uz77z4x.v75416dg-553etv3.7az@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3GkARXQgKCM0B0t3xx4uz77z4x.v75416DG-553Etv3.7Az@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=QuhQR2g5/6NgmqGFuivV/bpabSOVZ9f1VzuLjz/nCSs=; b=fCpgEkM3oU/q6tGjPXFgGeBjEu3hVd9yUvzw1RQH/0g25b6Jh+izus//6Javr+fxFc HH3rWllH1qzQxTgR+uHyVLDW9+cBU2XiQWne19jdSqEFYsuGN8N6hWC2lvovRusHjFPE wOHZB62oPVqhJ9vqAq/XIzj5rSF0X8WKlUCxK9TJtmCa/M1EUosulXBohIJZ5jLhOdY1 h8BIR+9sdQhMyh/BkBVU8vmep7CExj0cBDdxuKwkUdAL/uvVGbi5K1xqsja0TmlJ+1ia lsKhi4mrjMbKSfw9mgRUaLGeY9+RY5b7ciQDh312AiL6TloQaBuIZobxBEM8VNlJd5vn PPSQ== X-Google-Smtp-Source: APXvYqz5a7SZDemOpTEhQLNyJry7dI+Y4LMqToKE27cqw3yUALC3EexEf36v0Lyn6dcmeZbhLipgWCztywg1PA== X-Received: by 2002:a63:296:: with SMTP id 144mr35516171pgc.141.1561411610100; Mon, 24 Jun 2019 14:26:50 -0700 (PDT) Date: Mon, 24 Jun 2019 14:26:29 -0700 Message-Id: <20190624212631.87212-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog Subject: [PATCH v3 1/3] mm, oom: refactor dump_tasks for memcg OOMs From: Shakeel Butt To: Johannes Weiner , Vladimir Davydov , Michal Hocko , Andrew Morton , Roman Gushchin , David Rientjes , KOSAKI Motohiro , Tetsuo Handa , Paul Jackson , Nick Piggin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , Michal Hocko X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP dump_tasks() traverses all the existing processes even for the memcg OOM context which is not only unnecessary but also wasteful. This imposes a long RCU critical section even from a contained context which can be quite disruptive. Change dump_tasks() to be aligned with select_bad_process and use mem_cgroup_scan_tasks to selectively traverse only processes of the target memcg hierarchy during memcg OOM. Signed-off-by: Shakeel Butt Acked-by: Michal Hocko --- Changelog since v2: - Updated the commit message. Changelog since v1: - Divide the patch into two patches. mm/oom_kill.c | 68 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 05aaa1a5920b..bd80997e0969 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -385,10 +385,38 @@ static void select_bad_process(struct oom_control *oc) oc->chosen_points = oc->chosen_points * 1000 / oc->totalpages; } +static int dump_task(struct task_struct *p, void *arg) +{ + struct oom_control *oc = arg; + struct task_struct *task; + + if (oom_unkillable_task(p, NULL, oc->nodemask)) + return 0; + + task = find_lock_task_mm(p); + if (!task) { + /* + * This is a kthread or all of p's threads have already + * detached their mm's. There's no need to report + * them; they can't be oom killed anyway. + */ + return 0; + } + + pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", + task->pid, from_kuid(&init_user_ns, task_uid(task)), + task->tgid, task->mm->total_vm, get_mm_rss(task->mm), + mm_pgtables_bytes(task->mm), + get_mm_counter(task->mm, MM_SWAPENTS), + task->signal->oom_score_adj, task->comm); + task_unlock(task); + + return 0; +} + /** * dump_tasks - dump current memory state of all system tasks - * @memcg: current's memory controller, if constrained - * @nodemask: nodemask passed to page allocator for mempolicy ooms + * @oc: pointer to struct oom_control * * Dumps the current memory state of all eligible tasks. Tasks not in the same * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes @@ -396,37 +424,21 @@ static void select_bad_process(struct oom_control *oc) * State information includes task's pid, uid, tgid, vm size, rss, * pgtables_bytes, swapents, oom_score_adj value, and name. */ -static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask) +static void dump_tasks(struct oom_control *oc) { - struct task_struct *p; - struct task_struct *task; - pr_info("Tasks state (memory values in pages):\n"); pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n"); - rcu_read_lock(); - for_each_process(p) { - if (oom_unkillable_task(p, memcg, nodemask)) - continue; - task = find_lock_task_mm(p); - if (!task) { - /* - * This is a kthread or all of p's threads have already - * detached their mm's. There's no need to report - * them; they can't be oom killed anyway. - */ - continue; - } + if (is_memcg_oom(oc)) + mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); + else { + struct task_struct *p; - pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", - task->pid, from_kuid(&init_user_ns, task_uid(task)), - task->tgid, task->mm->total_vm, get_mm_rss(task->mm), - mm_pgtables_bytes(task->mm), - get_mm_counter(task->mm, MM_SWAPENTS), - task->signal->oom_score_adj, task->comm); - task_unlock(task); + rcu_read_lock(); + for_each_process(p) + dump_task(p, oc); + rcu_read_unlock(); } - rcu_read_unlock(); } static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) @@ -458,7 +470,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) dump_unreclaimable_slab(); } if (sysctl_oom_dump_tasks) - dump_tasks(oc->memcg, oc->nodemask); + dump_tasks(oc); if (p) dump_oom_summary(oc, p); } From patchwork Mon Jun 24 21:26:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 11014179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A0C3E13AF for ; Mon, 24 Jun 2019 21:27:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 903A628A32 for ; Mon, 24 Jun 2019 21:27:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83DEB28A35; Mon, 24 Jun 2019 21:27:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9414128A32 for ; Mon, 24 Jun 2019 21:27:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3F616B0008; Mon, 24 Jun 2019 17:27:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BF04E8E0003; Mon, 24 Jun 2019 17:27:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE0868E0002; Mon, 24 Jun 2019 17:27:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id 750646B0008 for ; Mon, 24 Jun 2019 17:27:08 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id k19so9768324pgl.0 for ; Mon, 24 Jun 2019 14:27:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=jaauUWabecQwQ5Q70jIH+5Q73ClPU59UVRRBap690W4=; b=VLPIJNNrRJyEvNY13KXz8WnazAb//s53feO38GEO92wHeldCP6vYFGHM5HlpEZCDnV XzkTxmswQtAH54l/CMSiMRSEiNVj1EDYGReS87GB3q6bphhF9QstE76QfFQ/06FbDv1K TRfEpDCZDdZzeyzjpcAODUGUbIXlku+qXn0QLHN11/3U1jvk+lw08riZ9xskYDo4S6Mf vOkF99RCyynDbXEcLCirck46/JKYO3Mxbm6UZysJckc3lHMDQGoro1bwA8/blGH09bMK CBD3InuUbMg7VF3rwoUpvHoBEUNf+qNJP9TOE5xVx7yeZL+CSL8lK45/5y1oz01Ef1Zz bUtg== X-Gm-Message-State: APjAAAV6P08/GHIWTwLqdGy6aTdelIVsrXgkDmteQQDuLBklEqanyGK6 PXLNtQggO0NC4IRRIUGB75qJZvHD/jkXoCHNx41yNYn2giKOFWgudGqMDqYCqqseatiTx4rVKYL 9Ui9qqv4v6pTps5ASjhulYOI9rHl0JVtcEwGW7tmreeBXpb0Rwos93OEcukHbvJ1j0Q== X-Received: by 2002:a63:7749:: with SMTP id s70mr22520408pgc.242.1561411623399; Mon, 24 Jun 2019 14:27:03 -0700 (PDT) X-Received: by 2002:a63:7749:: with SMTP id s70mr22520328pgc.242.1561411622288; Mon, 24 Jun 2019 14:27:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561411622; cv=none; d=google.com; s=arc-20160816; b=gNoyqhFKTVH5PuPJV8qGjv8m7XxIhD3EXKCgoaQe0Ebfe+9P1GUg6JhiAplbi4c+tf P6VjfDr0fmTDWccv0xycPdHAHY27JTFp/12LMqAbjwhA4QXVdte3HQvGLad1tNC7CBTO J4csFdLP5VNGVqdpi6wnZdHJCdfGjuQKIYiTAc0TOEkLB3uhusXb61iof3q4YnBdZAyE RxT6rPIzmJBqp4Z/rYWQXFyPiDo8u9tSPS/6PIGV0Rk/tUxXcWV7d2ANVdRQnz6qBnns 2Oa9QYanpSeTriux4azDc7ltU9cs1cxGiWEsYQUXY0N0TnyS1gnenIgKIGD+tIgh7VcF pXhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=jaauUWabecQwQ5Q70jIH+5Q73ClPU59UVRRBap690W4=; b=moT3+9JIarpIeXxWRRKmR9BzpMD+styM7JhMsu5aCda/7vMMK6AjqHQF9fQNIHPLoJ Q//D1CzJ7g3CiwaakkMYBof14ocSpFhpotBIxHG9BXbPT0w5mbLl8OJPXI2Hqt35zjCh 1cYYdGEQX4tD+vAblAqvubpDBhPtBmM0YL02PdbQ9ceGpnZyGkTpEICilOZeWSpSehlR 5HrZdznVLAXjp6cd5RrL0PSSdeG/8iSABi1Cuus9G40nXcnoJ2ebVOez9AiTTh9FtmTf nt9uTRXb4xKSLBRTbfgDdy+9SLzTu1shrYNWPH9dC60QRKlnRyTB7aoY3zBPmX/S6F8V 24sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nSpH5bFJ; spf=pass (google.com: domain of 3juarxqgkcngmb4e88f5aiiaf8.6igfchor-ggep46e.ila@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3JUARXQgKCNgMB4E88F5AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id y21sor7721619pfm.25.2019.06.24.14.27.02 for (Google Transport Security); Mon, 24 Jun 2019 14:27:02 -0700 (PDT) Received-SPF: pass (google.com: domain of 3juarxqgkcngmb4e88f5aiiaf8.6igfchor-ggep46e.ila@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=nSpH5bFJ; spf=pass (google.com: domain of 3juarxqgkcngmb4e88f5aiiaf8.6igfchor-ggep46e.ila@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3JUARXQgKCNgMB4E88F5AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=jaauUWabecQwQ5Q70jIH+5Q73ClPU59UVRRBap690W4=; b=nSpH5bFJw8ezGsmwSi2rzrMus3Anp17johkJQyAG2G7JqAbelPEdakMFH/NdyTRVMs /WANijmqB6eW00w3FD76Vz5V99NvA/I4Pc6f8GmNVCGaUu27vJOZ9K94Rb9Oe9og5O/F /nOQdJ/ZP8YHEC95IARa7HQAeD4lFqpQ26YotsUNWvKqgtcuy4lJyT5EXgbHyjuYRnl2 xYzXwc17gUgxRiz2RXZjXE880Ewkd5JcotUXdCw/6a69vxVqyMrkU7QV/yrpNx7Jz/qw LwEwckhismEjUaUDYJ6YIN/zeUPdYbx8Q6FsfxYe1puMIqRTfmwB8U0HbhPkRZrZJ8/0 s2Bg== X-Google-Smtp-Source: APXvYqxTVYyeiEGbsAPD+rbQmU7OP+F2TEhzLPCSQtqirFtgy0xbSaaq9tQz4tKPecocEomfIoNzNROavMVt+w== X-Received: by 2002:a63:8c0f:: with SMTP id m15mr12981076pgd.441.1561411621640; Mon, 24 Jun 2019 14:27:01 -0700 (PDT) Date: Mon, 24 Jun 2019 14:26:30 -0700 In-Reply-To: <20190624212631.87212-1-shakeelb@google.com> Message-Id: <20190624212631.87212-2-shakeelb@google.com> Mime-Version: 1.0 References: <20190624212631.87212-1-shakeelb@google.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog Subject: [PATCH v3 2/3] mm, oom: remove redundant task_in_mem_cgroup() check From: Shakeel Butt To: Johannes Weiner , Vladimir Davydov , Michal Hocko , Andrew Morton , Roman Gushchin , David Rientjes , KOSAKI Motohiro , Tetsuo Handa , Paul Jackson , Nick Piggin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP oom_unkillable_task() can be called from three different contexts i.e. global OOM, memcg OOM and oom_score procfs interface. At the moment oom_unkillable_task() does a task_in_mem_cgroup() check on the given process. Since there is no reason to perform task_in_mem_cgroup() check for global OOM and oom_score procfs interface, those contexts provide NULL memcg and skips the task_in_mem_cgroup() check. However for memcg OOM context, the oom_unkillable_task() is always called from mem_cgroup_scan_tasks() and thus task_in_mem_cgroup() check becomes redundant. So, just remove the task_in_mem_cgroup() check altogether. Signed-off-by: Shakeel Butt Signed-off-by: Tetsuo Handa Acked-by: Michal Hocko --- Changelog since v2: - Further divided the patch into two patches. - Incorporated the task_in_mem_cgroup() from Tetsuo. Changelog since v1: - Divide the patch into two patches. fs/proc/base.c | 2 +- include/linux/memcontrol.h | 7 ------- include/linux/oom.h | 2 +- mm/memcontrol.c | 26 -------------------------- mm/oom_kill.c | 19 +++++++------------ 5 files changed, 9 insertions(+), 47 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index b8d5d100ed4a..5eacce5e924a 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -532,7 +532,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, unsigned long totalpages = totalram_pages() + total_swap_pages; unsigned long points = 0; - points = oom_badness(task, NULL, NULL, totalpages) * + points = oom_badness(task, NULL, totalpages) * 1000 / totalpages; seq_printf(m, "%lu\n", points); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9abf31bbe53a..2cbce1fe7780 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -407,7 +407,6 @@ static inline struct lruvec *mem_cgroup_lruvec(struct pglist_data *pgdat, struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); -bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); @@ -896,12 +895,6 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, return true; } -static inline bool task_in_mem_cgroup(struct task_struct *task, - const struct mem_cgroup *memcg) -{ - return true; -} - static inline struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) { return NULL; diff --git a/include/linux/oom.h b/include/linux/oom.h index d07992009265..b75104690311 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -108,7 +108,7 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) bool __oom_reap_task_mm(struct mm_struct *mm); extern unsigned long oom_badness(struct task_struct *p, - struct mem_cgroup *memcg, const nodemask_t *nodemask, + const nodemask_t *nodemask, unsigned long totalpages); extern bool out_of_memory(struct oom_control *oc); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index db46a9dc37ab..27c92c2b99be 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1259,32 +1259,6 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, *lru_size += nr_pages; } -bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg) -{ - struct mem_cgroup *task_memcg; - struct task_struct *p; - bool ret; - - p = find_lock_task_mm(task); - if (p) { - task_memcg = get_mem_cgroup_from_mm(p->mm); - task_unlock(p); - } else { - /* - * All threads may have already detached their mm's, but the oom - * killer still needs to detect if they have already been oom - * killed to prevent needlessly killing additional tasks. - */ - rcu_read_lock(); - task_memcg = mem_cgroup_from_task(task); - css_get(&task_memcg->css); - rcu_read_unlock(); - } - ret = mem_cgroup_is_descendant(task_memcg, memcg); - css_put(&task_memcg->css); - return ret; -} - /** * mem_cgroup_margin - calculate chargeable space of a memory cgroup * @memcg: the memory cgroup diff --git a/mm/oom_kill.c b/mm/oom_kill.c index bd80997e0969..e0cdcbd58b0b 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -153,17 +153,13 @@ static inline bool is_memcg_oom(struct oom_control *oc) /* return true if the task is not adequate as candidate victim task. */ static bool oom_unkillable_task(struct task_struct *p, - struct mem_cgroup *memcg, const nodemask_t *nodemask) + const nodemask_t *nodemask) { if (is_global_init(p)) return true; if (p->flags & PF_KTHREAD) return true; - /* When mem_cgroup_out_of_memory() and p is not member of the group */ - if (memcg && !task_in_mem_cgroup(p, memcg)) - return true; - /* p may not have freeable memory in nodemask */ if (!has_intersects_mems_allowed(p, nodemask)) return true; @@ -194,20 +190,19 @@ static bool is_dump_unreclaim_slabs(void) * oom_badness - heuristic function to determine which candidate task to kill * @p: task struct of which task we should calculate * @totalpages: total present RAM allowed for page allocation - * @memcg: task's memory controller, if constrained * @nodemask: nodemask passed to page allocator for mempolicy ooms * * The heuristic for determining which task to kill is made to be as simple and * predictable as possible. The goal is to return the highest value for the * task consuming the most memory to avoid subsequent oom failures. */ -unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, +unsigned long oom_badness(struct task_struct *p, const nodemask_t *nodemask, unsigned long totalpages) { long points; long adj; - if (oom_unkillable_task(p, memcg, nodemask)) + if (oom_unkillable_task(p, nodemask)) return 0; p = find_lock_task_mm(p); @@ -318,7 +313,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) struct oom_control *oc = arg; unsigned long points; - if (oom_unkillable_task(task, NULL, oc->nodemask)) + if (oom_unkillable_task(task, oc->nodemask)) goto next; /* @@ -342,7 +337,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) goto select; } - points = oom_badness(task, NULL, oc->nodemask, oc->totalpages); + points = oom_badness(task, oc->nodemask, oc->totalpages); if (!points || points < oc->chosen_points) goto next; @@ -390,7 +385,7 @@ static int dump_task(struct task_struct *p, void *arg) struct oom_control *oc = arg; struct task_struct *task; - if (oom_unkillable_task(p, NULL, oc->nodemask)) + if (oom_unkillable_task(p, oc->nodemask)) return 0; task = find_lock_task_mm(p); @@ -1090,7 +1085,7 @@ bool out_of_memory(struct oom_control *oc) check_panic_on_oom(oc, constraint); if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && - current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) && + current->mm && !oom_unkillable_task(current, oc->nodemask) && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { get_task_struct(current); oc->chosen = current; From patchwork Mon Jun 24 21:26:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 11014181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AFF2013AF for ; Mon, 24 Jun 2019 21:27:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E3A128A32 for ; Mon, 24 Jun 2019 21:27:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 928B028A35; Mon, 24 Jun 2019 21:27:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F7A628A32 for ; Mon, 24 Jun 2019 21:27:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2EBB6B000A; Mon, 24 Jun 2019 17:27:14 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AE2438E0003; Mon, 24 Jun 2019 17:27:14 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A7378E0002; Mon, 24 Jun 2019 17:27:14 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 640F76B000A for ; Mon, 24 Jun 2019 17:27:14 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id i3so7954529plb.8 for ; Mon, 24 Jun 2019 14:27:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=Sa3SuUX0bUdZ+/DPsEnrnSpXOijycTDQTkt1G/hyG1Q=; b=KUbncvyEqNHC6g0Drml3SClX1gazzY/pFcIBwhtt5IBRB8+X9DUbG9+lDKKKp9ZjJk a0z/uh/Uz+1c3sbSdWwgQwfeX/1pidahl+VOpYCMTv3QLS3q8QmdSXL0/bKKm/byOZXY gSn7snbAJp1pfxglli7InFJxP+9VAn1pSMH8qHHbtn6gqe80aKJOAcBiOOAaGQwx+Jx2 O4IU4ouuWJa+3YCe4+n7kBe4w0h/nUjNOt4hklQDqvkw2Q6flatybrwAmU1Klwfn7xtT SDPfpfPSs50gVkKmvYjcGfOOmsmmdINga0eOR8XH0yy3IdSiyO7Aza4moptRwK5//STr f2HQ== X-Gm-Message-State: APjAAAXI67RqE7Mtu5Bguj27rMLx8KN5gMvrMgc1HlRixlfSksvRNd3z PDWRZFIjBgkeO9W2vTsvUKaAkja72AhWKzOePvOxOJu4Ku5bmU8xhS3rrX8ZRqTGS8ytKLW0z95 NwbDQXOrkkD0n4UyfQDSzXPKbWkNpxhsrrl8KEnhQAjRNFtBbdgZRQjd3huBUusNuMw== X-Received: by 2002:a17:902:1101:: with SMTP id d1mr30249436pla.212.1561411633978; Mon, 24 Jun 2019 14:27:13 -0700 (PDT) X-Received: by 2002:a17:902:1101:: with SMTP id d1mr30249340pla.212.1561411632511; Mon, 24 Jun 2019 14:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561411632; cv=none; d=google.com; s=arc-20160816; b=IpAyTZSSgRrtWfFT0z6K8z7AA8chXZkcy7jRRPM1W0ya2UtrUz5KYBoKI/5/AVORBN VQD6M7VcdTaSrKnkG/o9PSP+5zZf+kAKCJ8sEQ5PGX1domMJ06Af1GtfniAlFtFzyHTf YSOKxj1JBLtsy/zk2BrMTU6iV+0XmHbtk3Rb7PQTtpNYBqAI5MmS1YASEBXwPyUqUeRz lLCgTBWXCwBBYFOusxv/GFV3w99uzAknITQcEKFwJPTMImD/mDKEiZ9A65wOc0EQKAVq oNFsGX5V3AJ50osA2wAfqMEvkuBQTI0+xn6AFHQf//dvsfs68dnj+YtWBXWoShjfPXd4 KvSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=Sa3SuUX0bUdZ+/DPsEnrnSpXOijycTDQTkt1G/hyG1Q=; b=YqaoKimkFRWV5jZkQwKdHAn5PM7niNrL9at5vXavdRt1Qt+SrTDS+D+U14DIb5IUZH rUefEVdX0P5H9Jdf0ajvAoTni5TAGDLwg4+eOXMBYHLHYJxX4siFOhPK/83ub6liF4eu 1/3TyUMExa4+sRCbc4VQ0ihfmyRchC4Z3lmqprWOebI1fDnpeDGx8dhUO2ipmf+wrNOq A5hWlyV3gHxyTIg5Nz0ojN5Q9ZE1aUKtyRrtDdyH7LWGWHc5I5TWs7+tJfGhRZTIhtPu Q1CRE6UTOCvbJCe1wC2MCTBkWTmTEd4VrDl9tR96MZPkc//m55ViIv8RaV1pPmQXwSZc jyVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YSEuBJIY; spf=pass (google.com: domain of 3l0arxqgkcoiwleoiipfksskpi.gsqpmryb-qqozego.svk@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3L0ARXQgKCOIWLEOIIPFKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id n11sor6757808pgq.12.2019.06.24.14.27.12 for (Google Transport Security); Mon, 24 Jun 2019 14:27:12 -0700 (PDT) Received-SPF: pass (google.com: domain of 3l0arxqgkcoiwleoiipfksskpi.gsqpmryb-qqozego.svk@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YSEuBJIY; spf=pass (google.com: domain of 3l0arxqgkcoiwleoiipfksskpi.gsqpmryb-qqozego.svk@flex--shakeelb.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3L0ARXQgKCOIWLEOIIPFKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--shakeelb.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Sa3SuUX0bUdZ+/DPsEnrnSpXOijycTDQTkt1G/hyG1Q=; b=YSEuBJIYgMuNvjA5C4AGwJRKYgCKfgAywDauf23tmhw82cTmENHQLSvleNMzTMUx7a dZdvl6ekaexPTY21e5PaGnf+lgSUZu3WqOXpa6/figfc9kb1B0PMgYBKA2VT8GIu3YCq rTbauU2LdrdyYT3l5T5/rtaX1TFiAuvnVKfJItAnUP7n0q3u8IpF2wYJOu0hU4zeDA9X HfP23YZkkbhEgmtyOSbozmDz5BMLF+kpxuNpbnVIR1VQcSJ804QHBmI4cvTQdb67Gbw2 A2tvWqeHUrD+dG99W/OTl342w2Wi0Cn0mY3VCUrl1weMEeNcjssekT8BYB3g/PQDOjNu ZoRA== X-Google-Smtp-Source: APXvYqzE+87QCjeOOWH6Ce+0epylnDirKlTALkb9slWcVHEI96itaGHM0hDlOF2bc4DiIuDSmbo0WgHvL2/DWg== X-Received: by 2002:a63:3d0f:: with SMTP id k15mr35177489pga.343.1561411631861; Mon, 24 Jun 2019 14:27:11 -0700 (PDT) Date: Mon, 24 Jun 2019 14:26:31 -0700 In-Reply-To: <20190624212631.87212-1-shakeelb@google.com> Message-Id: <20190624212631.87212-3-shakeelb@google.com> Mime-Version: 1.0 References: <20190624212631.87212-1-shakeelb@google.com> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog Subject: [PATCH v3 3/3] oom: decouple mems_allowed from oom_unkillable_task From: Shakeel Butt To: Johannes Weiner , Vladimir Davydov , Michal Hocko , Andrew Morton , Roman Gushchin , David Rientjes , KOSAKI Motohiro , Tetsuo Handa , Paul Jackson , Nick Piggin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt , syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The commit ef08e3b4981a ("[PATCH] cpusets: confine oom_killer to mem_exclusive cpuset") introduces a heuristic where a potential oom-killer victim is skipped if the intersection of the potential victim and the current (the process triggered the oom) is empty based on the reason that killing such victim most probably will not help the current allocating process. However the commit 7887a3da753e ("[PATCH] oom: cpuset hint") changed the heuristic to just decrease the oom_badness scores of such potential victim based on the reason that the cpuset of such processes might have changed and previously they might have allocated memory on mems where the current allocating process can allocate from. Unintentionally commit 7887a3da753e ("[PATCH] oom: cpuset hint") introduced a side effect as the oom_badness is also exposed to the user space through /proc/[pid]/oom_score, so, readers with different cpusets can read different oom_score of th same process. Later the commit 6cf86ac6f36b ("oom: filter tasks not sharing the same cpuset") fixed the side effect introduced by 7887a3da753e by moving the cpuset intersection back to only oom-killer context and out of oom_badness. However the combination of the commit ab290adbaf8f ("oom: make oom_unkillable_task() helper function") and commit 26ebc984913b ("oom: /proc//oom_score treat kernel thread honestly") unintentionally brought back the cpuset intersection check into the oom_badness calculation function. Other than doing cpuset/mempolicy intersection from oom_badness, the memcg oom context is also doing cpuset/mempolicy intersection which is quite wrong and is caught by syzcaller with the following report: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline] RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline] RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155 Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff RSP: 0018:ffff888000127490 EFLAGS: 00010a03 RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001 RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0 R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007 R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6 FS: 00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000607304 CR3: 000000009237e000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: oom_evaluate_task+0x49/0x520 mm/oom_kill.c:321 mem_cgroup_scan_tasks+0xcc/0x180 mm/memcontrol.c:1169 select_bad_process mm/oom_kill.c:374 [inline] out_of_memory mm/oom_kill.c:1088 [inline] out_of_memory+0x6b2/0x1280 mm/oom_kill.c:1035 mem_cgroup_out_of_memory+0x1ca/0x230 mm/memcontrol.c:1573 mem_cgroup_oom mm/memcontrol.c:1905 [inline] try_charge+0xfbe/0x1480 mm/memcontrol.c:2468 mem_cgroup_try_charge+0x24d/0x5e0 mm/memcontrol.c:6073 mem_cgroup_try_charge_delay+0x1f/0xa0 mm/memcontrol.c:6088 do_huge_pmd_wp_page_fallback+0x24f/0x1680 mm/huge_memory.c:1201 do_huge_pmd_wp_page+0x7fc/0x2160 mm/huge_memory.c:1359 wp_huge_pmd mm/memory.c:3793 [inline] __handle_mm_fault+0x164c/0x3eb0 mm/memory.c:4006 handle_mm_fault+0x3b7/0xa90 mm/memory.c:4053 do_user_addr_fault arch/x86/mm/fault.c:1455 [inline] __do_page_fault+0x5ef/0xda0 arch/x86/mm/fault.c:1521 do_page_fault+0x71/0x57d arch/x86/mm/fault.c:1552 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1156 RIP: 0033:0x400590 Code: 06 e9 49 01 00 00 48 8b 44 24 10 48 0b 44 24 28 75 1f 48 8b 14 24 48 8b 7c 24 20 be 04 00 00 00 e8 f5 56 00 00 48 8b 74 24 08 <89> 06 e9 1e 01 00 00 48 8b 44 24 08 48 8b 14 24 be 04 00 00 00 8b RSP: 002b:00007fff7bc49780 EFLAGS: 00010206 RAX: 0000000000000001 RBX: 0000000000760000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000002000cffc RDI: 0000000000000001 RBP: fffffffffffffffe R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000760008 R13: 00000000004c55f2 R14: 0000000000000000 R15: 00007fff7bc499b0 Modules linked in: ---[ end trace a65689219582ffff ]--- RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline] RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline] RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155 Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff RSP: 0018:ffff888000127490 EFLAGS: 00010a03 RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001 RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0 R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007 R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6 FS: 00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f823000 CR3: 000000009237e000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 The fix is to decouple the cpuset/mempolicy intersection check from oom_unkillable_task() and make sure cpuset/mempolicy intersection check is only done in the global oom context. Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com Signed-off-by: Shakeel Butt Acked-by: Michal Hocko Acked-by: Roman Gushchin --- Changelog since v2: - Further divided the patch into two patches. - More cleaned version. Changelog since v1: - Divide the patch into two patches. fs/proc/base.c | 3 +-- include/linux/oom.h | 1 - mm/oom_kill.c | 51 ++++++++++++++++++++++++++------------------- 3 files changed, 30 insertions(+), 25 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 5eacce5e924a..57b7a0d75ef5 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -532,8 +532,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns, unsigned long totalpages = totalram_pages() + total_swap_pages; unsigned long points = 0; - points = oom_badness(task, NULL, totalpages) * - 1000 / totalpages; + points = oom_badness(task, totalpages) * 1000 / totalpages; seq_printf(m, "%lu\n", points); return 0; diff --git a/include/linux/oom.h b/include/linux/oom.h index b75104690311..c696c265f019 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -108,7 +108,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) bool __oom_reap_task_mm(struct mm_struct *mm); extern unsigned long oom_badness(struct task_struct *p, - const nodemask_t *nodemask, unsigned long totalpages); extern bool out_of_memory(struct oom_control *oc); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index e0cdcbd58b0b..9f91cb7036fb 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -64,6 +64,11 @@ int sysctl_oom_dump_tasks = 1; */ DEFINE_MUTEX(oom_lock); +static inline bool is_memcg_oom(struct oom_control *oc) +{ + return oc->memcg != NULL; +} + #ifdef CONFIG_NUMA /** * has_intersects_mems_allowed() - check task eligiblity for kill @@ -73,12 +78,18 @@ DEFINE_MUTEX(oom_lock); * Task eligibility is determined by whether or not a candidate task, @tsk, * shares the same mempolicy nodes as current if it is bound by such a policy * and whether or not it has the same set of allowed cpuset nodes. + * + * Only call in the global oom context (i.e. not in memcg oom). This function + * is assuming 'current' has triggered the oom-killer. */ static bool has_intersects_mems_allowed(struct task_struct *start, - const nodemask_t *mask) + struct oom_control *oc) { struct task_struct *tsk; bool ret = false; + const nodemask_t *mask = oc->nodemask; + + VM_BUG_ON(is_memcg_oom(oc)); rcu_read_lock(); for_each_thread(start, tsk) { @@ -106,7 +117,7 @@ static bool has_intersects_mems_allowed(struct task_struct *start, } #else static bool has_intersects_mems_allowed(struct task_struct *tsk, - const nodemask_t *mask) + struct oom_control *oc) { return true; } @@ -146,24 +157,13 @@ static inline bool is_sysrq_oom(struct oom_control *oc) return oc->order == -1; } -static inline bool is_memcg_oom(struct oom_control *oc) -{ - return oc->memcg != NULL; -} - /* return true if the task is not adequate as candidate victim task. */ -static bool oom_unkillable_task(struct task_struct *p, - const nodemask_t *nodemask) +static bool oom_unkillable_task(struct task_struct *p) { if (is_global_init(p)) return true; if (p->flags & PF_KTHREAD) return true; - - /* p may not have freeable memory in nodemask */ - if (!has_intersects_mems_allowed(p, nodemask)) - return true; - return false; } @@ -190,19 +190,17 @@ static bool is_dump_unreclaim_slabs(void) * oom_badness - heuristic function to determine which candidate task to kill * @p: task struct of which task we should calculate * @totalpages: total present RAM allowed for page allocation - * @nodemask: nodemask passed to page allocator for mempolicy ooms * * The heuristic for determining which task to kill is made to be as simple and * predictable as possible. The goal is to return the highest value for the * task consuming the most memory to avoid subsequent oom failures. */ -unsigned long oom_badness(struct task_struct *p, - const nodemask_t *nodemask, unsigned long totalpages) +unsigned long oom_badness(struct task_struct *p, unsigned long totalpages) { long points; long adj; - if (oom_unkillable_task(p, nodemask)) + if (oom_unkillable_task(p)) return 0; p = find_lock_task_mm(p); @@ -313,7 +311,11 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) struct oom_control *oc = arg; unsigned long points; - if (oom_unkillable_task(task, oc->nodemask)) + if (oom_unkillable_task(task)) + goto next; + + /* p may not have freeable memory in nodemask */ + if (!is_memcg_oom(oc) && !has_intersects_mems_allowed(task, oc)) goto next; /* @@ -337,7 +339,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) goto select; } - points = oom_badness(task, oc->nodemask, oc->totalpages); + points = oom_badness(task, oc->totalpages); if (!points || points < oc->chosen_points) goto next; @@ -385,7 +387,11 @@ static int dump_task(struct task_struct *p, void *arg) struct oom_control *oc = arg; struct task_struct *task; - if (oom_unkillable_task(p, oc->nodemask)) + if (oom_unkillable_task(p)) + return 0; + + /* p may not have freeable memory in nodemask */ + if (!is_memcg_oom(oc) && !has_intersects_mems_allowed(p, oc)) return 0; task = find_lock_task_mm(p); @@ -1085,7 +1091,8 @@ bool out_of_memory(struct oom_control *oc) check_panic_on_oom(oc, constraint); if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task && - current->mm && !oom_unkillable_task(current, oc->nodemask) && + current->mm && !oom_unkillable_task(current) && + has_intersects_mems_allowed(current, oc) && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) { get_task_struct(current); oc->chosen = current;