From patchwork Wed Dec 12 15:50:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10726619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3EB1914BD for ; Wed, 12 Dec 2018 15:51:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2D7762B345 for ; Wed, 12 Dec 2018 15:51:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 210582B462; Wed, 12 Dec 2018 15:51:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 864E52B46F for ; Wed, 12 Dec 2018 15:51:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7697A8E0126; Wed, 12 Dec 2018 10:51:04 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6EE518E00E5; Wed, 12 Dec 2018 10:51:04 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B6FE8E0126; Wed, 12 Dec 2018 10:51:04 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 000D08E00E5 for ; Wed, 12 Dec 2018 10:51:03 -0500 (EST) Received: by mail-ed1-f71.google.com with SMTP id s50so8647949edd.11 for ; Wed, 12 Dec 2018 07:51:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=2NB6emeHojHP3Pcyzy+norq0ay8ZtgpPR5m6A6AHM3U=; b=hP51LKaGFBPzAXGC3o0KFRauTUhsOSGQFjH9EOemxZFR/UgPRQT3owNdTmLfLHXHQx CG/fwEltnwFQUYOf3UZDW36WCMrYy3okKRr6wGKO1xTrjhkrH/3naX0PFDs9tQ4sTGba 4hkIj9QD5k96O8imx2Yz5rDLY33boDXbvOqB3+KKq3DlyDLtMP0gp7gaS6xbiMJiaa+O Tc8NgE4HlspgK/Ou76hrqWlTA8AioihgNffd6SETKWkBAS/FusGfifEtOtAqXIlKwOAV iXaaa5FkzhKY+M5uwQoQBZ5vbBOFwcU1AY1nlUon7ALICcHJ0RfZzowMV3YlfxRbqr2z CQcw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AA+aEWay2jHzXFQMVvFAiDVUM/+a5zZVpzkPbtb510KH7MtFGYzMfBbe +X68HpH1iw6QxV9v6E2qOlBwXgdtBZY2Rkg9fbQP2lbInZ63yn0Ig/X53MUgO1Hk/c7MzxPYqjd wa0GM1BsQd6biKnwQUP70LHN3QNRYV190XyD31Hnbzv4l4QiEeNytiuIGm61kU+q17dudnMCvX2 triHlS8C+XWsqdD5EBI2laq7Ik0+nftOBLtEov1hecHe/nsRi9JkYJ8Tmp5j5o30uMgpyAUhaG4 MHsMFgYZEiMzHoSwgvxQrzFoKWwP8dvgwxeutCpQir0oBce5+a5crkIGrxJxJAKf1Okeclp7ufu b5SkMaSNp6ugFuqoEG1oO7mvsxt+FrmgWPiroeR9DeYJguITLqeqVcLlfKsA7eWeNOov4FdYyg= = X-Received: by 2002:a05:6402:694:: with SMTP id f20mr18333676edy.99.1544629863557; Wed, 12 Dec 2018 07:51:03 -0800 (PST) X-Received: by 2002:a05:6402:694:: with SMTP id f20mr18333634edy.99.1544629862722; Wed, 12 Dec 2018 07:51:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544629862; cv=none; d=google.com; s=arc-20160816; b=nSD+2vRKr6Rfffjyylpeoj2ypq9rPnsg/Wn7iCGHNFMqjLyD+nPi/DgpNy6bpf5DNy wh3hKCd9AP+qmdUmr9n8RBGJOZrHhOcHhtL4eh6oXniogjmux3m4JGBGwgahZszd7/63 90m2/RvENeUGlZKgZM4F50o0Dspeaij3/Q1PBk40myA8exIeBrWC+IPjxBGOJjxqszr5 gOejPvZngV6kwqKjSYUim5HoBabFqRlk8dpYrTAk9OPiH5OSTXYwYNHmYNzuYhNCQziN y3mTOQ6rsc24ffEcSWzKLximw4liV2H83+hfBl5ep0GzhuWWOXjh1A0TGHVhWxp4K6Na tbfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=2NB6emeHojHP3Pcyzy+norq0ay8ZtgpPR5m6A6AHM3U=; b=vsu6WiLb+MzjIJJBQ4JOR5AcICdle/QytXXlvDPGjMrSmOiu0dfaffF9Cg+08gtA3h Y3VtU0eNSKAQ6QxKim7Jf9BB1KGmt7trSy7O2stH/dfunEIfqtITGBEhDU/w8G0gkR38 hY59mxvi3WvXExp9fksJwNgHuh9KugifIW/Zc2Xm5WnejnFXe3zcXoCKyNCKYKdXkf/R 184Jnci0wh6yc9jlmpD2XjdO24stDHsla7Bxm0D1BEFDhn1jQJBounBHisZcliBtPmF9 idvyK4Ros+EeYK/gUG7utkQmOLYL0r8DQQunCLT0f5fTa2r9HOfCnx8Tx1bowx/ezQPk QgKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id s23-v6sor4676301eju.43.2018.12.12.07.51.02 for (Google Transport Security); Wed, 12 Dec 2018 07:51:02 -0800 (PST) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AFSGD/XnGfNEzRSWh3XdLWLYLSJZQYSGGcDfYbkNWdPkoISYlb7RYCSLlfuq1Mhdhb0Z4eF2tQvqeA== X-Received: by 2002:a17:906:5304:: with SMTP id h4-v6mr15820892ejo.39.1544629862211; Wed, 12 Dec 2018 07:51:02 -0800 (PST) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id l17sm4913030edc.56.2018.12.12.07.51.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Dec 2018 07:51:00 -0800 (PST) From: Michal Hocko To: Andrew Morton , "Kirill A. Shutemov" Cc: Liu Bo , Jan Kara , Dave Chinner , "Theodore Ts'o" , Johannes Weiner , Vladimir Davydov , , , LKML , Michal Hocko Subject: [PATCH v2] mm, memcg: fix reclaim deadlock with writeback Date: Wed, 12 Dec 2018 16:50:55 +0100 Message-Id: <20181212155055.1269-1-mhocko@kernel.org> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181211132645.31053-1-mhocko@kernel.org> References: <20181211132645.31053-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the ext4 writeback task1: [] wait_on_page_bit+0x82/0xa0 [] shrink_page_list+0x907/0x960 [] shrink_inactive_list+0x2c7/0x680 [] shrink_node_memcg+0x404/0x830 [] shrink_node+0xd8/0x300 [] do_try_to_free_pages+0x10d/0x330 [] try_to_free_mem_cgroup_pages+0xd5/0x1b0 [] try_charge+0x14d/0x720 [] memcg_kmem_charge_memcg+0x3c/0xa0 [] memcg_kmem_charge+0x7e/0xd0 [] __alloc_pages_nodemask+0x178/0x260 [] alloc_pages_current+0x95/0x140 [] pte_alloc_one+0x17/0x40 [] __pte_alloc+0x1e/0x110 [] alloc_set_pte+0x5fe/0xc20 [] do_fault+0x103/0x970 [] handle_mm_fault+0x61e/0xd10 [] __do_page_fault+0x252/0x4d0 [] do_page_fault+0x30/0x80 [] page_fault+0x28/0x30 [] 0xffffffffffffffff task2: [] __lock_page+0x86/0xa0 [] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4] [] ext4_writepages+0x479/0xd60 [] do_writepages+0x1e/0x30 [] __writeback_single_inode+0x45/0x320 [] writeback_sb_inodes+0x272/0x600 [] __writeback_inodes_wb+0x92/0xc0 [] wb_writeback+0x268/0x300 [] wb_workfn+0xb4/0x390 [] process_one_work+0x189/0x420 [] worker_thread+0x4e/0x4b0 [] kthread+0xe6/0x100 [] ret_from_fork+0x41/0x50 [] 0xffffffffffffffff He adds : task1 is waiting for the PageWriteback bit of the page that task2 has : collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED : bit the page which tasks1 has locked. More precisely task1 is handling a page fault and it has a page locked while it charges a new page table to a memcg. That in turn hits a memory limit reclaim and the memcg reclaim for legacy controller is waiting on the writeback but that is never going to finish because the writeback itself is waiting for the page locked in the #PF path. So this is essentially ABBA deadlock. Waiting for the writeback in legacy memcg controller is a workaround for pre-mature OOM killer invocations because there is no dirty IO throttling available for the controller. There is no easy way around that unfortunately. Therefore fix this specific issue by pre-allocating the page table outside of the page lock. We have that handy infrastructure for that already so simply reuse the fault-around pattern which already does this. Reported-and-Debugged-by: Liu Bo Signed-off-by: Michal Hocko --- mm/memory.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 4ad2d293ddc2..bb78e90a9b70 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; vm_fault_t ret; + /* + * Preallocate pte before we take page_lock because this might lead to + * deadlocks for memcg reclaim which waits for pages under writeback. + */ + if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) { + vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, vmf->address); + if (!vmf->prealloc_pte) + return VM_FAULT_OOM; + smp_wmb(); /* See comment in __pte_alloc() */ + } + ret = vma->vm_ops->fault(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | VM_FAULT_DONE_COW)))