From patchwork Thu Dec 13 09:22:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10728261 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E7CD1759 for ; Thu, 13 Dec 2018 09:22:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EEBF2BC5A for ; Thu, 13 Dec 2018 09:22:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 221712BCAA; Thu, 13 Dec 2018 09:22:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 323A62BC5A for ; Thu, 13 Dec 2018 09:22:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27D8C8E0181; Thu, 13 Dec 2018 04:22:30 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 22D128E0161; Thu, 13 Dec 2018 04:22:30 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F60D8E0181; Thu, 13 Dec 2018 04:22:30 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id A7E538E0161 for ; Thu, 13 Dec 2018 04:22:29 -0500 (EST) Received: by mail-ed1-f71.google.com with SMTP id f17so842545edm.20 for ; Thu, 13 Dec 2018 01:22:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=JUicFBR02yoMFdfZBlyVnEhzti2L20thT/sD8xlcfyI=; b=ZzW4Uew/Oa+Cbxd8R5bJ2zKERAGk3hHlDZHaAFKsrv9FSh+8e2L0mXeToLeQ/WMABi GDjQjMJ22jaMj96y61vkAnYxukqA+WQapvl91gGPajNFW9rHx0Hqa5aeuCOtpf+3gqth FZugNsYqLNQMdKXs98t7g/gaLJt6ndfKdu9a9JA2rGvtD1vRFjSQWf4Vl3eezVOat83B eggExdi5rrBs+hMmPdvVq3KbaL4XTeDyF4Yh3Lp0+mqB11bc+q4ofH5pCkB1562PqzxV lCtBGjY7yh+ZkB5BoA7WSdUJTPKlvFMMgysc9DgTnmn+HgZolS8wpJ5K1lTm2ZMnmoBY fNkQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AA+aEWbugCmxSn5fAu1VT7fT6uIjISEpaIzoxorwQR1J0QzvJWIsQ9Kw HM9p+I6ihn+HRIFm2sUFJJX3UNDSMVmth4h6L44gp9R13GCo9UwaN4j8JeSOABBUyeS5GIlGDJl ptmwClBvaTvQuS/S6XLUkn2rlmGhH38pBYX50AcuY5oZv0KhZZxoSWw28wiyzH7R/fu+vmVD40d z6aRZ0gDu20gKCH2mIun7YEdal6CduKF+r2Qb0wKkeFY2UXq27WqTyE8SjeJqfqNLF3dKv/ogi/ 2jkx7uGW6gA0YsoOq3Osn0bsHmi6PqD0wNX4t7h/5eOpHSAv1MH2KrTz8I7i7nFh2M9clh9aNnU r2CwxBIRJCk+z2K6jjCzHHiN4vwZ1fqOq3WP/T8rUZZKfUEwEYjNXAe4yFSrvrp0D6jlid1HKg= = X-Received: by 2002:a17:906:404a:: with SMTP id y10-v6mr17808712ejj.30.1544692949195; Thu, 13 Dec 2018 01:22:29 -0800 (PST) X-Received: by 2002:a17:906:404a:: with SMTP id y10-v6mr17808677ejj.30.1544692948176; Thu, 13 Dec 2018 01:22:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544692948; cv=none; d=google.com; s=arc-20160816; b=WiSFoNfZXtV0sQkFUFf8ZNwDzecV23NYjYGhBotkMwM7xfEKVbFHcWJpJ8/0pTRnbk 96ikcwUQryW5lIZjkU+cLn1MXoGXlgQNrhnp2sSQpPy9/MTDVXZvJSLpWALBQIPJmp+2 +jrreKjobRfsLCK/o6WMYRjUIBcM+nTijFkRucloelbEXbUZKIAL6aPDZeJ9MSXhdKuq AxstdqF4++hVr2O+XRTQVp0hsYCoNDrCTaG1hgf2ECstsNigZgoOi0MGKO/XssY2n2CL xeVLd2YcTZlMt/X4TFmgT5qGRn99nUZZBGFOmO+r+5ZHwhOBb95kukLM9Uly7GXcKqKS 7IMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=JUicFBR02yoMFdfZBlyVnEhzti2L20thT/sD8xlcfyI=; b=MHV7z5marwA4AviUCeovK1Ks+osWcieghMUjoZyiYXHhOVQZBdW7oec+6/cwEiMX+y x7UZhcJdgUuZSunWkR8N/m2paZZmoTZjcYaaDJDyUQ6WP5m7HJVm7d3dDVYJJMaZ7/Aw eTHUWLS6hvxTcc1VRRsp54OzhTeTAW9GUhgfomB3NAbUBWw6H1eKD5OrFY785XMkBBti BqtY8kRbUfLJ5bnY5JFGjQgwyvR+TSfEJOrEaXE3QoUZ7GC+OLSrEmhPX7nAgRGnP1ML nNwPeojH7il6KU1xcQAGCKhXUw/kyoikbNfhg/AZ8eR5zlBxnGNOJ0gLkrO9edBBZN0b j+fA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id f47sor971376edb.4.2018.12.13.01.22.27 for (Google Transport Security); Thu, 13 Dec 2018 01:22:28 -0800 (PST) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AFSGD/VVTDGIqbyHthrws0tyrv81OrMOTPIkOhePFkpMhdy/flordrFKk+vHkSil+fdCwb5w4zkDdg== X-Received: by 2002:a50:9b1d:: with SMTP id o29mr20594527edi.246.1544692947665; Thu, 13 Dec 2018 01:22:27 -0800 (PST) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id z9sm472036edr.61.2018.12.13.01.22.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 01:22:26 -0800 (PST) From: Michal Hocko To: Andrew Morton , "Kirill A. Shutemov" Cc: Liu Bo , Jan Kara , Dave Chinner , "Theodore Ts'o" , Johannes Weiner , Vladimir Davydov , , , LKML , Shakeel Butt , Michal Hocko , Stable tree Subject: [PATCH v3] mm, memcg: fix reclaim deadlock with writeback Date: Thu, 13 Dec 2018 10:22:21 +0100 Message-Id: <20181213092221.27270-1-mhocko@kernel.org> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20181212155055.1269-1-mhocko@kernel.org> References: <20181212155055.1269-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the ext4 writeback task1: [] wait_on_page_bit+0x82/0xa0 [] shrink_page_list+0x907/0x960 [] shrink_inactive_list+0x2c7/0x680 [] shrink_node_memcg+0x404/0x830 [] shrink_node+0xd8/0x300 [] do_try_to_free_pages+0x10d/0x330 [] try_to_free_mem_cgroup_pages+0xd5/0x1b0 [] try_charge+0x14d/0x720 [] memcg_kmem_charge_memcg+0x3c/0xa0 [] memcg_kmem_charge+0x7e/0xd0 [] __alloc_pages_nodemask+0x178/0x260 [] alloc_pages_current+0x95/0x140 [] pte_alloc_one+0x17/0x40 [] __pte_alloc+0x1e/0x110 [] alloc_set_pte+0x5fe/0xc20 [] do_fault+0x103/0x970 [] handle_mm_fault+0x61e/0xd10 [] __do_page_fault+0x252/0x4d0 [] do_page_fault+0x30/0x80 [] page_fault+0x28/0x30 [] 0xffffffffffffffff task2: [] __lock_page+0x86/0xa0 [] mpage_prepare_extent_to_map+0x2e7/0x310 [ext4] [] ext4_writepages+0x479/0xd60 [] do_writepages+0x1e/0x30 [] __writeback_single_inode+0x45/0x320 [] writeback_sb_inodes+0x272/0x600 [] __writeback_inodes_wb+0x92/0xc0 [] wb_writeback+0x268/0x300 [] wb_workfn+0xb4/0x390 [] process_one_work+0x189/0x420 [] worker_thread+0x4e/0x4b0 [] kthread+0xe6/0x100 [] ret_from_fork+0x41/0x50 [] 0xffffffffffffffff He adds : task1 is waiting for the PageWriteback bit of the page that task2 has : collected in mpd->io_submit->io_bio, and tasks2 is waiting for the LOCKED : bit the page which tasks1 has locked. More precisely task1 is handling a page fault and it has a page locked while it charges a new page table to a memcg. That in turn hits a memory limit reclaim and the memcg reclaim for legacy controller is waiting on the writeback but that is never going to finish because the writeback itself is waiting for the page locked in the #PF path. So this is essentially ABBA deadlock: lock_page(A) SetPageWriteback(A) unlock_page(A) lock_page(B) lock_page(B) pte_alloc_pne shrink_page_list wait_on_page_writeback(A) SetPageWriteback(B) unlock_page(B) # flush A, B to clear the writeback This accumulating of more pages to flush is used by several filesystems to generate a more optimal IO patterns. Waiting for the writeback in legacy memcg controller is a workaround for pre-mature OOM killer invocations because there is no dirty IO throttling available for the controller. There is no easy way around that unfortunately. Therefore fix this specific issue by pre-allocating the page table outside of the page lock. We have that handy infrastructure for that already so simply reuse the fault-around pattern which already does this. There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations from under a fs page locked but they should be really rare. I am not aware of a better solution unfortunately. Reported-and-Debugged-by: Liu Bo Cc: stable Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages") Signed-off-by: Michal Hocko Acked-by: Kirill A. Shutemov Acked-by: Johannes Weiner Reviewed-by: Liu Bo --- mm/memory.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 4ad2d293ddc2..bb78e90a9b70 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2993,6 +2993,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; vm_fault_t ret; + /* + * Preallocate pte before we take page_lock because this might lead to + * deadlocks for memcg reclaim which waits for pages under writeback. + */ + if (pmd_none(*vmf->pmd) && !vmf->prealloc_pte) { + vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm, vmf->address); + if (!vmf->prealloc_pte) + return VM_FAULT_OOM; + smp_wmb(); /* See comment in __pte_alloc() */ + } + ret = vma->vm_ops->fault(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | VM_FAULT_DONE_COW)))