From patchwork Tue Apr 18 19:12:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13216089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB137C6FD18 for ; Tue, 18 Apr 2023 19:13:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6044B8E0003; Tue, 18 Apr 2023 15:13:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B0A3900002; Tue, 18 Apr 2023 15:13:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 403458E0005; Tue, 18 Apr 2023 15:13:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 32E558E0001 for ; Tue, 18 Apr 2023 15:13:25 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AB8191A0443 for ; Tue, 18 Apr 2023 19:13:24 +0000 (UTC) X-FDA: 80695460328.20.05B357A Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf03.hostedemail.com (Postfix) with ESMTP id D01D02000C for ; Tue, 18 Apr 2023 19:13:22 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=SR96bpw9; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681845202; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+u1UuZT1pIMEXypNrA+9XQUlZTAHYDviJjF8gqonVQU=; b=XIQ1CK0Igrvs2XNcQWdaYaj0YTvGpBdV8ZddTooC8qjkv04ocQFhAQnEQv4a/tejB5KLc6 7t08eAi/Pg4VdDTjUaXFe0iM1byUwooLdPVfwVwLqS4N6zd3ipovvjI2dEizvytqNu9TAQ tvO9yBe1BUqOkAraJdeQ6ynsJzLoqe0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=SR96bpw9; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.42 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681845202; a=rsa-sha256; cv=none; b=ThaRrZoAPrIwOaKTX6dvFZp3VHeDRT0VnB3BuI31KckQJdviUf8RQWcA3q0tkn1k+68etG rsWP73ehDIu3r9gasifZT/c5LzPicvtk7lZnp7wtBKYuz6w3eyK4+1rzaSvPqe79l8Yn8f vxhvGXi3SSwDU6o+H35KJZSySdA6lNk= Received: by mail-qv1-f42.google.com with SMTP id m16so18343847qvx.9 for ; Tue, 18 Apr 2023 12:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845201; x=1684437201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+u1UuZT1pIMEXypNrA+9XQUlZTAHYDviJjF8gqonVQU=; b=SR96bpw9MNAx3LCxCgWDO8w9zD5dt4ByUGuuyAnVydDGhAmlwQbHzS187awsb5xwdO 3E5aqY4YcBiZQTeoz3fQUkUokUCWEDRMjGydvXec0gcHjnJOsbk/0vKlUc/tnBFoT/0C kWH+7WMwtws4YbPqzcCRt8bBH1bl+ICBQsLfkT1ezlUyIH/5+oOmw29dhYFofT1hkTdY fyMPwBPciCacz/1C0AgpYcdlzBi7yEJearS5hkjaNIgnLfuukw0DCDMopCTZGHeXjebj KTN/KuXdWL+pLGpBy6kKsnYoyQAxLtS4wrxhGbhjaIIdY/MpFdAuz6/NMu5kBVygLQNV /nvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681845201; x=1684437201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+u1UuZT1pIMEXypNrA+9XQUlZTAHYDviJjF8gqonVQU=; b=TXjQca2kX+uCDV3NRinDbAtT9z5k33BHH/ELkjaMMKPl0QRYFeU17tl5Q/df58A/PJ aJRYOBOyPugYkhCklt3NLKESgr4MhpWs30bZOon+pkUkyY6cE6dWOhCj6kQn2IH2BbuP lm3vUkn+I9I277gDRBZ3PaWjgcHDJn72LTHN28UUes15hBMx6+bOZkKhayUd1Kf+ST5m vNhXVo+3103bkMtASsEdguAki9c0B/u0qydzbKJ5/xynGHwcd8wNXuLAaZQQ2LCV6ezv c/+tP22cWfJYyJNRv/RsYzd9/Zsq/eOKT4RbHU1AFRFLn1kdx1+go1OqEQCPWbCsOOc0 0kcA== X-Gm-Message-State: AAQBX9e8ACyB2LQf3lWTDdh738wDZ+qHm3g1EdI3j69VAGg4pwa5oj3X q0lizw0IcM+iTzw4ohvZwlysPJKMjo1cb+U26MY= X-Google-Smtp-Source: AKy350Y9G6t8iPg8KGaVLYACgNehTbb26PoR9hC3G0xtoef9vY2sm05dJCXh7ylUzUWaDJ07gr2qMA== X-Received: by 2002:ad4:5b82:0:b0:5e9:9eb:e026 with SMTP id 2-20020ad45b82000000b005e909ebe026mr25993947qvp.29.1681845201676; Tue, 18 Apr 2023 12:13:21 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:e646]) by smtp.gmail.com with ESMTPSA id ay35-20020a05620a17a300b0074e0951c7e7sm428997qkb.28.2023.04.18.12.13.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 12:13:21 -0700 (PDT) From: Johannes Weiner To: linux-mm@kvack.org Cc: Kaiyang Zhao , Mel Gorman , Vlastimil Babka , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [RFC PATCH 02/26] mm: compaction: avoid GFP_NOFS deadlocks Date: Tue, 18 Apr 2023 15:12:49 -0400 Message-Id: <20230418191313.268131-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org> References: <20230418191313.268131-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: D01D02000C X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 8b7nc8x8cbqd1iiz9peofpm8e7j7crnh X-HE-Tag: 1681845202-425906 X-HE-Meta: U2FsdGVkX1972dphIHfoO3NucXXACkluWLvLC4/HkBbTBM5m6T86lowGZY9aDKHYIT9TWqLPYD+VbXqNVS+EX/KTB4yO+zXLYT+oTpoEIrLJ1zhc6w78v16D0z6Ki7GRWNFpg8ozn2pGnw4sj1oRoChcl00LgOfkF20M6ocq5FIm1CD7x4YmO+JKoQ0835lMAlVa+iyZzpfCSIeeI+w+vbbNCdg0hNkxqYNsrS9MhgBKViAQrufE0ZbeF4d2ZRa6QZGsP6/R4dnO5Gjuc8iDI7gdwHJTKqW0xSAi0ly29AfS729lSoutFqLtHCKaP3PKofW5HDDUBq3KLASzKsdVZzECWSsOnGhcZ8t5MMv/K6GJb1HVLdrHa0RYSe2tC1uXAPTK95XLpeOVM5F8c1wnVSi6FSoqzcFRIiYI/eix8/J2kVhm29lSxdXVVLFv5ZLw/M3QVrB+rfuIwyABDEe1lU/jz6eMiIQD+UBnxRYMGoL8rLQajl3VzDKUrmxtpfyoYf8Q+Ctb0LE/hvZlMncPNovE2kVIwf0EVpffs4E4APNGBeWLG1YtHfQf18kcnRrahTOvYNh0/PhGDrJxv+q1poUYlKrm0J992u6pgQ2ZMzbuG1eFiFZYpWretRVMtWph+L9CU4gqQ5H1jpUHu2B8sSyBedDSEV9i/rdQV1IjuPGVA2vpyz2FWdvf9M+RSct/zX5RcrBfVb82IysXwfLEWA++TsQOPjqfUxhwBpKhJS7LwIGy7byb8xDq5wf6vVaxX/eRHMRsaFnH9yHpZDseLt8ec9itDIK9P5+Icv3xYFwwFPNibOllbM0rlSZE2gL8pk4qEkYgToSfY6d41nywq2O0ESGhtC3z2k6jPvQ16Kv4VSIq0apBtEZ+GPOXtl1CuYhK6eLJuOe6nc/UGEV85pvIRcVII9YXPNwoQZ1SgvwsRwlCri+sXA2xPQfSJjGBwVxiFY7HOEfqJSK2wRI B1B0A0QF Y6YlFKHeIVREm5TrQK9krpBsRC9SSidtcorgm45bvdq7u/f4nN0LHB39NgNQoyUsPv0v4ArrB6rodBmvSJAAAQvqZIUKiAoDjZB3mOc/nHssS8CX/ZIo+4D9vSB1FhQhypEYGY9s93sRdvy6PVQEncw3BtcFCZTklbUTaH4NjFOOzwhGgw063lTgTk1CHYjhMYQA29HEnOgFuZ3SwnARQVFqfD/0nXIrCuTilcscPY0YbBfotLBFs03O45mXZso3CF2zzp1ZTG84hwS4GfRNXD4LTTu7vfl6UupSIKJ6GNUNTP2KW54Mj0BmI+8UkCB3V5UYqERqEE94B/n5HrrL17zggLc+FsPCnOmqblFFIOhn2TlU2SuzxqQtMS2EoJkSlEGe3Rjro+FcCEx98zCp7ZXFpfG+5cVB6GO8/v297MnBdLEutM5F/JjROE6VXwo7HlTq7m+7kgwBpYIkGlSi+J6no00UjGGy9W3zXuBeQ7Y2L89K37g+lqxWfKYwYk0zjIZv2NfPJ8gn0mfEpKEkHOxE95w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During stress testing, two deadlock scenarios were observed: 1. One GFP_NOFS allocation was sleeping on too_many_isolated(), and all CPUs were busy with compactors that appeared to be spinning on buffer locks. Give GFP_NOFS compactors additional isolation headroom, the same way we do during reclaim, to eliminate this deadlock scenario. 2. In a more pernicious scenario, the GFP_NOFS allocation was busy-spinning in compaction, but seemingly never making progress. Upon closer inspection, memory was dominated by file pages, which the fs compactor isn't allowed to touch. The remaining anon pages didn't have the contiguity to satisfy the request. Allow GFP_NOFS allocations to bypass watermarks when compaction failed at the highest priority. While these deadlocks were encountered only in tests with the subsequent patches (which put a lot more demand on compaction), in theory these problems already exist in the code today. Fix them now. Signed-off-by: Johannes Weiner --- mm/compaction.c | 15 +++++++++++++-- mm/page_alloc.c | 10 +++++++++- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 8238e83385a7..84db84e8fd3a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc, } /* Similar to reclaim, but different enough that they don't share logic */ -static bool too_many_isolated(pg_data_t *pgdat) +static bool too_many_isolated(struct compact_control *cc) { + pg_data_t *pgdat = cc->zone->zone_pgdat; bool too_many; unsigned long active, inactive, isolated; @@ -758,6 +759,16 @@ static bool too_many_isolated(pg_data_t *pgdat) isolated = node_page_state(pgdat, NR_ISOLATED_FILE) + node_page_state(pgdat, NR_ISOLATED_ANON); + /* + * GFP_NOFS callers are allowed to isolate more pages, so they + * won't get blocked by normal direct-reclaimers, forming a + * circular deadlock. GFP_NOIO won't get here. + */ + if (cc->gfp_mask & __GFP_FS) { + inactive >>= 3; + active >>= 3; + } + too_many = isolated > (inactive + active) / 2; if (!too_many) wake_throttle_isolated(pgdat); @@ -806,7 +817,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * list by either parallel reclaimers or compaction. If there are, * delay for some time until fewer pages are isolated */ - while (unlikely(too_many_isolated(pgdat))) { + while (unlikely(too_many_isolated(cc))) { /* stop isolation if there are still pages not migrated */ if (cc->nr_migratepages) return -EAGAIN; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3bb3484563ed..ac03571e0532 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4508,8 +4508,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, prep_new_page(page, order, gfp_mask, alloc_flags); /* Try get a page from the freelist if available */ - if (!page) + if (!page) { + /* + * It's possible that the only migration sources are + * file pages, and the GFP_NOFS stack is holding up + * other compactors. Use reserves to avoid deadlock. + */ + if (prio == MIN_COMPACT_PRIORITY && !(gfp_mask & __GFP_FS)) + alloc_flags |= ALLOC_NO_WATERMARKS; page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); + } if (page) { struct zone *zone = page_zone(page);