From patchwork Sun Jun 25 19:06:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13292096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5BB8EB64DC for ; Sun, 25 Jun 2023 19:06:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 280CE6B0071; Sun, 25 Jun 2023 15:06:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 230866B0072; Sun, 25 Jun 2023 15:06:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F8F96B0074; Sun, 25 Jun 2023 15:06:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F08766B0071 for ; Sun, 25 Jun 2023 15:06:38 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A3AA1802D8 for ; Sun, 25 Jun 2023 19:06:38 +0000 (UTC) X-FDA: 80942201676.11.EDABE1C Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf23.hostedemail.com (Postfix) with ESMTP id CC686140018 for ; Sun, 25 Jun 2023 19:06:36 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="k0/k1JwK"; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687719996; a=rsa-sha256; cv=none; b=22CN2aJC58bl6KQQFuRbmKMtYs1HIG4www5GT1IQlsamoP0cdnkItk5Y/wXOF4289lXTaR lU5b6ADJiYqj/nuPKXhckDPePQYBR4mzjYh3nBztQ1obdKm14xZVy6FyU/9uM3Y2w/bzEX uWMKOBKUO8wiXZYKsFmBbKmEX9FfFnI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="k0/k1JwK"; spf=pass (imf23.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687719996; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Imw9X6zO34JjN9/hKCePWCCRVn8bd8VU2k+bw1pFsBY=; b=F99WbK9wO2lwIVNM/fBmcD/VSmxPdhbSR5SP1bCq8pKiVwRxrp3end3MVQtfiRi1XhIqpl fo1zfYhS5BfXynJwb76pUAuNcJNeVYJxexf0ssswoE6Qeb570kvro5HhuUJ+FXcZB9WCSO 3YBZSME/5AM+jtyupHbAnHMUkNsaRn8= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-570284c7e61so27605657b3.1 for ; Sun, 25 Jun 2023 12:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687719996; x=1690311996; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Imw9X6zO34JjN9/hKCePWCCRVn8bd8VU2k+bw1pFsBY=; b=k0/k1JwK2I3rzmS6zsa8hs0DxsZ/c8e4V4QnWNy0m/Q1BWMWLzfje6V3jr8Bvt5p5Y e9ifeimyb87A1h/0bK9pEamuzOZjkDSvDiKufwIHjesj45xp/aciV4VSvuT5DpIVa8Hl MKD4NqEMtMHaUVzlcqzXR7gekm1N1WJ67wOMI5TFB6ibeGiKgyI1fRyQcLKQaOZuwJHX 429zmtl+ycmVsZ9C1qiyyLO6v2UwYEVMSMdyLjq0WcaA3rWg2DpwF65oznvKuD7Ikj1p H1qC8nGdH2lAHFxs8hG1KCS8WrmCgLHQD4gkilLz1sB9sQLIq32EHSObFA4IyaDOFWGY j5rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687719996; x=1690311996; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Imw9X6zO34JjN9/hKCePWCCRVn8bd8VU2k+bw1pFsBY=; b=N9ozNKBHqbbhj0SFifHezUMIUiTzFiDWuXxpqdSA4VFxAZeaNpKHqfjca9Ed8zgJ3a 2gmvoKwmuVFTdG+lBYsqM1qNJL8KPte5lG0QJt1juAcS+ZRf9ir45VEO07TXknSw3CIJ N8OGWJp2p8/5inMWcti7xi9dPnFkYwh5bT5MplWI8tDSExz1Rkn7AbdeAHvZy/9kEbDA g820Vv15EF2vgw+Mghpi85j7J+D/ipJVnJfpF4vv6H56Mt1kNGBuxsHNxV6RtRGxNa8P N8/I2zSOXD5lrElqsamXTg3caa1RuBSaZnYSlRqEZMtkbmhCXyY5r4Xm/WYgk1vGvcZv L/aw== X-Gm-Message-State: AC+VfDwBYiGbPyM0OklCTr5Jy8CjVEnLmMnG2ycqVkH8ob+yk1pNB5iD ETKEeQx163t0zn9IjGhrK4mVpQ== X-Google-Smtp-Source: ACHHUZ6cMg1cmp1S9/7uKdhrCgUWhiY1OzAu6tDhk0Km5hJSweQoA7h/09Dzoy8xoEkANktpglIuEg== X-Received: by 2002:a81:a115:0:b0:570:6654:68c8 with SMTP id y21-20020a81a115000000b00570665468c8mr25968000ywg.1.1687719995799; Sun, 25 Jun 2023 12:06:35 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p79-20020a0de652000000b00573a2a0808esm930818ywe.77.2023.06.25.12.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Jun 2023 12:06:35 -0700 (PDT) Date: Sun, 25 Jun 2023 12:06:15 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Linus Torvalds cc: David Stevens , Andrew Morton , Peter Xu , Matthew Wilcox , "Kirill A . Shutemov" , Yang Shi , David Hildenbrand , Jiaqi Yan , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mm/khugepaged: fix regression in collapse_file() In-Reply-To: <8ef3ee-ba41-8e9e-4453-73736ff27783@google.com> Message-ID: References: <20230607053135.2087354-1-stevensd@google.com> <8ef3ee-ba41-8e9e-4453-73736ff27783@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CC686140018 X-Stat-Signature: iqgjj8d4xmjgbu88fjg6ufn83e7kd8fq X-Rspam-User: X-HE-Tag: 1687719996-769116 X-HE-Meta: U2FsdGVkX1/SraNHjjFnK6koY9OaQCeKJQBnTnptT15/dyrJYAqw4OipXX0iZQWhvZhZG9yqx2nQlAvd1ZEr0V1goU929UkOXWenLqih7fEhH36SkMrTwRgP323Ll6QCXGfYHL3mpivJguJ3ztQXtRoltQY6xoK9MdOMYYYOB/hmXHiCx9bBg5z5MeInU06WO5gQcDjG09/TJYLJPifgTcHIQ+PiBmDhmOA81qB+FHDN5BBmQ5HZJnq3BCQwulGoEFmC2/O47e1L33u50QwjVEDi7G0azR9dMCwv7uOvulIC6/7hLkCqedWmEfYvxQLY5ou/ZVOgZ6nvd/2mm4DurvVd2/MTZP8/BcyYrbnWY8sJEJTwLDNud3RZED3DmnKcqt4hbpW698NZPnsMQgTxlQZAY6NcVHM02HSDzW2aPQLN81Ojv0O2aqhY0s+tmqSkxcc4GLPnSVVNSSdvTymbVCXXTyJZSipHrL9Ve2mhs/O9RfKgMR/oZu83YVYCqIBZi0onaaMn3XOZom9k1GGYBz49QzbDqK1kd4q/Rhn7TqmIQ3sbHt+e5f2+Nccuu7/zUy/KneKY5ZT0qLMoYPdjasFHAlIuK3SrFCNgEWtpdgT6j8ow6/ryTsKGPF9X+awNBGyVGxku2KstPD2qPc/13JmVyv1MbRknwckUYDNd2SGCoOTATWuCFlDTrp/HQPb6Tj38g+s7ms3ItIGQ7HxOIFis4XDH8sqK58ueDD0D/Yrh9p5ZCD7fD26Zrb0fHrb22n8eqXemP4qBpetN6+iVbutI5MJJ92alCB77mpGa7K0rOp9Il3E5U07wcgNd/xTUrBGaoV5CnFV1bm9WHUevhfUgOQxoI/dzAmXvvvIZ1DY0hLAl21xxC15NDfrRuwnaxlOIqbsZ1sJn2c99V1bOmk2WUeA6xn/tnDxZ0qS7Y5Qfi//TM7DIyQWCa5Vv3Z5CobxeuxeD5CCOTXsY7qv k4KsXALj SWgkk45vwqWd4KvQQH/oRdc/AgSMMHTzQJYMnDcBYqyOOfjEv1r+069dLKdtRdUwQxswUPA0Ha+tMbRgtgwyz/tAbDLQX3XddHsrp8zVoyL0PMZytn0hoDkn5ABUw1tiYGpqyqLK2eJaRg559FcfI44JUWGEecqgW6BI0nB4O/52dukDff47ooATgzqwvNymqJncI/B0L1lTcKFt+iGcOI02P/88HK9+rd8Ha+90FQbYyBXCK8g2mxHWmPJUhn+iqFqI5Ln31ziNMh8N7gjX3RS06FIfSDHl5t3/egJu+VMPfm85Ds9elVNCMATUZxTrd0yGrh3fVY1ZJyuKCF5NORsZkoxSgwUJccAwNGWKvMNwK5pGBeoVt2B8xQCECqv21N7fGYiou2mS/TGrhRb9kCn8QlnodIKnfi4nBYscsrdSu+RSDjH3bxZX1HCBcnkaqwYXfLhPZK6EwdSe5ocW/ss3dXIHb6SsNjOv4IBO1upSHNLk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is no xas_pause(&xas) in collapse_file()'s main loop, at the points where it does xas_unlock_irq(&xas) and then continues. That would explain why, once two weeks ago and twice yesterday, I have hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged: fix iteration in collapse_file" removed the xas_set(&xas, index) just before it: xas.xa_node could be left pointing to a stale node, if there was concurrent activity on the file which transformed its xarray. I tried inserting xas_pause()s, but then even bootup crashed on that VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in xas_pause(). xas_next() and xas_pause() are good for use in simple loops, but not in this one: xas_set() worked well until now, so use xas_set(&xas, index) explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not to need its own xas_set(), and not to interfere with the xa_state (which would probably stop the crashes from xas_pause(), but I trust that less). Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/ Fixes: c8a8f3b4a95a ("mm/khugepaged: fix iteration in collapse_file") Signed-off-by: Hugh Dickins --- Linus, I'm rushing this directly to you, but not really expecting you to put it in at this stage, unless you're very comfortable with it, or perhaps it catches Matthew's eye and gets a quick Ack from him. The commit being fixed only got in after -rc7, after being held up by my initial report of this crash; but I had to rescind that when I couldn't reproduce it at all. Then yesterday morning it hit again on two machines, and reading XArray Doc reminded me of xas_pause() - seems obvious now. Patch ran for 14 hours overnight on those two machines without a problem. mm/khugepaged.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2d0d58fb4e7f..47b59f2843f6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1918,9 +1918,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, } } while (1); - xas_set(&xas, start); for (index = start; index < end; index++) { - page = xas_next(&xas); + xas_set(&xas, index); + page = xas_load(&xas); VM_BUG_ON(index != xas.xa_index); if (is_shmem) { @@ -1935,7 +1935,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, result = SCAN_TRUNCATED; goto xa_locked; } - xas_set(&xas, index + 1); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; @@ -2071,7 +2070,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_lock_irq(&xas); - VM_BUG_ON_PAGE(page != xas_load(&xas), page); + VM_BUG_ON_PAGE(page != xa_load(xas.xa, index), page); /* * We control three references to the page: