From patchwork Wed Mar 20 18:02:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 13598067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 496B5C54E58 for ; Wed, 20 Mar 2024 18:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37D596B009A; Wed, 20 Mar 2024 14:05:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 303946B009B; Wed, 20 Mar 2024 14:05:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 107236B009C; Wed, 20 Mar 2024 14:05:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EC4AA6B009A for ; Wed, 20 Mar 2024 14:05:04 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A01561C15CB for ; Wed, 20 Mar 2024 18:05:04 +0000 (UTC) X-FDA: 81918193728.09.3326402 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf29.hostedemail.com (Postfix) with ESMTP id 6C844120017 for ; Wed, 20 Mar 2024 18:05:02 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ob6a0R5L; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.50 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710957902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=flNXTtbOi6l1opPzPulo8MxPNWZbuYfRY5TPRoxN0V8=; b=JTROa+pxQyYkFp1hdCPLj33LHTSuMw3Va768gg7o+pLVMaXwA6EL2OkSq57sLqiszmsvtM jPdlhId+REtNlokLF0pgOxLCcOl3fJyQsKpViCShWCy1uYMazrqQW1GgPtkybErNC/o4em ERkc73oCYIp7kMevqS35LKgBMgbP2vI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=ob6a0R5L; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.50 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710957902; a=rsa-sha256; cv=none; b=TFkPk1gb31IV+ftColGZ6fJJ4tfpevjWkBKS2hwbHghoOdMUBFmEig1fzmVxWYYU3ZoQzK y26ckIk1r1tOqtOt7R+4JMax/iskscO+S+ZOcNqavTtvLFe0seQb4+W1waO9EXdtiAlvze Lv5j2e8O7BLeg1uvlbOMe/BxdcPLHdA= Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-690fed6816fso1159296d6.1 for ; Wed, 20 Mar 2024 11:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1710957901; x=1711562701; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=flNXTtbOi6l1opPzPulo8MxPNWZbuYfRY5TPRoxN0V8=; b=ob6a0R5L+O12jRyP600rGiAr2z0duhKxqK7DVrcmG74Xf4eoyYYqjd1pnoNTGVYx/m llIaHpJ6j8iKeUf/EnDfhGItwjaKDagDAQcUZDOAN42fX0IybrTeXsxEB56F4x9uCoSW xoZ24tv1eiGAGIbCzY0tSUgVNPuOPr1Et6+JzeKWmMZ6LiGcxwRHjrhF6YSZ2RWuT24n K+ci/UX2GElIge6c+XPrLmMexdZ2rGCVKB3Bztc743IpcrqvW8aRoCa5n+dyi/6cX2ej Q64dyK2n0xRfpuNkyQ00S67C3HKW5RYEROUGW/zLTJztSe459Y2y/WFYgqX2L9NvTDJf xh7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710957901; x=1711562701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=flNXTtbOi6l1opPzPulo8MxPNWZbuYfRY5TPRoxN0V8=; b=gu9mP1uWJVZp55Hu1PqO6hLzMdCerQGZrZMK47SXwfsBEAtJnjQgVwZLWZALlfUQoZ izrGuHIGb7zhT7LUAfI3+VRHuMlqQ8+0SQilexbztRL2JgieknP0Btvtj4T2YuasLt8L SbWKwx4bYjblKFD1Qz43J+pKRRUSdHb7fKLhAJ+NdQ9HSs4w6TyA3A9GQyLtGjN+o+AF nfj1hZQfNdTkVehyyervPepB1i74iSGnGLsvHaPANryWCQuoWfzWOnnR/7YShNprno4z rK/spGsBn4KGdtbcIDOZWgCJ/Axp5Lv2nnZxOJq8dAnQeB5Wtgsb5ZIOAapjowwd4pY2 qvXA== X-Forwarded-Encrypted: i=1; AJvYcCUHK56dE5APMtswZxnRMjOwycO0lFQ6IVQwqeynLbFp46g+CSaCnNMo/lXTy/ffqz7/IZtwlzmPl4JWZl9Nx1nlS5c= X-Gm-Message-State: AOJu0Yz2fzoDD6F+Iy72mptzyxIir862Q8GSzOghMY2yl1lf88n79ZS2 DDiD5g1ZdDuCC52W3ieZyQxwcx7V5OOjPKLl56GbEH/WjkaqU/XwXlHAF7sQ6s0= X-Google-Smtp-Source: AGHT+IFZ8dHGGlrLtFAYw7pJUEMpCiD9DvUpZzuKYVnmjN5J3RADhgWs7NKoZjpYmw1YXPj55rjIPA== X-Received: by 2002:ad4:55d1:0:b0:696:48ca:99ad with SMTP id bt17-20020ad455d1000000b0069648ca99admr1661910qvb.14.1710957901465; Wed, 20 Mar 2024 11:05:01 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id jm14-20020ad45ece000000b00690c5cc0ff6sm8102285qvb.124.2024.03.20.11.05.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Mar 2024 11:05:00 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Vlastimil Babka , Mel Gorman , Zi Yan , "Huang, Ying" , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/10] mm: page_alloc: close migratetype race between freeing and stealing Date: Wed, 20 Mar 2024 14:02:12 -0400 Message-ID: <20240320180429.678181-8-hannes@cmpxchg.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240320180429.678181-1-hannes@cmpxchg.org> References: <20240320180429.678181-1-hannes@cmpxchg.org> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 658mshzjbsdpxihdsr4crq4aqjkhi6ck X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6C844120017 X-HE-Tag: 1710957902-760970 X-HE-Meta: U2FsdGVkX1+UuRrItA5YAbDyDW8KbKGgbBtQUQQwadTF/lYLrXp2vniX74DXlNgO9pZBhr13mx52gibxofQMKcqVRQpEvFQpLVLKcJScM8a8Tp4jVr0kxnuLUXYWT3/h1GQ8KjPAlOwRmEYUyqqXYV50NcwaTm8auSt3zuBDZZHcLHLeJH5SG2KwpNJdSFgwY6K5ELP616HjVEBmBeQaY90vfD+YtlEi3QsWfBYsKZZqzXORTk8Yp7ZbYx1JzMROxx1ZjVLdke3Ug0Ha/cOe4h9UzqrQO1ASW+oJwDDuGcpxYe7K+tjnBAiobzdmOLut9G6O3gSLz7HQ0aojp85HlfLfxJjeJi3A5NUZ8Mba3T/lBoL+BK82twPggKgF1bTRfKlLGAdMXTeCepB57pHJxYtFjt6be1Nv64b6K23zjECO5D1Ao7D7dUeTI7YXWu1OhviGCqr2qfOZaFsYwO/AA6o045nHmsFrz+09rQw/oWCmQJjS1G7/a69hVKZZtNIkRwYQUvXQ5nUDnjs3J+7MDrwWn6Pq7oPcNIChH1Rmk8icm3J5Ruj3p8kH4E+aGMocebs6imWGKWSQBMXP3+P1TCdn3+gZodKPVVOS6JulmWZeAYOqkBf8o95fYUkvqF3IZ6bq2YY2QYeIC2wUZrugldLVc0JKvAsW4UP/4Yl9NsC4Kjvw8vVJDk6Udc08VFaPvMgJnK4gb2PnFUVj+bcviFo+sd+0qEIwCBp2GRzEsbUgZ73urPsByz6leE8AlCxGdtlvIMhTe6duz36mvObQa1ELUudB0xqBqD0+wqVj2tZaei5bC37C27KHSpums/kyh3SgDtPOQg1on3623wG8T5xjhJCAVR8ToXM3fuVtlon95BvKad2nYPl7cQhB8xjBSWr9Djx2Sa5in/9W4tAsIDIQvgjn60EuqIa5z1IefSF0AiwnA3pTZLQCe/P29WjAMPq7xwKotwoPW9ocERF fIEapu4D xFlcVoItGcCZq5Ksl8951VT18HZMCXAsGW143KFG8KEk3uVEBYHbEh2RXYBm3AJZPKlv8n9yZqzEe1aOhTcr9PueUP0CB8se7aDBHF5+GivcNK+RWTUKMeTqYl68iB8i2DiTtQl2Hzq3EvjTOr+QvoIUDv/V8WPthPyMNH2H2VWvm6tkxyJCyTgKopxla9d86309GqOGbGb08FEQ3f9H+mJjzXRvZr0Wh/jpry6dcZnlFixVUiNspObZwMQPvi6qmqBd5zj2fVT6SvLUIVR5fgpC9mFTvtg8HMv3ce+N4VdIjcsgs6jKn2UXjbUqevUefAbxgTXqFBfzsrsMoHlhLmu4+csYO7kU3Z2H8RtzIOBafomX7FPzHXXf0/OoEPYR9EOLv0Q47KVNEUlGkYB547Z8TOIe72Wu60xsUnHUY1b8p2yYQuuv7WD/HYlAl8T77vfHlPDBgPVcfk/TfWhY+XUCft6nF751eioMD9TlpAR+Fn9k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are three freeing paths that read the page's migratetype optimistically before grabbing the zone lock. When this races with block stealing, those pages go on the wrong freelist. The paths in question are: - when freeing >costly orders that aren't THP - when freeing pages to the buddy upon pcp lock contention - when freeing pages that are isolated - when freeing pages initially during boot - when freeing the remainder in alloc_pages_exact() - when "accepting" unaccepted VM host memory before first use - when freeing pages during unpoisoning None of these are so hot that they would need this optimization at the cost of hampering defrag efforts. Especially when contrasted with the fact that the most common buddy freeing path - free_pcppages_bulk - is checking the migratetype under the zone->lock just fine. In addition, isolated pages need to look up the migratetype under the lock anyway, which adds branches to the locked section, and results in a double lookup when the pages are in fact isolated. Move the lookups into the lock. Reported-by: Vlastimil Babka Signed-off-by: Johannes Weiner Reviewed-by: Vlastimil Babka --- mm/page_alloc.c | 52 ++++++++++++++++++------------------------------- 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e7d0d4711bdd..3f65b565eaad 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1227,18 +1227,15 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock_irqrestore(&zone->lock, flags); } -static void free_one_page(struct zone *zone, - struct page *page, unsigned long pfn, - unsigned int order, - int migratetype, fpi_t fpi_flags) +static void free_one_page(struct zone *zone, struct page *page, + unsigned long pfn, unsigned int order, + fpi_t fpi_flags) { unsigned long flags; + int migratetype; spin_lock_irqsave(&zone->lock, flags); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } + migratetype = get_pfnblock_migratetype(page, pfn); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); } @@ -1246,21 +1243,13 @@ static void free_one_page(struct zone *zone, static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags) { - int migratetype; unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); if (!free_pages_prepare(page, order)) return; - /* - * Calling get_pfnblock_migratetype() without spin_lock_irqsave() here - * is used to avoid calling get_pfnblock_migratetype() under the lock. - * This will reduce the lock holding time. - */ - migratetype = get_pfnblock_migratetype(page, pfn); - - free_one_page(zone, page, pfn, order, migratetype, fpi_flags); + free_one_page(zone, page, pfn, order, fpi_flags); __count_vm_events(PGFREE, 1 << order); } @@ -2533,7 +2522,7 @@ void free_unref_page(struct page *page, unsigned int order) struct per_cpu_pages *pcp; struct zone *zone; unsigned long pfn = page_to_pfn(page); - int migratetype, pcpmigratetype; + int migratetype; if (!free_pages_prepare(page, order)) return; @@ -2545,23 +2534,23 @@ void free_unref_page(struct page *page, unsigned int order) * get those areas back if necessary. Otherwise, we may have to free * excessively into the page allocator */ - migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn); + migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, FPI_NONE); return; } - pcpmigratetype = MIGRATE_MOVABLE; + migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, pcpmigratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, migratetype, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE); } pcp_trylock_finish(UP_flags); } @@ -2591,12 +2580,8 @@ void free_unref_folios(struct folio_batch *folios) * allocator. */ if (!pcp_allowed_order(order)) { - int migratetype; - - migratetype = get_pfnblock_migratetype(&folio->page, - pfn); - free_one_page(folio_zone(folio), &folio->page, pfn, - order, migratetype, FPI_NONE); + free_one_page(folio_zone(folio), &folio->page, + pfn, order, FPI_NONE); continue; } folio->private = (void *)(unsigned long)order; @@ -2632,7 +2617,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (is_migrate_isolate(migratetype)) { free_one_page(zone, &folio->page, pfn, - order, migratetype, FPI_NONE); + order, FPI_NONE); continue; } @@ -2645,7 +2630,7 @@ void free_unref_folios(struct folio_batch *folios) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, - order, migratetype, FPI_NONE); + order, FPI_NONE); continue; } locked_zone = zone; @@ -6823,13 +6808,14 @@ bool take_page_off_buddy(struct page *page) bool put_page_back_buddy(struct page *page) { struct zone *zone = page_zone(page); - unsigned long pfn = page_to_pfn(page); unsigned long flags; - int migratetype = get_pfnblock_migratetype(page, pfn); bool ret = false; spin_lock_irqsave(&zone->lock, flags); if (put_page_testzero(page)) { + unsigned long pfn = page_to_pfn(page); + int migratetype = get_pfnblock_migratetype(page, pfn); + ClearPageHWPoisonTakenOff(page); __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); if (TestClearPageHWPoison(page)) {