From patchwork Wed Nov 3 17:05:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12601221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F298BC4332F for ; Wed, 3 Nov 2021 17:05:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9089B61108 for ; Wed, 3 Nov 2021 17:05:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9089B61108 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id ACCC46B006C; Wed, 3 Nov 2021 13:05:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A55ED940007; Wed, 3 Nov 2021 13:05:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CEE36B0073; Wed, 3 Nov 2021 13:05:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id 77CC36B006C for ; Wed, 3 Nov 2021 13:05:25 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F2EA370003 for ; Wed, 3 Nov 2021 17:05:24 +0000 (UTC) X-FDA: 78768245010.28.06E0776 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 39DB9B0000A5 for ; Wed, 3 Nov 2021 17:05:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635959121; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LgfOv5Fz6wLZvm6Rgi1f9KXwhG9hLdRO724+AjgnKSw=; b=BnmxznTY/AH43yORRb18tddVAZZCsVscALpfD9XpltM/yZAi1FCydr7kLp3LGyW7k4dBIP IWR7wDuOV2Gv4XcBLhCQY/Ucc5012hp24x7mUoLqabOMfkOzxglKRQaymQDyoNenREuZ76 BvBi2pgHIpXc3lxkBTRlOR0VbvIdSZ0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-340-FXn4m3UDNZ2ZbqGcncXAoA-1; Wed, 03 Nov 2021 13:05:20 -0400 X-MC-Unique: FXn4m3UDNZ2ZbqGcncXAoA-1 Received: by mail-wm1-f70.google.com with SMTP id 67-20020a1c0046000000b0032cd88916e5so1328288wma.6 for ; Wed, 03 Nov 2021 10:05:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LgfOv5Fz6wLZvm6Rgi1f9KXwhG9hLdRO724+AjgnKSw=; b=ihwi5PtFkvU1kHLEiLKxg4x1QK+4KZoue5/rdkHQVT7Y6P6n1zy37h4eaCr+lZapBg Rhquvajv2mrhcQnfy0AAdFTcIabfM48qMwo+q1vNursBFLeJSmHdDAPKet7+8fUEZd10 DgKfAQeV2ZrkBFctuJdnFmUO9azYsjepOusU7Tw5R7/I30ZmxM7GqMYOxOpHLKgxYskd AGYCGk+3zTfHx/iFFDi89Dn1vcSSbiLjXTCL1oQspfrbfKi9WJXLPc1cV6tbdCgpuak4 ZUnZ189Vsg8g/fnuZak27UN/CXUZJJE7DMdZrj0ghj6v+3Y8KrEKSUc8zlqXFrckFHyB 7ACQ== X-Gm-Message-State: AOAM532Q3h836KdP2nBYRQ8hpw03r3FMd1kpl7Q2X+NPJ9m5q+/zxSa1 8+hda8MQWwFDwhl8kg2FgiJR0NXsJka/VJtkxarTLAbvnICqpyeLGM9bZwEVH0Gp0vu0izLopJn /Ok6paRZ8fNM= X-Received: by 2002:adf:8b41:: with SMTP id v1mr59256091wra.255.1635959118962; Wed, 03 Nov 2021 10:05:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzBQcX8lYmzjC6CiT+zQNAvU5Ugr38t440jY6GVWANQUQa1Lk1Pj9neqCHeo0EkJT1jT+bnww== X-Received: by 2002:adf:8b41:: with SMTP id v1mr59256052wra.255.1635959118727; Wed, 03 Nov 2021 10:05:18 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:3c10:3400:3c70:6643:6e71:7eae]) by smtp.gmail.com with ESMTPSA id h22sm2900610wmq.14.2021.11.03.10.05.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 10:05:17 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [PATCH v2 1/3] mm/page_alloc: Don't pass pfn to free_unref_page_commit() Date: Wed, 3 Nov 2021 18:05:10 +0100 Message-Id: <20211103170512.2745765-2-nsaenzju@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211103170512.2745765-1-nsaenzju@redhat.com> References: <20211103170512.2745765-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 39DB9B0000A5 X-Stat-Signature: wwucaj69r84au8ycxe8sm5xw79axhoyg Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BnmxznTY; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf19.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-Rspamd-Server: rspam01 X-HE-Tag: 1635959115-358638 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: free_unref_page_commit() doesn't make use of its pfn argument, so get rid of it. Signed-off-by: Nicolas Saenz Julienne Reviewed-by: Vlastimil Babka --- mm/page_alloc.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..9ef03dfb8f95 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3355,8 +3355,8 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone) return min(READ_ONCE(pcp->batch) << 2, high); } -static void free_unref_page_commit(struct page *page, unsigned long pfn, - int migratetype, unsigned int order) +static void free_unref_page_commit(struct page *page, int migratetype, + unsigned int order) { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; @@ -3405,7 +3405,7 @@ void free_unref_page(struct page *page, unsigned int order) } local_lock_irqsave(&pagesets.lock, flags); - free_unref_page_commit(page, pfn, migratetype, order); + free_unref_page_commit(page, migratetype, order); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3415,13 +3415,13 @@ void free_unref_page(struct page *page, unsigned int order) void free_unref_page_list(struct list_head *list) { struct page *page, *next; - unsigned long flags, pfn; + unsigned long flags; int batch_count = 0; int migratetype; /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { - pfn = page_to_pfn(page); + unsigned long pfn = page_to_pfn(page); if (!free_unref_page_prepare(page, pfn, 0)) { list_del(&page->lru); continue; @@ -3437,15 +3437,10 @@ void free_unref_page_list(struct list_head *list) free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); continue; } - - set_page_private(page, pfn); } local_lock_irqsave(&pagesets.lock, flags); list_for_each_entry_safe(page, next, list, lru) { - pfn = page_private(page); - set_page_private(page, 0); - /* * Non-isolated types over MIGRATE_PCPTYPES get added * to the MIGRATE_MOVABLE pcp list. @@ -3455,7 +3450,7 @@ void free_unref_page_list(struct list_head *list) migratetype = MIGRATE_MOVABLE; trace_mm_page_free_batched(page); - free_unref_page_commit(page, pfn, migratetype, 0); + free_unref_page_commit(page, migratetype, 0); /* * Guard against excessive IRQ disabled times when we get From patchwork Wed Nov 3 17:05:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12601223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10534C433EF for ; Wed, 3 Nov 2021 17:05:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CFAA61156 for ; Wed, 3 Nov 2021 17:05:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9CFAA61156 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1306F6B0072; Wed, 3 Nov 2021 13:05:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 092316B0073; Wed, 3 Nov 2021 13:05:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFEA46B0074; Wed, 3 Nov 2021 13:05:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id CA7FD6B0072 for ; Wed, 3 Nov 2021 13:05:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 65CE270019 for ; Wed, 3 Nov 2021 17:05:26 +0000 (UTC) X-FDA: 78768245052.02.6DB20A5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 57654D0000A2 for ; Wed, 3 Nov 2021 17:05:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635959125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W6hqjk3k1qOe1hlsoywI2rtd9aMmXRpPZ4qPXicNcNI=; b=g/v0l06Sv/GsPgo1K6x+LuE0+wKdBeXuIWgJUwEMNHRgsG1RB9wQQyGAWLITG8/EmTFYz0 dUHYS8OopFLRTH+6hxsUJNtzPwTVwUbwo1cX6uumb5ehegtY8JX9Uonrj/UfNKy/dRlY3Y RwH6uyJ8+JyQIhcqHeqXJ9WUCRY882s= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-550-ulIwwhf5PBu6PD-Vo4feZg-1; Wed, 03 Nov 2021 13:05:21 -0400 X-MC-Unique: ulIwwhf5PBu6PD-Vo4feZg-1 Received: by mail-wm1-f70.google.com with SMTP id 128-20020a1c0486000000b0030dcd45476aso1365396wme.0 for ; Wed, 03 Nov 2021 10:05:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=W6hqjk3k1qOe1hlsoywI2rtd9aMmXRpPZ4qPXicNcNI=; b=djdkZsTzyzumNE2R6dvb7aS0xwhbnLp6W/T1QxfXUh0vAEPL49nx+O9/XfbRkhzD57 EaFK64HXQCpWMjJspgyctIg5+E7sOSnQqgW24i/w1sG3DUstTbSzZ2qK6nVGCMZzjm/y tnQsksofSiKOQE+JMABdwYQZW87DJYqJsDbA7UOXPDTcubk678XeQlGSjEtAxobj+IK4 eUlvGmGnKiecluLACz4MZV+81ZsXd6ssdLIRyvq5D3omV6ScoLZfiyF3z2tNRuZlyGPE 0K6GEWRgvAMHpQ0kAWQ1+a4NeygX/i1YJTo6Wqvusnz8I6FCLHunWjus7jFTDCAz0eN6 83ig== X-Gm-Message-State: AOAM533wW39GoDNZvmuzr4g/BuaznXUeDQvMYaXp1sb+orIgf+saJ+HI OxFoXFjc0AGeEsD1t6I1UfoY3QU8h5YXCe25zkaNP1LRO438mmLrg2gOQDbpnnicqegMPokAfAj gXPAYQBrkUeM= X-Received: by 2002:adf:fc88:: with SMTP id g8mr28007980wrr.334.1635959120479; Wed, 03 Nov 2021 10:05:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/GN+uyyTaWBah9VkZ3ed2+0wkDjTtHrEFLj3fSAJzY+sm3o2ftXmvEq8M9d48axCCrj02dw== X-Received: by 2002:adf:fc88:: with SMTP id g8mr28007937wrr.334.1635959120235; Wed, 03 Nov 2021 10:05:20 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:3c10:3400:3c70:6643:6e71:7eae]) by smtp.gmail.com with ESMTPSA id h22sm2900610wmq.14.2021.11.03.10.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 10:05:19 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [PATCH v2 2/3] mm/page_alloc: Convert per-cpu lists' local locks to per-cpu spin locks Date: Wed, 3 Nov 2021 18:05:11 +0100 Message-Id: <20211103170512.2745765-3-nsaenzju@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211103170512.2745765-1-nsaenzju@redhat.com> References: <20211103170512.2745765-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 57654D0000A2 X-Stat-Signature: bwzec7xokxbysbahmkz7bjwqtm6t9iap Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="g/v0l06S"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf20.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1635959117-103306 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: page_alloc's per-cpu page lists are currently protected by local locks. This forces any remote operation dependent on draining them to schedule drain work on all CPUs. This doesn't play well with NOHZ_FULL CPUs, which can't be bothered to run housekeeping tasks. As a first step to mitigate this, convert the current locking scheme to per-cpu spinlocks. The conversion also moves the actual lock into 'struct per_cpu_pages' which is nicer code, but also essential in order to couple access to the lock and lists. One side effect of this is a more complex free_unref_page_list(), as the patch tries to maintain previous function optimizations[1]. Other than that the conversion itself is mostly trivial. The performance difference between local locks and uncontended per-cpu spinlocks (which they happen to be during normal operation) is pretty small. On an Intel Xeon E5-2640 (x86_64) with with 32GB of memory (mean variation vs. vanilla runs, higher is worse): - netperf: -0.5% to 0.5% (no difference) - hackbench: -0.3% to 0.7% (almost no difference) - mmtests/sparsetruncate-tiny: -0.1% to 0.6% On a Cavium ThunderX2 (arm64) with 64GB of memory: - netperf 1.0% to 1.7% - hackbench 0.8% to 1.5% - mmtests/sparsetruncate-tiny 1.6% to 2.1% arm64 is a bit more sensitive to the change. Probably due to the effect of the spinlock's memory barriers. Note that the aim9 test suite was also run (through mmtests/pagealloc-performance) but the test's own variance distorts the results too much. [1] See: - 9cca35d42eb61 ("mm, page_alloc: enable/disable IRQs once when freeing a list of pages ") - c24ad77d962c3 ("mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list()") Signed-off-by: Nicolas Saenz Julienne Reported-by: kernel test robot --- include/linux/mmzone.h | 1 + mm/page_alloc.c | 87 ++++++++++++++++++++++-------------------- 2 files changed, 47 insertions(+), 41 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 58e744b78c2c..83c51036c756 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -376,6 +376,7 @@ struct per_cpu_pages { /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; + spinlock_t lock; }; struct per_cpu_zonestat { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9ef03dfb8f95..b332d5cc40f1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -122,13 +122,6 @@ typedef int __bitwise fpi_t; static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) -struct pagesets { - local_lock_t lock; -}; -static DEFINE_PER_CPU(struct pagesets, pagesets) = { - .lock = INIT_LOCAL_LOCK(lock), -}; - #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -1505,8 +1498,8 @@ static void free_pcppages_bulk(struct zone *zone, int count, pcp->count -= nr_freed; /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. + * spin_lock_irqsave(&pcp->lock) held so equivalent to + * spin_lock_irqsave(). */ spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -3011,8 +3004,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, int i, allocated = 0; /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. + * spin_lock_irqsave(&pcp->lock) held so equivalent to + * spin_lock_irqsave(). */ spin_lock(&zone->lock); for (i = 0; i < count; ++i) { @@ -3066,12 +3059,12 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) unsigned long flags; int to_drain, batch; - local_lock_irqsave(&pagesets.lock, flags); + spin_lock_irqsave(&pcp->lock, flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); if (to_drain > 0) free_pcppages_bulk(zone, to_drain, pcp); - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); } #endif @@ -3087,13 +3080,11 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) unsigned long flags; struct per_cpu_pages *pcp; - local_lock_irqsave(&pagesets.lock, flags); - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + spin_lock_irqsave(&pcp->lock, flags); if (pcp->count) free_pcppages_bulk(zone, pcp->count, pcp); - - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); } /* @@ -3355,16 +3346,14 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone) return min(READ_ONCE(pcp->batch) << 2, high); } -static void free_unref_page_commit(struct page *page, int migratetype, - unsigned int order) +static void free_unref_page_commit(struct page *page, struct per_cpu_pages *pcp, + int migratetype, unsigned int order) { struct zone *zone = page_zone(page); - struct per_cpu_pages *pcp; int high; int pindex; __count_vm_event(PGFREE); - pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); list_add(&page->lru, &pcp->lists[pindex]); pcp->count += 1 << order; @@ -3383,6 +3372,7 @@ void free_unref_page(struct page *page, unsigned int order) { unsigned long flags; unsigned long pfn = page_to_pfn(page); + struct per_cpu_pages *pcp; int migratetype; if (!free_unref_page_prepare(page, pfn, order)) @@ -3404,9 +3394,10 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = MIGRATE_MOVABLE; } - local_lock_irqsave(&pagesets.lock, flags); - free_unref_page_commit(page, migratetype, order); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp = this_cpu_ptr(page_zone(page)->per_cpu_pageset); + spin_lock_irqsave(&pcp->lock, flags); + free_unref_page_commit(page, pcp, migratetype, order); + spin_unlock_irqrestore(&pcp->lock, flags); } /* @@ -3415,6 +3406,7 @@ void free_unref_page(struct page *page, unsigned int order) void free_unref_page_list(struct list_head *list) { struct page *page, *next; + spinlock_t *lock = NULL; unsigned long flags; int batch_count = 0; int migratetype; @@ -3422,6 +3414,7 @@ void free_unref_page_list(struct list_head *list) /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_to_pfn(page); + if (!free_unref_page_prepare(page, pfn, 0)) { list_del(&page->lru); continue; @@ -3439,8 +3432,22 @@ void free_unref_page_list(struct list_head *list) } } - local_lock_irqsave(&pagesets.lock, flags); list_for_each_entry_safe(page, next, list, lru) { + struct per_cpu_pages *pcp = this_cpu_ptr(page_zone(page)->per_cpu_pageset); + + /* + * As an optimization, release the previously held lock only if + * the page belongs to a different zone. But also, guard + * against excessive IRQ disabled times when we get a large + * list of pages to free. + */ + if (++batch_count == SWAP_CLUSTER_MAX || + (lock != &pcp->lock && lock)) { + spin_unlock_irqrestore(lock, flags); + batch_count = 0; + lock = NULL; + } + /* * Non-isolated types over MIGRATE_PCPTYPES get added * to the MIGRATE_MOVABLE pcp list. @@ -3450,19 +3457,17 @@ void free_unref_page_list(struct list_head *list) migratetype = MIGRATE_MOVABLE; trace_mm_page_free_batched(page); - free_unref_page_commit(page, migratetype, 0); - /* - * Guard against excessive IRQ disabled times when we get - * a large list of pages to free. - */ - if (++batch_count == SWAP_CLUSTER_MAX) { - local_unlock_irqrestore(&pagesets.lock, flags); - batch_count = 0; - local_lock_irqsave(&pagesets.lock, flags); + if (!lock) { + spin_lock_irqsave(&pcp->lock, flags); + lock = &pcp->lock; } + + free_unref_page_commit(page, pcp, migratetype, 0); } - local_unlock_irqrestore(&pagesets.lock, flags); + + if (lock) + spin_unlock_irqrestore(lock, flags); } /* @@ -3636,18 +3641,17 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long flags; - local_lock_irqsave(&pagesets.lock, flags); - /* * On allocation, reduce the number of pages that are batch freed. * See nr_pcp_free() where free_factor is increased for subsequent * frees. */ pcp = this_cpu_ptr(zone->per_cpu_pageset); + spin_lock_irqsave(&pcp->lock, flags); pcp->free_factor >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone, 1); @@ -5265,8 +5269,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed; /* Attempt the batch allocation */ - local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); + spin_lock_irqsave(&pcp->lock, flags); pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; while (nr_populated < nr_pages) { @@ -5295,7 +5299,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); @@ -5304,7 +5308,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, return nr_populated; failed_irq: - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask); @@ -6947,6 +6951,7 @@ void __meminit setup_zone_pageset(struct zone *zone) struct per_cpu_zonestat *pzstats; pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + spin_lock_init(&pcp->lock); pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); per_cpu_pages_init(pcp, pzstats); } From patchwork Wed Nov 3 17:05:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12601225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CE36C433F5 for ; Wed, 3 Nov 2021 17:05:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C6E2B61053 for ; Wed, 3 Nov 2021 17:05:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C6E2B61053 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 639756B0073; Wed, 3 Nov 2021 13:05:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C1066B0074; Wed, 3 Nov 2021 13:05:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43C386B0075; Wed, 3 Nov 2021 13:05:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0106.hostedemail.com [216.40.44.106]) by kanga.kvack.org (Postfix) with ESMTP id 31D096B0073 for ; Wed, 3 Nov 2021 13:05:31 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E63565C4B4 for ; Wed, 3 Nov 2021 17:05:30 +0000 (UTC) X-FDA: 78768245220.12.FEB89AF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 6649B7001A23 for ; Wed, 3 Nov 2021 17:05:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635959127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5+bpUJMaMqU4MgQRu2MTHVpkguUbctBrwUfDWUc8b78=; b=Jdwuihub+vf5GEHLcBMpU1X3qz9H5MZ8ytkrfOua1H1ZrWxqRPd7Nl9MyOyl7jl3TqwvkV brj6Mf5aj5wYud7TB2B+eTeWmNpgYH5rTuxOhoHP2ijC2Fw3ysOVZy5bwFCulwgfCgxMwd YPp5foEus9rgIBIafTSmoj41m+PdMSI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635959129; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5+bpUJMaMqU4MgQRu2MTHVpkguUbctBrwUfDWUc8b78=; b=Ia7p3oyzMMUrnT6ykmK5oa8JY3SQdlGAuG/l4CzWL6LE+WZWMdicMNWzSegzWMZnVrqlZu hKrV30hOYvUZD/ZKsBrd75H92SD9PhzoEplqTUlAFmeeYyJnS99W8Yb8DEkvzyzpIXu0SW JaJ18TiCWsRNZW9MfWXuEriFmdocPso= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-594-ZE1R2YwTOBG182c8FdYJ1Q-1; Wed, 03 Nov 2021 13:05:26 -0400 X-MC-Unique: ZE1R2YwTOBG182c8FdYJ1Q-1 Received: by mail-wm1-f72.google.com with SMTP id c1-20020a05600c0ac100b00322fcaa2bc7so1365630wmr.4 for ; Wed, 03 Nov 2021 10:05:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5+bpUJMaMqU4MgQRu2MTHVpkguUbctBrwUfDWUc8b78=; b=HagjaMAAmLiRi+xXx8Egc1yhhDoUQVHgpUkl4OIpQrqVTmVFrt6slkFN/MJqLqjWcR XZfx0Uwqw1AwJx/L1KLLqRIRgEtd7iNcbx2V5pC9RhpE9ANzM7+p/f7bJxFvzEGA0X+0 odYXETvDegyMzbi/uEDCRC4pk26klcIEJ9pxmwhUZhast6pHYHp7yvckAHFSqChDijFt TK67dWg7ZzqcUeYYIIzQg8igtwmbnKDxMSxrqvhCD7xLPuVfEURdkuOvM4CzabgnpsWc jqqZd/pVuk2HR6ldAfFglck8+0NWvAAyVw/sBByb2wgNYNFP4S9EedrJlpP4X2KWKziI D1/Q== X-Gm-Message-State: AOAM530dlJuEc4+cgf3283sLyR9cRIvD2NVGpN7HveWwFvykymdVY/IR e0O48wYvD1E8N+n4ehooaTQRLvkWZ80LnBdiekuGgH+5N8X5tCnnNv4V82VceDT6FhOLZiIiILs sHtArJPXvlrQ= X-Received: by 2002:adf:ba0d:: with SMTP id o13mr57138409wrg.339.1635959121852; Wed, 03 Nov 2021 10:05:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymt56oD7xssX0KLrDRxqFfhgxvPTvsPtL5aMyYaSGk8fSIesEVDxu69NGKhno90RaTMwzFJQ== X-Received: by 2002:adf:ba0d:: with SMTP id o13mr57138368wrg.339.1635959121636; Wed, 03 Nov 2021 10:05:21 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:3c10:3400:3c70:6643:6e71:7eae]) by smtp.gmail.com with ESMTPSA id h22sm2900610wmq.14.2021.11.03.10.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 10:05:21 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [PATCH v2 3/3] mm/page_alloc: Remotely drain per-cpu lists Date: Wed, 3 Nov 2021 18:05:12 +0100 Message-Id: <20211103170512.2745765-4-nsaenzju@redhat.com> X-Mailer: git-send-email 2.33.1 In-Reply-To: <20211103170512.2745765-1-nsaenzju@redhat.com> References: <20211103170512.2745765-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: swah9s8xfk4okr63d4ech7t8rgwkqjt7 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6649B7001A23 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jdwuihub; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ia7p3oyz; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf02.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1635959125-461052 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu drain work queued by __drain_all_pages(). So introduce new a mechanism to remotely drain the per-cpu lists. It is made possible by remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new scheme is that drain operations are now migration safe. There was no observed performance degradation vs. the previous scheme. Both netperf and hackbench were run in parallel to triggering the __drain_all_pages(NULL, true) code path around ~100 times per second. The new scheme performs a bit better (~5%), although the important point here is there are no performance regressions vs. the previous mechanism. Per-cpu lists draining happens only in slow paths. Signed-off-by: Nicolas Saenz Julienne --- mm/page_alloc.c | 59 +++++-------------------------------------------- 1 file changed, 5 insertions(+), 54 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b332d5cc40f1..7dbdab100461 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -140,13 +140,7 @@ DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */ EXPORT_PER_CPU_SYMBOL(_numa_mem_); #endif -/* work_structs for global per-cpu drains */ -struct pcpu_drain { - struct zone *zone; - struct work_struct work; -}; static DEFINE_MUTEX(pcpu_drain_mutex); -static DEFINE_PER_CPU(struct pcpu_drain, pcpu_drain); #ifdef CONFIG_GCC_PLUGIN_LATENT_ENTROPY volatile unsigned long latent_entropy __latent_entropy; @@ -3050,9 +3044,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * Called from the vmstat counter updater to drain pagesets of this * currently executing processor on remote nodes after they have * expired. - * - * Note that this function must be called with the thread pinned to - * a single processor. */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { @@ -3070,10 +3061,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) /* * Drain pcplists of the indicated processor and zone. - * - * The processor must either be the current processor and the - * thread pinned to the current processor or a processor that - * is not online. */ static void drain_pages_zone(unsigned int cpu, struct zone *zone) { @@ -3089,10 +3076,6 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) /* * Drain pcplists of all zones on the indicated processor. - * - * The processor must either be the current processor and the - * thread pinned to the current processor or a processor that - * is not online. */ static void drain_pages(unsigned int cpu) { @@ -3105,9 +3088,6 @@ static void drain_pages(unsigned int cpu) /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. - * - * The CPU has to be pinned. When zone parameter is non-NULL, spill just - * the single zone's pages. */ void drain_local_pages(struct zone *zone) { @@ -3119,24 +3099,6 @@ void drain_local_pages(struct zone *zone) drain_pages(cpu); } -static void drain_local_pages_wq(struct work_struct *work) -{ - struct pcpu_drain *drain; - - drain = container_of(work, struct pcpu_drain, work); - - /* - * drain_all_pages doesn't use proper cpu hotplug protection so - * we can race with cpu offline when the WQ can move this from - * a cpu pinned worker to an unbound one. We can operate on a different - * cpu which is alright but we also have to make sure to not move to - * a different one. - */ - migrate_disable(); - drain_local_pages(drain->zone); - migrate_enable(); -} - /* * The implementation of drain_all_pages(), exposing an extra parameter to * drain on all cpus. @@ -3157,13 +3119,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) */ static cpumask_t cpus_with_pcps; - /* - * Make sure nobody triggers this path before mm_percpu_wq is fully - * initialized. - */ - if (WARN_ON_ONCE(!mm_percpu_wq)) - return; - /* * Do not drain if one is already in progress unless it's specific to * a zone. Such callers are primarily CMA and memory hotplug and need @@ -3213,14 +3168,12 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) } for_each_cpu(cpu, &cpus_with_pcps) { - struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu); - - drain->zone = zone; - INIT_WORK(&drain->work, drain_local_pages_wq); - queue_work_on(cpu, mm_percpu_wq, &drain->work); + if (zone) { + drain_pages_zone(cpu, zone); + } else { + drain_pages(cpu); + } } - for_each_cpu(cpu, &cpus_with_pcps) - flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work); mutex_unlock(&pcpu_drain_mutex); } @@ -3229,8 +3182,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) * Spill all the per-cpu pages from all CPUs back into the buddy allocator. * * When zone parameter is non-NULL, spill just the single zone's pages. - * - * Note that this can be extremely slow as the draining happens in a workqueue. */ void drain_all_pages(struct zone *zone) {