From patchwork Fri Oct 8 16:19:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 12545687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6240C4332F for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 417C861038 for ; Fri, 8 Oct 2021 16:19:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 417C861038 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 569FD6B0072; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 515226B0073; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DC646B0074; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 2F09C6B0072 for ; Fri, 8 Oct 2021 12:19:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CE0AF184075AD for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) X-FDA: 78673780836.16.FFC5501 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 59586D0020D3 for ; Fri, 8 Oct 2021 16:19:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633709977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=Spc1oS85z506bdKXS/4lPCD06xiaOAWDLTMgkxmkm57Cft5J1UADkVaLB4/wOoiGZVUmSk y9W56aZRGY2CMQAKkfzFQuwvMc5geJrTCsoFnXIAC4RxyjzvZHoNClr1i7SIMPTCH7utfa XwfwIZqO/eOdkDg93Hu+kaARPgL60J8= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-XcBDPIayPse_bdIkBKkNng-1; Fri, 08 Oct 2021 12:19:36 -0400 X-MC-Unique: XcBDPIayPse_bdIkBKkNng-1 Received: by mail-wr1-f70.google.com with SMTP id c4-20020a5d6cc4000000b00160edc8bb28so1919981wrc.9 for ; Fri, 08 Oct 2021 09:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2fuLR+YRtRHAlItHHlAWKLPVpJwI/vzr35BiENgc3FU=; b=v3V40VfMv8CIaZdamr/X1QjGoif7VVHuGLQQywrTzM8nH8FGraavGO9Gu3Krk1tqws SwmT4hxahKhMBXaDGnVnk0Ewcmnp4Rxo+iHj9yRH/c9KEZB7eDUbU9MHr4YAFwdLOLsQ ptnkKHJ4Qhq9gSODsV3sqSbcM0nLHbFMERXrcxbryedy44bRvbmoGpydBAF7MDMzAbPe mdSEoTYH0lsyiZEeedvgzqkH0aMTbMWMgwxxx0vWpvfAfuqVzkQH7cG69PjCJilMxA7p mwm6RpTNErNuXgr4IK5Fz0W9BaL2VVNiLRUCAY5KhCYB/txxpiutxrzfjmjflQIZW4p9 1XAw== X-Gm-Message-State: AOAM533nWmnDNeDXwoLw5IPcsmdQPUE1tcpz68iU2/rsX+MCfiiScPah IxQdRRgv2eZlgQdP7S6C9YAPrVAo3aI0cq1D2gA+LUTuHc9y7Y9tG+1ORYKnhjjcgDQBX0MNQha 1zCZaanM6w8c= X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502435wmg.92.1633709975483; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOQOshorop78BPPPmvY4365BmT9Zj6yiWDgdBD9GJTejFjNxj+YIOr2lmIyGa7Qfdh1ZDiUg== X-Received: by 2002:a05:600c:154f:: with SMTP id f15mr4502412wmg.92.1633709975241; Fri, 08 Oct 2021 09:19:35 -0700 (PDT) Received: from vian.redhat.com ([2a0c:5a80:1d03:b900:c3d1:5974:ce92:3123]) by smtp.gmail.com with ESMTPSA id f184sm2901753wmf.22.2021.10.08.09.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 09:19:34 -0700 (PDT) From: Nicolas Saenz Julienne To: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de, peterz@infradead.org, mtosatti@redhat.com, nilal@redhat.com, mgorman@suse.de, linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com, paulmck@kernel.org, ppandit@redhat.com, Nicolas Saenz Julienne Subject: [RFC 2/3] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Date: Fri, 8 Oct 2021 18:19:21 +0200 Message-Id: <20211008161922.942459-3-nsaenzju@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211008161922.942459-1-nsaenzju@redhat.com> References: <20211008161922.942459-1-nsaenzju@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 59586D0020D3 X-Stat-Signature: zn9aw8bob11fmypoax35kqwb674hq7ur Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Spc1oS85; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of nsaenzju@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=nsaenzju@redhat.com X-HE-Tag: 1633709978-247655 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation to adding remote pcplists drain support, let's bundle 'struct per_cpu_pages' list heads and page count into a new structure, 'struct pcplists', and have all code access it indirectly through a pointer. It'll be used by upcoming patches, which will maintain multiple versions of pcplists and switch the pointer atomically. free_pcppages_bulk() also gains a new argument, since we want to avoid dereferencing the pcplists pointer twice per critical section (delimited by the pagevec local locks). 'struct pcplists' data is marked as __private, so as to make sure nobody accesses it directly, except for the initialization code. Note that 'struct per_cpu_pages' is used during boot, when no allocation is possible. Signed-off-by: Nicolas Saenz Julienne --- include/linux/mmzone.h | 10 +++++-- mm/page_alloc.c | 66 +++++++++++++++++++++++++----------------- mm/vmstat.c | 6 ++-- 3 files changed, 49 insertions(+), 33 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6a1d79d84675..fb023da9a181 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -358,7 +358,6 @@ enum zone_watermarks { /* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { - int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ short free_factor; /* batch scaling factor during free */ @@ -366,8 +365,13 @@ struct per_cpu_pages { short expire; /* When 0, remote pagesets are drained */ #endif - /* Lists of pages, one per migrate type stored on the pcp-lists */ - struct list_head lists[NR_PCP_LISTS]; + struct pcplists *lp; + struct pcplists { + /* Number of pages in the lists */ + int count; + /* Lists of pages, one per migrate type stored on the pcp-lists */ + struct list_head lists[NR_PCP_LISTS]; + } __private pcplists; }; struct per_cpu_zonestat { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dd89933503b4..842816f269da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1438,7 +1438,8 @@ static inline void prefetch_buddy(struct page *page) * pinned" detection logic. */ static void free_pcppages_bulk(struct zone *zone, int count, - struct per_cpu_pages *pcp) + struct per_cpu_pages *pcp, + struct pcplists *lp) { int pindex = 0; int batch_free = 0; @@ -1453,7 +1454,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, * Ensure proper count is passed which otherwise would stuck in the * below while (list_empty(list)) loop. */ - count = min(pcp->count, count); + count = min(lp->count, count); while (count > 0) { struct list_head *list; @@ -1468,7 +1469,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, batch_free++; if (++pindex == NR_PCP_LISTS) pindex = 0; - list = &pcp->lists[pindex]; + list = &lp->lists[pindex]; } while (list_empty(list)); /* This is the only non-empty list. Free them all. */ @@ -1508,7 +1509,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } } while (count > 0 && --batch_free && !list_empty(list)); } - pcp->count -= nr_freed; + lp->count -= nr_freed; /* * local_lock_irq held so equivalent to spin_lock_irqsave for @@ -3069,14 +3070,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { + struct pcplists *lp; unsigned long flags; int to_drain, batch; local_lock_irqsave(&pagesets.lock, flags); batch = READ_ONCE(pcp->batch); - to_drain = min(pcp->count, batch); + lp = pcp->lp; + to_drain = min(lp->count, batch); if (to_drain > 0) - free_pcppages_bulk(zone, to_drain, pcp); + free_pcppages_bulk(zone, to_drain, pcp, lp); local_unlock_irqrestore(&pagesets.lock, flags); } #endif @@ -3092,12 +3095,14 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) { unsigned long flags; struct per_cpu_pages *pcp; + struct pcplists *lp; local_lock_irqsave(&pagesets.lock, flags); pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp); + lp = pcp->lp; + if (lp->count) + free_pcppages_bulk(zone, lp->count, pcp, lp); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3158,7 +3163,7 @@ static void drain_local_pages_wq(struct work_struct *work) * * drain_all_pages() is optimized to only execute on cpus where pcplists are * not empty. The check for non-emptiness can however race with a free to - * pcplist that has not yet increased the pcp->count from 0 to 1. Callers + * pcplist that has not yet increased the lp->count from 0 to 1. Callers * that need the guarantee that every CPU has drained can disable the * optimizing racy check. */ @@ -3200,21 +3205,22 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) struct per_cpu_pages *pcp; struct zone *z; bool has_pcps = false; + struct pcplists *lp; if (force_all_cpus) { /* - * The pcp.count check is racy, some callers need a + * The lp->count check is racy, some callers need a * guarantee that no cpu is missed. */ has_pcps = true; } else if (zone) { - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) + lp = per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp; + if (lp->count) has_pcps = true; } else { for_each_populated_zone(z) { - pcp = per_cpu_ptr(z->per_cpu_pageset, cpu); - if (pcp->count) { + lp = per_cpu_ptr(z->per_cpu_pageset, cpu)->lp; + if (lp->count) { has_pcps = true; break; } @@ -3366,19 +3372,21 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn, { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; + struct pcplists *lp; int high; int pindex; __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); + lp = pcp->lp; pindex = order_to_pindex(migratetype, order); - list_add(&page->lru, &pcp->lists[pindex]); - pcp->count += 1 << order; + list_add(&page->lru, &lp->lists[pindex]); + lp->count += 1 << order; high = nr_pcp_high(pcp, zone); - if (pcp->count >= high) { + if (lp->count >= high) { int batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp); + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, lp); } } @@ -3603,9 +3611,11 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, struct per_cpu_pages *pcp) { struct list_head *list; + struct pcplists *lp; struct page *page; - list = &pcp->lists[order_to_pindex(migratetype, order)]; + lp = pcp->lp; + list = &lp->lists[order_to_pindex(migratetype, order)]; do { if (list_empty(list)) { @@ -3625,14 +3635,14 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, batch, list, migratetype, alloc_flags); - pcp->count += alloced << order; + lp->count += alloced << order; if (unlikely(list_empty(list))) return NULL; } page = list_first_entry(list, struct page, lru); list_del(&page->lru); - pcp->count -= 1 << order; + lp->count -= 1 << order; } while (check_new_pcp(page)); return page; @@ -5877,7 +5887,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) continue; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; } printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n" @@ -5971,7 +5981,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) free_pcp = 0; for_each_online_cpu(cpu) - free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->count; + free_pcp += per_cpu_ptr(zone->per_cpu_pageset, cpu)->lp->count; show_node(zone); printk(KERN_CONT @@ -6012,7 +6022,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->per_cpu_pageset->count)), + K(this_cpu_read(zone->per_cpu_pageset)->lp->count), K(zone_page_state(zone, NR_FREE_CMA_PAGES))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) @@ -6848,7 +6858,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) /* * pcp->high and pcp->batch values are related and generally batch is lower - * than high. They are also related to pcp->count such that count is lower + * than high. They are also related to pcp->lp->count such that count is lower * than high, and as soon as it reaches high, the pcplist is flushed. * * However, guaranteeing these relations at all times would require e.g. write @@ -6856,7 +6866,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) * thus be prone to error and bad for performance. Thus the update only prevents * store tearing. Any new users of pcp->batch and pcp->high should ensure they * can cope with those fields changing asynchronously, and fully trust only the - * pcp->count field on the local CPU with interrupts disabled. + * pcp->lp->count field on the local CPU with interrupts disabled. * * mutex_is_locked(&pcp_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -6876,8 +6886,10 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta memset(pcp, 0, sizeof(*pcp)); memset(pzstats, 0, sizeof(*pzstats)); + pcp->lp = &ACCESS_PRIVATE(pcp, pcplists); + for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) - INIT_LIST_HEAD(&pcp->lists[pindex]); + INIT_LIST_HEAD(&pcp->lp->lists[pindex]); /* * Set batch and high values safe for a boot pageset. A true percpu diff --git a/mm/vmstat.c b/mm/vmstat.c index 8ce2620344b2..5279d3f34e0b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -856,7 +856,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) * if not then there is nothing to expire. */ if (!__this_cpu_read(pcp->expire) || - !__this_cpu_read(pcp->count)) + !this_cpu_ptr(pcp)->lp->count) continue; /* @@ -870,7 +870,7 @@ static int refresh_cpu_vm_stats(bool do_pagesets) if (__this_cpu_dec_return(pcp->expire)) continue; - if (__this_cpu_read(pcp->count)) { + if (this_cpu_ptr(pcp)->lp->count) { drain_zone_pages(zone, this_cpu_ptr(pcp)); changes++; } @@ -1707,7 +1707,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, "\n high: %i" "\n batch: %i", i, - pcp->count, + pcp->lp->count, pcp->high, pcp->batch); #ifdef CONFIG_SMP