From patchwork Tue Sep 11 05:36:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10595073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A19DC921 for ; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90C8B292A8 for ; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 84C34292B6; Tue, 11 Sep 2018 05:36:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9B99292A8 for ; Tue, 11 Sep 2018 05:36:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 645998E000A; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5A2298E0001; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46D038E000A; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id 027AF8E0001 for ; Tue, 11 Sep 2018 01:36:42 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id 132-v6so11725981pga.18 for ; Mon, 10 Sep 2018 22:36:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=9gzQ6BWuXDsLCdFigf39HWL/uohfzRb8dtndO4xjEiY=; b=KqNIUlVhWRHYzHA/VpmeQnimNHC7peky1iptrZ0L76/Rc0s+RBdGyxgJXX5T5nko0A YgXXuHR5/9y+2jbTRYMa1Cx9ReuQ8Etgg3zCZrYRyTmQ3+EMq/A5AmguDJHgy9IiF92o daDYrPdy4jk8vb39Uj1pSIi/RuY9sjEO65hGtBE9kH0b9dzjnwUZvTQD3MSEuUO4Shxm W8VA9VEPbiTX+j8sZShWFVAWXKXE3m63t02+HLcXkTDCWp4UcshQTnkTn0XpZrIIyEyz 50dDBY/TfuklupRJVDihMgMgN4CloT1ydmLXp1oyZ1MnfGmUfvnan8F7NBSXTzdrKPCp 3YFw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51AcS2npAA5f+TdkZz8/poQgb68szU/JTfVoigwXvmG1vioFEMXP xuTFTaUG0LqAfdvHXV2Hkc9kkz6kL8sBfF5AN6JK71rw7lSt/dQmcr/XPpZjPcjMXW+z4G4Cwoc 5k3ZWfA0D+AF+RPqjRzacFIUniF06+r4umMdvZMuyzq1LeeO4ajfNpvjWbkMcFbf2PA== X-Received: by 2002:a62:cec6:: with SMTP id y189-v6mr27214433pfg.140.1536644201656; Mon, 10 Sep 2018 22:36:41 -0700 (PDT) X-Google-Smtp-Source: ANB0Vdb4S59sJ8vYsl2iD3Nnc4EvGL6r7xR19Fob88y6QQGCykNQhhX86qg9QpfId2RsUmGaPSo7 X-Received: by 2002:a62:cec6:: with SMTP id y189-v6mr27214360pfg.140.1536644200577; Mon, 10 Sep 2018 22:36:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536644200; cv=none; d=google.com; s=arc-20160816; b=knMwYSgY8q9LIhIgVGdXoA/U8MXbReoeXhRezQwyMuTfbQqky8P31PmE/7fkCNHmbu encLUdioE7TchTK+C2BVNa1SoE3RTVN2Vba/1xOflT3TZ82YkDIsOVx6n7ci+2C9cllB L9nonCe226Atcw9k21pKDEd2eGzIbhh00vSVoimuSeBqoul6CjzeQHO9k8SZIIquRxSV 6a/SMPu4373NJE35POqQ15D9FzCI10lLNIF4/O3ExJ1mA6sFSWPCbTce8nzBw//uHOYe /neW43zvn4/ibWhtYZ+QP5fLv3mskBUluG1isKxeNm1EXYTT454UOlBYUoh+h5WrpBDT rm5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=9gzQ6BWuXDsLCdFigf39HWL/uohfzRb8dtndO4xjEiY=; b=pzjtAtEn0YbSlyitS71VSCsc+M1wj+EXSDdTOzQhv67C2PcStOBJPneyRD8zS+hW/U n9zzL6Q/WEHFKAXXSMYEe8DUPMWiC3S8L40YxcofOw3Zk/oHXOZQKSMWQS08jhz5Lx/v kwD+5TOE8tOMdEVXM147skLe5wtG2e3bKCmnPm4DI3YvP+GP1rDVzxVPu0uMPPyHbgfe u5ts0bFEia44Ku3LwXcJC6I5Sn3nbOnNmL3pDPTtndKV8KlwSOpLla32vdhLQ3/6mBEf Ei8FGqy53FIiP5iXli0GBrtCzkkgpQKb8XndnK3mFqns09P2PxzmuRk4ocrPs4sa62jZ BacQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTPS id c19-v6si20646945pfc.18.2018.09.10.22.36.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 22:36:40 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) client-ip=192.55.52.93; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Sep 2018 22:36:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,359,1531810800"; d="scan'208";a="262426338" Received: from aaronlu.sh.intel.com ([10.239.159.44]) by fmsmga006.fm.intel.com with ESMTP; 10 Sep 2018 22:36:37 -0700 From: Aaron Lu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Dave Hansen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan , Tariq Toukan , Yosef Lev , Jesper Dangaard Brouer Subject: [RFC PATCH 7/9] mm: use read_lock for free path Date: Tue, 11 Sep 2018 13:36:14 +0800 Message-Id: <20180911053616.6894-8-aaron.lu@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180911053616.6894-1-aaron.lu@intel.com> References: <20180911053616.6894-1-aaron.lu@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Daniel Jordan's patch has made it possible for multiple threads to operate on a global list with smp_list_del() at any position and smp_list_add/splice() at head position concurrently without taking any lock. This patch makes use of this technique on free list. To make this happen, add_to_buddy_tail() is removed since only adding to list head is safe with smp_list_del() so only add_to_buddy() is used. Once free path can run concurrently, it is possible for multiple threads to free pages at the same time. If 2 pages being freed are buddy, they can miss the oppotunity to be merged. For this reason, introduce range locks to protect merge operation that makes sure inside one range, only one merge can happen and a page's Buddy status is properly set inside the lock. The range is selected as an order of (MAX_ORDER-1) pages since merge can't exceed that order. Signed-off-by: Aaron Lu --- include/linux/list.h | 1 + include/linux/mmzone.h | 3 ++ lib/list.c | 23 ++++++++++ mm/page_alloc.c | 95 +++++++++++++++++++++++------------------- 4 files changed, 78 insertions(+), 44 deletions(-) diff --git a/include/linux/list.h b/include/linux/list.h index 5f203fb55939..608e40f6489e 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -49,6 +49,7 @@ static inline bool __list_del_entry_valid(struct list_head *entry) extern void smp_list_del(struct list_head *entry); extern void smp_list_splice(struct list_head *list, struct list_head *head); +extern void smp_list_add(struct list_head *entry, struct list_head *head); /* * Insert a new entry between two known consecutive entries. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e66b8c63d5d1..0ea52e9bb610 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -467,6 +467,9 @@ struct zone { /* Primarily protects free_area */ rwlock_t lock; + /* Protects merge operation for a range of order=(MAX_ORDER-1) pages */ + spinlock_t *range_locks; + /* Write-intensive fields used by compaction and vmstats. */ ZONE_PADDING(_pad2_) diff --git a/lib/list.c b/lib/list.c index 104faa144abf..3ecf62b88c86 100644 --- a/lib/list.c +++ b/lib/list.c @@ -202,3 +202,26 @@ void smp_list_splice(struct list_head *list, struct list_head *head) /* Simultaneously complete the splice and unlock the head node. */ WRITE_ONCE(head->next, first); } + +void smp_list_add(struct list_head *entry, struct list_head *head) +{ + struct list_head *succ; + + /* + * Lock the front of @head by replacing its next pointer with NULL. + * Should another thread be adding to the front, wait until it's done. + */ + succ = READ_ONCE(head->next); + while (succ == NULL || cmpxchg(&head->next, succ, NULL) != succ) { + cpu_relax(); + succ = READ_ONCE(head->next); + } + + entry->next = succ; + entry->prev = head; + succ->prev = entry; + + smp_wmb(); + + WRITE_ONCE(head->next, entry); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dff3edc60d71..5f5cc671bcf7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -339,6 +339,17 @@ static inline bool update_defer_init(pg_data_t *pgdat, } #endif +/* Return a pointer to the spinblock for a pageblock this page belongs to */ +static inline spinlock_t *get_range_lock(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long range = (page_to_pfn(page) - zone_start_pfn) >> + (MAX_ORDER - 1); + + return &zone->range_locks[range]; +} + /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(struct page *page, unsigned long pfn) @@ -697,25 +708,12 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void add_to_buddy_common(struct page *page, struct zone *zone, - unsigned int order) +static inline void add_to_buddy(struct page *page, struct zone *zone, + unsigned int order, int mt) { set_page_order(page, order); atomic_long_inc(&zone->free_area[order].nr_free); -} - -static inline void add_to_buddy_head(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add(&page->lru, &zone->free_area[order].free_list[mt]); -} - -static inline void add_to_buddy_tail(struct page *page, struct zone *zone, - unsigned int order, int mt) -{ - add_to_buddy_common(page, zone, order); - list_add_tail(&page->lru, &zone->free_area[order].free_list[mt]); + smp_list_add(&page->lru, &zone->free_area[order].free_list[mt]); } static inline void rmv_page_order(struct page *page) @@ -724,12 +722,25 @@ static inline void rmv_page_order(struct page *page) set_page_private(page, 0); } +static inline void remove_from_buddy_common(struct page *page, + struct zone *zone, unsigned int order) +{ + atomic_long_dec(&zone->free_area[order].nr_free); + rmv_page_order(page); +} + static inline void remove_from_buddy(struct page *page, struct zone *zone, unsigned int order) { list_del(&page->lru); - atomic_long_dec(&zone->free_area[order].nr_free); - rmv_page_order(page); + remove_from_buddy_common(page, zone, order); +} + +static inline void remove_from_buddy_concurrent(struct page *page, + struct zone *zone, unsigned int order) +{ + smp_list_del(&page->lru); + remove_from_buddy_common(page, zone, order); } /* @@ -806,6 +817,7 @@ static inline void __free_one_page(struct page *page, unsigned long uninitialized_var(buddy_pfn); struct page *buddy; unsigned int max_order; + spinlock_t *range_lock; max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1); @@ -819,6 +831,8 @@ static inline void __free_one_page(struct page *page, VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); VM_BUG_ON_PAGE(bad_range(zone, page), page); + range_lock = get_range_lock(page); + spin_lock(range_lock); continue_merging: while (order < max_order - 1) { buddy_pfn = __find_buddy_pfn(pfn, order); @@ -835,7 +849,7 @@ static inline void __free_one_page(struct page *page, if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); else - remove_from_buddy(buddy, zone, order); + remove_from_buddy_concurrent(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -867,28 +881,8 @@ static inline void __free_one_page(struct page *page, } done_merging: - /* - * If this is not the largest possible page, check if the buddy - * of the next-highest order is free. If it is, it's possible - * that pages are being freed that will coalesce soon. In case, - * that is happening, add the free page to the tail of the list - * so it's less likely to be used soon and more likely to be merged - * as a higher order page - */ - if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn)) { - struct page *higher_page, *higher_buddy; - combined_pfn = buddy_pfn & pfn; - higher_page = page + (combined_pfn - pfn); - buddy_pfn = __find_buddy_pfn(combined_pfn, order + 1); - higher_buddy = higher_page + (buddy_pfn - combined_pfn); - if (pfn_valid_within(buddy_pfn) && - page_is_buddy(higher_page, higher_buddy, order + 1)) { - add_to_buddy_tail(page, zone, order, migratetype); - return; - } - } - - add_to_buddy_head(page, zone, order, migratetype); + add_to_buddy(page, zone, order, migratetype); + spin_unlock(range_lock); } /* @@ -1154,7 +1148,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } - write_lock(&zone->lock); + read_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); /* @@ -1172,7 +1166,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); } - write_unlock(&zone->lock); + read_unlock(&zone->lock); } static void free_one_page(struct zone *zone, @@ -1826,7 +1820,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - add_to_buddy_head(&page[size], zone, high, migratetype); + add_to_buddy(&page[size], zone, high, migratetype); } } @@ -6286,6 +6280,18 @@ void __ref free_area_init_core_hotplug(int nid) } #endif +static void __init setup_range_locks(struct zone *zone) +{ + unsigned long nr = (zone->spanned_pages >> (MAX_ORDER - 1)) + 1; + unsigned long size = nr * sizeof(spinlock_t); + unsigned long i; + + zone->range_locks = memblock_virt_alloc_node_nopanic(size, + zone->zone_pgdat->node_id); + for (i = 0; i < nr; i++) + spin_lock_init(&zone->range_locks[i]); +} + /* * Set up the zone data structures: * - mark all pages reserved @@ -6357,6 +6363,7 @@ static void __init free_area_init_core(struct pglist_data *pgdat) setup_usemap(pgdat, zone, zone_start_pfn, size); init_currently_empty_zone(zone, zone_start_pfn, size); memmap_init(size, nid, j, zone_start_pfn); + setup_range_locks(zone); } }