From patchwork Fri Dec 25 09:59:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11990089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F540C433DB for ; Fri, 25 Dec 2020 10:02:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D85DB230FF for ; Fri, 25 Dec 2020 10:02:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D85DB230FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 111ED8D0084; Fri, 25 Dec 2020 05:02:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EDCB18D0080; Fri, 25 Dec 2020 05:02:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA5548D0085; Fri, 25 Dec 2020 05:02:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id C39B98D0080 for ; Fri, 25 Dec 2020 05:02:01 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8D5D31808B2B6 for ; Fri, 25 Dec 2020 10:02:01 +0000 (UTC) X-FDA: 77631363642.29.tray45_5a0550527479 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 6B74E180A93AD for ; Fri, 25 Dec 2020 10:02:01 +0000 (UTC) X-HE-Tag: tray45_5a0550527479 X-Filterd-Recvd-Size: 5732 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Dec 2020 10:01:59 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R471e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0UJjFIVO_1608890514; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UJjFIVO_1608890514) by smtp.aliyun-inc.com(127.0.0.1); Fri, 25 Dec 2020 18:01:55 +0800 From: Alex Shi To: willy@infradead.org Cc: tim.c.chen@linux.intel.com, Konstantin Khlebnikov , Hugh Dickins , Yu Zhao , Michal Hocko , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 1/4] mm/swap.c: pre-sort pages in pagevec for pagevec_lru_move_fn Date: Fri, 25 Dec 2020 17:59:47 +0800 Message-Id: <1608890390-64305-2-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1608890390-64305-1-git-send-email-alex.shi@linux.alibaba.com> References: <20201126155553.GT4327@casper.infradead.org> <1608890390-64305-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pages in pagevec may have different lruvec, so we have to do relock in function pagevec_lru_move_fn(), but a relock may cause current cpu wait for long time on the same lock for spinlock fairness reason. Before per memcg lru_lock, we have to bear the relock since the spinlock is the only way to serialize page's memcg/lruvec. Now TestClearPageLRU could be used to isolate pages exculsively, and stable the page's lruvec/memcg. So it gives us a chance to sort the page's lruvec before moving action in pagevec_lru_move_fn. Then we don't suffer from the spinlock's fairness wait. Signed-off-by: Alex Shi Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Yu Zhao Cc: Michal Hocko Cc: Matthew Wilcox (Oracle) Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/swap.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 79 insertions(+), 13 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index c5363bdebe67..994641331bf7 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -201,29 +201,95 @@ int get_kernel_page(unsigned long start, int write, struct page **pages) } EXPORT_SYMBOL_GPL(get_kernel_page); +/* Pratt's gaps for shell sort, https://en.wikipedia.org/wiki/Shellsort */ +static int gaps[] = { 6, 4, 3, 2, 1, 0}; + +/* Shell sort pagevec[] on page's lruvec.*/ +static void shell_sort(struct pagevec *pvec, unsigned long *lvaddr) +{ + int g, i, j, n = pagevec_count(pvec); + + for (g=0; gaps[g] > 0 && gaps[g] <= n/2; g++) { + int gap = gaps[g]; + + for (i = gap; i < n; i++) { + unsigned long tmp = lvaddr[i]; + struct page *page = pvec->pages[i]; + + for (j = i - gap; j >= 0 && lvaddr[j] > tmp; j -= gap) { + lvaddr[j + gap] = lvaddr[j]; + pvec->pages[j + gap] = pvec->pages[j]; + } + lvaddr[j + gap] = tmp; + pvec->pages[j + gap] = page; + } + } +} + +/* Get lru bit cleared page and their lruvec address, release the others */ +void sort_isopv(struct pagevec *pvec, struct pagevec *isopv, + unsigned long *lvaddr) +{ + int i, j; + struct pagevec busypv; + + pagevec_init(&busypv); + + for (i = 0, j = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + + pvec->pages[i] = NULL; + + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) { + pagevec_add(&busypv, page); + continue; + } + lvaddr[j++] = (unsigned long) + mem_cgroup_page_lruvec(page, page_pgdat(page)); + pagevec_add(isopv, page); + } + pagevec_reinit(pvec); + if (pagevec_count(&busypv)) + release_pages(busypv.pages, busypv.nr); + + shell_sort(isopv, lvaddr); +} + static void pagevec_lru_move_fn(struct pagevec *pvec, void (*move_fn)(struct page *page, struct lruvec *lruvec)) { - int i; + int i, n; struct lruvec *lruvec = NULL; unsigned long flags = 0; + unsigned long lvaddr[PAGEVEC_SIZE]; + struct pagevec isopv; - for (i = 0; i < pagevec_count(pvec); i++) { - struct page *page = pvec->pages[i]; + pagevec_init(&isopv); - /* block memcg migration during page moving between lru */ - if (!TestClearPageLRU(page)) - continue; + sort_isopv(pvec, &isopv, lvaddr); - lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); - (*move_fn)(page, lruvec); + n = pagevec_count(&isopv); + if (!n) + return; - SetPageLRU(page); + lruvec = (struct lruvec *)lvaddr[0]; + spin_lock_irqsave(&lruvec->lru_lock, flags); + + for (i = 0; i < n; i++) { + /* lock new lruvec if lruvec changes, we have sorted them */ + if (lruvec != (struct lruvec *)lvaddr[i]) { + spin_unlock_irqrestore(&lruvec->lru_lock, flags); + lruvec = (struct lruvec *)lvaddr[i]; + spin_lock_irqsave(&lruvec->lru_lock, flags); + } + + (*move_fn)(isopv.pages[i], lruvec); + + SetPageLRU(isopv.pages[i]); } - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - release_pages(pvec->pages, pvec->nr); - pagevec_reinit(pvec); + spin_unlock_irqrestore(&lruvec->lru_lock, flags); + release_pages(isopv.pages, isopv.nr); } static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)