From patchwork Fri Nov 15 15:44:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vitaly Wool X-Patchwork-Id: 11246621 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A474213BD for ; Fri, 15 Nov 2019 15:44:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 56E0A2073B for ; Fri, 15 Nov 2019 15:44:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vCvH5rQ0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 56E0A2073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 73F096B0007; Fri, 15 Nov 2019 10:44:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6EEC56B0008; Fri, 15 Nov 2019 10:44:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DEF46B000A; Fri, 15 Nov 2019 10:44:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id 438B26B0007 for ; Fri, 15 Nov 2019 10:44:42 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id DC9DC82499A8 for ; Fri, 15 Nov 2019 15:44:41 +0000 (UTC) X-FDA: 76158934362.30.cakes67_631fd15aedd45 X-Spam-Summary: 2,0,0,1765a99370c86765,d41d8cd98f00b204,vitalywool@gmail.com,::linux-kernel@vger.kernel.org:vitaly.wool@konsulko.com:akpm@linux-foundation.org:ddstreet@ieee.org:sjenning@redhat.com,RULES_HIT:1:41:69:355:379:541:800:960:966:968:973:988:989:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1593:1594:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2636:2895:2897:2902:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4321:4385:4605:5007:6261:6653:7875:7903:8531:9108:9413:9592:10004:11026:11232:11473:11658:11914:12043:12257:12291:12295:12296:12297:12438:12517:12519:12555:12679:12683:12760:12986:13138:13161:13229:13231:13439:13869:14659:14687:21063:21080:21324:21433:21444:21451:21611:21627:21666:21939:21966:30005:30012:30034:30054:30070,0,RBL:209.85.208.194:@gmail.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0 :0:0,LFt X-HE-Tag: cakes67_631fd15aedd45 X-Filterd-Recvd-Size: 13878 Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Nov 2019 15:44:41 +0000 (UTC) Received: by mail-lj1-f194.google.com with SMTP id 139so11196177ljf.1 for ; Fri, 15 Nov 2019 07:44:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mime-version :content-transfer-encoding; bh=84pl1/1+YPRtIqTofzdrWQY2WZWjY9tDk/ZvXd4FhfU=; b=vCvH5rQ0JC4w4QwWVm5lwaEEg+3sxM3iFLQU9k+U3ozBbKcf2+0rPoueV0G1KLP/uy sspJPoRqhGtxsNUOz9Asjt26NxCbR9s6vl1DZwzRK7E7PQf1y3vx7IAmPDS8iGwA36F7 0KuPtN8HCc84Rfl3IWPzvQ4iLo26eRWhzSA4ANnUOEmq5vCIdREcYXvrgJoO3MB40aoW vtaFscl/4GtNQkVEHhPlYII5nSGNVtdnn+Z1IO6hhjHyQFgN/dXQq/fHwaEssTF92laA gtdEgLgMbjpj/NJJISw/mLpmZww3EC+As3UbZkriMyj9Y5DS4mxmEVDx7ts/aJgUs8H+ IoAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-transfer-encoding; bh=84pl1/1+YPRtIqTofzdrWQY2WZWjY9tDk/ZvXd4FhfU=; b=pmLIROI/gL7flJazilWzRcFrPMFFLrxBVvOWn7YclMcnuSbhibKxroBxPGThLHyJ6U 4po3nedLrfzszmPOiy6/fGA3h0HOj1RWXupeQ++X3z0UVr2M37unRBKTr97wWACg+9cO 45KzjPzDa9l69Z02/7zhPxZ0sEPozlEIp8CrK0xkMttnA1I2Eayt24Ub0WmX3KlmjVsO d0YrmRywPt2f3rvUm/MFSlAi8OfEAwBztrYxQkRYs17L+Z8pNZcwuSRwYGJzKcm2/lnO c3AmWnPkaJd7ouilEXcrs0QPWec30fxZ2UC0rTvIj00zp1Zb0n7LVJou7xW9FNldRUyI Fqeg== X-Gm-Message-State: APjAAAXVxPwBCZBmKIlKOGBYFlytpMXwHnS45Dp7VjS//PypOPrdKJjp y92CLWFnVjxPNP0YkTpkjaKn0VO+45w= X-Google-Smtp-Source: APXvYqwjp5Jc5QL96PbtcfSeNmz7lyM5E4hliHUzg13SaO34zCfUgrkVkLxYPcA4usBJcXzj7wgevg== X-Received: by 2002:a05:651c:1053:: with SMTP id x19mr11796456ljm.39.1573832679135; Fri, 15 Nov 2019 07:44:39 -0800 (PST) Received: from seldlx21914.corpusers.net ([37.139.156.40]) by smtp.gmail.com with ESMTPSA id r22sm4412765lji.71.2019.11.15.07.44.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Nov 2019 07:44:38 -0800 (PST) Date: Fri, 15 Nov 2019 16:44:36 +0100 From: Vitaly Wool To: Linux-MM , linux-kernel@vger.kernel.org Cc: vitaly.wool@konsulko.com, Andrew Morton , Dan Streetman , Seth Jennings Subject: [RFC] zswap: use B-tree for search Message-Id: <20191115164436.da3ae5bd403564174e334bca@gmail.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; x86_64-unknown-linux-gnu) Mime-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The current zswap implementation uses red-black trees to store entries and to perform lookups. Although this algorithm obviously has complexity of O(log N) it still takes a while to complete lookup (or, even more for replacement) of an entry, when the amount of entries is huge (100K+). B-trees are known to handle such cases more efficiently (i. e. also with O(log N) complexity but with way lower coefficient) so trying zswap with B-trees sounded good to me from the start. The implementation of B-trees that is currently present in Linux kernel isn't really doing things in the best possible way (i. e. it has recursion) but I thought it was worth trying anyway. So I modified zswap (the patch follows below) to use lib/btree and I can see substantial improvement in my tests when thie patch is applied. I do not post test results here since we're not quite there yet; I'd like to first hear opinions on the idea as such and on the way I use B-trees here. Thanks in advance :) Signed-off-by: Vitaly Wool mm/zswap.c | 147 +++++++++++++++++++++++++++------------------------------------- 1 file changed, 64 insertions(+), 83 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 46a322316e52..c2cc07be1322 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -21,7 +21,7 @@ #include #include #include -#include +#include #include #include #include @@ -134,7 +134,6 @@ struct zswap_pool { * This structure contains the metadata for tracking a single compressed * page within zswap. * - * rbnode - links the entry into red-black tree for the appropriate swap type * offset - the swap offset for the entry. Index into the red-black tree. * refcount - the number of outstanding reference to the entry. This is needed * to protect against premature freeing of the entry by code @@ -149,7 +148,6 @@ struct zswap_pool { * value - value of the same-value filled pages which have same content */ struct zswap_entry { - struct rb_node rbnode; pgoff_t offset; int refcount; unsigned int length; @@ -166,11 +164,11 @@ struct zswap_header { /* * The tree lock in the zswap_tree struct protects a few things: - * - the rbtree + * - the tree * - the refcount field of each entry in the tree */ struct zswap_tree { - struct rb_root rbroot; + struct btree_head head; spinlock_t lock; }; @@ -252,7 +250,6 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; - RB_CLEAR_NODE(&entry->rbnode); return entry; } @@ -262,58 +259,18 @@ static void zswap_entry_cache_free(struct zswap_entry *entry) } /********************************* -* rbtree functions +* btree functions **********************************/ -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) -{ - struct rb_node *node = root->rb_node; - struct zswap_entry *entry; +static struct btree_geo *btree_pgofft_geo; - while (node) { - entry = rb_entry(node, struct zswap_entry, rbnode); - if (entry->offset > offset) - node = node->rb_left; - else if (entry->offset < offset) - node = node->rb_right; - else - return entry; - } - return NULL; -} - -/* - * In the case that a entry with the same offset is found, a pointer to - * the existing entry is stored in dupentry and the function returns -EEXIST - */ -static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, - struct zswap_entry **dupentry) +static struct zswap_entry *zswap_search(struct btree_head *head, pgoff_t offset) { - struct rb_node **link = &root->rb_node, *parent = NULL; - struct zswap_entry *myentry; - - while (*link) { - parent = *link; - myentry = rb_entry(parent, struct zswap_entry, rbnode); - if (myentry->offset > entry->offset) - link = &(*link)->rb_left; - else if (myentry->offset < entry->offset) - link = &(*link)->rb_right; - else { - *dupentry = myentry; - return -EEXIST; - } - } - rb_link_node(&entry->rbnode, parent, link); - rb_insert_color(&entry->rbnode, root); - return 0; + return btree_lookup(head, btree_pgofft_geo, &offset); } -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) +static void zswap_erase(struct btree_head *head, struct zswap_entry *entry) { - if (!RB_EMPTY_NODE(&entry->rbnode)) { - rb_erase(&entry->rbnode, root); - RB_CLEAR_NODE(&entry->rbnode); - } + btree_remove(head, btree_pgofft_geo, &entry->offset); } /* @@ -342,25 +299,40 @@ static void zswap_entry_get(struct zswap_entry *entry) /* caller must hold the tree lock * remove from the tree and free it, if nobody reference the entry */ -static void zswap_entry_put(struct zswap_tree *tree, +static void zswap_entry_put(struct btree_head *head, struct zswap_entry *entry) { int refcount = --entry->refcount; BUG_ON(refcount < 0); if (refcount == 0) { - zswap_rb_erase(&tree->rbroot, entry); + zswap_erase(head, entry); zswap_free_entry(entry); } } +static int zswap_insert_or_replace(struct btree_head *head, + struct zswap_entry *entry) +{ + struct zswap_entry *old; + + do { + old = btree_remove(head, btree_pgofft_geo, &entry->offset); + if (old) { + zswap_duplicate_entry++; + zswap_entry_put(head, old); + } + } while (old); + return btree_insert(head, btree_pgofft_geo, &entry->offset, entry, + GFP_ATOMIC); +} /* caller must hold the tree lock */ -static struct zswap_entry *zswap_entry_find_get(struct rb_root *root, +static struct zswap_entry *zswap_entry_find_get(struct btree_head *head, pgoff_t offset) { struct zswap_entry *entry; - entry = zswap_rb_search(root, offset); + entry = zswap_search(head, offset); if (entry) zswap_entry_get(entry); @@ -861,7 +833,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) /* find and ref zswap entry */ spin_lock(&tree->lock); - entry = zswap_entry_find_get(&tree->rbroot, offset); + entry = zswap_entry_find_get(&tree->head, offset); if (!entry) { /* entry was invalidated */ spin_unlock(&tree->lock); @@ -910,7 +882,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) spin_lock(&tree->lock); /* drop local reference */ - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); /* * There are two possible situations for entry here: @@ -919,8 +891,8 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) * because invalidate happened during writeback * search the tree and free the entry if find entry */ - if (entry == zswap_rb_search(&tree->rbroot, offset)) - zswap_entry_put(tree, entry); + if (entry == zswap_search(&tree->head, offset)) + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); goto end; @@ -934,7 +906,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) */ fail: spin_lock(&tree->lock); - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); end: @@ -988,7 +960,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct page *page) { struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *dupentry; + struct zswap_entry *entry; struct crypto_comp *tfm; int ret; unsigned int hlen, dlen = PAGE_SIZE; @@ -1096,16 +1068,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, insert_entry: /* map */ spin_lock(&tree->lock); - do { - ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry); - if (ret == -EEXIST) { - zswap_duplicate_entry++; - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, dupentry); - zswap_entry_put(tree, dupentry); - } - } while (ret == -EEXIST); + ret = zswap_insert_or_replace(&tree->head, entry); spin_unlock(&tree->lock); + if (ret < 0) { + zswap_reject_alloc_fail++; + goto put_dstmem; + } /* update stats */ atomic_inc(&zswap_stored_pages); @@ -1138,7 +1106,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, /* find */ spin_lock(&tree->lock); - entry = zswap_entry_find_get(&tree->rbroot, offset); + entry = zswap_entry_find_get(&tree->head, offset); if (!entry) { /* entry was written back */ spin_unlock(&tree->lock); @@ -1168,7 +1136,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, freeentry: spin_lock(&tree->lock); - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); return 0; @@ -1182,36 +1150,41 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) /* find */ spin_lock(&tree->lock); - entry = zswap_rb_search(&tree->rbroot, offset); + entry = zswap_search(&tree->head, offset); if (!entry) { /* entry was written back */ spin_unlock(&tree->lock); return; } - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, entry); + /* remove from tree */ + zswap_erase(&tree->head, entry); /* drop the initial reference from entry creation */ - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); } +void do_free_entry(void *elem, unsigned long opaque, unsigned long *key, + size_t index, void *func2) +{ + struct zswap_entry *entry = elem; + zswap_free_entry(entry); +} + /* frees all zswap entries for the given swap type */ static void zswap_frontswap_invalidate_area(unsigned type) { struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *n; if (!tree) return; /* walk the tree and free everything */ spin_lock(&tree->lock); - rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) - zswap_free_entry(entry); - tree->rbroot = RB_ROOT; + btree_visitor(&tree->head, btree_pgofft_geo, 0, do_free_entry, NULL); + btree_destroy(&tree->head); spin_unlock(&tree->lock); kfree(tree); zswap_trees[type] = NULL; @@ -1226,8 +1199,11 @@ static void zswap_frontswap_init(unsigned type) pr_err("alloc failed, zswap disabled for swap type %d\n", type); return; } - - tree->rbroot = RB_ROOT; + if (btree_init(&tree->head) < 0) { + pr_err("couldn't init the tree head\n"); + kfree(tree); + return; + } spin_lock_init(&tree->lock); zswap_trees[type] = tree; } @@ -1302,6 +1278,11 @@ static int __init init_zswap(void) zswap_init_started = true; + if (sizeof(pgoff_t) == 8) + btree_pgofft_geo = &btree_geo64; + else + btree_pgofft_geo = &btree_geo32; + if (zswap_entry_cache_create()) { pr_err("entry cache creation failed\n"); goto cache_fail;