From patchwork Tue Apr 18 06:25:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Romanov X-Patchwork-Id: 13215071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F564C77B71 for ; Tue, 18 Apr 2023 06:25:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F36968E0005; Tue, 18 Apr 2023 02:25:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBF1D8E0002; Tue, 18 Apr 2023 02:25:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D11A18E0005; Tue, 18 Apr 2023 02:25:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BA8428E0002 for ; Tue, 18 Apr 2023 02:25:18 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 848BDA0120 for ; Tue, 18 Apr 2023 06:25:18 +0000 (UTC) X-FDA: 80693524716.07.6162FDE Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by imf14.hostedemail.com (Postfix) with ESMTP id 0D2FA100003 for ; Tue, 18 Apr 2023 06:25:15 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=sberdevices.ru header.s=mail header.b=O0HhLxxi; spf=pass (imf14.hostedemail.com: domain of AVRomanov@sberdevices.ru designates 45.89.227.171 as permitted sender) smtp.mailfrom=AVRomanov@sberdevices.ru; dmarc=pass (policy=quarantine) header.from=sberdevices.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681799116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7MTjAhk17Jfa37m9fE/raeY17+lmbDG/IO4oS0JOBFU=; b=gnlpFwDLaLSCv3j5U0EYMTVqTZfUD2DkvW9Iw+SjChb0iHyaVcSIe5Oqh3zcHszvKesvUa cJqsuONpGrHyifaGo+wx+rsXdYTSO9KpSXqUZ59pilbk4R4ObDVsEgc/oPzvXF9GfM+p1y 1NXCAjTHawnVjFm7MzknqvrNdKW1vOY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=sberdevices.ru header.s=mail header.b=O0HhLxxi; spf=pass (imf14.hostedemail.com: domain of AVRomanov@sberdevices.ru designates 45.89.227.171 as permitted sender) smtp.mailfrom=AVRomanov@sberdevices.ru; dmarc=pass (policy=quarantine) header.from=sberdevices.ru ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681799116; a=rsa-sha256; cv=none; b=QIM8JgfR5fGn/MTgDX8grsXAfo5UA+nM+Xj/alWGGd3UksBkWUT6S/frQyRkZni+Uf2Otc 4U3MIMYVH7MJrFd8n3XPuAK1NqY/JoAmJySmowTaTJadKLrnHQcFcQrvp6jOT+rquQ5g+S TjtyEzlBqpqcCuhf2zuTjRw4lGZwp0E= Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 543335FD57; Tue, 18 Apr 2023 09:25:14 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1681799114; bh=7MTjAhk17Jfa37m9fE/raeY17+lmbDG/IO4oS0JOBFU=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; b=O0HhLxxiOVZ5ZvDS/LrzBmvgXC15q0/z2vM9JPSNZdJfoDfCs6AKI7FKjPhdT/Fft /e26Iya6eyxO4wRvxl1mAp+elaWp/5Vf/8d3cU44jS6ny3aZdN0mL+uEKP0jR7xX8W diRaxtrAmm+uIgxpl3wJANblOBCWJkh3rwjmhMsYB1jNCui0wfjlitv0fl0lRT9PHW MM76Z7+eSbqPJbPnTBLlkCdQ+u/CWnuCUBrTcOmXTBf+DBdnM3bL3no9Z92IfZyJK8 H59sPAoEub3g5IL8hpdVw/fQ4J2vWGydD6Engei+ubzUgXXJe9Sus1LwX/9dDrfzya 4I4Rufw+v85YQ== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Tue, 18 Apr 2023 09:25:14 +0300 (MSK) From: Alexey Romanov To: , , CC: , , , Alexey Romanov Subject: [RFC PATCH v1 3/5] mm/zsmalloc: introduce objects folding mechanism Date: Tue, 18 Apr 2023 09:25:01 +0300 Message-ID: <20230418062503.62121-4-avromanov@sberdevices.ru> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230418062503.62121-1-avromanov@sberdevices.ru> References: <20230418062503.62121-1-avromanov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [172.16.1.6] X-ClientProxiedBy: S-MS-EXCH02.sberdevices.ru (172.16.1.5) To S-MS-EXCH01.sberdevices.ru (172.16.1.4) X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2023/04/18 02:02:00 #21122658 X-KSMG-AntiVirus-Status: Clean, skipped X-Stat-Signature: b8qnyfzgb1k8nchhcoju4qhdbpn5d738 X-Rspam-User: X-Rspamd-Queue-Id: 0D2FA100003 X-Rspamd-Server: rspam06 X-HE-Tag: 1681799115-680454 X-HE-Meta: U2FsdGVkX1+Gu5TkG2law8+42jYnl3XmWY7TsfRTl5UKc1cyjwjiSrFe7HSZc2v2vA4j0wf6CTUz1yoL3Lw6nHYPM3eWUFV+HO484yV3B0+XMzgPfCfXE60DPSTIdUyDo7CszaCFSjlL172bAQxxydyBHWlT3i/do1Okc/DBzSa5iDsupUp+OjRn9M6xjlDjZZQwU8tb4UNuJPLO1zBw5NnXTTduKlY0SsJEhVIs876ZPw4E6BonozMuIiHkoq6JE1Cp03YrgFSUxC14SazGWkjVurRj5EH8NJ9w9/OkFiOQSJ8f5famVTct7+l8R7nRRKZDNKtJLFdokahcPXSisi1wzRqSxHJk1Eh2DD4xDDPigLw5VKDablJHZszXT3ILrUijSCqDF1+Sd4/cfxWPdvnzj1GTWOI12Flwe1/Hqij15RW2iUTE6soA1e/HAu2V1GKFvImiyZFRJw57e8f5XG2qvoC1WrGKyewvZ3F/qWGDzD0UdYJFbJJNcQC3Udfdgqy4j8KIBzEfwzdt0dkELVKzxjNHNN+yXsGjgUOrV8OJlO6BrdLMGPWYo4+qrPwyeg9eeV8fC6WUXZ+L+ffsKeRk5xVVa4pYHg/SdN5Nhql3vln52A6QNf3BoH8a+Cek1yLMtCQ6f+AATCn+JVujhM8UysJ5MrC9fU5N/1299OBXTyeiSMLhf1KRanB5Y8DRW3ZL4eVUp33DY6FcX3UnznIFs6fFbehuD3Am2NjK1u33V58mj7oBPsRr1vnPG30tO3g4VmJw9xfcIHhxqH9GcoxahItaf5c9J1+t2sblg5EHftWTtChhiKfZm/897+tTYdKpF0mHebpXc9lYru4D75eqGAgSjpcjENLBoQko4bPrERrdgzJOueYe2pUbxNZ+4486E9Q48QiIqncGn/XKsFpd5giMQUT7QkVBEOqVYm52StkPldrK+t6EYEuk2BH3JMIVm7tcYNSb8Vcp6Ot 7F+PmdqK RS6oxnCX64xtBbxOUYnQEQ6cpAjyqzISykYSbtfTSOfwg5vNGcTSVc8NySvt84AytotXozdJbK1vU4pszg7D+n0Nub0CwKnOwW7qAjXAC85iZ3GDf7CxVVhV8Jb1zkdZ83mJtcLjVYvoWdWqFxx+02f8ZfvzrtNpQ5/S4UVqFvWvlwyWBdYpBHzWihauDxqXrWwt4CBt8lN/nwHcNvbmS0aV/3TBI38k9WgENKl6Wl5YLya5yoz/LSJufpqfEbtWg0AURFClYEg/8IruS3JINf41T/XsAVjgdGu5zoc0iG4ktRDVt6Cuddhy444LIV3oGflrI+aA3iqdLJZUt9jWAoeuIhwYeVRwbXAB7tAMcqfbr3uZP5tds411RZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds a mechanism to scan every zspage in zs_pool and frees all identical objects, leaving only one in memory. All zsmalloc handles which reference this freed objects now refer to the same, not freed, object. To implement this mechanism, we sequentially scan every allocated object, counting the hash from the their contents and save the object value in the hash table (hlist_head). If the hash matches, we free the identical objects via obj_free() and update the link rbtree. To implement this mechanism a rbtree is needed. The tree node key is a reference to the object value, and the value is the number of references to this object. This is necessary for data consistency so that we do not obj_free() the object referenced by another handle object. Also, this mechanism fully compatible with zs_compact() function. Signed-off-by: Alexey Romanov --- include/linux/zsmalloc.h | 4 + mm/Kconfig | 9 + mm/zsmalloc.c | 470 ++++++++++++++++++++++++++++++++++++++- 3 files changed, 476 insertions(+), 7 deletions(-) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index a48cd0ffe57d..a581283c68b4 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -36,6 +36,8 @@ enum zs_mapmode { struct zs_pool_stats { /* How many pages were migrated (freed) */ atomic_long_t pages_compacted; + /* How many pages were freed during objects folding */ + atomic_long_t pages_folded; }; struct zs_pool; @@ -58,4 +60,6 @@ unsigned long zs_compact(struct zs_pool *pool); unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +unsigned long zs_fold(struct zs_pool *pool); #endif diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec05..f01ec9038101 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -191,6 +191,15 @@ config ZSMALLOC_STAT information to userspace via debugfs. If unsure, say N. +config ZSMALLOC_FOLD + bool "Export zsmalloc objects folding mechanism" + depends on ZSMALLOC + default n + help + This option enables a mechanism for folding identical objects, + which can reduce the amount of zsmalloc used memory. + If unsure, say N. + menu "SLAB allocator options" choice diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 0a3b11aa07a9..de6be26cca65 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -62,6 +62,8 @@ #include #include #include +#include +#include #define ZSPAGE_MAGIC 0x58 @@ -178,6 +180,7 @@ enum class_stat_type { CLASS_FULL, OBJ_ALLOCATED, OBJ_USED, + OBJ_FOLDED, NR_ZS_STAT_TYPE, }; @@ -219,6 +222,14 @@ struct size_class { unsigned int index; struct zs_size_stat stats; +#ifdef CONFIG_ZSMALLOC_FOLD + /* + * We use this rbtree only in zs_free() and __zs_fold(). + * This function always uses spin_lock(&pool->lock), so, + * no additional lock specifically for rbtree is needed. + */ + struct rb_root fold_rbtree; +#endif }; /* @@ -253,6 +264,9 @@ struct zs_pool { struct size_class *size_class[ZS_SIZE_CLASSES]; struct kmem_cache *handle_cachep; struct kmem_cache *zspage_cachep; +#ifdef CONFIG_ZSMALLOC_FOLD + struct kmem_cache *fold_rbtree_cachep; +#endif atomic_long_t pages_allocated; @@ -307,6 +321,129 @@ struct mapping_area { enum zs_mapmode vm_mm; /* mapping mode */ }; +#ifdef CONFIG_ZSMALLOC_FOLD +struct obj_hash_node { + unsigned long handle; + struct hlist_node next; +}; + +struct hash_table { + struct hlist_head *table; + struct kmem_cache *cachep; + size_t size; +}; + +struct fold_rbtree_node { + struct rb_node node; + unsigned long key; + unsigned int cnt; +}; + +static int fold_rbtree_key_cmp(const void *k, const struct rb_node *node) +{ + const struct fold_rbtree_node *entry = rb_entry(node, struct fold_rbtree_node, node); + unsigned long key = *(unsigned long *)k; + + if (entry->key == key) + return 0; + + return key < entry->key ? -1 : 1; +} + +static struct fold_rbtree_node *fold_rbtree_search(struct rb_root *root, + unsigned long key) +{ + struct rb_node *node = + rb_find((void *)&key, root, fold_rbtree_key_cmp); + + if (!node) + return NULL; + + return rb_entry(node, struct fold_rbtree_node, node); +} + +static int fold_rbtree_node_cmp(struct rb_node *a_node, const struct rb_node *b_node) +{ + const struct fold_rbtree_node *a = rb_entry(a_node, struct fold_rbtree_node, node); + + return fold_rbtree_key_cmp((void *)&a->key, b_node); +} + +static bool fold_rbtree_insert(struct rb_root *root, struct fold_rbtree_node *data) +{ + return rb_find_add(&data->node, root, fold_rbtree_node_cmp); +} + +static struct fold_rbtree_node *fold_rbtree_alloc_node(struct zs_pool *pool, + unsigned long key, unsigned int cnt) +{ + /* + * This function is called under a spinlock, + * so it is necessary to use GFP_ATOMIC flag + */ + struct fold_rbtree_node *node = + kmem_cache_alloc(pool->fold_rbtree_cachep, GFP_ATOMIC); + + if (!node) + return NULL; + + node->key = key; + node->cnt = cnt; + + return node; +} + +static void fold_rbtree_free_node(struct zs_pool *pool, struct fold_rbtree_node *node) +{ + kmem_cache_free(pool->fold_rbtree_cachep, node); +} + +static void free_htable(struct hash_table *htable) +{ + size_t i; + struct hlist_node *tmp; + struct obj_hash_node *node; + + for (i = 0; i < htable->size; i++) { + hlist_for_each_entry_safe(node, tmp, &htable->table[i], next) { + hlist_del(&node->next); + kmem_cache_free(htable->cachep, node); + } + } + + vfree(htable->table); + kmem_cache_destroy(htable->cachep); +} + +static int init_htable(struct hash_table *htable, size_t size) +{ + size_t i; + + htable->size = size; + htable->table = vmalloc_array(htable->size, sizeof(struct hlist_head)); + if (!htable->table) + return -ENOMEM; + + htable->cachep = kmem_cache_create("fold_htable", sizeof(struct obj_hash_node), + 0, 0, NULL); + if (!htable->cachep) { + vfree(htable->table); + return -ENOMEM; + } + + for (i = 0; i < htable->size; i++) + INIT_HLIST_HEAD(&htable->table[i]); + + return 0; +} + +static void insert_htable(struct hash_table *htable, struct obj_hash_node *node, + unsigned long hash) +{ + hlist_add_head(&node->next, &htable->table[hash % htable->size]); +} +#endif /* CONFIG_ZSMALLOC_FOLD */ + /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ static void SetZsHugePage(struct zspage *zspage) { @@ -353,6 +490,18 @@ static int create_cache(struct zs_pool *pool) return 1; } +#ifdef CONFIG_ZSMALLOC_FOLD + pool->fold_rbtree_cachep = kmem_cache_create("fold_rbtree", + sizeof(struct fold_rbtree_node), 0, 0, NULL); + if (!pool->fold_rbtree_cachep) { + kmem_cache_destroy(pool->handle_cachep); + kmem_cache_destroy(pool->zspage_cachep); + pool->handle_cachep = NULL; + pool->zspage_cachep = NULL; + return 1; + } +#endif + return 0; } @@ -360,6 +509,9 @@ static void destroy_cache(struct zs_pool *pool) { kmem_cache_destroy(pool->handle_cachep); kmem_cache_destroy(pool->zspage_cachep); +#ifdef CONFIG_ZSMALLOC_FOLD + kmem_cache_destroy(pool->fold_rbtree_cachep); +#endif } static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) @@ -639,15 +791,15 @@ static int zs_stats_size_show(struct seq_file *s, void *v) struct size_class *class; int objs_per_zspage; unsigned long class_almost_full, class_almost_empty; - unsigned long obj_allocated, obj_used, pages_used, freeable; + unsigned long obj_allocated, obj_used, pages_used, freeable, folded; unsigned long total_class_almost_full = 0, total_class_almost_empty = 0; unsigned long total_objs = 0, total_used_objs = 0, total_pages = 0; - unsigned long total_freeable = 0; + unsigned long total_freeable = 0, total_folded = 0; seq_printf(s, " %5s %5s %11s %12s %13s %10s %10s %16s %8s\n", "class", "size", "almost_full", "almost_empty", "obj_allocated", "obj_used", "pages_used", - "pages_per_zspage", "freeable"); + "pages_per_zspage", "freeable", "folded"); for (i = 0; i < ZS_SIZE_CLASSES; i++) { class = pool->size_class[i]; @@ -660,6 +812,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v) class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); obj_used = zs_stat_get(class, OBJ_USED); + folded = zs_stat_get(class, OBJ_FOLDED); freeable = zs_can_compact(class); spin_unlock(&pool->lock); @@ -668,10 +821,10 @@ static int zs_stats_size_show(struct seq_file *s, void *v) class->pages_per_zspage; seq_printf(s, " %5u %5u %11lu %12lu %13lu" - " %10lu %10lu %16d %8lu\n", + " %10lu %10lu %16d %8lu %8lu\n", i, class->size, class_almost_full, class_almost_empty, obj_allocated, obj_used, pages_used, - class->pages_per_zspage, freeable); + class->pages_per_zspage, freeable, folded); total_class_almost_full += class_almost_full; total_class_almost_empty += class_almost_empty; @@ -679,13 +832,14 @@ static int zs_stats_size_show(struct seq_file *s, void *v) total_used_objs += obj_used; total_pages += pages_used; total_freeable += freeable; + total_folded += folded } seq_puts(s, "\n"); - seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu %16s %8lu\n", + seq_printf(s, " %5s %5s %11lu %12lu %13lu %10lu %10lu %16s %8lu %8lu\n", "Total", "", total_class_almost_full, total_class_almost_empty, total_objs, - total_used_objs, total_pages, "", total_freeable); + total_used_objs, total_pages, "", total_freeable, total_folded); return 0; } @@ -1663,6 +1817,9 @@ void zs_free(struct zs_pool *pool, unsigned long handle) unsigned long obj; struct size_class *class; enum fullness_group fullness; +#ifdef CONFIG_ZSMALLOC_FOLD + struct fold_rbtree_node *node; +#endif if (IS_ERR_OR_NULL((void *)handle)) return; @@ -1677,6 +1834,22 @@ void zs_free(struct zs_pool *pool, unsigned long handle) zspage = get_zspage(f_page); class = zspage_class(pool, zspage); +#ifdef CONFIG_ZSMALLOC_FOLD + node = fold_rbtree_search(&class->fold_rbtree, obj); + if (node) { + node->cnt--; + + if (node->cnt) { + cache_free_handle(pool, handle); + spin_unlock(&pool->lock); + return; + } + + rb_erase(&node->node, &class->fold_rbtree); + fold_rbtree_free_node(pool, node); + } +#endif + class_stat_dec(class, OBJ_USED, 1); #ifdef CONFIG_ZPOOL @@ -2345,6 +2518,286 @@ unsigned long zs_compact(struct zs_pool *pool) } EXPORT_SYMBOL_GPL(zs_compact); +#ifdef CONFIG_ZSMALLOC_FOLD +struct zs_fold_control { + struct zspage *src_zspage; + unsigned long handle; + char *current_buf; + char *folded_buf; +}; + +static unsigned long obj_get_hash(void *src, size_t len) +{ + return xxhash(src, len, 0); +} + +static void read_object(void *buf, struct zs_pool *pool, unsigned long handle) +{ + struct zspage *zspage; + struct page *page; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + struct mapping_area *area; + struct page *pages[2]; + void *ret; + + obj = handle_to_obj(handle); + obj_to_location(obj, &page, &obj_idx); + zspage = get_zspage(page); + + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + local_lock(&zs_map_area.lock); + area = this_cpu_ptr(&zs_map_area); + area->vm_mm = ZS_MM_RO; + if (off + class->size <= PAGE_SIZE) { + area->vm_addr = kmap_local_page(page); + ret = area->vm_addr + off; + if (likely(!ZsHugePage(zspage))) + ret += ZS_HANDLE_SIZE; + + memcpy(buf, ret, class->size); + kunmap_local(area->vm_addr); + goto out; + } + + pages[0] = page; + pages[1] = get_next_page(page); + + ret = __zs_map_object(area, pages, off, class->size); + if (likely(!ZsHugePage(zspage))) + ret += ZS_HANDLE_SIZE; + + memcpy(buf, ret, class->size); + + __zs_unmap_object(area, pages, off, class->size); +out: + local_unlock(&zs_map_area.lock); +} + +static void fold_object(struct size_class *class, unsigned long obj) +{ + class_stat_dec(class, OBJ_USED, 1); + class_stat_inc(class, OBJ_FOLDED, 1); + obj_free(class->size, obj, NULL); +} + +static int zs_cmp_obj_and_fold(struct zs_pool *pool, struct size_class *class, + struct hash_table *htable, const struct zs_fold_control *fc) +{ + struct fold_rbtree_node *current_rbnode, *fold_rbnode; + struct obj_hash_node *hash_node; + struct zspage *current_zspage; + struct page *page; + unsigned long hash; + + read_object(fc->folded_buf, pool, fc->handle); + hash = obj_get_hash(fc->folded_buf, class->size); + + hlist_for_each_entry(hash_node, &htable->table[hash % htable->size], next) { + unsigned long current_obj, folded_obj; + int cmp; + + current_obj = handle_to_obj(hash_node->handle); + obj_to_page(current_obj, &page); + current_zspage = get_zspage(page); + + /* + * Because we can fold objects on the same zspage, + * current_zspage and fc->src_zspage can be equal and + * fc->src_zspage is already locked. + */ + if (current_zspage != fc->src_zspage) + migrate_write_lock(current_zspage); + + read_object(fc->current_buf, pool, hash_node->handle); + cmp = memcmp(fc->folded_buf, fc->current_buf, class->size); + + if (!cmp) { + /* Skip the already folded objects */ + folded_obj = handle_to_obj(fc->handle); + if (current_obj == folded_obj) + return 0; + + current_rbnode = fold_rbtree_search(&class->fold_rbtree, current_obj); + if (!current_rbnode) { + /* Two handles refer to an object */ + current_rbnode = fold_rbtree_alloc_node(pool, current_obj, 2); + if (!current_rbnode) { + if (current_zspage != fc->src_zspage) + migrate_write_unlock(current_zspage); + + return -ENOMEM; + } + + fold_rbtree_insert(&class->fold_rbtree, current_rbnode); + } else { + current_rbnode->cnt++; + } + + record_obj(fc->handle, current_obj); + + /* + * This check is necessary in order to avoid freeing an object + * that someone already refers to. This situation can occur when + * there are repeated calls to zs_fold(). For example: + * + * [handle0] [handle1] [handle2] [handle3] [handle4] + * [obj0] [obj1] [obj2] [obj3] [obj4] + * + * Let's imagine that obj2 and obj3 are equal, and we call zs_fold(): + * + * [handle0] [handle1] [handle2] [handle3] [handle4] + * [obj0] [obj1] [obj2] [obj2] [obj4] + * + * Now, handle2 and handle3 refer to the obj2 object. Time passes, + * and now handle0 refers to obj0_n, which is equal to obj2: + * + * [handle0] [handle1] [handle2] [handle3] [handle4] + * [obj0_n] [obj1] [obj2] [obj2] [obj4] + * + * If we call the zs_fold() function again, we come to handle2, + * and we understand that the obj2 and obj0_n hashes are the same. + * We can't just free obj2 because handle3 also refers to it already! + */ + fold_rbnode = fold_rbtree_search(&class->fold_rbtree, folded_obj); + if (fold_rbnode) { + fold_rbnode->cnt--; + + if (!fold_rbnode->cnt) { + rb_erase(&fold_rbnode->node, &class->fold_rbtree); + fold_object(class, folded_obj); + fold_rbtree_free_node(pool, fold_rbnode); + } + } else { + fold_object(class, folded_obj); + } + + if (current_zspage != fc->src_zspage) + migrate_write_unlock(current_zspage); + + return 0; + } else if (current_zspage != fc->src_zspage) { + migrate_write_unlock(current_zspage); + } + } + + /* We use GFP_ATOMIC because we under spin-lock */ + hash_node = kmem_cache_alloc(htable->cachep, GFP_ATOMIC); + if (!hash_node) + return -ENOMEM; + + hash_node->handle = fc->handle; + insert_htable(htable, hash_node, hash); + return 0; +} + +static unsigned long __zs_fold(struct zs_pool *pool, struct size_class *class) +{ + struct zspage *src_zspage, *tmp; + struct zs_fold_control fc; + struct hash_table htable; + struct page *s_page; + unsigned long pages_freed = 0, handle; + enum fullness_group fg, newfg; + size_t htable_size; + int obj_idx; + + htable_size = zs_stat_get(class, OBJ_USED); + if (!htable_size) + return 0; + + init_htable(&htable, htable_size); + + /* + * We can't allocate these buffers inside zs_cmp_obj_and_fold() + * because the function is under spinlock. In this case, we have + * to use kmalloc with GFP_ATOMIC, but allocations happen very often. + * Therefore, it is better to allocate these two buffers here. + */ + fc.current_buf = kmalloc(class->size, GFP_KERNEL); + if (!fc.current_buf) + return 0; + + fc.folded_buf = kmalloc(class->size, GFP_KERNEL); + if (!fc.folded_buf) { + kfree(fc.current_buf); + return 0; + } + + spin_lock(&pool->lock); + + for (fg = ZS_ALMOST_EMPTY; fg <= ZS_FULL; fg++) { + list_for_each_entry_safe(src_zspage, tmp, &class->fullness_list[fg], list) { + remove_zspage(class, src_zspage, fg); + migrate_write_lock(src_zspage); + + fc.src_zspage = src_zspage; + s_page = get_first_page(src_zspage); + obj_idx = 0; + + /* Iterate over all the objects on the zspage */ + while (1) { + handle = find_alloced_obj(class, s_page, &obj_idx); + if (!handle) { + s_page = get_next_page(s_page); + if (!s_page) + break; + + obj_idx = 0; + continue; + } + + fc.handle = handle; + /* + * Nothing bad will happen if we ignore the return code from + * zs_cmp_obj_and_fold(). Errors only occur in case of allocator + * failure. Furthermore, even if an error occurs, this function + * doesn't corrupt any data structures in any way. + */ + zs_cmp_obj_and_fold(pool, class, &htable, &fc); + obj_idx++; + } + + newfg = putback_zspage(class, src_zspage); + migrate_write_unlock(src_zspage); + if (newfg == ZS_EMPTY) { + free_zspage(pool, class, src_zspage); + pages_freed += class->pages_per_zspage; + } + } + } + + spin_unlock(&pool->lock); + free_htable(&htable); + kfree(fc.current_buf); + kfree(fc.folded_buf); + + return pages_freed; +} + +unsigned long zs_fold(struct zs_pool *pool) +{ + int i; + struct size_class *class; + unsigned long pages_freed = 0; + + for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { + class = pool->size_class[i]; + if (class->index != i) + continue; + + pages_freed += __zs_fold(pool, class); + } + atomic_long_add(pages_freed, &pool->stats.pages_folded); + + return pages_freed; +} +EXPORT_SYMBOL_GPL(zs_fold); +#endif /* CONFIG_ZSMALLOC_FOLD */ + void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats) { memcpy(stats, &pool->stats, sizeof(struct zs_pool_stats)); @@ -2496,6 +2949,9 @@ struct zs_pool *zs_create_pool(const char *name) class->index = i; class->pages_per_zspage = pages_per_zspage; class->objs_per_zspage = objs_per_zspage; +#ifdef CONFIG_ZSMALLOC_FOLD + class->fold_rbtree = RB_ROOT; +#endif pool->size_class[i] = class; for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS; fullness++) From patchwork Tue Apr 18 06:25:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Romanov X-Patchwork-Id: 13215072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12659C77B7C for ; Tue, 18 Apr 2023 06:25:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 241AF8E0006; Tue, 18 Apr 2023 02:25:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A1958E0002; Tue, 18 Apr 2023 02:25:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE77B8E0006; Tue, 18 Apr 2023 02:25:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D9CD38E0002 for ; Tue, 18 Apr 2023 02:25:19 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A91C7C0139 for ; Tue, 18 Apr 2023 06:25:19 +0000 (UTC) X-FDA: 80693524758.25.C43A94B Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by imf07.hostedemail.com (Postfix) with ESMTP id 684B240013 for ; Tue, 18 Apr 2023 06:25:17 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=sberdevices.ru header.s=mail header.b=Du1ijEM7; spf=pass (imf07.hostedemail.com: domain of AVRomanov@sberdevices.ru designates 45.89.227.171 as permitted sender) smtp.mailfrom=AVRomanov@sberdevices.ru; dmarc=pass (policy=quarantine) header.from=sberdevices.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681799117; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nfgD8CjAt52uLbsRtpG0YlGKoerIw5fnKcSwfy+g/JE=; b=eQ34sNMAZ7C3WvxCFbKHZ2cQh+T1M40LkFrIwertkF8mIchyClJaRajzA7jndBAZDZO6xj iPsrEbi0cXSVixnFhXJzKpfGdW8Y9CAvNqdWjAT4dN9SLNJcK6eNU1i39yswZfQXclwfcg +2osBcis5Z1lcjsRcHJpOje1YQMwCjE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=sberdevices.ru header.s=mail header.b=Du1ijEM7; spf=pass (imf07.hostedemail.com: domain of AVRomanov@sberdevices.ru designates 45.89.227.171 as permitted sender) smtp.mailfrom=AVRomanov@sberdevices.ru; dmarc=pass (policy=quarantine) header.from=sberdevices.ru ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681799117; a=rsa-sha256; cv=none; b=fkNQtfe7YBK+ZzMU9hncjmSZGE1lNhuKFpI9XjZJZZSSwd6cEzCmyEX6YsaDvDzL+vckt/ o/aei9EAZ89q0hbh5BzYfnInW+/dAoYdUx3VF/l4hYuZ8PReUPBZ+x/HP2yIb/PYufECqF Ll+kHXGBNpu+qhQGFedQUjVkkyJ9gEU= Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 266405FD5B; Tue, 18 Apr 2023 09:25:15 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1681799115; bh=nfgD8CjAt52uLbsRtpG0YlGKoerIw5fnKcSwfy+g/JE=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; b=Du1ijEM7jXPQlYLuCqUkwMDR9U6uHlm6GPfjO87RePxjdfiJ3HQcQMppmp9vOJMtF xxWMiFe5HXJ+lG5kzPeyi5gj8ZGjqGKoNtjfjzvqGESrMKLf2uhwaj4AN68AtkxZD+ NZvOVJx1Rwgnsc/pqJ9JXd9y55I3ivJkOHeOwp03gKMw6KteyWXpcX/UoVO5D7oDsu DSpS8ifEQvhW3Vg5pfxkgCMeCZ9ps3vSLX/fyIEbti8oxad6yNwA3Ag6Um9OOMLOfn 17xdFTei6ZJBk+cPmnXN/ZGWsmujHQwqIgJqlB+MRUmLSvo9dFHza6kce8z8LBEcOe BmtLDmjOOYRgQ== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mx.sberdevices.ru (Postfix) with ESMTP; Tue, 18 Apr 2023 09:25:15 +0300 (MSK) From: Alexey Romanov To: , , CC: , , , Alexey Romanov Subject: [RFC PATCH v1 4/5] zram: add fold sysfs knob Date: Tue, 18 Apr 2023 09:25:02 +0300 Message-ID: <20230418062503.62121-5-avromanov@sberdevices.ru> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230418062503.62121-1-avromanov@sberdevices.ru> References: <20230418062503.62121-1-avromanov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [172.16.1.6] X-ClientProxiedBy: S-MS-EXCH02.sberdevices.ru (172.16.1.5) To S-MS-EXCH01.sberdevices.ru (172.16.1.4) X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2023/04/18 02:02:00 #21122658 X-KSMG-AntiVirus-Status: Clean, skipped X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 684B240013 X-Stat-Signature: w9fcmip6deni4zsc4fwaok4f7n4jfjrj X-Rspam-User: X-HE-Tag: 1681799117-327807 X-HE-Meta: U2FsdGVkX19z+w7/hbPYKbfrz9BlmLY4/RNAsIA4+ETjCNZbWWq/HkxNPdmyNLmdIUlD29EeEBPEDkuT1NTarnFgaYIZZRn8OwmpsZ2eTUVpT3fyg34Rmh5ebnvEa5lWjpnpK78sr2jOHq3q/akL1cA1eRXUi/LeoRVmgJXh/J618llOteSnpI+ABin4gWzFpDMyjmSQAfoPys23LFric2ms9F/Y7J22pC7yNL4XmC2ZfY6VY1gumRJ07F663iPfb1W70OV7FhUhxq4St2NL40Pr16blgVI9+6DqycBWzswQ0SXa1GZ5+cY4cCJUPuhH80rSaxsCrwAC+Vf3MlARy5Aoti9kgufnWee57ORuBIBcDxp/KkJmt2BbXdqQh7SQWyQrCuMN+5dVtpVnThP+YKhdctcgqHl00vN2dlzpkfobW+Lr9OZLRpCksIT0VuF129DPI3RI6MuwbCoVbA+5Y1zv9a46o2dGmpxVAx3tU8vrUbYjfcp9OQxID/2KJR6hFv3qQOJ2O8q9hgLsRTfMxt1zIEtMjaJYqJIShY/aWTjS4MwD7/GrjSh3anQnhk8EliR2+kZxo9capDbRyL1RSlaU3NvZNfRhTs+bycTpPzRf1UFemtuSaTLkm/LMIJEfKrDWHfkglR/VlZ4MAroxnS0yTayQjKMD1oRAWyUPOKcH3dwMR3ezArSxDHAJfVAx8MmZGXw12/SeM6nmcANP6BKShsUNUOIbbdo5qRjYL38eV3LZmZ8C/kfq/hBImiMJ6Zs3Ub+E+fYYFIluIZEp0HjV/7DcUqZ59VuQ0gsT3UxtMvI0+IimpXAsnA8BCtUD6k/DSl6o8ojyxnzp7CqqU2vQa9FwPEobaa40GAlvUJcOPtuTrGeUBOYGZb/53hfxDnzszt8DItseIY7f4J9tzHlxfS/3IZxu6a5mlVZbfZT5QjT/NHtdWBpaeKDKtat7ShmdD3VDvEFLWlHVMks lqHQdZ9y +EseeaqeP3gLDfkaJ1KgjGYxvOvjl8wQ/3QfMghECE2swSsDOCHmdq8Juo48ZFEYMsGlcKJtoSy31aimhzjolFPcG+ZCzzdXlJTWwgUFwylVA05hadSM3l+ZVYZIBsjCYDlMfeRpPg65wdMF7hpKvSzfmllo+3ARMDia6BY5eechj/rErS5rrgFTojts2GS0+By5tRWqpzedo6cmOOxS5u3QO2CotcS1RjQFvyDxGbFROWD1FrYyzBYRTsiSRtuiW1l0BtfYjIeBjPh+BHKHNx+5lcGlfN8jlmfVKJJsWozOBzOHM5ZR8NhB13Ps5NMASrbjRLE+kXCIrBvVfIjTn8/cVhClCyyFEHIV5R/WKR7+ESTHZYrieZGVq8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow zram to fold identical zsmalloc objects into single one: echo 1 > /sys/block/zramX/fold Signed-off-by: Alexey Romanov --- drivers/block/zram/zram_drv.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e290d6d97047..06a614d1643d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1184,6 +1184,25 @@ static ssize_t compact_store(struct device *dev, return len; } +#ifdef CONFIG_ZSMALLOC_FOLD +static ssize_t fold_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + if (!init_done(zram)) { + up_read(&zram->init_lock); + return -EINVAL; + } + + zs_fold(zram->mem_pool); + up_read(&zram->init_lock); + + return len; +} +#endif + static ssize_t io_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2313,6 +2332,9 @@ static DEVICE_ATTR_RW(writeback_limit_enable); static DEVICE_ATTR_RW(recomp_algorithm); static DEVICE_ATTR_WO(recompress); #endif +#ifdef CONFIG_ZSMALLOC_FOLD +static DEVICE_ATTR_WO(fold); +#endif static struct attribute *zram_disk_attrs[] = { &dev_attr_disksize.attr, @@ -2339,6 +2361,9 @@ static struct attribute *zram_disk_attrs[] = { #ifdef CONFIG_ZRAM_MULTI_COMP &dev_attr_recomp_algorithm.attr, &dev_attr_recompress.attr, +#endif +#ifdef CONFIG_ZSMALLOC_FOLD + &dev_attr_fold.attr, #endif NULL, };