From patchwork Wed Oct 26 20:06:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13021243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4006ECDFA1 for ; Wed, 26 Oct 2022 20:06:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98A1A8E0008; Wed, 26 Oct 2022 16:06:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 912F18E0001; Wed, 26 Oct 2022 16:06:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67AF58E0008; Wed, 26 Oct 2022 16:06:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 55B9A8E0001 for ; Wed, 26 Oct 2022 16:06:22 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 20C13120F52 for ; Wed, 26 Oct 2022 20:06:22 +0000 (UTC) X-FDA: 80064182604.05.DBF7A7C Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf05.hostedemail.com (Postfix) with ESMTP id B0703100010 for ; Wed, 26 Oct 2022 20:06:21 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id 20so16024251pgc.5 for ; Wed, 26 Oct 2022 13:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7OqjbX0ITf4k24OuoKwYdR+7colwCLHKEor3QZ1c38U=; b=B8/Zw4l4FSDq4hTG54dSyA1WeuK0Dz1r5JD85LjlCAjLNCsu+MH0bnTd8jjEnsSulV s/CLh+005SrNFDNzbqQrDnnoyMd5btQIsGqmYFFvJZIEP77Px713mxhnG60gec1LVfQC UfCzlpF8ILFekFuXunWwwlFeMahnDOiL0xypcbekLCThO56VkxqMbFSsgf7WfBl1hjrm oGUTYDJOstz3KhAVJ04QqOEkGCovf3rWK9WO4/xoDCtwHFQ2Tn98YCQbYwx5GM2rbNt3 t5yM59yjjbw8dwfk8XY0jNDoWln3+eSnlD+pYGCde2RSC+iu8Ou/tUYxaQqH0Fgrt4EW QiZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7OqjbX0ITf4k24OuoKwYdR+7colwCLHKEor3QZ1c38U=; b=ZnIbypsirBp1NPXeDEUK+QViK2HvksQD1r7zeR/2sPgcUlJqAADMDv+ADPNbp2spU1 SAJifF4H5vC9+DxKdOiCUfpU3EOcbuMNLfVr+3loETTSwPpwJQUOyJQluSk3gJcCJj4S ufJbvI9yyBkY2pPqjQEvwieQfYSequqcr+IdQbGci40zxFNB8OIKWCxK3H5QLy04eULs Af/HtL//IDpelUPjCxGwENuTCHsnmxL6tkj0IOauNXU1ESzf1VTrylHBtJYmjM0tDvBx IAYmsJKOJ8bAed2gOaZGRYvCkvtxM7nndCLZhIXLvFNR2Jas5ec8ZSUT0iipo6x8Bids rl1Q== X-Gm-Message-State: ACrzQf1xb3uR7UWVgtNYkgOcxtFix9R310XbSMUva+O/8eJwsA8fhPvW wW2HT6Iq3Xs7G6G6mTrbIms= X-Google-Smtp-Source: AMsMyM5o3eP60Zv0hO1mVs3qgeH/LP3UM8DuGGCr+5DJWpv+Yu2EDQE4y69NFgmo6zXatp4eFr11QA== X-Received: by 2002:a05:6a00:10cf:b0:563:34ce:4138 with SMTP id d15-20020a056a0010cf00b0056334ce4138mr46545932pfu.6.1666814780588; Wed, 26 Oct 2022 13:06:20 -0700 (PDT) Received: from localhost (fwdproxy-prn-001.fbsv.net. [2a03:2880:ff:1::face:b00c]) by smtp.gmail.com with ESMTPSA id g31-20020a63111f000000b0043a18cef977sm3156403pgl.13.2022.10.26.13.06.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Oct 2022 13:06:20 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, senozhatsky@chromium.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com Subject: [PATCH 5/5] zsmalloc: Implement writeback mechanism for zsmalloc Date: Wed, 26 Oct 2022 13:06:13 -0700 Message-Id: <20221026200613.1031261-6-nphamcs@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221026200613.1031261-1-nphamcs@gmail.com> References: <20221026200613.1031261-1-nphamcs@gmail.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666814781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7OqjbX0ITf4k24OuoKwYdR+7colwCLHKEor3QZ1c38U=; b=lMqhSlZYrE8RmCnBx+SVDsbnvNHK2zImXWbg/6mR+Zg4nKIMQsZiR9uEE0c19gRiRNxPLt c5L/P+zC6NFOs8Io294xqtcgWpUpVBWsIioKjLGnpPz6iFOCXPVuiw0C4EQh1IKTkXP0Kn 8IcZAPZ0gkYCVELcNwVbboyAJKe6pXw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="B8/Zw4l4"; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666814781; a=rsa-sha256; cv=none; b=t/rRVWec343SLqud+IHmy9kwMqes8a4kBQgVog3LRjNMaYrq3fXOvYzGmsd1nWGmicjWnB X9Q7TN9aB811Rf3sybNis4g6mm2YsRlQY9bVcaeG7HEz9Zps5ZqHNxKnExe+sop6/NYlOf 724GoHg1XDwoO23ygo3YxFm2CLsI57g= X-Rspamd-Queue-Id: B0703100010 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="B8/Zw4l4"; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Stat-Signature: fg6sr1o4wow3agoa5o6z4mzzn34ndg96 X-HE-Tag: 1666814781-185258 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This commit adds the writeback mechanism for zsmalloc, analogous to the zbud allocator. Zsmalloc will attempt to determine the coldest zspage (i.e least recently used) in the pool, and attempt to write back all the stored compressed objects via the pool's evict handler. Signed-off-by: Nhat Pham --- mm/zsmalloc.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 177 insertions(+), 15 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 76ff2ed839d0..c79cbd3f46f3 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -279,10 +279,13 @@ struct zspage { /* links the zspage to the lru list in the pool */ struct list_head lru; + bool under_reclaim; + + /* list of unfreed handles whose objects have been reclaimed */ + unsigned long *deferred_handles; + struct zs_pool *pool; -#ifdef CONFIG_COMPACTION rwlock_t lock; -#endif }; struct mapping_area { @@ -303,10 +306,11 @@ static bool ZsHugePage(struct zspage *zspage) return zspage->huge; } -#ifdef CONFIG_COMPACTION static void migrate_lock_init(struct zspage *zspage); static void migrate_read_lock(struct zspage *zspage); static void migrate_read_unlock(struct zspage *zspage); + +#ifdef CONFIG_COMPACTION static void migrate_write_lock(struct zspage *zspage); static void migrate_write_lock_nested(struct zspage *zspage); static void migrate_write_unlock(struct zspage *zspage); @@ -314,9 +318,6 @@ static void kick_deferred_free(struct zs_pool *pool); static void init_deferred_free(struct zs_pool *pool); static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage); #else -static void migrate_lock_init(struct zspage *zspage) {} -static void migrate_read_lock(struct zspage *zspage) {} -static void migrate_read_unlock(struct zspage *zspage) {} static void migrate_write_lock(struct zspage *zspage) {} static void migrate_write_lock_nested(struct zspage *zspage) {} static void migrate_write_unlock(struct zspage *zspage) {} @@ -444,6 +445,27 @@ static void zs_zpool_free(void *pool, unsigned long handle) zs_free(pool, handle); } +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries); + +static int zs_zpool_shrink(void *pool, unsigned int pages, + unsigned int *reclaimed) +{ + unsigned int total = 0; + int ret = -EINVAL; + + while (total < pages) { + ret = zs_reclaim_page(pool, 8); + if (ret < 0) + break; + total++; + } + + if (reclaimed) + *reclaimed = total; + + return ret; +} + static void *zs_zpool_map(void *pool, unsigned long handle, enum zpool_mapmode mm) { @@ -482,6 +504,7 @@ static struct zpool_driver zs_zpool_driver = { .malloc_support_movable = true, .malloc = zs_zpool_malloc, .free = zs_zpool_free, + .shrink = zs_zpool_shrink, .map = zs_zpool_map, .unmap = zs_zpool_unmap, .total_size = zs_zpool_total_size, @@ -955,6 +978,21 @@ static int trylock_zspage(struct zspage *zspage) return 0; } +/* + * Free all the deferred handles whose objects are freed in zs_free. + */ +static void free_handles(struct zs_pool *pool, struct zspage *zspage) +{ + unsigned long handle = (unsigned long) zspage->deferred_handles; + + while (handle) { + unsigned long nxt_handle = handle_to_obj(handle); + + cache_free_handle(pool, handle); + handle = nxt_handle; + } +} + static void __free_zspage(struct zs_pool *pool, struct size_class *class, struct zspage *zspage) { @@ -969,6 +1007,9 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, VM_BUG_ON(get_zspage_inuse(zspage)); VM_BUG_ON(fg != ZS_EMPTY); + /* Free all deferred handles from zs_free */ + free_handles(pool, zspage); + next = page = get_first_page(zspage); do { VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -1051,6 +1092,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) } INIT_LIST_HEAD(&zspage->lru); + zspage->under_reclaim = false; + zspage->deferred_handles = NULL; set_freeobj(zspage, 0); } @@ -1472,11 +1515,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) fix_fullness_group(class, zspage); record_obj(handle, obj); class_stat_inc(class, OBJ_USED, 1); - /* Move the zspage to front of pool's LRU */ - move_to_front(pool, zspage); - spin_unlock(&pool->lock); - return handle; + goto out; } spin_unlock(&pool->lock); @@ -1500,6 +1540,8 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); + +out: /* Move the zspage to front of pool's LRU */ move_to_front(pool, zspage); spin_unlock(&pool->lock); @@ -1557,12 +1599,24 @@ void zs_free(struct zs_pool *pool, unsigned long handle) obj_free(class->size, obj); class_stat_dec(class, OBJ_USED, 1); + + if (zspage->under_reclaim) { + /* + * Reclaim needs the handles during writeback. It'll free + * them along with the zspage when it's done with them. + * + * Record current deferred handle at the memory location + * whose address is given by handle. + */ + record_obj(handle, (unsigned long) zspage->deferred_handles); + zspage->deferred_handles = (unsigned long *) handle; + spin_unlock(&pool->lock); + return; + } fullness = fix_fullness_group(class, zspage); - if (fullness != ZS_EMPTY) - goto out; + if (fullness == ZS_EMPTY) + free_zspage(pool, class, zspage); - free_zspage(pool, class, zspage); -out: spin_unlock(&pool->lock); cache_free_handle(pool, handle); } @@ -1762,7 +1816,6 @@ static enum fullness_group putback_zspage(struct size_class *class, return fullness; } -#ifdef CONFIG_COMPACTION /* * To prevent zspage destroy during migration, zspage freeing should * hold locks of all pages in the zspage. @@ -1805,6 +1858,21 @@ static void lock_zspage(struct zspage *zspage) migrate_read_unlock(zspage); } +/* + * Unlocks all the pages of the zspage. + * + * pool->lock must be held before this function is called + * to prevent the underlying pages from migrating. + */ +static void unlock_zspage(struct zspage *zspage) +{ + struct page *page = get_first_page(zspage); + + do { + unlock_page(page); + } while ((page = get_next_page(page)) != NULL); +} + static void migrate_lock_init(struct zspage *zspage) { rwlock_init(&zspage->lock); @@ -1820,6 +1888,7 @@ static void migrate_read_unlock(struct zspage *zspage) __releases(&zspage->lock) read_unlock(&zspage->lock); } +#ifdef CONFIG_COMPACTION static void migrate_write_lock(struct zspage *zspage) { write_lock(&zspage->lock); @@ -2380,6 +2449,99 @@ void zs_destroy_pool(struct zs_pool *pool) } EXPORT_SYMBOL_GPL(zs_destroy_pool); +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries) +{ + int i, obj_idx, ret = 0; + unsigned long handle; + struct zspage *zspage; + struct page *page; + enum fullness_group fullness; + + /* Lock LRU and fullness list */ + spin_lock(&pool->lock); + if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || + retries == 0) { + spin_unlock(&pool->lock); + return -EINVAL; + } + + for (i = 0; i < retries; i++) { + struct size_class *class; + + zspage = list_last_entry(&pool->lru, struct zspage, lru); + list_del(&zspage->lru); + + /* zs_free may free objects, but not the zspage and handles */ + zspage->under_reclaim = true; + + /* Lock backing pages into place */ + lock_zspage(zspage); + + class = zspage_class(pool, zspage); + fullness = get_fullness_group(class, zspage); + + /* Lock out object allocations and object compaction */ + remove_zspage(class, zspage, fullness); + + spin_unlock(&pool->lock); + + obj_idx = 0; + page = zspage->first_page; + while (1) { + handle = find_alloced_obj(class, page, &obj_idx); + if (!handle) { + page = get_next_page(page); + if (!page) + break; + obj_idx = 0; + continue; + } + + /* + * This will write the object and call + * zs_free. + * + * zs_free will free the object, but the + * under_reclaim flag prevents it from freeing + * the zspage altogether. This is necessary so + * that we can continue working with the + * zspage potentially after the last object + * has been freed. + */ + ret = pool->ops->evict(pool, handle); + if (ret) + goto next; + + obj_idx++; + } + +next: + /* For freeing the zspage, or putting it back in the pool and LRU list. */ + spin_lock(&pool->lock); + zspage->under_reclaim = false; + + if (!get_zspage_inuse(zspage)) { + /* + * Fullness went stale as zs_free() won't touch it + * while the page is removed from the pool. Fix it + * up for the check in __free_zspage(). + */ + zspage->fullness = ZS_EMPTY; + + __free_zspage(pool, class, zspage); + spin_unlock(&pool->lock); + return 0; + } + + putback_zspage(class, zspage); + list_add(&zspage->lru, &pool->lru); + unlock_zspage(zspage); + } + + spin_unlock(&pool->lock); + return -EAGAIN; +} + static int __init zs_init(void) { int ret;