From patchwork Mon Mar 3 02:03:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13998051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20399C19F32 for ; Mon, 3 Mar 2025 02:24:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A20D86B0085; Sun, 2 Mar 2025 21:24:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B4096B0088; Sun, 2 Mar 2025 21:24:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FC9B6B0089; Sun, 2 Mar 2025 21:24:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5A8056B0085 for ; Sun, 2 Mar 2025 21:24:44 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 130461409DC for ; Mon, 3 Mar 2025 02:24:44 +0000 (UTC) X-FDA: 83178646488.07.AAD93ED Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf17.hostedemail.com (Postfix) with ESMTP id 338004000D for ; Mon, 3 Mar 2025 02:24:41 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=hTAuFOMD; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.44 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740968682; a=rsa-sha256; cv=none; b=mtdkZ28xx5tB1BnTCQ837uHaB3ZD5k7r31kb/8C2shBixuX6c5hvq77WZjo9ZZNDzx6wqg 0wOlHzl8YefHNyMoQMl+mfYoONRS8LuRRCgbvZZUn44un/zwkmg+rQ8zsmeStztvgxgmRz KSeTH75NtgIZ0b1fLzF9cJXJyIQ4w1E= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=hTAuFOMD; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf17.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.44 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740968682; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=24epfTRD5B0/Gn0qvdHVkfmT+fNP4MZPLg9AakHcZPk=; b=imbbkwMkcHywXht96KoGZDbckcN9BEdjSVRDcUKMD1jX9za1V07hxowwLV1ntoEhJPTkAh +86r+4v3UBJmHbwZGAe1msmkexQMJdprjRf7Sro+CER5IM1QVtu5yuXJVEtqEia7zNDKD4 w1RxbEJFKWUe5bPyXJ3foatqzQsJ1zA= Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2fea795bafeso5767527a91.1 for ; Sun, 02 Mar 2025 18:24:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1740968681; x=1741573481; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=24epfTRD5B0/Gn0qvdHVkfmT+fNP4MZPLg9AakHcZPk=; b=hTAuFOMDMWWWvpmnQzAENYPzazK9+cx+oTOmlnqhy385kfWx3OHUxhzvqemWE1gDbq ly0KOGnKYYm4roJX7EWULy+/9Yd/TP8QF/UQ66rrVWjxLL1vaxzpfsj2DrTfx15zZxBI sGtuCNc17z6fPDoDaius282/P0T8HEBvHcpes= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740968681; x=1741573481; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=24epfTRD5B0/Gn0qvdHVkfmT+fNP4MZPLg9AakHcZPk=; b=gpxVp/mhhv/kDWjoNaBHSeEayUFVcCC/0xsLeMOTfpdNtiSjSWK13k/QPFobTlK2/C WjbOlXKNS0Zm3woKMDLpbfh30SgJuUd2HVQJkHG2VUGgQsdF19ImdtMlNNlaFtdCbPki 1TRji2OaiN/r6MJgosezZQxDdSCyQWyYt2DgxizSEseh1jQf4KjiJiyxkKKw9qIJDkaR Qlm6+jkTkAqNUOiaW1HlzwdthETNfkwx7I/GIc9kUyig+kaZndkWHbSssos3lcF4e7ys MXXtv7ZTB+WxtqEAz8DuzSUFxR1TotAF8tuvA1DbCoc8zNZfuT+Ie1SB+3ZmlqCgYPuP O7Mg== X-Forwarded-Encrypted: i=1; AJvYcCWzF+fyk7U6dQ9PUz3jj9llH0wP6q70auda5ZI5N/sZbLpP6U7/UlDytInYNlB0SGT79xhXuiws+A==@kvack.org X-Gm-Message-State: AOJu0Yw3rsuQDzvYGe6A5VwV58aKYtvREX1rXEv4AOdAYedV2vSWUdzT JayPevte1ol6e4Gvz831tK+dFhRrPMY7mbzj/uGStG5jORVBYQq1XVp7gEkwuw== X-Gm-Gg: ASbGnctrvNghHYyCOF302NbMm3tDy8ACsxD+tFgBS0S9pw18KgybTT0J84vwUXiSRW1 66oCs4Cimq9sGD4lgmCK6YHp2JqIXWiBIMJB7YSoI9y3xtKBY8DtsMUp7kneHtZTKcUngilLOHb dAVAVFVESzzWbWw6okyYeN6Uy9vIxcsDrUlQ6ayN0xTTWVnUUtg94jSFjdwcSY/1oJTkmyFkr70 PjkFfh1oul5OPIpM6HLTG8UfarGZFtwStNZGvZyi5rMDaZeNdZohg3mPsOytMiesnjAQ+ijPoM2 Douq/xtGFzVhM+hGDGEZJoqRPcF2POecxu3xOlBAmvCd/ew= X-Google-Smtp-Source: AGHT+IEzPpFTdNA4qdxJB0dxJj4fKFXJpJ2zIhvE5aIYLQNZyq296Wz9RmH9FDhwIK3/6h7XuPIjIQ== X-Received: by 2002:a17:90b:3884:b0:2fa:2252:f436 with SMTP id 98e67ed59e1d1-2feba5ced0cmr17673056a91.3.1740968680936; Sun, 02 Mar 2025 18:24:40 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:1513:4f61:a4d3:b418]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-223501d2610sm66715445ad.14.2025.03.02.18.24.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 02 Mar 2025 18:24:40 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Sebastian Andrzej Siewior , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v10 01/19] zram: sleepable entry locking Date: Mon, 3 Mar 2025 11:03:10 +0900 Message-ID: <20250303022425.285971-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.711.g2feabab25a-goog In-Reply-To: <20250303022425.285971-1-senozhatsky@chromium.org> References: <20250303022425.285971-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 338004000D X-Stat-Signature: oxe5hw7odqb49tdndgh8u33kw6cxwkoj X-Rspam-User: X-HE-Tag: 1740968681-893116 X-HE-Meta: U2FsdGVkX19S3EUfDHWZs/CeH//V76+lQmV8ag7S0F+w3cZMR3/H03S6njZY+1s1PqFxaBxlqGSS5TcnyfDjna6i5VBYESLi9tKKzLDuYUXS29HORehB89bw8s0mE7puFCJaz1WOYRra1b1UQscDmzo+mbPGGzqqLw4xn614cm1CkEk+lyE3i+XG2eqb+gkJMePMSvCKDwPAtXTMxlohTQNrCecZ/5Ya1uN7chG8ZYTzdlKlp/Cl0KMZxVfRfuvglhJ1NRw4NteRQGI1sIA9gsrmW5BZWanqEBxVnc9lcdWfIDS/MenGtXscb3W4/aP0bMiuI1woYcUfxEexZ39ZcQolKgW630NjpKfHytX46tP14Q2h54llSpM+Kh+yUryD15x6LuYwYPPB+5bm+h2Fb55mM53GnIwZVS6/PY7cFWbqGQH0kwZqc76kFkqq/2iHwUcf5IEmER8LLZLN5HkRg82gzH6M88ZmB1LLPlcB7u4Haocq+cB+L+Gjoa0tipQVqg056IPlMmLwicj3UdnKY5kiKcvf/TJRIggSRYLZofBj+KGIVvV67v2Lc+7V3RjOVU0qDYsjZGw6KG9pI1XL6KdWGpsl4thD2cEk+QqJOUe1dPJFNI3Mqc5Z1z2Y4nDnCahUTmpakO3rhlMNt3fnSBuz7nMSHaJMWp1Eg+Xswaas/U7wElmxv55HBahKCqToT2PYplDw5TFfbZbp1tgWdUr0NjHCsid+q2EhQL+27TjQHaBOd4E3qAeYg78bcVTTUCD1jpnIR8D3TDcCLQwDiUMfXECUFGdceJPQWFzu17z9I+XVa/2NLwSpIwwh8PIa+F7mgFCrq7EP636P512IpfPIzfL+TQC0aCzu1eO5RHJjk8HhWaZlD0pMOo5yEghLToD7CM7GW68zDYBPveZWcMkF4Cs6OChZB1g4HJ0xIhKhqS5GBDDWeXgQSokc/bjO2sWPGTyyxiQWG5kmY9g ZIy87d2F LTIKF9m0aok3L+qSXwKD7xtv7Jl2pQrSibqxG2emG5GZDP8fnfdsIssi9814J1d+Xb3wDt+ydQrNMH+p9ZHukQ7U/yM1dRTvEARj6h2MTggTARL9gerr6J59oMP4dtoobUs95R0t7878iic/0HxSyvZFHIkma429/Rg2Vo5ivxBBUrBFrlndc+MUhnaoJnMP+mTXCXz9kcHoCiGcusV7s5+j48RUGmeEiAhCe19oX3DURgkZCnvxhYOTjpx75vuggKF+l90XMQDOfcjdF3GuBVAGmjic1oWYXyGEHD4mDk7h58T8EQk2NFgQv5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Having a per-entry mutex (or, for instance, a rw-semaphore) significantly increases sizeof() of each entry and hence the meta table. Therefore entry locking returns back to bit locking, as before, however, this time also preempt-rt friendly, because if waits-on-bit instead of spinning-on-bit. Lock owners are also now permitted to schedule, which is a first step on the path of making zram non-atomic. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 54 ++++++++++++++++++++++++++++------- drivers/block/zram/zram_drv.h | 15 ++++++---- 2 files changed, 52 insertions(+), 17 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..70599d41b828 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,56 @@ static void zram_free_page(struct zram *zram, size_t index); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); -static int zram_slot_trylock(struct zram *zram, u32 index) +#define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map) + +static void zram_slot_lock_init(struct zram *zram, u32 index) +{ + static struct lock_class_key __key; + + lockdep_init_map(slot_dep_map(zram, index), "zram->table[index].lock", + &__key, 0); +} + +/* + * entry locking rules: + * + * 1) Lock is exclusive + * + * 2) lock() function can sleep waiting for the lock + * + * 3) Lock owner can sleep + * + * 4) Use TRY lock variant when in atomic context + * - must check return value and handle locking failers + */ +static __must_check bool zram_slot_trylock(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { + mutex_acquire(slot_dep_map(zram, index), 0, 1, _RET_IP_); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); + return true; + } + + return false; } static void zram_slot_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_acquire(slot_dep_map(zram, index), 0, 0, _RET_IP_); + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); } static void zram_slot_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_release(slot_dep_map(zram, index), _RET_IP_); + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); } static inline bool init_done(struct zram *zram) @@ -93,7 +130,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) zram->table[index].handle = handle; } -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -1473,15 +1509,11 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) huge_class_size = zs_huge_class_size(zram->mem_pool); for (index = 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + zram_slot_lock_init(zram, index); + return true; } -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..c804f78a7fa8 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -28,7 +28,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -46,6 +45,7 @@ /* Flags for zram pages (table[page_no].flags) */ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ + ZRAM_ENTRY_LOCK, /* entry access lock bit */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ @@ -58,16 +58,19 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; -/*-- Data structures */ - -/* Allocated for each disk page */ +/* + * Allocated for each disk page. We use bit-lock (ZRAM_ENTRY_LOCK bit + * of flags) to save memory. There can be plenty of entries and standard + * locking primitives (e.g. mutex) will significantly increase sizeof() + * of each entry and hence of the meta table. + */ struct zram_table_entry { unsigned long handle; - unsigned int flags; - spinlock_t lock; + unsigned long flags; #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif + struct lockdep_map dep_map; }; struct zram_stats {