From patchwork Thu Feb 27 04:35:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13993731 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F877C021BE for ; Thu, 27 Feb 2025 04:36:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB2C0280002; Wed, 26 Feb 2025 23:36:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3B456B0089; Wed, 26 Feb 2025 23:36:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DBE0280002; Wed, 26 Feb 2025 23:36:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6C6726B0088 for ; Wed, 26 Feb 2025 23:36:40 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E21EE8020A for ; Thu, 27 Feb 2025 04:36:39 +0000 (UTC) X-FDA: 83164463718.02.F512662 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf15.hostedemail.com (Postfix) with ESMTP id 0AA19A0005 for ; Thu, 27 Feb 2025 04:36:37 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=MgVbIZXy; spf=pass (imf15.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.45 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740630998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NR62rHb3pebtfvQHNCZtoXwLqkDdn/SbiRV6/pPxB7w=; b=ArixSChD3mwzHNKDVkvdfDPktLtWO8dADqNdUa2cOuUFaqSl20ImF95ArYrajQ+tF8Hgv+ bzwJzUsj6EYNRLbeYgNXSwTdojUsRSmLXq21SjQh57Kqo7DqX+tO+kDxUEolG/z7OThLfC tCMlRQL2u+IBjmJ74i4HH2W/U+WiLHU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=MgVbIZXy; spf=pass (imf15.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.45 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740630998; a=rsa-sha256; cv=none; b=0HEKM4mLbeEjx25eYT/gXFIJCOUkzAzbnQmDD5Cwepymf/13DHGQ/oTEw3gKjvlSfOu9qq +MvsmxvI13MiVPRESBzK8Vjfo07xQWw64w97QI6TG7pO31TtvEh1IKPaeXKaF776Z3Lerh yFvkbUFh/cUaMpMWA2vBlDgNuxWPn5E= Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2f9b9c0088fso994432a91.0 for ; Wed, 26 Feb 2025 20:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1740630997; x=1741235797; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NR62rHb3pebtfvQHNCZtoXwLqkDdn/SbiRV6/pPxB7w=; b=MgVbIZXyD//nwHpZGG9t866jhlXbKW18U8SBcwx0ymN+COMOf+yNNYPMilxpEMu7sC 5I5Merz/0JZCTrQryy35X+/2YGIAm2Z+rPoNPfN1jo8VfOD/pQle8O1NUR9y2lVddDzl ifANB/u7LnGUWjjcrzu/kUP1X2kDyuNnHCvyA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740630997; x=1741235797; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NR62rHb3pebtfvQHNCZtoXwLqkDdn/SbiRV6/pPxB7w=; b=lbLKbCm41d8BnmIuyxMRZMGQxuMY0NXgFRMjkTkVj5gHpK2rnZhCUwPYZErz61gj9U cat1AizsxaktYlt8IPKtz0oY/h8c6a16bCa4fYTZ4UceAte76nBzV8vBNe6/IFuh3F+Z IxgbyY+NabRiIKZ071nbML+LAWzWWgWJNi7G3rDRx284x1XrdYsmsSgGqTsyLXkvC57b bmv/P2Zq193Q/U/oPpfEoWpuAMPw8FXmDJ3bna02McdtMKN/LU7Is1mw+lzXEqwRMkOY YYCgjKNnvhyc5uzR7zLfBunndMqyBdfdshhE6Xa0d5Z5q5qziO5utNPeqkz1529e0cWt eAaQ== X-Forwarded-Encrypted: i=1; AJvYcCXrprRd+vEaRQxysDR+d8MkuGMEGvP89tK7M5zTAeuPtTWU2GqUFreAziyKzoxf0hx4faLg8jc46g==@kvack.org X-Gm-Message-State: AOJu0YxrfHop26unL7eziecKV7+3jhpZwlW4Pa69ZLh+IqcRmOXexjgU afKr52409yxAt77cwbk9qNBvGI/GUHB5+BVlg60uyV1M/Pgdff8Lj/fpFm0tmg== X-Gm-Gg: ASbGncuRppdr2MYvmQMi6ThMsOB177+GDXfVHVuuO6hbYL/QlkCgKO9eOTMtTuKyXr/ zWnUzDHykBdTUw4tPk1o9Xl7EhRzTT0z42Y5EPAPf2uawHkz0r7VVMr8IqXXHhPPB+FIC6dTwfD SOkvMB3p5xE11yxTt3U5rGerui/1G0LmOHXYmQRHCBAECyzIzJl+3H9LcxzFx37C4cQNeS+eFXB j2YSc+bilPvWfjGeMFWqpKbbQtg3Fd7PzDYyO5DNKmovBIcYNtp+7muRytjPdSCQ0DoterEuPWy jjKvRK29VhYB0ouAUytonPsNufBG X-Google-Smtp-Source: AGHT+IH/UF6t7IzFfea6GJp3N/IG0r3W9xlvoHcoiOhurhjeGRrnZ4qGrFjng8ecS9ungkxHCpZQfA== X-Received: by 2002:a17:90b:184e:b0:2ee:c30f:33c9 with SMTP id 98e67ed59e1d1-2fea12f4accmr3176743a91.14.1740630996801; Wed, 26 Feb 2025 20:36:36 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:a9c0:1bc1:74e3:3e31]) by smtp.gmail.com with UTF8SMTPSA id 98e67ed59e1d1-2fea6752060sm548358a91.3.2025.02.26.20.36.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Feb 2025 20:36:36 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Sebastian Andrzej Siewior , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v9 01/19] zram: sleepable entry locking Date: Thu, 27 Feb 2025 13:35:19 +0900 Message-ID: <20250227043618.88380-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.658.g4767266eb4-goog In-Reply-To: <20250227043618.88380-1-senozhatsky@chromium.org> References: <20250227043618.88380-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0AA19A0005 X-Stat-Signature: n7116jcbzj3ow1k9uf9egiirmnquwr85 X-HE-Tag: 1740630997-415374 X-HE-Meta: U2FsdGVkX1+Ph/V8ezQqclvXHBM4PFngH6hNGhvZI5k1rUCdmRRATO9gy047ywtKGL+hUsbMFtcbHyIxGIyKxida2Oymhtr9B6UrPSWAgW3Wp70UUMb6Hfu6dIxvKFXyPCEj1ltEt9hLyCFBRN9wQxMY57WonFEy/TAxond087BnHt2M2SrDOuilk0r8eGvDaC9TCBELdt95fsbl4oE7dPbnwUzzBecXKd3/7iNXOnga5lTrnuYONo6Biv0uFnc5TRSR8SK90yL7bzPqr6zEpQX/a3K3l7Y6MH31bSzK4RGSiB5raoQP5uIntkd3+j9lpjbnSf0ETX5Pk3Xv767Nbn00Dc7Tm473Hfpd0uFygA+UowzpWe1jG71mjd9DG1aHtq4VSEwLNW4oy7gjc+pMjDsNA1kclyluBF9cTzXNkO9zI9BkTa71w3QzHmyf4grfBIOglGhE33HTKsT+QtB7wFtwqGz2nQqYshudMt/G29nEkF1IiiBvD/6GUPRvE5Qx61Qsx+D+om95EvsNIHahamF0xsqyqU4pEEsCmDUph6wqFEPviblhi+8N6mZW7hyGIPJ4Pyjb32fO3A4u96tP+xq7fcUL4rZBAdTdGNh32L8YBlLZzrNDCn8lpOLiNhWlNf8+3vCp+q3pKGvPFv3R1GZyx7S1spnhDwh+DzbaHtnd1UWcfnvP2ooH34pezfzV291dtbl3hU0emcxmD3Q3g6uEnO1ZIcuPWI6t9kyQl9dsGdZp9rT5ofLHUiGJTpknYatw/js/M+z0IleVasihLOphDPjZMrQnUunZyhXnl0RwE5FZAy2FX38ftADvCrMJHNFB6ytzQMGjgMRFNZ+SIh3Tc6Rvgw6R4bQeuxZ3goH+5Ng9HvFL468yQIzzDsdJ2HY8TfuQYvUHYkqRiwHs5C9wjMx0eIDrKKxsbcmKY0booD3ugC0IVrTi6OdmxHLuOQeg6h5tFkZcWsJgf25 xtKLp7F2 +4UrKq3eSYDE/ZioYYkxoKYuFESBoxVGNyyIjdurwZcd4MsBLoIP2kJwwWkJ0lGaQevIqUQSJziF0kJpU7kO6lG7feYP9+ryrhSmQMqgeaUKqoJxUDdKVEkszPPHkKP267mgHy5cqF19xOn7sCfVQm10/94mhtpL44mQ5x+q7DX4kl/3k++8mhQB4r66xR53OhHdGlH83QTVQd3+HnThg1G3Op8ongIpnAB224CgFX4oW5Fi8F7pNrFEOEf1Rx0rHgA/IuVRO3zrChMX3YSVqY+GICwUjNeBhm28eOYavAsrWWgQZGsirlvKnpw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Having a per-entry mutex (or, for instance, a rw-semaphore) significantly increases sizeof() of each entry and hence the meta table. Therefore entry locking returns back to bit locking, as before, however, this time also preempt-rt friendly, because if waits-on-bit instead of spinning-on-bit. Lock owners are also now permitted to schedule, which is a first step on the path of making zram non-atomic. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 56 ++++++++++++++++++++++++++++------- drivers/block/zram/zram_drv.h | 16 ++++++---- 2 files changed, 55 insertions(+), 17 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..ddf03f6cbeed 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,56 @@ static void zram_free_page(struct zram *zram, size_t index); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); -static int zram_slot_trylock(struct zram *zram, u32 index) +#define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map) + +static void zram_slot_lock_init(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); + lockdep_init_map(slot_dep_map(zram, index), + "zram->table[index].lock", + &zram->lock_class, 0); +} + +/* + * entry locking rules: + * + * 1) Lock is exclusive + * + * 2) lock() function can sleep waiting for the lock + * + * 3) Lock owner can sleep + * + * 4) Use TRY lock variant when in atomic context + * - must check return value and handle locking failers + */ +static __must_check bool zram_slot_trylock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { + mutex_acquire(slot_dep_map(zram, index), 0, 1, _RET_IP_); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); + return true; + } + + lock_contended(slot_dep_map(zram, index), _RET_IP_); + return false; } static void zram_slot_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_acquire(slot_dep_map(zram, index), 0, 0, _RET_IP_); + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); } static void zram_slot_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_release(slot_dep_map(zram, index), _RET_IP_); + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); } static inline bool init_done(struct zram *zram) @@ -93,7 +130,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) zram->table[index].handle = handle; } -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -1473,15 +1509,11 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) huge_class_size = zs_huge_class_size(zram->mem_pool); for (index = 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + zram_slot_lock_init(zram, index); + return true; } -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; @@ -2625,6 +2657,7 @@ static int zram_add(void) if (ret) goto out_cleanup_disk; + lockdep_register_key(&zram->lock_class); zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; @@ -2653,6 +2686,7 @@ static int zram_remove(struct zram *zram) zram->claim = true; mutex_unlock(&zram->disk->open_mutex); + lockdep_unregister_key(&zram->lock_class); zram_debugfs_unregister(zram); if (claimed) { diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..8a7d52fbab4d 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -28,7 +28,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -46,6 +45,7 @@ /* Flags for zram pages (table[page_no].flags) */ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ + ZRAM_ENTRY_LOCK, /* entry access lock bit */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ @@ -58,16 +58,19 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; -/*-- Data structures */ - -/* Allocated for each disk page */ +/* + * Allocated for each disk page. We use bit-lock (ZRAM_ENTRY_LOCK bit + * of flags) to save memory. There can be plenty of entries and standard + * locking primitives (e.g. mutex) will significantly increase sizeof() + * of each entry and hence of the meta table. + */ struct zram_table_entry { unsigned long handle; - unsigned int flags; - spinlock_t lock; + unsigned long flags; #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif + struct lockdep_map dep_map; }; struct zram_stats { @@ -137,5 +140,6 @@ struct zram { struct dentry *debugfs_dir; #endif atomic_t pp_in_progress; + struct lock_class_key lock_class; }; #endif