From patchwork Fri Feb 14 04:50:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13974462 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AF20C02198 for ; Fri, 14 Feb 2025 04:52:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3EF56B007B; Thu, 13 Feb 2025 23:52:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BEF096B0082; Thu, 13 Feb 2025 23:52:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADE126B0083; Thu, 13 Feb 2025 23:52:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 919296B007B for ; Thu, 13 Feb 2025 23:52:23 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 139BB1A05C9 for ; Fri, 14 Feb 2025 04:52:23 +0000 (UTC) X-FDA: 83117328966.27.B5D6D09 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf16.hostedemail.com (Postfix) with ESMTP id 26E4F180003 for ; Fri, 14 Feb 2025 04:52:20 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=m0ToOXBP; spf=pass (imf16.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.179 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739508741; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=gmOJP3Z7wquDbzKxXwoXG/JeDZB5cqsEGjVZKnE331U=; b=m0xPinv7KtZBeIGoxjNLgqG82GNOQsfwGTJI1sakdaVFdmvpGXr2q3Hoil4ELFWMW9q32t WUVoygOjGctbmskzLO6Uh9E7TxvCX06RpFuTLJQMBA/I2vK4mYv88j126RNgMv2XjVucxy K/vHnrkdUXtg6Ya0mG9NOLPb31+WD5Q= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=m0ToOXBP; spf=pass (imf16.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.179 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739508741; a=rsa-sha256; cv=none; b=WpR7PUveX7egikiDbojczkDy9he1dx0vH497PeCrJM3DmjQjY6McCAGP9f2QoZJ/BWYSmH YaVk7WzjGiru0C/jmooz8PXs4ypgZmNm5kWxXRKMX54QZlNfxmFqUjfcNXQfIUdgIj3R3+ 0S4lOiDCMpsYcE6uAuPKmxbsXYynilE= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-220c92c857aso25940335ad.0 for ; Thu, 13 Feb 2025 20:52:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1739508740; x=1740113540; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=gmOJP3Z7wquDbzKxXwoXG/JeDZB5cqsEGjVZKnE331U=; b=m0ToOXBPHbY/l9bNNLi5XRtxH+MEZNI7MuAB4ziAlxFfTJFclSTE6sacX6ZWZaZk2Q deArWCQXt7CvxRFDzjZZYOb2yB4r2CIdgWfm3SVY7hgoQ5Zwh3iH9zNqDGX9QRMQ77tf o7xp5kiW4tP8c+uooFJnk4R1o8kyskEwjjDvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739508740; x=1740113540; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gmOJP3Z7wquDbzKxXwoXG/JeDZB5cqsEGjVZKnE331U=; b=pXcVqz4mDb4cUEZcZtTOJvJNCT21qwpNap2uH3/foH/YoJ8LJjbcg5QIjUv0jgvoiI c6hRkZDuYiTFEOdn1AfKanxwpehRo4DqmaVbhXdMGg2iSIxrOYzRE6SvkJPWqJx8T4Ge WtZW+GCiT/63a8EeyfyuWt5Evs5Z3SmRgVjRBm1a+V9lg8x05uW/0DbLAgfauljeanQK nA1lPniegn7F7rtBJfZJMSEixx4f6zPSYLPf6oWCUod9N3Qw+J4qjThxpkoq1dvan6RD OnjEq/6pV3HH4J4JDIxS1EHH75iMTcZAx7qynJ0jQc2NLrtv9I/i8PaPCQEMgVv5qoP6 SWFg== X-Forwarded-Encrypted: i=1; AJvYcCXgkRksqrwJEKJTfdwCfXmlc3SCyHHmBfWpB/ugD1HF/0pIAZMfcJI4Nlh614QP9m58u/o1hJVXpw==@kvack.org X-Gm-Message-State: AOJu0YzCqK3JIkWzxFveInCPxzhYeuERx0R0fcDld+05MiYLK+WqcuV8 aP2LGXHbfu3bZytyIXUkNWbil1+OEHrWYQYRv3qqgpMlozg/XpPsLcZkH2WIxA== X-Gm-Gg: ASbGncuY0K422I6wirDpDQjmkykHFuP43VDaw9/INp/TVsM0qmpagHlRVI+kD3b5yRI nEZnAPjka9qIUAswQIwgj2DIV3TWvqcO80Ae2jgIngqL2RXTSGwEAtmLWaAm0uKCONXbDj9chiA un3DWrNgDQuv7p1bqG7maI4zbml7Q9yHuOuvThI7Oof7l1GHgy0cP6MGpHEVXapuiutKiL35ktQ GO+d1kw/TNa4ntaYWdp3r7OdE/mF+KbS0c6q6ameC14cVM44pPwHsQ6M3qH6lMopXlgigMgNB5J WU9Ybl7vLOxfI+v/sQ== X-Google-Smtp-Source: AGHT+IFTeCBcOuIPmVP7cIqQibR68G7t2XsSlQ82GITWrysZsnQxcJE2dRGlmLlPNz7jtHCdZI2SEQ== X-Received: by 2002:a17:903:98f:b0:21a:7e04:7021 with SMTP id d9443c01a7336-220d3763091mr84781225ad.24.1739508740037; Thu, 13 Feb 2025 20:52:20 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:942d:9291:22aa:8126]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-220d5364669sm20890135ad.86.2025.02.13.20.52.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 Feb 2025 20:52:19 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v6 00/17] zsmalloc/zram: there be preemption Date: Fri, 14 Feb 2025 13:50:12 +0900 Message-ID: <20250214045208.1388854-1-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 26E4F180003 X-Stat-Signature: 63zyzb9nkks6ikuqntams1eiubp31jus X-Rspam-User: X-HE-Tag: 1739508740-70514 X-HE-Meta: U2FsdGVkX19dnM/le1RB2+5uQgfaUCUD01PlCf8kOp37ylbXGkFRgKqlhp6lv/BYYA4pl0XQK92ikuFNuWemaNg141Xc/N1umLaPxYhsEIHrQ/fdCxvRCwG+4s3bhzkXYG372YnzmU9p2WS/6KW8P+ACiD8IM51mgvr9glWLBR0loy43Jd+oAefi3gZ7N4ZEzEi0slpqLGysPg+Ueq517YuIQm+YtHZIyWJCSicrweCWIRPfqfpXzltLTPAW6NcELG8T4Y63Th2bFuwZrCUVO2MoDK527GSR+u/NoFB/vXQYOAwnnrQxqEoUM3CGCNB+nDf6J9EulCLc3t50/1Pu4iZf71g5oXqVfQu2bqqI3TsP9UtKbOUAiYb9TnYlqD+w91JNWfyd647NlMry3ieP/B8hSn4vLkAHQGimtOOlT66/w42HS5BLk4v2TQC7X/m+3McH6brfKZX99v3mfzfb3VQBAMsqPAZ6jEtZ3fkZx898mVs/8apfUzuHt7/21+Kwmv1FrQiOc45PoXlfdKe91G1uaDxyU1qfJFAS7pWWD1Gytnmck/ngAbDYlk9bRAD9RTfImg+v3o6KYbevkp2JEJPbDu4v3zwUcVLFsS9X+CVXtJILRIL3omCa1Y+2T48HdDOjlsrVHr9dp91CdkLDriVidzpYrpn9dUHIyilLAoGPxhk8Xi7LG9oGKj0RSNwRv3pwBYhw3I3KPbOtu33/8X5zu496IXbsvC9KZxBJv4MW4nT3e/FZMLfnXwCLZbZV9csvEg60QdJsFsV4hOi5Sr4uXy7EH/fsDsNTWfjuraYfa2wKL1eWN5bPXWYmkkGBKu9sAB+ChSWh0DTKKjmzard4FgxdrLkAV0jXzZVDlNsc6mkJLCeGcv0/0mX4JmKXBiK+AqdzxNiDyOyCZYDBsi6ZB9Pp1tYmBipBpgRSgfxqsvNjZk7PKSbq2u0MJyWWeOMQCj/f5N2ANPjIlSR mR8QWnnk 3vqSJ7wHvHGqu6wTfxbcTxHCoevb0MvVWYKkkcox0+RJJP1Yx48eK/R7BZ+ADHJ/lMu1EzD5ZzTbwK0rWjGC/dUqM3hyCSp1nC+yH6/M8OZJrBNyEsNwkIQwBUrpRZ56Ml2g2Man0Rk9VnsgLt08yCruqICjEOhaBL94pU76UGPQzvLBNLn2EkDKtFxZVgFABohXOlG3C7d0a31rodCwe+lIVofV624geNhFMcroRCnFIFXwrybQj3IEWfE4IdNxvsVLAQ9fqriGuSbuCYkRkl2MxTam+38ohJ9cyTb5gTjE5CUkdgNt9qoaXuBTZbddf1r6P X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently zram runs compression and decompression in non-preemptible sections, e.g. zcomp_stream_get() // grabs CPU local lock zcomp_compress() or zram_slot_lock() // grabs entry spin-lock zcomp_stream_get() // grabs CPU local lock zs_map_object() // grabs rwlock and CPU local lock zcomp_decompress() Potentially a little troublesome for a number of reasons. For instance, this makes it impossible to use async compression algorithms or/and H/W compression algorithms, which can wait for OP completion or resource availability. This also restricts what compression algorithms can do internally, for example, zstd can allocate internal state memory for C/D dictionaries: do_fsync() do_writepages() zram_bio_write() zram_write_page() // become non-preemptible zcomp_compress() zstd_compress() ZSTD_compress_usingCDict() ZSTD_compressBegin_usingCDict_internal() ZSTD_resetCCtx_usingCDict() ZSTD_resetCCtx_internal() zstd_custom_alloc() // memory allocation Not to mention that the system can be configured to maximize compression ratio at a cost of CPU/HW time (e.g. lz4hc or deflate with very high compression level) so zram can stay in non-preemptible section (even under spin-lock or/and rwlock) for an extended period of time. Aside from compression algorithms, this also restricts what zram can do. One particular example is zram_write_page() zsmalloc handle allocation, which has an optimistic allocation (disallowing direct reclaim) and a pessimistic fallback path, which then forces zram to compress the page one more time. This series changes zram to not directly impose atomicity restrictions on compression algorithms (and on itself), which makes zram write() fully preemptible; zram read(), sadly, is not always preemptible yet. There are still indirect atomicity restrictions imposed by zsmalloc(). One notable example is object mapping API, which returns with: a) local CPU lock held b) zspage rwlock held First, zsmalloc's zspage lock is converted from rwlock to a special type of RW-lookalike look with some extra guarantees/features. Second, a new handle mapping is introduced which doesn't use per-CPU buffers (and hence no local CPU lock), does fewer memcpy() calls, but requires users to provide a pointer to temp buffer for object copy-in (when needed). Third, zram is converted to the new zsmalloc mapping API and thus zram read() becomes preemptible. v5 - > v6 - new zspage lock implementation, based on a spin-lock (Hillf) - added CONFIG_LOCK_STAT support to zram entry lock and zspage lock - tweaked lockdep names of zram entry lock and zspage lock - factored out lockdep-enabled zram entry lock and zspage lock functions into separate helpers to avoid numerous #ifdef-s in the code (Andrew) - updated zspage lock rules (Yosry) - moved comp stream mutex initialisation out of cpu-up handler to close cpu-dead/stream-get/cpu-up race window (Yosry) - dropped patches that factored out zspool and size-class locking (Yosry) - rewrote commit messages for some patches (Yosry) Sergey Senozhatsky (17): zram: sleepable entry locking zram: permit preemption with active compression stream zram: remove unused crypto include zram: remove max_comp_streams device attr zram: remove two-staged handle allocation zram: remove writestall zram_stats member zram: limit max recompress prio to num_active_comps zram: filter out recomp targets based on priority zram: rework recompression loop zsmalloc: rename pool lock zsmalloc: make zspage lock preemptible zsmalloc: introduce new object mapping API zram: switch to new zsmalloc object mapping API zram: permit reclaim in zstd custom allocator zram: do not leak page on recompress_store error path zram: do not leak page on writeback_store error path zram: add might_sleep to zcomp API Documentation/ABI/testing/sysfs-block-zram | 8 - Documentation/admin-guide/blockdev/zram.rst | 36 +- drivers/block/zram/backend_zstd.c | 11 +- drivers/block/zram/zcomp.c | 48 ++- drivers/block/zram/zcomp.h | 8 +- drivers/block/zram/zram_drv.c | 326 ++++++++------- drivers/block/zram/zram_drv.h | 22 +- include/linux/zsmalloc.h | 8 + mm/zsmalloc.c | 413 ++++++++++++++++---- 9 files changed, 592 insertions(+), 288 deletions(-) --- 2.48.1.601.g30ceb7b040-goog