From patchwork Mon Feb 6 09:25:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13129547 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D30D2C636D4 for ; Mon, 6 Feb 2023 09:26:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 752496B0073; Mon, 6 Feb 2023 04:26:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 702516B0074; Mon, 6 Feb 2023 04:26:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A5346B0075; Mon, 6 Feb 2023 04:26:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 474FE6B0073 for ; Mon, 6 Feb 2023 04:26:17 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2075DAB124 for ; Mon, 6 Feb 2023 09:26:17 +0000 (UTC) X-FDA: 80436335994.14.BEA5C0B Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf04.hostedemail.com (Postfix) with ESMTP id 4FB0840023 for ; Mon, 6 Feb 2023 09:26:15 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=esixuwoW; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf04.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.175 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675675575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6EYzqDYVAUxUcl5H7L0+RBCt9aPnrJa6Jn52QssbLoc=; b=YtSl0xy0BXMfSWVgWca/Qjy8/CcmDamC4w7ZD0U2brhrzDbjT+U/Pth8XCzc56gNo5BEbj WlRUub3N0qpAjAYGs8Z54ioPxJHTCNMp0tSAXDoNIwcexFD3SMNZ5nHbCVdsvtvv3yU0Py z0tzZkFDvIHFPUz7fzd3sTJ+efX1Hww= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=esixuwoW; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf04.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.175 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675675575; a=rsa-sha256; cv=none; b=G3F70qdlAidKT2GYqJ3e2qbXiQ2brYJGs92L3bsFiPbP6u9SHHY9RYE6Mf63UWqvxJwgEA 1RldNXRklqmlKrC2ig/jDUoiauPOTiEc2gC+kI1Q2Obdj14ePa04g3YUrV20EBTdxOhL92 Y2Vd1/9vLcmn8BiSv23AzuKo8SESdXs= Received: by mail-pl1-f175.google.com with SMTP id iy2so1346986plb.11 for ; Mon, 06 Feb 2023 01:26:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6EYzqDYVAUxUcl5H7L0+RBCt9aPnrJa6Jn52QssbLoc=; b=esixuwoW1XXvVaRSq/qeOoODQJsPCnwslpXsBr3oCxI2fa6miNpo/FU3qkmcM/DARO WFg52ztwqqeZFRoFvpPRhE8Nr3izanJ0gAkXqXQuZeRAGgnqBFQpM3XENYcwKFYJW5ev J2YKroA6WQzb4iBi3qeaWXRsDCQKVhFsZCEps= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6EYzqDYVAUxUcl5H7L0+RBCt9aPnrJa6Jn52QssbLoc=; b=S47LVqejvoyVSKnEfPYOR+CkVwW9X2HNFowq6pZ400VJugEDntS8VKvGUOdii5A+w6 N0lTi1sRqbUen99UaXbTlwwQa3ABuI6oBN3EL5gbv7gKw6KEni+e1wgXvwF+9TJ2mJKd Ibm789gL2iB4IV3Cba0xtRwK5jMcreKc5qZsZz+jl4OvEzUwG30XcwdG2xSwMXOpuoCP I9poqomHiNKyWoZjQrW4K204326prCwzCkZ/OJQPRIEyhuR5WRKolYklt6RPXGJGJQr+ SOEIV4RCIp7EHGvHmN+h0FcZG4DW/HDbGqL2/poWO3ZUjDs4nLx5s64V713l76BXHemJ 9Xwg== X-Gm-Message-State: AO0yUKUvuSOe63RgfALk/lBiKE+YxFe5Ey4Tvs6+iMwF+CwIb4YyaUT8 bM7dRDdLToMqgvkfuNF6vIIVY42urogB55q6 X-Google-Smtp-Source: AK7set82/lVo4oVJUvptLjyzdH+UDUdDHjKosk+NiYryQgYiSP/6kn1HcleZQIM1e6R8tYxqNtU+VA== X-Received: by 2002:a17:90b:3b4d:b0:22c:6bb1:55a4 with SMTP id ot13-20020a17090b3b4d00b0022c6bb155a4mr20256388pjb.45.1675675574750; Mon, 06 Feb 2023 01:26:14 -0800 (PST) Received: from tigerii.tok.corp.google.com ([2401:fa00:8f:203:4a1b:4fdb:174d:8f36]) by smtp.gmail.com with ESMTPSA id ga23-20020a17090b039700b00228f45d589fsm5663008pjb.29.2023.02.06.01.26.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 01:26:14 -0800 (PST) From: Sergey Senozhatsky To: Minchan Kim , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Sergey Senozhatsky Subject: [PATCH 1/2] zsmalloc: remove insert_zspage() ->inuse optimization Date: Mon, 6 Feb 2023 18:25:58 +0900 Message-Id: <20230206092559.2722946-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog In-Reply-To: <20230206092559.2722946-1-senozhatsky@chromium.org> References: <20230206092559.2722946-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4FB0840023 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: oxhu3joec5e8maq7h4roppgg8g8is1oz X-HE-Tag: 1675675575-961717 X-HE-Meta: U2FsdGVkX18t/n4SuZNmNLkKJNtu6o0CezBQJOZC8I3CH8cM5Mjg0gRUxiwMLG8eRdUntW63gwjseUGX6ppbbC3mYKh+C4bmfTfrHXiSDVTFY0OplSG4cgaZKSAv236nm2lvBn4YKKT49/+85JrVWrMosNnzUnn+YErQOZfsYoT6tmrbS2pKR4RemYwI01JkZ/KfjRv6okg1c98PvkpZvIOltiNebWtMYtJi3Mjx7rttvmqXCj2wM5jJWH8eh7j22yV5a9gg8vAVTTOF7I7ehYFVjHgRyxDWYozs9rxL5eK71Fz+D4KqWJzfqStwFU8VSqe3NhurCs75csybNUmkKPXlg2O77fDXzY5A3JEjee68cHkSVvF0QhBcuWjquhnPhgzG9ezB91YJkSc2RqiEWlgmE4JV13pOad1Wu3IqXqMHiDrDbamA4GOpIyhRgHdkDPFAiKnoB9BYlT/ZintU709oCOFtSX3u1PGyeLe58HhHEiD6t79riK87Qo/AC5HfPL5xYjOT7d+4+14UmVNgg481Jav4vGH4vZw7iEvug5iMXi7NpIf35SH94CXpOxM/NEcGWqdiPDgdM/X8kzfQaMpOQQDwCYWx+zPRWwlb1FdHv49h7vRTqId5aEtCUafQhl2+lDIH+3DFyvf7NvSVc86fvjGPrSP39sF6R2wEJVMZ2WJUIMjPzGqHalLQn9EJoTN9SZzjwdwJlYyUaFm0ycLw6QghT6cJDqEworAoGdOyjdIMfHN02vRPWzz4nQpBsNdgbtgQrClP8UDhSGnJOXZvyDvWgLGs8beYmEuLOrCKKSo6/wnzKlv2y0xS7riTY72livaJNqqia4RClIJz0c1A3dMM6cxapEHTgn59lBPPxkquDuwalMe79w31u+Ov4UEzACAa4MOZqJvznl8GowQJjHLfFhbeUJGpRdmZVPD6OfhOaJpQYpIjk5DKDKgIVhX6xNDlR8sHIHj38Np Kka/1aWa 1UUhADw3PfbwY73YUF1ggDr7dSUA6NNyzG3ggFXflv9JKSv0jKY2Dy+DiBah0RB/iMMHGlvxo1STPD6NA0BAEvOuTYG+7ouwp34kNvRDvEtEzNouY4rD26feFLrxUqNdcSSKuD442Kf2v/qRUoXyfLP8AoXCoiZeWk2IuFgi/tkrDGrAnGkkcRyfX8bViZeMdnwj7uJb8l2FUuj2F7g/Cwc+3qKYHlwVz3pKjt2GZOKzzk8ePotlUeAGRWE0/8Ugb8jM4wekRd9ztntNHXueR5cTf3+r3iXx+XDjUrNzpX2tCP/6SP78Cir70oknhMBrPzF+f2WY7ALB8rkAQyF8uPAawv3Ouyqxxrb/bIBj+JYXyGWdazy/mW106dY4Ck9/nAXQmfxaFLNZUtTSCi/tpuhdSqgJXCFc9nwAxrgMvzEa5F4vJhuvilk4vTzglyOM4HfRhnoEs1woJOOVchMNsvNRxq7Ryo1R9SFW29s50RaOa3ws= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This optimization has no effect. It only ensures that when a page was added to its corresponding fullness list, its "inuse" counter was higher or lower than the "inuse" counter of the page at the head of the list. The intention was to keep busy pages at the head, so they could be filled up and moved to the ZS_FULL fullness group more quickly. However, this doesn't work as the "inuse" counter of a page can be modified by obj_free() but the page may still belong to the same fullness list. So, fix_fullness_group() won't change the page's position in relation to the head's "inuse" counter, leading to a largely random order of pages within the fullness list. For instance, consider a printout of the "inuse" counters of the first 10 pages in a class that holds 93 objects per zspage: ZS_ALMOST_EMPTY: 36 67 68 64 35 54 63 52 As we can see the page with the lowest "inuse" counter is actually the head of the fullness list. Signed-off-by: Sergey Senozhatsky --- mm/zsmalloc.c | 29 ++++++++--------------------- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 3aed46ab7e6c..b57a89ed6f30 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -753,37 +753,24 @@ static enum fullness_group get_fullness_group(struct size_class *class, } /* - * Each size class maintains various freelists and zspages are assigned - * to one of these freelists based on the number of live objects they - * have. This functions inserts the given zspage into the freelist - * identified by . + * This function adds the given zspage to the fullness list identified + * by . */ static void insert_zspage(struct size_class *class, - struct zspage *zspage, - enum fullness_group fullness) + struct zspage *zspage, + enum fullness_group fullness) { - struct zspage *head; - class_stat_inc(class, fullness, 1); - head = list_first_entry_or_null(&class->fullness_list[fullness], - struct zspage, list); - /* - * We want to see more ZS_FULL pages and less almost empty/full. - * Put pages with higher ->inuse first. - */ - if (head && get_zspage_inuse(zspage) < get_zspage_inuse(head)) - list_add(&zspage->list, &head->list); - else - list_add(&zspage->list, &class->fullness_list[fullness]); + list_add(&zspage->list, &class->fullness_list[fullness]); } /* - * This function removes the given zspage from the freelist identified + * This function removes the given zspage from the fullness list identified * by . */ static void remove_zspage(struct size_class *class, - struct zspage *zspage, - enum fullness_group fullness) + struct zspage *zspage, + enum fullness_group fullness) { VM_BUG_ON(list_empty(&class->fullness_list[fullness])); From patchwork Mon Feb 6 09:25:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13129548 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2517C05027 for ; Mon, 6 Feb 2023 09:26:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 508446B0074; Mon, 6 Feb 2023 04:26:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B8406B0075; Mon, 6 Feb 2023 04:26:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 332666B0078; Mon, 6 Feb 2023 04:26:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2021F6B0074 for ; Mon, 6 Feb 2023 04:26:24 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D8E3840C07 for ; Mon, 6 Feb 2023 09:26:23 +0000 (UTC) X-FDA: 80436336246.30.C7A17C7 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf24.hostedemail.com (Postfix) with ESMTP id EADDA180015 for ; Mon, 6 Feb 2023 09:26:21 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=nTtOM5CE; spf=pass (imf24.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.47 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675675582; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3xLnK7T2lHXde5NWh4VnFPeWW46Bpgtp3qXtHq6NXBw=; b=X6UVbLDdGsMHH0Yvvb5zpwAR39duGoPPMhSzdWC0BjnSghGkalNQ/NGJ48gDMByj+vV0CN HbElzzTrZ+PxYYGEv2r29tu1RQzocoGOIxs05MjGCwLJL9V4hxQm2nS81TyWyP5PEyRFiN Qz+qcD3w+H0J9/LDSZ7oCl2BxpklLkA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=nTtOM5CE; spf=pass (imf24.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.47 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675675582; a=rsa-sha256; cv=none; b=DH/CEOvEKqksffqm5GW5wZ/1WydTl/ybC7iApNcvwAa558aVO+DoFf0oKjIY992aambZkJ 6ks+j0vS81XOnRXyXhzUgf1ZW1cA8SlSjUHcW1AGxQshtq8jfaw9Fu7zWA3aoxvtW/scXK rF+fPFJKHQ1sji4S/pyRjl4XG/HjzBU= Received: by mail-pj1-f47.google.com with SMTP id on9-20020a17090b1d0900b002300a96b358so10662578pjb.1 for ; Mon, 06 Feb 2023 01:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3xLnK7T2lHXde5NWh4VnFPeWW46Bpgtp3qXtHq6NXBw=; b=nTtOM5CEDob4JeO6QbxqPC5kTsbeKMa2t6e1WS83k0iYrMZ6AFzDt/QnXry1chTxTn iWCD0s/XQe8P5I9Sw8G2VzJx/J1zcMzTCNKF/gD6iqSBSpkP5n9d97qa1Kqs3MmFFdS4 pW06zyHz/3HjjT4o73GacZdUt0ZGuh+CqZ55k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3xLnK7T2lHXde5NWh4VnFPeWW46Bpgtp3qXtHq6NXBw=; b=QKwwF6TK2KoivZ79PvSxZ4agyX9WRiGPx1lyGeg0YsM5UaYQWsoeFclCE2kaTDX1Wl 5WJNfG4EQsi07RZghMvEfwUCgW1vsWkiAw9SYwkqMK603qAk+NGLkqf3BQWWIA7xspFk 3a4ijCfet1ZgkKuoy7IaZz0AjQsruKqbF0QYN0tOpcO6ptr6cQHjnNCbGs8zJeXVbWN/ dS8JEM8mWuSMzMbT5QN4ZXqi82d5e/imuULFNDPRDIt4R0G4mkS0uONPhYugmbYfpbKU NYO7lNYqDtV6WluQfHZa1U83IaadvN/hGg+K9AfH07GLmqEEqxN6h2qTjWmdCG+5x4+d o9WQ== X-Gm-Message-State: AO0yUKVj+lrZCYEJzenwlsJW3SCZWxs/EkUOJOlp2ZXTiKklrihJRSNs QLa9CcemOzSyEgIjMS2Uv8f90g== X-Google-Smtp-Source: AK7set+zkxrvZJ+69RIf9i5vkYpq9yvXomxlx5J/k3evAXhabKcxow0GiNXi+H7zD/SHhH2Yy7GXkA== X-Received: by 2002:a17:90b:17d0:b0:230:1acb:191f with SMTP id me16-20020a17090b17d000b002301acb191fmr21108074pjb.32.1675675580621; Mon, 06 Feb 2023 01:26:20 -0800 (PST) Received: from tigerii.tok.corp.google.com ([2401:fa00:8f:203:4a1b:4fdb:174d:8f36]) by smtp.gmail.com with ESMTPSA id ga23-20020a17090b039700b00228f45d589fsm5663008pjb.29.2023.02.06.01.26.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 01:26:20 -0800 (PST) From: Sergey Senozhatsky To: Minchan Kim , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Sergey Senozhatsky Subject: [PATCH 2/2] zsmalloc: fine-grained inuse ratio based fullness grouping Date: Mon, 6 Feb 2023 18:25:59 +0900 Message-Id: <20230206092559.2722946-3-senozhatsky@chromium.org> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog In-Reply-To: <20230206092559.2722946-1-senozhatsky@chromium.org> References: <20230206092559.2722946-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Stat-Signature: zrticzqu48n68nnizt3d4wqmx8sih9ii X-Rspam-User: X-Rspamd-Queue-Id: EADDA180015 X-Rspamd-Server: rspam06 X-HE-Tag: 1675675581-590540 X-HE-Meta: U2FsdGVkX19G/JPn55q7Y4x1LjZtPUmnuPzqMafTCimJ+i17CPt7B9kFbL+MliMP4wcLWX/zheyFGIgUeqqlVy7oTOICkW6wGgfKQnHWAJtQSDAE/q2SwKxXYg+OY2TvS99DUVEASw1RKtM1qV6pXVuTJmjf/euDuKuDXChgp646LgMmLy9jn0mJMTJsupgOrlJRERIzki7mFR+RLtF8A3FFupKRqgwkCsvgaaUb5moKI9ksvYNnX6MIBQxhe+q+d/hilLoL7cfjleskv+QJmQzBNWaZUwgXmBBSYFP1qqJie46I7KkSiieKabuB7yd5KmDr++DXHRmXvCGuu0plnU/DE0A6mr3b65fejvMnNDVupmGUqYdHoGOH7+8d37/CAjbLKuDq3n3j868uWd288xnVYofgw8dP9OZWsmFD2FX6S+t2N2kxtu6rhYXiSiRnpAaeYdpvOvx2qedEVR2fgrcxprs9wlM+PH6PSGYyU0WbbofxXocgXdbUI9q4VvZ4wpJQinB+PaYBZy6/tCcCGBNU22l3vTEDUZRyBKQfZP7zN+nnz/qz7QQ5rxE44tCsuhSBV1yDbMeEUDj7DHP2lEG82UJX+zdbsmxG2Z+Wh57c/DdU1vvYPlOZ8Uj/6ONZUs2LGXnJDuhPDl+ktIkeYnw5fPBCTHuZCYO8s17EYhHgxO7H3jkd0d9nuKrhxQ3pEkb41dHSMqKFm4a0VScnRrOTJpjwJBT+QzmGUBks4z23W055yzhvWS+N3AfbIT0iP4OI82zx5qDQDp5WNx6JPBGXWpZY1BRsLPGZPNEGnL8gSTpbL7u3N2aUP4m54d3hkX/14yBWjTtSSOO0r5rBiM3VTQE8TN1ViBt8rYyiXtkEHagMc+VPjDbk4iwp7k2ezFt1RdaZJFsZDlIJnD1fldAYxIWPdgCg29l0b/Wcap0quFznG/QUR54WNKDCS/+y73+JfXPPuuDZSQG0JLZ 0Zj6HodH tTEfSj+1NOHHXLDy0Gh/p7y5sGj51tbMdYVRiinStaGs8WkpNCOod2DRq94cfG3tPwvVP0aL+1+hGaku5yaVkXjEp7JjU2M2Tn0QS5dQil4SSMthC7ANgpNYZ1N983DF5NQBv+46Jwro8evNH5G86zoKZpI22LyD4AtxjsQ5DRM25y0KEpaKkbR24RckGKS035qjVgSO/Ek7LVkh7+ETd/c03Mfc3chYXr+JEjN/3NtwEKTEJHYkKF5h8Psz67NV58WXAV0kRzq9eljP/jeQ29Eqeli1+WEIpw6Huqb5Z/ATKXR8u8O6Tc7RDXpvF3mWZcQaVF3+DkpRGbQ+m80rT1KloO1TtitVeP5KqjdWlcm4hyKhL8Ft+5CgcWNfZgxAWiSbyI1wJM5rEW5l0EPsVJ04HZs7dm2aHdpt1PZChhC2MBLVrGV/2p/F74umWklYEtZGB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Each zspage maintains ->inuse counter which keeps track of the number of objects stored in the page. The ->inuse counter also determines the page's "fullness group" which is calculated as the ratio of the "inuse" objects to the total number of objects the page can hold (objs_per_zspage). The closer the ->inuse counter is to objs_per_zspage, the better. Each size class maintains several fullness lists, that keep track of zspages of particular "fullness". There are four lists at the moment: ZS_EMPTY for pages with zero "inuse" counter ZS_FULL for pages with "inuse" equal to objs_per_zspage ZS_ALMOST_EMPTY for pages with "inuse" less than or equal to 3 * objs_per_zspage / 4 ZS_ALMOST_FULL for pages with "inuse" greater than 3 * objs_per_zspage / 4. First or all, this makes ZS_ALMOST_EMPTY fullness list pretty busy for certain size classes. For example, the ZS_ALMOST_EMPTY list for class-112 (which can store 256 objects per zspage) will contain pages with ->inuse counters in the range from 1 to 192. Second, pages within each fullness list are stored in random order with regard to the ->inuse counter. This is because sorting the pages by ->inuse counter each time obj_malloc() or obj_free() is called would be too expensive. However, the ->inuse counter is still a crucial factor in many situations. In a busy system with many obj_malloc() and obj_free() calls, fullness lists become inefficient. For instance, the ->inuse counters for the first 7 zspages of some random classes are: class-1840 objs_per_zspage 20: ZS_ALMOST_EMPTY: 3 13 8 2 11 14 3 ZS_ALMOST_FULL : empty class-688 objs_per_zspage 59: ZS_ALMOST_EMPTY: 1 3 5 1 18 13 15 ZS_ALMOST_FULL : empty For the two major zsmalloc operations, zs_malloc() and zs_compact(), we typically select the head page from the corresponding fullness list as the best candidate page. However, this assumption is not always accurate. For the zs_malloc() operation, the optimal candidate page should have the highest ->inuse counter. This is because the goal is to maximize the number of ZS_FULL pages and make full use of all allocated memory. For the zs_compact() operation, the optimal candidate page should have the lowest ->inuse counter. This is because compaction needs to move objects in use to another page before it can release the zspage and return its physical pages to the buddy allocator. The fewer objects in use, the quicker compaction can release the page. Additionally, compaction is measured by the number of pages it releases. For example, assume the following case: - size class stores 8 objects per zspage - ALMOST_FULL list contains one page that has ->inuse equal to 6 - ALMOST_EMPTY list contains 3 pages: one pages has ->inuse equal to 2, and two pages have ->inuse equal to 1. The current compaction algorithm selects the head page of the ALMOST_EMPTY list (the source page), which has inuse equals 2, moves its objects to the ALMOST_FULL list page (the destination page), and then releases the source page. The ALMOST_FULL page (destination page) becomes FULL, so further compaction is not possible. At the same time, if compaction were to choose ALMOST_EMPTY pages with ->inuse equal to 1, it would be able to release two zspages while still performing the same number of memcpy() operations. This patch reworks the fullness grouping mechanism. Instead of relying on a threshold that results in too many pages being included in the ALMOST_EMPTY group for specific classes, size classes maintain a larger number of fullness lists that give strict guarantees on the minimum and maximum ->inuse values within each group. Each group represents a 10% change in the ->inuse ratio compared to neighboring groups. In essence, there are groups for pages with 0%, 10%, 20% usage ratios, and so on, up to 100%. This enhances the selection of candidate pages for both zs_malloc() and zs_compact(). A printout of the ->inuse counters of the first 7 pages per (random) class fullness group: class-768 objs_per_zspage 16: fullness 100%: empty fullness 99%: empty fullness 90%: empty fullness 80%: empty fullness 70%: empty fullness 60%: 8 8 9 9 8 8 8 fullness 50%: empty fullness 40%: 5 5 6 5 5 5 5 fullness 30%: 4 4 4 4 4 4 4 fullness 20%: 2 3 2 3 3 2 2 fullness 10%: 1 1 1 1 1 1 1 fullness 0%: empty The zs_malloc() function searches through the groups of pages starting with the one having the highest usage ratio. This means that it always selects a page from the group with the least internal fragmentation (highest usage ratio) and makes it even less fragmented by increasing its usage ratio. The zs_compact() function, on the other hand, begins by scanning the group with the highest fragmentation (lowest usage ratio) to locate the source page. The first available page is selected, and then the function moves downward to find a destination page in the group with the lowest internal fragmentation (highest usage ratio). The example demonstrates that zs_malloc() would choose a page with ->inuse of 8 as the candidate page, while zs_compact() would pick a page with ->inuse of 1 as the source page and another page with ->inuse of 8 as the destination page. A 1/10 difference in ratio between fullness groups is intentional and critical for classes that have a high number of objs_per_zspage. For instance, class-624 stores 59 objects per zspage. With a 1/10 ratio grouping, the difference in inuse values between the page with the lowest and highest inuse in a single fullness group is only 4 objects (2469 bytes), whereas a 1/5 ratio grouping would result in a difference of 10 objects (6240 bytes). Signed-off-by: Sergey Senozhatsky --- mm/zsmalloc.c | 224 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 148 insertions(+), 76 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index b57a89ed6f30..1901edd01e38 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -127,7 +127,7 @@ #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) #define HUGE_BITS 1 -#define FULLNESS_BITS 2 +#define FULLNESS_BITS 4 #define CLASS_BITS 8 #define ISOLATED_BITS 5 #define MAGIC_VAL_BITS 8 @@ -159,24 +159,88 @@ #define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \ ZS_SIZE_CLASS_DELTA) + 1) +/* + * Pages are distinguished by the ratio of used memory (that is the ratio + * of ->inuse objects to all objects that page can store). For example, + * USAGE_30 means that the ratio of used objects is > 20% and <= 30%. + * + * The number of fullness groups is not random. It allows us to keep + * diffeence between the least busy page in the group (minimum permitted + * number of ->inuse objects) and the most busy page (maximum permitted + * number of ->inuse objects) at a reasonable value. + */ enum fullness_group { - ZS_EMPTY, - ZS_ALMOST_EMPTY, - ZS_ALMOST_FULL, - ZS_FULL, + ZS_USAGE_0, + ZS_USAGE_10, + ZS_USAGE_20, + ZS_USAGE_30, + ZS_USAGE_40, + ZS_USAGE_50, + ZS_USAGE_60, + ZS_USAGE_70, + ZS_USAGE_80, + ZS_USAGE_90, + ZS_USAGE_99, + ZS_USAGE_100, NR_ZS_FULLNESS, }; enum class_stat_type { - CLASS_EMPTY, - CLASS_ALMOST_EMPTY, - CLASS_ALMOST_FULL, - CLASS_FULL, + CLASS_USAGE_0, + CLASS_USAGE_10, + CLASS_USAGE_20, + CLASS_USAGE_30, + CLASS_USAGE_40, + CLASS_USAGE_50, + CLASS_USAGE_60, + CLASS_USAGE_70, + CLASS_USAGE_80, + CLASS_USAGE_90, + CLASS_USAGE_99, + CLASS_USAGE_100, OBJ_ALLOCATED, OBJ_USED, NR_ZS_STAT_TYPE, }; +#define NUM_FULLNESS_GROUPS 10 + +/* + * Lookup pages in increasing (from lowest to highest) order of usage ratio. + * This is useful, for instance, during compaction, when we want to migrate + * as few objects as possible in order to free zspage. + */ +static const enum fullness_group fullness_asc[NUM_FULLNESS_GROUPS] = { + ZS_USAGE_10, + ZS_USAGE_20, + ZS_USAGE_30, + ZS_USAGE_40, + ZS_USAGE_50, + ZS_USAGE_60, + ZS_USAGE_70, + ZS_USAGE_80, + ZS_USAGE_90, + ZS_USAGE_99 +}; + +/* + * Lookup pages in decreasing (from highest to lowest) order of usage ratio. + * This is useful in zs_malloc() and compaction, when we want to have as + * many full pages as possible for more efficient memory usage. + */ +static const enum fullness_group fullness_desc[NUM_FULLNESS_GROUPS] = { + ZS_USAGE_99, + ZS_USAGE_90, + ZS_USAGE_80, + ZS_USAGE_70, + ZS_USAGE_60, + ZS_USAGE_50, + ZS_USAGE_40, + ZS_USAGE_30, + ZS_USAGE_20, + ZS_USAGE_10, +}; + struct zs_size_stat { unsigned long objs[NR_ZS_STAT_TYPE]; }; @@ -185,21 +249,6 @@ struct zs_size_stat { static struct dentry *zs_stat_root; #endif -/* - * We assign a page to ZS_ALMOST_EMPTY fullness group when: - * n <= N / f, where - * n = number of allocated objects - * N = total number of objects zspage can store - * f = fullness_threshold_frac - * - * Similarly, we assign zspage to: - * ZS_ALMOST_FULL when n > N / f - * ZS_EMPTY when n == 0 - * ZS_FULL when n == N - * - * (see: fix_fullness_group()) - */ -static const int fullness_threshold_frac = 4; static size_t huge_class_size; struct size_class { @@ -652,8 +701,23 @@ static int zs_stats_size_show(struct seq_file *s, void *v) continue; spin_lock(&pool->lock); - class_almost_full = zs_stat_get(class, CLASS_ALMOST_FULL); - class_almost_empty = zs_stat_get(class, CLASS_ALMOST_EMPTY); + + /* + * Replecate old behaviour for almost_full and almost_empty + * stats. + */ + class_almost_full = zs_stat_get(class, CLASS_USAGE_99); + class_almost_full += zs_stat_get(class, CLASS_USAGE_90); + class_almost_full += zs_stat_get(class, CLASS_USAGE_80); + class_almost_full += zs_stat_get(class, CLASS_USAGE_70); + + class_almost_empty = zs_stat_get(class, CLASS_USAGE_60); + class_almost_empty += zs_stat_get(class, CLASS_USAGE_50); + class_almost_empty += zs_stat_get(class, CLASS_USAGE_40); + class_almost_empty += zs_stat_get(class, CLASS_USAGE_30); + class_almost_empty += zs_stat_get(class, CLASS_USAGE_20); + class_almost_empty += zs_stat_get(class, CLASS_USAGE_10); + obj_allocated = zs_stat_get(class, OBJ_ALLOCATED); obj_used = zs_stat_get(class, OBJ_USED); freeable = zs_can_compact(class); @@ -723,33 +787,39 @@ static inline void zs_pool_stat_destroy(struct zs_pool *pool) } #endif - /* * For each size class, zspages are divided into different groups - * depending on how "full" they are. This was done so that we could - * easily find empty or nearly empty zspages when we try to shrink - * the pool (not yet implemented). This function returns fullness + * depending on their usage ratio. This function returns fullness * status of the given page. */ static enum fullness_group get_fullness_group(struct size_class *class, - struct zspage *zspage) -{ + struct zspage *zspage) +{ + static const enum fullness_group groups[] = { + ZS_USAGE_10, + ZS_USAGE_20, + ZS_USAGE_30, + ZS_USAGE_40, + ZS_USAGE_50, + ZS_USAGE_60, + ZS_USAGE_70, + ZS_USAGE_80, + ZS_USAGE_90, + ZS_USAGE_99, + }; int inuse, objs_per_zspage; - enum fullness_group fg; + int ratio; inuse = get_zspage_inuse(zspage); objs_per_zspage = class->objs_per_zspage; if (inuse == 0) - fg = ZS_EMPTY; - else if (inuse == objs_per_zspage) - fg = ZS_FULL; - else if (inuse <= 3 * objs_per_zspage / fullness_threshold_frac) - fg = ZS_ALMOST_EMPTY; - else - fg = ZS_ALMOST_FULL; + return ZS_USAGE_0; + if (inuse == objs_per_zspage) + return ZS_USAGE_100; - return fg; + ratio = 100 * inuse / objs_per_zspage; + return groups[ratio / 10]; } /* @@ -781,14 +851,13 @@ static void remove_zspage(struct size_class *class, /* * Each size class maintains zspages in different fullness groups depending * on the number of live objects they contain. When allocating or freeing - * objects, the fullness status of the page can change, say, from ALMOST_FULL - * to ALMOST_EMPTY when freeing an object. This function checks if such - * a status change has occurred for the given page and accordingly moves the - * page from the freelist of the old fullness group to that of the new - * fullness group. + * objects, the fullness status of the page can change, say, from USAGE_80 + * to USAGE_70 when freeing an object. This function checks if such a status + * change has occurred for the given page and accordingly moves the page from + * the list of the old fullness group to that of the new fullness group. */ static enum fullness_group fix_fullness_group(struct size_class *class, - struct zspage *zspage) + struct zspage *zspage) { int class_idx; enum fullness_group currfg, newfg; @@ -972,7 +1041,7 @@ static void __free_zspage(struct zs_pool *pool, struct size_class *class, assert_spin_locked(&pool->lock); VM_BUG_ON(get_zspage_inuse(zspage)); - VM_BUG_ON(fg != ZS_EMPTY); + VM_BUG_ON(fg != ZS_USAGE_0); /* Free all deferred handles from zs_free */ free_handles(pool, class, zspage); @@ -1011,7 +1080,7 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, return; } - remove_zspage(class, zspage, ZS_EMPTY); + remove_zspage(class, zspage, ZS_USAGE_0); #ifdef CONFIG_ZPOOL list_del(&zspage->lru); #endif @@ -1142,14 +1211,15 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, return zspage; } -static struct zspage *find_get_zspage(struct size_class *class) +static struct zspage *find_get_zspage(struct size_class *class, + const enum fullness_group *groups) { - int i; struct zspage *zspage; + int i; - for (i = ZS_ALMOST_FULL; i >= ZS_EMPTY; i--) { + for (i = 0; i < NUM_FULLNESS_GROUPS; i++) { zspage = list_first_entry_or_null(&class->fullness_list[i], - struct zspage, list); + struct zspage, list); if (zspage) break; } @@ -1524,7 +1594,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) /* pool->lock effectively protects the zpage migration */ spin_lock(&pool->lock); - zspage = find_get_zspage(class); + zspage = find_get_zspage(class, fullness_desc); if (likely(zspage)) { obj = obj_malloc(pool, zspage, handle); /* Now move the zspage to another fullness group, if required */ @@ -1642,7 +1712,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle) obj_free(class->size, obj, NULL); fullness = fix_fullness_group(class, zspage); - if (fullness == ZS_EMPTY) + if (fullness == ZS_USAGE_0) free_zspage(pool, class, zspage); spin_unlock(&pool->lock); @@ -1824,22 +1894,19 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class, return ret; } -static struct zspage *isolate_zspage(struct size_class *class, bool source) +static struct zspage *isolate_zspage(struct size_class *class, + const enum fullness_group *groups) { - int i; struct zspage *zspage; - enum fullness_group fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL}; + int i; - if (!source) { - fg[0] = ZS_ALMOST_FULL; - fg[1] = ZS_ALMOST_EMPTY; - } + for (i = 0; i < NUM_FULLNESS_GROUPS; i++) { + enum fullness_group fg = groups[i]; - for (i = 0; i < 2; i++) { - zspage = list_first_entry_or_null(&class->fullness_list[fg[i]], - struct zspage, list); + zspage = list_first_entry_or_null(&class->fullness_list[fg], + struct zspage, list); if (zspage) { - remove_zspage(class, zspage, fg[i]); + remove_zspage(class, zspage, fg); return zspage; } } @@ -2133,7 +2200,8 @@ static void async_free_zspage(struct work_struct *work) continue; spin_lock(&pool->lock); - list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages); + list_splice_init(&class->fullness_list[ZS_USAGE_0], + &free_pages); spin_unlock(&pool->lock); } @@ -2142,7 +2210,7 @@ static void async_free_zspage(struct work_struct *work) lock_zspage(zspage); get_zspage_mapping(zspage, &class_idx, &fullness); - VM_BUG_ON(fullness != ZS_EMPTY); + VM_BUG_ON(fullness != ZS_USAGE_0); class = pool->size_class[class_idx]; spin_lock(&pool->lock); #ifdef CONFIG_ZPOOL @@ -2215,7 +2283,7 @@ static unsigned long __zs_compact(struct zs_pool *pool, * as well as zpage allocation/free */ spin_lock(&pool->lock); - while ((src_zspage = isolate_zspage(class, true))) { + while ((src_zspage = isolate_zspage(class, fullness_asc))) { /* protect someone accessing the zspage(i.e., zs_map_object) */ migrate_write_lock(src_zspage); @@ -2225,10 +2293,11 @@ static unsigned long __zs_compact(struct zs_pool *pool, cc.obj_idx = 0; cc.s_page = get_first_page(src_zspage); - while ((dst_zspage = isolate_zspage(class, false))) { + while ((dst_zspage = isolate_zspage(class, fullness_desc))) { migrate_write_lock_nested(dst_zspage); cc.d_page = get_first_page(dst_zspage); + /* * If there is no more space in dst_page, resched * and see if anyone had allocated another zspage. @@ -2250,7 +2319,7 @@ static unsigned long __zs_compact(struct zs_pool *pool, putback_zspage(class, dst_zspage); migrate_write_unlock(dst_zspage); - if (putback_zspage(class, src_zspage) == ZS_EMPTY) { + if (putback_zspage(class, src_zspage) == ZS_USAGE_0) { migrate_write_unlock(src_zspage); free_zspage(pool, class, src_zspage); pages_freed += class->pages_per_zspage; @@ -2408,7 +2477,7 @@ struct zs_pool *zs_create_pool(const char *name) int pages_per_zspage; int objs_per_zspage; struct size_class *class; - int fullness = 0; + int fullness; size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA; if (size > ZS_MAX_ALLOC_SIZE) @@ -2462,9 +2531,12 @@ struct zs_pool *zs_create_pool(const char *name) class->pages_per_zspage = pages_per_zspage; class->objs_per_zspage = objs_per_zspage; pool->size_class[i] = class; - for (fullness = ZS_EMPTY; fullness < NR_ZS_FULLNESS; - fullness++) + + fullness = ZS_USAGE_0; + while (fullness < NR_ZS_FULLNESS) { INIT_LIST_HEAD(&class->fullness_list[fullness]); + fullness++; + } prev_class = class; } @@ -2510,7 +2582,7 @@ void zs_destroy_pool(struct zs_pool *pool) if (class->index != i) continue; - for (fg = ZS_EMPTY; fg < NR_ZS_FULLNESS; fg++) { + for (fg = ZS_USAGE_0; fg < NR_ZS_FULLNESS; fg++) { if (!list_empty(&class->fullness_list[fg])) { pr_info("Freeing non-empty class with size %db, fullness group %d\n", class->size, fg); @@ -2686,7 +2758,7 @@ static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries) * while the page is removed from the pool. Fix it * up for the check in __free_zspage(). */ - zspage->fullness = ZS_EMPTY; + zspage->fullness = ZS_USAGE_0; __free_zspage(pool, class, zspage); spin_unlock(&pool->lock);