From patchwork Wed Aug 16 08:34:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13354736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FF88C001DF for ; Wed, 16 Aug 2023 08:36:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C648A8D002D; Wed, 16 Aug 2023 04:36:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEB9D8D0001; Wed, 16 Aug 2023 04:36:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A651C8D002D; Wed, 16 Aug 2023 04:36:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8F8808D0001 for ; Wed, 16 Aug 2023 04:36:23 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6D15AB2406 for ; Wed, 16 Aug 2023 08:36:23 +0000 (UTC) X-FDA: 81129311046.06.1A69096 Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) by imf25.hostedemail.com (Postfix) with ESMTP id 900B3A0010 for ; Wed, 16 Aug 2023 08:36:21 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=UFJQxcDf; spf=pass (imf25.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692174981; a=rsa-sha256; cv=none; b=ohGRoSHM32o+id2JaEJMEa5V4YINGLhh4xMLT94jsjymrJXa/UzRuQ3CcjoOtUiJH5B+dC OMkl306RX9xgCFnY+62j3Ttp/X7TPJW7zW8DQp07Bgsbfv70VQ5k/o6uUlObEDV49lx3CO yfdMnGXa8RIFPdtr0YYYeKHuVn0SHds= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=UFJQxcDf; spf=pass (imf25.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.52 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692174981; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=woVtApjKtbzIe2/hEV7galerWEkdSpR+3F7GcJJJS+s=; b=VEnI4ZUgkuNbSuSW+vgedtzv9ROjzPifYbzIpCChxPHixiE7wOc+9W4v+6H53gylw6j7r2 R86N7GHjBqr76DlG+dWr0PgsQLo77mPaAPvkAXUVISzc/QVkMGjh/o8L8k+82nZrEKpUqu iiRAZOu7j2xza6YiZ0rh8TTRmwcxztY= Received: by mail-io1-f52.google.com with SMTP id ca18e2360f4ac-7748ca56133so80729339f.0 for ; Wed, 16 Aug 2023 01:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1692174980; x=1692779780; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=woVtApjKtbzIe2/hEV7galerWEkdSpR+3F7GcJJJS+s=; b=UFJQxcDfJufS+u4ZP03ThFdl1ypcosn80vvkH4QoVYrGHIGgyIwFNNdwKIZcDeZCQj AUnywyiNZlwo5JEIeF6kKvTCFfU6WFiIqamMhkl6qtZ/fRKZdTWKyBJiTBcPGoPVWwFk gGJbpAf4w/gCDa+BLf06yTO5dyPgUoMRSBuNIoaRONBJ/XR1u8OtD/UjK6euxqlOZf+e vaAZvH8YuOLdIVtsH491XO4/ROlhb4xps69xm3gh5rrF1PJsGbbhenUYBf9IVZGp8+lc 4H3N9OW6ekjZhSp6AsttSM1YAYgT4SuN5D7ogcHHmeys41oqqNJGHnG+nP2Z2NZ74u3M A6Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692174980; x=1692779780; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=woVtApjKtbzIe2/hEV7galerWEkdSpR+3F7GcJJJS+s=; b=cpaw4/9RhiK6sR0Azzpp4uw1x3WbyiBQhuls9Uo93dICvrC2eIVx8wXBTJOXozV/VE 0pRWX9VJlJ4BqRE8pTo47Zv6PhZI4C0fHnze4rOHKIhl7fr6dkAxP8X5lTzqzx//GMXt WNhvdTKw/IJTPOzuZq6twvXlWCmd+BKWrZ2SmVZ9Biohi9EINhoVM1r2x4aMVKtigK75 3p3tqCAFVR8DYtOe6JrI0MrlYNe9CTx35KgwQ+nRA8gQo5bPexAUpvgD8wswiLcoarqm MshAxAiQz480s8CqHtDE8Dql8711mL9cDUxiVrQUec7Pd+DpeTDTWRBOYbsgFDwW4ERh w/HQ== X-Gm-Message-State: AOJu0YzwaAw3sMbDVuDRDbOEVQihxPVCKigz7fY+s7gigfFhodMWUZy5 zlIypFXymU6k0x2jwgOtJZduxg== X-Google-Smtp-Source: AGHT+IEQ4qXRi8kiLipV9oxETxH879Tvy9+vsKRQ6VK+eYHALO/sHbhIYHG1ERD0ezaGVdvnTAWMiw== X-Received: by 2002:a05:6e02:549:b0:349:385e:287e with SMTP id i9-20020a056e02054900b00349385e287emr1762872ils.1.1692174980399; Wed, 16 Aug 2023 01:36:20 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id p16-20020a639510000000b005658d3a46d7sm7506333pgd.84.2023.08.16.01.36.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Aug 2023 01:36:19 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, joel@joelfernandes.org, christian.koenig@amd.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, dri-devel@lists.freedesktop.org, linux-fsdevel@vger.kernel.org, Qi Zheng , Muchun Song Subject: [PATCH 5/5] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred} Date: Wed, 16 Aug 2023 16:34:19 +0800 Message-Id: <20230816083419.41088-6-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230816083419.41088-1-zhengqi.arch@bytedance.com> References: <20230816083419.41088-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 900B3A0010 X-Stat-Signature: zqmqb8a6qwik3umey7f7pimttufx4nfw X-Rspam-User: X-HE-Tag: 1692174981-330310 X-HE-Meta: U2FsdGVkX1+V8uMiEu/dtFlgwS2iA6n09Jdxcj65LP7/QQyUdJnAsSRUOcdzyrjXf36DQjyY+Cbmzp5ZAlDYxIGlYvhnqFJ1Yv//qIOKT6m6qq8qYKvT08tNqSqHu79VspcKyXF6pozkJ0D7/NGWee7sZvE0jde+olV8XXp/qSNi6+xv/qXGmcYxKrCsqWULW22Jb/0ODkSyfN99wJXjzwii7rlfA2v1Zs3JhaqUPN1hiZCoLKxOboY5ea0qIAnqt5dZCVHkXOTLvdXT765TXUcxihhGkCTMUO5nfWNu/RGbK7v0MlifqHR0SB+p7EDKfpH7RP27eArTUPYJMIiCjVKNrImpBhZJXHO/y5piimwJ1EcpDf0Yr4WR85HDrU48kmgK4gxZcSaU3IDIl5Q18zzPVl4TJz7IgKN5h8Rzng3tdp4pCcGU0U0BzatoJ1u/avUh2aiaf2KiepjmR8Mz+EfTvuKiEUJsFdRfysSXNrzwUOUAaxCPqXEAVW7wdz8mdxcOMY/cuK0JI3eFlvEvDEYHOdTykJ2O+GWXEppoATkUFoBNz7APKdeTvcm0k0Krb0t4TstAn1cgvyfSibdeEhvrvhQrDtq3nHmjGs1oU0UTKOCpIJqgKV5NfH8ruFTUdqnZ3jHVTuIs1cJbv3rEyNjeQj++5NM8KctFlqyKZ/QV6H7PrNFSMTMkqpbDZiiJhKG01p77+qIj2jbAy7XsaGl2Ch8Gms6rJGPevfQWxhL0k42tmsWuoIJI7XoSnI8kQQxR1ZNKLZ4eHxYvGgpJ4Q+LzdxXSiS5rQhE0O5DRCfAedwL21+lphderlLBOtnJdVhqi0YJxCeSfbrzKiToyURmccO7k3qV1cUmTs3/VLJlsB0iOH04vkkxg9mKaVC5B4szQBbz1AgxQ9n3+Y/frWUrzvGPGJ+KXth22CD2JnzdCA084KeTC4e8fh9pPcY65IAUoINYjgot73eahe5 hgmqm1Wj XFhYn0jgvTzFq6e7AZf3v+LfJmQorDyCeOn6d7SL4eMiLB6fvIxyo7+ZaYRWN/Lc7DeNsCReqfNS8V0h7tfBc36w1rPkGU+KWIOJORnbeRjFUko9exZ4j7TI/wM73W/Vt1rAe22lh6R1SZ5Ps8TpRg5BNvNa0Z0zF6KgX6Goo3nUumg17Z/XBNOARA39uz7BLjMgNKzWeZLaz3OXUpj0SibmnTDvrVVIGlgiGX90Yq2SVBjFWPz6l+HB/jtv3O46jIBaPErx9+BrVl566BM6SBPcEA5T8q2lI8+YNnp5xiN4rxOfeA+aMMQTCwtdEDs2+5iWN2A1Cm4CzQ1W2zOrq1B6BPcvFE9ddEIeQlbLWoNgshEWkl3nIBCwe3RAHJPz/942fGZFppcKqf1gSJLvB3h5KF56nNvOmgjIU4SSnz8V3x2dja7cQbWFnZIOfLL2NLzBBrKP98CQX/WzhT3CvIrwAgcuyFMMIhaueL3Ohjy5yggPrt6N19JP0QCpmI65VGQt8rIDkHTBPTac= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, we maintain two linear arrays per node per memcg, which are shrinker_info::map and shrinker_info::nr_deferred. And we need to resize them when the shrinker_nr_max is exceeded, that is, allocate a new array, and then copy the old array to the new array, and finally free the old array by RCU. For shrinker_info::map, we do set_bit() under the RCU lock, so we may set the value into the old map which is about to be freed. This may cause the value set to be lost. The current solution is not to copy the old map when resizing, but to set all the corresponding bits in the new map to 1. This solves the data loss problem, but bring the overhead of more pointless loops while doing memcg slab shrink. For shrinker_info::nr_deferred, we will only modify it under the read lock of shrinker_rwsem, so it will not run concurrently with the resizing. But after we make memcg slab shrink lockless, there will be the same data loss problem as shrinker_info::map, and we can't work around it like the map. For such resizable arrays, the most straightforward idea is to change it to xarray, like we did for list_lru [1]. We need to do xa_store() in the list_lru_add()-->set_shrinker_bit(), but this will cause memory allocation, and the list_lru_add() doesn't accept failure. A possible solution is to pre-allocate, but the location of pre-allocation is not well determined (such as deferred_split_shrinker case). Therefore, this commit chooses to introduce the following secondary array for shrinker_info::{map, nr_deferred}: +---------------+--------+--------+-----+ | shrinker_info | unit 0 | unit 1 | ... | (secondary array) +---------------+--------+--------+-----+ | v +---------------+-----+ | nr_deferred[] | map | (leaf array) +---------------+-----+ (shrinker_info_unit) The leaf array is never freed unless the memcg is destroyed. The secondary array will be resized every time the shrinker id exceeds shrinker_nr_max. So the shrinker_info_unit can be indexed from both the old and the new shrinker_info->unit[x]. Then even if we get the old secondary array under the RCU lock, the found map and nr_deferred are also true, so the updated nr_deferred and map will not be lost. [1]. https://lore.kernel.org/all/20220228122126.37293-13-songmuchun@bytedance.com/ Signed-off-by: Qi Zheng Reviewed-by: Muchun Song --- include/linux/memcontrol.h | 12 +- include/linux/shrinker.h | 17 +++ mm/shrinker.c | 249 +++++++++++++++++++++++-------------- 3 files changed, 171 insertions(+), 107 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 11810a2cfd2d..b49515bb6fbd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -21,6 +21,7 @@ #include #include #include +#include struct mem_cgroup; struct obj_cgroup; @@ -88,17 +89,6 @@ struct mem_cgroup_reclaim_iter { unsigned int generation; }; -/* - * Bitmap and deferred work of shrinker::id corresponding to memcg-aware - * shrinkers, which have elements charged to this memcg. - */ -struct shrinker_info { - struct rcu_head rcu; - atomic_long_t *nr_deferred; - unsigned long *map; - int map_nr_max; -}; - struct lruvec_stats_percpu { /* Local (CPU and cgroup) state */ long state[NR_VM_NODE_STAT_ITEMS]; diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 6b5843c3b827..8a3c99422fd3 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -5,6 +5,23 @@ #include #include +#define SHRINKER_UNIT_BITS BITS_PER_LONG + +/* + * Bitmap and deferred work of shrinker::id corresponding to memcg-aware + * shrinkers, which have elements charged to the memcg. + */ +struct shrinker_info_unit { + atomic_long_t nr_deferred[SHRINKER_UNIT_BITS]; + DECLARE_BITMAP(map, SHRINKER_UNIT_BITS); +}; + +struct shrinker_info { + struct rcu_head rcu; + int map_nr_max; + struct shrinker_info_unit *unit[]; +}; + /* * This struct is used to pass information from page reclaim to the shrinkers. * We consolidate the values for easier extension later. diff --git a/mm/shrinker.c b/mm/shrinker.c index a16cd448b924..a7b5397a4fb9 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -12,15 +12,50 @@ DECLARE_RWSEM(shrinker_rwsem); #ifdef CONFIG_MEMCG static int shrinker_nr_max; -/* The shrinker_info is expanded in a batch of BITS_PER_LONG */ -static inline int shrinker_map_size(int nr_items) +static inline int shrinker_unit_size(int nr_items) { - return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long)); + return (DIV_ROUND_UP(nr_items, SHRINKER_UNIT_BITS) * sizeof(struct shrinker_info_unit *)); } -static inline int shrinker_defer_size(int nr_items) +static inline void shrinker_unit_free(struct shrinker_info *info, int start) { - return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t)); + struct shrinker_info_unit **unit; + int nr, i; + + if (!info) + return; + + unit = info->unit; + nr = DIV_ROUND_UP(info->map_nr_max, SHRINKER_UNIT_BITS); + + for (i = start; i < nr; i++) { + if (!unit[i]) + break; + + kfree(unit[i]); + unit[i] = NULL; + } +} + +static inline int shrinker_unit_alloc(struct shrinker_info *new, + struct shrinker_info *old, int nid) +{ + struct shrinker_info_unit *unit; + int nr = DIV_ROUND_UP(new->map_nr_max, SHRINKER_UNIT_BITS); + int start = old ? DIV_ROUND_UP(old->map_nr_max, SHRINKER_UNIT_BITS) : 0; + int i; + + for (i = start; i < nr; i++) { + unit = kzalloc_node(sizeof(*unit), GFP_KERNEL, nid); + if (!unit) { + shrinker_unit_free(new, start); + return -ENOMEM; + } + + new->unit[i] = unit; + } + + return 0; } void free_shrinker_info(struct mem_cgroup *memcg) @@ -32,6 +67,7 @@ void free_shrinker_info(struct mem_cgroup *memcg) for_each_node(nid) { pn = memcg->nodeinfo[nid]; info = rcu_dereference_protected(pn->shrinker_info, true); + shrinker_unit_free(info, 0); kvfree(info); rcu_assign_pointer(pn->shrinker_info, NULL); } @@ -40,28 +76,27 @@ void free_shrinker_info(struct mem_cgroup *memcg) int alloc_shrinker_info(struct mem_cgroup *memcg) { struct shrinker_info *info; - int nid, size, ret = 0; - int map_size, defer_size = 0; + int nid, ret = 0; + int array_size = 0; down_write(&shrinker_rwsem); - map_size = shrinker_map_size(shrinker_nr_max); - defer_size = shrinker_defer_size(shrinker_nr_max); - size = map_size + defer_size; + array_size = shrinker_unit_size(shrinker_nr_max); for_each_node(nid) { - info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid); - if (!info) { - free_shrinker_info(memcg); - ret = -ENOMEM; - break; - } - info->nr_deferred = (atomic_long_t *)(info + 1); - info->map = (void *)info->nr_deferred + defer_size; + info = kvzalloc_node(sizeof(*info) + array_size, GFP_KERNEL, nid); + if (!info) + goto err; info->map_nr_max = shrinker_nr_max; + if (shrinker_unit_alloc(info, NULL, nid)) + goto err; rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); } up_write(&shrinker_rwsem); return ret; + +err: + free_shrinker_info(memcg); + return -ENOMEM; } static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, @@ -71,15 +106,12 @@ static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, lockdep_is_held(&shrinker_rwsem)); } -static int expand_one_shrinker_info(struct mem_cgroup *memcg, - int map_size, int defer_size, - int old_map_size, int old_defer_size, - int new_nr_max) +static int expand_one_shrinker_info(struct mem_cgroup *memcg, int new_size, + int old_size, int new_nr_max) { struct shrinker_info *new, *old; struct mem_cgroup_per_node *pn; int nid; - int size = map_size + defer_size; for_each_node(nid) { pn = memcg->nodeinfo[nid]; @@ -92,21 +124,17 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, if (new_nr_max <= old->map_nr_max) continue; - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid); + new = kvmalloc_node(sizeof(*new) + new_size, GFP_KERNEL, nid); if (!new) return -ENOMEM; - new->nr_deferred = (atomic_long_t *)(new + 1); - new->map = (void *)new->nr_deferred + defer_size; new->map_nr_max = new_nr_max; - /* map: set all old bits, clear all new bits */ - memset(new->map, (int)0xff, old_map_size); - memset((void *)new->map + old_map_size, 0, map_size - old_map_size); - /* nr_deferred: copy old values, clear all new values */ - memcpy(new->nr_deferred, old->nr_deferred, old_defer_size); - memset((void *)new->nr_deferred + old_defer_size, 0, - defer_size - old_defer_size); + memcpy(new->unit, old->unit, old_size); + if (shrinker_unit_alloc(new, old, nid)) { + kvfree(new); + return -ENOMEM; + } rcu_assign_pointer(pn->shrinker_info, new); kvfree_rcu(old, rcu); @@ -118,9 +146,8 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, static int expand_shrinker_info(int new_id) { int ret = 0; - int new_nr_max = round_up(new_id + 1, BITS_PER_LONG); - int map_size, defer_size = 0; - int old_map_size, old_defer_size = 0; + int new_nr_max = round_up(new_id + 1, SHRINKER_UNIT_BITS); + int new_size, old_size = 0; struct mem_cgroup *memcg; if (!root_mem_cgroup) @@ -128,15 +155,12 @@ static int expand_shrinker_info(int new_id) lockdep_assert_held(&shrinker_rwsem); - map_size = shrinker_map_size(new_nr_max); - defer_size = shrinker_defer_size(new_nr_max); - old_map_size = shrinker_map_size(shrinker_nr_max); - old_defer_size = shrinker_defer_size(shrinker_nr_max); + new_size = shrinker_unit_size(new_nr_max); + old_size = shrinker_unit_size(shrinker_nr_max); memcg = mem_cgroup_iter(NULL, NULL, NULL); do { - ret = expand_one_shrinker_info(memcg, map_size, defer_size, - old_map_size, old_defer_size, + ret = expand_one_shrinker_info(memcg, new_size, old_size, new_nr_max); if (ret) { mem_cgroup_iter_break(NULL, memcg); @@ -150,17 +174,34 @@ static int expand_shrinker_info(int new_id) return ret; } +static inline int shrinker_id_to_index(int shrinker_id) +{ + return shrinker_id / SHRINKER_UNIT_BITS; +} + +static inline int shrinker_id_to_offset(int shrinker_id) +{ + return shrinker_id % SHRINKER_UNIT_BITS; +} + +static inline int calc_shrinker_id(int index, int offset) +{ + return index * SHRINKER_UNIT_BITS + offset; +} + void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) { if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) { struct shrinker_info *info; + struct shrinker_info_unit *unit; rcu_read_lock(); info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); + unit = info->unit[shrinker_id_to_index(shrinker_id)]; if (!WARN_ON_ONCE(shrinker_id >= info->map_nr_max)) { /* Pairs with smp mb in shrink_slab() */ smp_mb__before_atomic(); - set_bit(shrinker_id, info->map); + set_bit(shrinker_id_to_offset(shrinker_id), unit->map); } rcu_read_unlock(); } @@ -209,26 +250,31 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, struct mem_cgroup *memcg) { struct shrinker_info *info; + struct shrinker_info_unit *unit; info = shrinker_info_protected(memcg, nid); - return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); + unit = info->unit[shrinker_id_to_index(shrinker->id)]; + return atomic_long_xchg(&unit->nr_deferred[shrinker_id_to_offset(shrinker->id)], 0); } static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, struct mem_cgroup *memcg) { struct shrinker_info *info; + struct shrinker_info_unit *unit; info = shrinker_info_protected(memcg, nid); - return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); + unit = info->unit[shrinker_id_to_index(shrinker->id)]; + return atomic_long_add_return(nr, &unit->nr_deferred[shrinker_id_to_offset(shrinker->id)]); } void reparent_shrinker_deferred(struct mem_cgroup *memcg) { - int i, nid; + int nid, index, offset; long nr; struct mem_cgroup *parent; struct shrinker_info *child_info, *parent_info; + struct shrinker_info_unit *child_unit, *parent_unit; parent = parent_mem_cgroup(memcg); if (!parent) @@ -239,9 +285,13 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) for_each_node(nid) { child_info = shrinker_info_protected(memcg, nid); parent_info = shrinker_info_protected(parent, nid); - for (i = 0; i < child_info->map_nr_max; i++) { - nr = atomic_long_read(&child_info->nr_deferred[i]); - atomic_long_add(nr, &parent_info->nr_deferred[i]); + for (index = 0; index < shrinker_id_to_index(child_info->map_nr_max); index++) { + child_unit = child_info->unit[index]; + parent_unit = parent_info->unit[index]; + for (offset = 0; offset < SHRINKER_UNIT_BITS; offset++) { + nr = atomic_long_read(&child_unit->nr_deferred[offset]); + atomic_long_add(nr, &parent_unit->nr_deferred[offset]); + } } } up_read(&shrinker_rwsem); @@ -407,7 +457,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, { struct shrinker_info *info; unsigned long ret, freed = 0; - int i; + int offset, index = 0; if (!mem_cgroup_online(memcg)) return 0; @@ -419,56 +469,63 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (unlikely(!info)) goto unlock; - for_each_set_bit(i, info->map, info->map_nr_max) { - struct shrink_control sc = { - .gfp_mask = gfp_mask, - .nid = nid, - .memcg = memcg, - }; - struct shrinker *shrinker; + for (; index < shrinker_id_to_index(info->map_nr_max); index++) { + struct shrinker_info_unit *unit; - shrinker = idr_find(&shrinker_idr, i); - if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) { - if (!shrinker) - clear_bit(i, info->map); - continue; - } + unit = info->unit[index]; - /* Call non-slab shrinkers even though kmem is disabled */ - if (!memcg_kmem_online() && - !(shrinker->flags & SHRINKER_NONSLAB)) - continue; + for_each_set_bit(offset, unit->map, SHRINKER_UNIT_BITS) { + struct shrink_control sc = { + .gfp_mask = gfp_mask, + .nid = nid, + .memcg = memcg, + }; + struct shrinker *shrinker; + int shrinker_id = calc_shrinker_id(index, offset); - ret = do_shrink_slab(&sc, shrinker, priority); - if (ret == SHRINK_EMPTY) { - clear_bit(i, info->map); - /* - * After the shrinker reported that it had no objects to - * free, but before we cleared the corresponding bit in - * the memcg shrinker map, a new object might have been - * added. To make sure, we have the bit set in this - * case, we invoke the shrinker one more time and reset - * the bit if it reports that it is not empty anymore. - * The memory barrier here pairs with the barrier in - * set_shrinker_bit(): - * - * list_lru_add() shrink_slab_memcg() - * list_add_tail() clear_bit() - * - * set_bit() do_shrink_slab() - */ - smp_mb__after_atomic(); - ret = do_shrink_slab(&sc, shrinker, priority); - if (ret == SHRINK_EMPTY) - ret = 0; - else - set_shrinker_bit(memcg, nid, i); - } - freed += ret; + shrinker = idr_find(&shrinker_idr, shrinker_id); + if (unlikely(!shrinker || !(shrinker->flags & SHRINKER_REGISTERED))) { + if (!shrinker) + clear_bit(offset, unit->map); + continue; + } - if (rwsem_is_contended(&shrinker_rwsem)) { - freed = freed ? : 1; - break; + /* Call non-slab shrinkers even though kmem is disabled */ + if (!memcg_kmem_online() && + !(shrinker->flags & SHRINKER_NONSLAB)) + continue; + + ret = do_shrink_slab(&sc, shrinker, priority); + if (ret == SHRINK_EMPTY) { + clear_bit(offset, unit->map); + /* + * After the shrinker reported that it had no objects to + * free, but before we cleared the corresponding bit in + * the memcg shrinker map, a new object might have been + * added. To make sure, we have the bit set in this + * case, we invoke the shrinker one more time and reset + * the bit if it reports that it is not empty anymore. + * The memory barrier here pairs with the barrier in + * set_shrinker_bit(): + * + * list_lru_add() shrink_slab_memcg() + * list_add_tail() clear_bit() + * + * set_bit() do_shrink_slab() + */ + smp_mb__after_atomic(); + ret = do_shrink_slab(&sc, shrinker, priority); + if (ret == SHRINK_EMPTY) + ret = 0; + else + set_shrinker_bit(memcg, nid, shrinker_id); + } + freed += ret; + + if (rwsem_is_contended(&shrinker_rwsem)) { + freed = freed ? : 1; + goto unlock; + } } } unlock: