From patchwork Fri Jun 14 01:05:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "zhaoyang.huang" X-Patchwork-Id: 13697696 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D275C27C75 for ; Fri, 14 Jun 2024 01:07:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F7046B00AA; Thu, 13 Jun 2024 21:06:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A5046B00AC; Thu, 13 Jun 2024 21:06:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86CCA6B00AD; Thu, 13 Jun 2024 21:06:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6ACC56B00AA for ; Thu, 13 Jun 2024 21:06:48 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 14FB5803B5 for ; Fri, 14 Jun 2024 01:06:48 +0000 (UTC) X-FDA: 82227704496.02.12753A4 Received: from SHSQR01.spreadtrum.com (mx1.unisoc.com [222.66.158.135]) by imf23.hostedemail.com (Postfix) with ESMTP id 83D30140017 for ; Fri, 14 Jun 2024 01:06:44 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of zhaoyang.huang@unisoc.com designates 222.66.158.135 as permitted sender) smtp.mailfrom=zhaoyang.huang@unisoc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718327204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=bhVGw8WlZ7n4mlMDwmlDwwv97QzgbtKjSxyylCgV0fE=; b=l05vQPlgXhQ0ViJYoMrl2llq1gxQF3pn9k5PCJSBRurh5sRGc7HkwONyq1YRy8K4EZvXiW dxiFKcFE/9DxbPld68t3h2woV+x25wHiNakTde8IluPTq6CxEfBnjV9vtf3WBmJHZz8yQc 6mZ16NaWuEpmsuM7Lu1yezFUqHDiwRA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of zhaoyang.huang@unisoc.com designates 222.66.158.135 as permitted sender) smtp.mailfrom=zhaoyang.huang@unisoc.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718327204; a=rsa-sha256; cv=none; b=ps1cAnz2o/CtLc/a3NhYZ7RrQVcUFW9fhyE42mn3JmReEsPvzxzigIQt/KwtKRnDx3iggU goPaEUKhWjsleccKkAFtwrFnqW4bQ1xiCZ8iZ9PJL8gjyY7YzyrTpVadLFqrt4c2pWec3a FQnA1tiwAey/qVjJFMId2WdPyVqGIPo= Received: from dlp.unisoc.com ([10.29.3.86]) by SHSQR01.spreadtrum.com with ESMTP id 45E166sM073373; Fri, 14 Jun 2024 09:06:06 +0800 (+08) (envelope-from zhaoyang.huang@unisoc.com) Received: from SHDLP.spreadtrum.com (bjmbx01.spreadtrum.com [10.0.64.7]) by dlp.unisoc.com (SkyGuard) with ESMTPS id 4W0gwC4lQ4z2RZ8mg; Fri, 14 Jun 2024 09:01:47 +0800 (CST) Received: from bj03382pcu01.spreadtrum.com (10.0.73.40) by BJMBX01.spreadtrum.com (10.0.64.7) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Fri, 14 Jun 2024 09:06:03 +0800 From: "zhaoyang.huang" To: Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Baoquan He , Thomas Gleixner , hailong liu , , , Zhaoyang Huang , CC: Subject: [PATCHv5 1/1] mm: fix incorrect vbq reference in purge_fragmented_block Date: Fri, 14 Jun 2024 09:05:57 +0800 Message-ID: <20240614010557.1821327-1-zhaoyang.huang@unisoc.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.0.73.40] X-ClientProxiedBy: SHCAS03.spreadtrum.com (10.0.1.207) To BJMBX01.spreadtrum.com (10.0.64.7) X-MAIL: SHSQR01.spreadtrum.com 45E166sM073373 X-Stat-Signature: x44piupuui9zr74frnfewysxy8gbbxcr X-Rspam-User: X-Rspamd-Queue-Id: 83D30140017 X-Rspamd-Server: rspam02 X-HE-Tag: 1718327204-827048 X-HE-Meta: U2FsdGVkX1/cAYIwmWiZVJpsHj8Uj3iV4SRNXZBoJ5n/om4IErRs5KyUFCJk8n/OR5ZbPBNo9VE81EhCHYEEjfgAchQ7fTeR+AXdTKs02xmM/51Fhk4MyC0+q6Mg74o4Q2sMFED81RbuIG64G1TQUhMhdAmS9At96T7Pavp7cv2HKD3ubYQWwCAUYFhGb0VWip21ib5nzaZfa4T1l+IPsC4yIYMSEMGKhgZDt/eWPJ+/J17oKYboqtE71BXgjSNIfQzaltVKteH7ZNlXuWnuIoJ2hWFSlqDan++MYG2nqjTZft48kqFG7MNgYc5cnBBzv7NjnNr+tNVBKC6erZl6fWYQZSi6vaXHqay0wkQPCZQvv8JwmhhNPzYSY8P2eRTC/DedqYC0KVeO5c6Kg0AZ7WGhfGzkT2dYAKC+QaGkaKz+K43QfyRUQkbhZOThdN9skqEFUcF2+Jb5UXvoMvvaFsyMYjIs7Puu3NQQM08CksqhRuS1uDbg8PsVPGd1qGqfoeDlwPbML7V4f81aIyaa2f9g0dyvfr+Vp1FxJjG93vmQsdksUzluFFpckIxGOu803gHkyUYeMgOYDNMPwwAg7jBehlaF0rwXE/Cl/9F4jV4JLzvjZQfoWDj+/1gy4Q9Axqt/ZTckBLKbb31fVJETqebc7NjT9zayxR/W6PCBgWqTQ9645CsjkWeQTcr0jVX4oMTYpnRBWodz51A1mIT1c65xnrEiWfMG1ExO2/Jx/7dQ9PE4wajwJc+Dm529P17USCHzola36nPCk97lJO1+tt5ic429g6KTqFXU+WX768LQ9UkQO6ny6pqhRO2cC1FcByNFG7RJsPZsPJmvwIS47fIYZjnrTgCnqrBJbBYmIeD8qoKbKC7gstXJREI1YoNATU1KUdGXw+na12dm8NyJ0YvIz7qtiSPtHRdk3fHRAOVdCvkRqz3si5qERT236iYovi804UW0PnPuOeQAb3+ tjIWOVgS hblJiXe00KMTmPocFDI6Df0BPN3jsi8Jjaufr1jtdwxlCSmDQeWqRvSghpPPWduoxOT3KAJ6l523e9EuWP+qUj5NVizsn3Wx3OZJHdjb0JVeyy+LylbGpQsvtxeDv5grcaDdqN+zBD8bqMGycSt97cUHdQDfYXD/UsLkv0T6Kiz08vAgLBnTMZckuo6XHZNB2B/SEFohvVlHbZ74EHC/HmC2muKaOdQisXxbK2WFZSHEUZRyQyu6uM8jCfJnYncts4AAzUHIQewy6WbGhFteAtn4Xf3OccxXudPXG5utUIOc4MDOgwob0pQ14ZhFdo+drDhBme5CUhsUbLDtDUpE7zo3AkS6qbRqTxo3N X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Zhaoyang Huang The function xa_for_each() in _vm_unmap_aliases() loops through all vbs. However, since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks xarray") the vb from xarray may not be on the corresponding CPU vmap_block_queue. Consequently, purge_fragmented_block() might use the wrong vbq->lock to protect the free list, leading to vbq->free breakage. Incorrect lock protection can exhaust all vmalloc space as follows: CPU0 CPU1 +--------------------------------------------+ | +--------------------+ +-----+ | +--> | |---->| |------+ | CPU1:vbq free_list | | vb1 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ _vm_unmap_aliases() vb_alloc() new_vmap_block() xa_for_each(&vbq->vmap_blocks, idx, vb) --> vb in CPU1:vbq->freelist purge_fragmented_block(vb) spin_lock(&vbq->lock) spin_lock(&vbq->lock) --> use CPU0:vbq->lock --> use CPU1:vbq->lock list_del_rcu(&vb->free_list) list_add_tail_rcu(&vb->free_list, &vbq->free) __list_del(vb->prev, vb->next) next->prev = prev +--------------------+ | | | CPU1:vbq free_list | +---| |<--+ | +--------------------+ | +----------------------------+ __list_add(new, head->prev, head) +--------------------------------------------+ | +--------------------+ +-----+ | +--> | |---->| |------+ | CPU1:vbq free_list | | vb2 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ prev->next = next +--------------------------------------------+ |----------------------------+ | | +--------------------+ | +-----+ | +--> | |--+ | |------+ | CPU1:vbq free_list | | vb2 | +--- | |<----| |<-----+ | +--------------------+ +-----+ | +--------------------------------------------+ Here’s a list breakdown. All vbs, which were to be added to ‘prev’, cannot be used by list_for_each_entry_rcu(vb, &vbq->free, free_list) in vb_alloc(). Thus, vmalloc space is exhausted. This issue affects both erofs and f2fs, the stacktrace is as follows: erofs: [] __switch_to+0x174 [] __schedule+0x624 [] schedule+0x7c [] schedule_preempt_disabled+0x24 [] __mutex_lock+0x374 [] __mutex_lock_slowpath+0x14 [] mutex_lock+0x24 [] reclaim_and_purge_vmap_areas+0x44 [] alloc_vmap_area+0x2e0 [] vm_map_ram+0x1b0 [] z_erofs_lz4_decompress+0x278 [] z_erofs_decompress_queue+0x650 [] z_erofs_runqueue+0x7f4 [] z_erofs_read_folio+0x104 [] filemap_read_folio+0x6c [] filemap_fault+0x300 [] __do_fault+0xc8 [] handle_mm_fault+0xb38 [] do_page_fault+0x288 [] do_translation_fault[jt]+0x40 [] do_mem_abort+0x58 [] el0_ia+0x70 [] el0t_64_sync_handler[jt]+0xb0 [] ret_to_user[jt]+0x0 f2fs: [] __switch_to+0x174 [] __schedule+0x624 [] schedule+0x7c [] schedule_preempt_disabled+0x24 [] __mutex_lock+0x374 [] __mutex_lock_slowpath+0x14 [] mutex_lock+0x24 [] reclaim_and_purge_vmap_areas+0x44 [] alloc_vmap_area+0x2e0 [] vm_map_ram+0x1b0 [] f2fs_prepare_decomp_mem+0x144 [] f2fs_alloc_dic+0x264 [] f2fs_read_multi_pages+0x428 [] f2fs_mpage_readpages+0x314 [] f2fs_readahead+0x50 [] read_pages+0x80 [] page_cache_ra_unbounded+0x1a0 [] page_cache_ra_order+0x274 [] do_sync_mmap_readahead+0x11c [] filemap_fault+0x1a0 [] f2fs_filemap_fault+0x28 [] __do_fault+0xc8 [] handle_mm_fault+0xb38 [] do_page_fault+0x288 [] do_translation_fault[jt]+0x40 [] do_mem_abort+0x58 [] el0_ia+0x70 [] el0t_64_sync_handler[jt]+0xb0 [] ret_to_user[jt]+0x0 To fix this, replace xa_for_each() with list_for_each_entry_rcu() which reverts commit fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks") Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks") Cc: stable@vger.kernel.org Suggested-by: Hailong.Liu Signed-off-by: Zhaoyang Huang --- v2: introduce cpu in vmap_block to record the right CPU number v3: use get_cpu/put_cpu to prevent schedule between core v4: replace get_cpu/put_cpu by another API to avoid disabling preemption v5: update the commit message by Hailong.Liu --- --- mm/vmalloc.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 22aa63f4ef63..89eb034f4ac6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2458,6 +2458,7 @@ struct vmap_block { struct list_head free_list; struct rcu_head rcu_head; struct list_head purge; + unsigned int cpu; }; /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */ @@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) free_vmap_area(va); return ERR_PTR(err); } - - vbq = raw_cpu_ptr(&vmap_block_queue); + /* + * list_add_tail_rcu could happened in another core + * rather than vb->cpu due to task migration, which + * is safe as list_add_tail_rcu will ensure the list's + * integrity together with list_for_each_rcu from read + * side. + */ + vb->cpu = raw_smp_processor_id(); + vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu); spin_lock(&vbq->lock); list_add_tail_rcu(&vb->free_list, &vbq->free); spin_unlock(&vbq->lock); @@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb) } static bool purge_fragmented_block(struct vmap_block *vb, - struct vmap_block_queue *vbq, struct list_head *purge_list, - bool force_purge) + struct list_head *purge_list, bool force_purge) { + struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu); + if (vb->free + vb->dirty != VMAP_BBMAP_BITS || vb->dirty == VMAP_BBMAP_BITS) return false; @@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu) continue; spin_lock(&vb->lock); - purge_fragmented_block(vb, vbq, &purge, true); + purge_fragmented_block(vb, &purge, true); spin_unlock(&vb->lock); } rcu_read_unlock(); @@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) * not purgeable, check whether there is dirty * space to be flushed. */ - if (!purge_fragmented_block(vb, vbq, &purge_list, false) && + if (!purge_fragmented_block(vb, &purge_list, false) && vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) { unsigned long va_start = vb->va->va_start; unsigned long s, e;