From patchwork Sun Jan 19 10:32:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13944405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9E25C02183 for ; Sun, 19 Jan 2025 10:32:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F243F6B0082; Sun, 19 Jan 2025 05:32:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAD156B0083; Sun, 19 Jan 2025 05:32:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D271B6B0085; Sun, 19 Jan 2025 05:32:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B0F936B0082 for ; Sun, 19 Jan 2025 05:32:23 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1629C1420E4 for ; Sun, 19 Jan 2025 10:32:23 +0000 (UTC) X-FDA: 83023836966.05.F71F2D5 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf23.hostedemail.com (Postfix) with ESMTP id 3F4F214000D for ; Sun, 19 Jan 2025 10:32:21 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k5Z+xJza; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737282741; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=YvHEce2KZ6M7l87KhRuxkVU7gcIO0QFn/8QdpwPork0=; b=pbEkBbVlJKO/Cfi7mB1Xw42nH6R1qqfYnTPdTv2yB3jTK0jFp7xr3/W1Dc1mLD23svN2Li cvpLnIs/15BQwIcmz89vf9mCPtYqUu9vqQwIFRcuTYifTPMx6E4p+LlPWjnebFClxgVgu+ VtEpa7r2aJ5msVywEfAbrF3PSNhpbQA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737282741; a=rsa-sha256; cv=none; b=5bVLNhq82EM3mQtFWTBvVJqYa5fWZ8iA57N4luVjKstZCAOkL5GrnF458NRy0Sy/hWP1dK 5nUYhqHC44BdV1ngWZsTFuRDMIpNcd7vAfeQcKkLhB6bO87Cq3aP9VqsyCtKhFNaYT4o3w ycwsvapckkSfH90kjglhoONM/trsCes= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k5Z+xJza; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-5da12190e75so7236137a12.1 for ; Sun, 19 Jan 2025 02:32:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737282739; x=1737887539; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=YvHEce2KZ6M7l87KhRuxkVU7gcIO0QFn/8QdpwPork0=; b=k5Z+xJzamIMWrixaf1WW8WbDus1FM4WwzuT8V1eZ6poR90JNJ1bq0vLJyQi/YzI+lE aribrLuSKkSF/qSs19Qv506FLSC6cJN5CCodpCjtKSOHcN50cBwH7oV8ot3SrHATbz6H iKFnltDOa6De2GZSxWDu1K8fMGNyTC6kW5riWBiHGB6MiRKNPRwUSFGrvAohrbJSn5G4 SVlDsOsfYlFkdEtNPY7ClZTQcH/eGmiqRcOhvsuYMIrC05ijaofQLawHQ8D2N0VzIzgX hiuzR+e4Vra8NegeQecmM8s/G0NIYJTKh4yxfF5auyGSQKNOwx481JT16zCnpd1uHE0d 0TZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737282739; x=1737887539; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YvHEce2KZ6M7l87KhRuxkVU7gcIO0QFn/8QdpwPork0=; b=E5QWaQyCw+7pa/Rkpip/RZkQq5QH2KO+IpXI3jvTSD0awrEkhDDb29UlG/zi6FlLKq Z0bBurr81KNliNz1/2/+qCeBr6FbLu7PmzKpUq8ugq2sjQxodeXul4Dj9PndfxGV9TtJ Dg3Yg/s62gxbfEzUeLuMWyylsTRe5z0hU0fL3qV8t0pMZHJarYty37Jf6BTm+oSYeVFG X/D5BDs4WpounOhxAjNtEqz1UX0zFHz+eniR7hc99VSki04aacr31fnshFZBGCQe+H41 HhIAT8o7MVFDDr969qEkZ5pKY1E5bAYIiFnvK5Yyvpec2+DeBG2AO3PF+KV68KgiPIgy eS7w== X-Forwarded-Encrypted: i=1; AJvYcCWYoVWgrs0r65y+y2xBrlOmn1OntNMleZOnrqF0mCb+9V3TVBeQ289WhMc5pfIANoA85BTwdN8CuQ==@kvack.org X-Gm-Message-State: AOJu0YzxzahBvHSg1AT6978Qsj3aD6EZNxVZoCWAVPvNlfcNZ2+j5Xtt YnucnlhCZnodzRMTp1tls5js62mXI193C4JkTyfuocuoHxNGJX8F X-Gm-Gg: ASbGncuCY2/aT6Qc5VKCaAL6WSN07nXvFO1iqbcjptrI6en302oyY7wL8HlPoihwR56 hwAQLZqRQ7Ub+njGjCPM8U+pU7oMSki0a/tsU0Ye2bJuRTM7wXiQTbzfdNPSQvP5bNOgZyKq8Yu wuc931arsNCGgu7YEkdEPdqvijrxp0i+YSb/fK1an5MawyY6rBGck95d7jgSgJqdpKQO/RGYUoy Fw35Zwo8QS/OItW5eruG3CjvZepGWK34JiRVdqQl0kw+NxGi5/lXpfGivuhcSEEK0re9odiPkLs 2ttIzfIsX+oF X-Google-Smtp-Source: AGHT+IFyvGpNyjIy0/3HeMvg3lEOOpOJzY9yuG5BK+QF3krIrmnM1Mc3TFDlW7zYT8+DboitvUvU4w== X-Received: by 2002:a05:6402:2744:b0:5d0:bcdd:ff90 with SMTP id 4fb4d7f45d1cf-5db7d2e7e43mr7811481a12.2.1737282739174; Sun, 19 Jan 2025 02:32:19 -0800 (PST) Received: from f.. (cst-prg-69-191.cust.vodafone.cz. [46.135.69.191]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5db73684dd3sm4289503a12.46.2025.01.19.02.32.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jan 2025 02:32:18 -0800 (PST) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, tavianator@tavianator.com, linux-mm@kvack.org, akpm@linux-foundation.org, Mateusz Guzik Subject: [RESEND PATCH] fs: avoid mmap sem relocks when coredumping with many missing pages Date: Sun, 19 Jan 2025 11:32:05 +0100 Message-ID: <20250119103205.2172432-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Stat-Signature: u4s7mfqfuptjyt3ws1ktn87h83y7hpby X-Rspam-User: X-Rspamd-Queue-Id: 3F4F214000D X-Rspamd-Server: rspam03 X-HE-Tag: 1737282741-469736 X-HE-Meta: U2FsdGVkX18p85oczr4HtlZKbrtS4o1VqJAoYNaZ3nuRRvBHhtjH5FonastAn5SIeWqdYqAZvrdSVcv/5pdVloZL8X+WSzz6QKWPPgS5ZyDraWDcTNKiCI2E3zxBRA6RdSt336xXFQsZk0qhZPco3MZW5+Nns+QVU7pmkl0ZUHWwMfK+qs9reAwsIsc8VgqRJ7TbJMp9MvCeuMWFTOkEcz8c0BredWIxQ5FgtgtAc2ok+eJOxnzGqjcTjNZuh3l4/VMm2LXQZEtAabi0FdZKCw5bdk0M1D0GyJWfoC55fdOipznrVnOzLHLc9W16IzBGqr4swzceW/MhzbfGfd0wZ2DoVA2f8h93dN8GZl6c0mAsX3hX0PMYM4cWaTuXwSQKv+95km3jO/Nc4XaWCQQEKyAl9lqMzjQGKEqakJyHiZJ5MCewTGb1rPKHG+yeKPEXkvuotpUrCdKRh+30SXltkUQErf40uEpgd3jRDieAObUPYj1gY05NnkIkH2Fi0JRXj32CogvmvJXD4nsGeSVrchbTxFPOlpDk1ZN9ckOhwUfaOsdyXR06q+1hZ31gSxltHn6WP3N1nXvmbfZvVjchkvO11gJ0lRrnYefLHxWdSyC9N1R1Hw0QO7MjzeN6dWOR86OyC1RpuxXUq0KHSDGOyAKAqkHxY1i+vjQqQHUMaSHy8vs/Y+gBbEkiLZPI0F9w8lOXCO/fIcEtnxM1VOx0jWIhoijSCds7mYL5ooLr7JvcMufFt39K3AMmJJg9EVnExWAZtzn+EqEWprfoIh7tkafYdewhHUTpAZY7ulobP1X43o2s5CN3g3MJofaXtjK/aAvKTDC8TOJJroQ4x4dwmlVUOzatu1VF+MMA5dF/nIz9jIY9PIXUoG1keZxGQXxXLGbEhLLVMQhmspia/SEDHHYriwW3cSF937ML+z2kfqiWiT3TQvSW1Mc/ASVCzd5IEt54L0w7h+i3L8WMtKq tTCCc1a9 rpw27q5xMzpRlM5xr0CWRa2+r2QG6ghUZPdZtlAgi0aaPBoFfCSzLWzbZcMbbs9GGdyzI+fJP8A8vf6MpJLXEKb1aNUDxN1YuAvn5ozDAdUkHoz/BCrmjRFCKnAiynw0No3dnKrkn3ItHvWED+XrAWkkz1wwmiaT/iPs3i4uwbvgwJF5hZF6Gtquu/iqUUz9SmcigTsgjAn/y00ZOADY988ojGVgyl7xBU9PuU2TAa++vQithmYpXflaSBuylUwJgonuCg3EnQ6khAeDJCUr53X/diaDGsk/3wYjkOk9rWz7wRuHu6LqoiuqLIyqoo2MYjDbtUUFnDDlW7A/BC8sqedkp/YgcPsP3A0B3SzU4AU+/ZqI9HckTOTTUI63LDil9coMONaW5GPdIJ2BBTLMtNl2KMA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Dumping processes with large allocated and mostly not-faulted areas is very slow. Borrowing a test case from Tavian Barnes: int main(void) { char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0); printf("%p %m\n", mem); if (mem != MAP_FAILED) { mem[0] = 1; } abort(); } That's 1TB of almost completely not-populated area. On my test box it takes 13-14 seconds to dump. The profile shows: - 99.89% 0.00% a.out entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_exit_to_user_mode arch_do_signal_or_restart - get_signal - 99.89% do_coredump - 99.88% elf_core_dump - dump_user_range - 98.12% get_dump_page - 64.19% __get_user_pages - 40.92% gup_vma_lookup - find_vma - mt_find 4.21% __rcu_read_lock 1.33% __rcu_read_unlock - 3.14% check_vma_flags 0.68% vma_is_secretmem 0.61% __cond_resched 0.60% vma_pgtable_walk_end 0.59% vma_pgtable_walk_begin 0.58% no_page_table - 15.13% down_read_killable 0.69% __cond_resched 13.84% up_read 0.58% __cond_resched Almost 29% of the time is spent relocking the mmap semaphore between calls to get_dump_page() which find nothing. Whacking that results in times of 10 seconds (down from 13-14). While here make the thing killable. The real problem is the page-sized iteration and the real fix would patch it up instead. It is left as an exercise for the mm-familiar reader. Signed-off-by: Mateusz Guzik --- Minimally tested, very plausible I missed something. sent again because the previous thing has myself in To -- i failed to fix up the oneliner suggested by lore.kernel.org. it seem the original got lost. arch/arm64/kernel/elfcore.c | 3 ++- fs/coredump.c | 38 +++++++++++++++++++++++++++++++------ include/linux/mm.h | 2 +- mm/gup.c | 5 ++--- 4 files changed, 37 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kernel/elfcore.c b/arch/arm64/kernel/elfcore.c index 2e94d20c4ac7..b735f4c2fe5e 100644 --- a/arch/arm64/kernel/elfcore.c +++ b/arch/arm64/kernel/elfcore.c @@ -27,9 +27,10 @@ static int mte_dump_tag_range(struct coredump_params *cprm, int ret = 1; unsigned long addr; void *tags = NULL; + int locked = 0; for (addr = start; addr < start + len; addr += PAGE_SIZE) { - struct page *page = get_dump_page(addr); + struct page *page = get_dump_page(addr, &locked); /* * get_dump_page() returns NULL when encountering an empty diff --git a/fs/coredump.c b/fs/coredump.c index d48edb37bc35..84cf76f0d5b6 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -925,14 +925,23 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, { unsigned long addr; struct page *dump_page; + int locked, ret; dump_page = dump_page_alloc(); if (!dump_page) return 0; + ret = 0; + locked = 0; for (addr = start; addr < start + len; addr += PAGE_SIZE) { struct page *page; + if (!locked) { + if (mmap_read_lock_killable(current->mm)) + goto out; + locked = 1; + } + /* * To avoid having to allocate page tables for virtual address * ranges that have never been used yet, and also to make it @@ -940,21 +949,38 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, * NULL when encountering an empty page table entry that would * otherwise have been filled with the zero page. */ - page = get_dump_page(addr); + page = get_dump_page(addr, &locked); if (page) { + if (locked) { + mmap_read_unlock(current->mm); + locked = 0; + } int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page)); put_page(page); - if (stop) { - dump_page_free(dump_page); - return 0; - } + if (stop) + goto out; } else { dump_skip(cprm, PAGE_SIZE); } + + if (dump_interrupted()) + goto out; + + if (!need_resched()) + continue; + if (locked) { + mmap_read_unlock(current->mm); + locked = 0; + } cond_resched(); } + ret = 1; +out: + if (locked) + mmap_read_unlock(current->mm); + dump_page_free(dump_page); - return 1; + return ret; } #endif diff --git a/include/linux/mm.h b/include/linux/mm.h index 75c9b4f46897..7df0d9200d8c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2633,7 +2633,7 @@ int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); struct kvec; -struct page *get_dump_page(unsigned long addr); +struct page *get_dump_page(unsigned long addr, int *locked); bool folio_mark_dirty(struct folio *folio); bool folio_mark_dirty_lock(struct folio *folio); diff --git a/mm/gup.c b/mm/gup.c index 2304175636df..f3be2aa43543 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2266,13 +2266,12 @@ EXPORT_SYMBOL(fault_in_readable); * Called without mmap_lock (takes and releases the mmap_lock by itself). */ #ifdef CONFIG_ELF_CORE -struct page *get_dump_page(unsigned long addr) +struct page *get_dump_page(unsigned long addr, int *locked) { struct page *page; - int locked = 0; int ret; - ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked, + ret = __get_user_pages_locked(current->mm, addr, 1, &page, locked, FOLL_FORCE | FOLL_DUMP | FOLL_GET); return (ret == 1) ? page : NULL; }