From patchwork Tue Oct 25 20:52:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 13019898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2CB53FA373E for ; Tue, 25 Oct 2022 20:54:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=hZ9O5hhTZXtywFludlr88e14yDcAo/s4xo0Uen5YxOM=; b=t8xvHsOmtKkJhk LbFv+fOCMVxm+ODckvJ4Zsgi2qqTVqr7ZMNmbiz0mjXZy2KHVEyUclkg71GI7hY3k8051UoQQ9vA1 ipTedqx6/EDodICjlPOL5wEiKedZHQO/gXiDNtP6UYgYbJiz5d+9EhrpThAxEworbY/ad0cHQOLhc AIF0BVNSO3Tt3swpDPDc5E6H3Y/83AtYlHLBhNqeJihDzV37b99myuBSZC5tvmvsRYZcBHb9UXY1s WP8UvYrNXlJfL48OX89m4P1/qjCDt4UeOfmPUWHF8Sib7lOG5zFh02ppEgUEtPtQwFoM18JbISadi OnvPtMIafJdWxvJiFH3w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1onQv7-0078M1-Dn; Tue, 25 Oct 2022 20:53:09 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1onQuu-0078H3-0k for linux-arm-kernel@lists.infradead.org; Tue, 25 Oct 2022 20:52:58 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 978DAB81BE1; Tue, 25 Oct 2022 20:52:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AFCC8C433C1; Tue, 25 Oct 2022 20:52:49 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Arnd Bergmann Cc: Will Deacon , Marc Zyngier , Greg Kroah-Hartman , Andrew Morton , Herbert Xu , Ard Biesheuvel , Christoph Hellwig , Isaac Manjarres , Saravana Kannan , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v2 0/2] mm: Allow kmalloc() allocations below ARCH_KMALLOC_MINALIGN Date: Tue, 25 Oct 2022 21:52:45 +0100 Message-Id: <20221025205247.3264568-1-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221025_135256_384712_0944C509 X-CRM114-Status: GOOD ( 26.48 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi, I called this 'v2' but it's a different incarnation of the previous attempt to reduce ARCH_KMALLOC_MINALIGN for arm64 (and please treat it as an RFC): https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/ The long discussion in the thread above seemed to converge towards bouncing of the DMA buffers that came from small kmalloc() caches. However, there are a few problems with such approach: - We still need ARCH_KMALLOC_MINALIGN to match the smaller kmalloc() size but this breaks crypto without changing the structure alignments to the larger ARCH_DMA_MINALIGN (not agreed upon yet). - Not all systems, especially mobile devices, have a swiotlb buffer (most boot with 'noswiotlb'). Even of we allow a small bounce buffer, it's hard to come up with a size that would please the mobile vendors. - An alternative to the default bounce buffer is to use another kmalloc() for the bouncing, though it adds an overhead, potential allocation failures and DMA unmapping cannot be easily tracked back to the bounce kmalloc'ed buffer (no simple is_swiotlb_buffer()). - IOMMU needs bouncing as well when cache maintenance is required. This seems to get pretty complicated in the presence of scatterlists (though probably doable with enough dedication but it would add an overhead with the additional scatterlist scanning). - Custom dma_ops - not too many but they'd also need the bouncing logic. - Undecided on whether we should only bounce small sizes or any unaligned size. The latter would end up bouncing large-ish buffers (network) but it has the advantage that we don't need to change the crypto code if the ARCH_KMALLOC_MINALIGN becomes 8 (well, it will bounce all the DMA until the code is changed). So, with this patchset, I've given up on trying to sort out the bouncing and went for an explicit opt-in to smaller kmalloc() sizes via a newly introduced __GFP_PACKED flag which would return objects aligned to KMALLOC_PACKED_ALIGN (8 bytes with slub, 32 with slab). The second patch sprinkles the kernel with several gfp flag updates where some hot allocations seem to be on my hardware (the commit log has instructions on how to identify them). On a Cavium TX2, after boot, I get: kmalloc-128 24608 24608 128 32 1 : tunables 0 0 0 : slabdata 769 769 0 kmalloc-96 9408 9660 96 42 1 : tunables 0 0 0 : slabdata 230 230 0 kmalloc-64 9594 9728 64 64 1 : tunables 0 0 0 : slabdata 152 152 0 kmalloc-32 14052 14464 32 128 1 : tunables 0 0 0 : slabdata 113 113 0 kmalloc-16 26368 26368 16 256 1 : tunables 0 0 0 : slabdata 103 103 0 kmalloc-8 36352 36352 8 512 1 : tunables 0 0 0 : slabdata 71 71 0 It's still not ideal and I suspect there may be some early allocations even prior to ftrace identifying them but there is a significant improvement without breaking any of the existing cases. To me, the ideal approach would be __dma annotations on pointers aimed at DMA and using kernel tools like sparse to identify them. A dma_kmalloc() would return such pointers. However, things get muddier with scatterlists since functions like sg_set_page() don't have any such pointer information (can we mark the offset as __dma instead?). Comments/suggestions are welcome but, as per above, I'd like to stay away from bouncing if possible. Thanks. Catalin Marinas (2): mm: slab: Introduce __GFP_PACKED for smaller kmalloc() alignments treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations drivers/usb/core/message.c | 3 ++- fs/binfmt_elf.c | 6 ++++-- fs/dcache.c | 3 ++- fs/ext4/dir.c | 4 ++-- fs/ext4/extents.c | 4 ++-- fs/file.c | 2 +- fs/kernfs/file.c | 8 ++++---- fs/nfs/dir.c | 7 ++++--- fs/nfs/inode.c | 2 +- fs/nfs/nfs4state.c | 2 +- fs/nfs/write.c | 3 ++- fs/notify/inotify/inotify_fsnotify.c | 3 ++- fs/proc/self.c | 2 +- fs/seq_file.c | 5 +++-- include/acpi/platform/aclinuxex.h | 6 ++++-- include/linux/gfp_types.h | 10 ++++++++-- include/linux/slab.h | 22 ++++++++++++++++++---- kernel/trace/ftrace.c | 2 +- kernel/trace/tracing_map.c | 2 +- lib/kasprintf.c | 2 +- lib/kobject.c | 4 ++-- lib/mpi/mpiutil.c | 2 +- mm/list_lru.c | 6 ++++-- mm/memcontrol.c | 4 ++-- mm/slab_common.c | 3 ++- mm/util.c | 6 +++--- mm/vmalloc.c | 6 ++++-- net/sunrpc/auth_unix.c | 2 +- net/sunrpc/xdr.c | 2 +- security/apparmor/lsm.c | 2 +- security/security.c | 4 ++-- security/tomoyo/realpath.c | 2 +- 32 files changed, 88 insertions(+), 53 deletions(-)