From patchwork Thu Feb 13 08:55:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Gross X-Patchwork-Id: 13972954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9CA67C0219D for ; Thu, 13 Feb 2025 08:56:01 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.887230.1296768 (Exim 4.92) (envelope-from ) id 1tiV0i-00005I-7P; Thu, 13 Feb 2025 08:55:52 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 887230.1296768; Thu, 13 Feb 2025 08:55:52 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tiV0i-00005B-4A; Thu, 13 Feb 2025 08:55:52 +0000 Received: by outflank-mailman (input) for mailman id 887230; Thu, 13 Feb 2025 08:55:50 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tiV0g-0008Vs-Q4 for xen-devel@lists.xenproject.org; Thu, 13 Feb 2025 08:55:50 +0000 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 518d49b8-e9e8-11ef-b3ef-695165c68f79; Thu, 13 Feb 2025 09:55:48 +0100 (CET) Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DA9FF1FBAA; Thu, 13 Feb 2025 08:55:46 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 9EB01137DB; Thu, 13 Feb 2025 08:55:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id G3hNJZKzrWcJdgAAD6G6ig (envelope-from ); Thu, 13 Feb 2025 08:55:46 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 518d49b8-e9e8-11ef-b3ef-695165c68f79 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1739436947; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5roWTjW+RnJz72UMkoQG0UFH3GgE3qHAOCHBzpPQNeA=; b=MQyTHHh5vUOEGZpbG10NkPrZFDPNgDK9dtuo4Tva8KNikzMhp8LfojnVGJ0RYI/qE4HiVf 8Vz2UOz9xkbgG2AQIBE7WOkxcQ72JO+LQB4izZbu+g7H4OY36bN0Ve8w/MQuWwWRHOgE8t F8Qq/p2PCtw8MjiL5DF5nEXYLOxgp6c= Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1739436946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5roWTjW+RnJz72UMkoQG0UFH3GgE3qHAOCHBzpPQNeA=; b=CCQRalYxu3fSyIQ1BlZYGQvIsj3ZC/bYV+G30yTRsLllQEMMLXJS4otXsEdjzCwccghlWq K1DYdyPg4m2xIUykGpcgXIjSju4/BJS5IEraQvxOa5Fy0Agdjsjfnwjs2SqwgbaRimmBbV 6m87dlOGUhQrBeIRBKiDJjugPIKKK98= From: Juergen Gross To: linux-kernel@vger.kernel.org, iommu@lists.linux.dev Cc: Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , xen-devel@lists.xenproject.org, Jan Vejvalka Subject: [PATCH v2 1/2] xen/swiotlb: relax alignment requirements Date: Thu, 13 Feb 2025 09:55:37 +0100 Message-ID: <20250213085538.17060-2-jgross@suse.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250213085538.17060-1-jgross@suse.com> References: <20250213085538.17060-1-jgross@suse.com> MIME-Version: 1.0 X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_SEVEN(0.00)[7]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo]; RCVD_TLS_ALL(0.00)[] When mapping a buffer for DMA via .map_page or .map_sg DMA operations, there is no need to check the machine frames to be aligned according to the mapped areas size. All what is needed in these cases is that the buffer is contiguous at machine level. So carve out the alignment check from range_straddles_page_boundary() and move it to a helper called by xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() directly. Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers") Reported-by: Jan Vejvalka Tested-by: Jan Vejvalka Signed-off-by: Juergen Gross Reviewed-by: Stefano Stabellini --- drivers/xen/swiotlb-xen.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index a337edcf8faf..26c62e0d34e9 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -74,19 +74,21 @@ static inline phys_addr_t xen_dma_to_phys(struct device *dev, return xen_bus_to_phys(dev, dma_to_phys(dev, dma_addr)); } +static inline bool range_requires_alignment(phys_addr_t p, size_t size) +{ + phys_addr_t algn = 1ULL << (get_order(size) + PAGE_SHIFT); + phys_addr_t bus_addr = pfn_to_bfn(XEN_PFN_DOWN(p)) << XEN_PAGE_SHIFT; + + return IS_ALIGNED(p, algn) && !IS_ALIGNED(bus_addr, algn); +} + static inline int range_straddles_page_boundary(phys_addr_t p, size_t size) { unsigned long next_bfn, xen_pfn = XEN_PFN_DOWN(p); unsigned int i, nr_pages = XEN_PFN_UP(xen_offset_in_page(p) + size); - phys_addr_t algn = 1ULL << (get_order(size) + PAGE_SHIFT); next_bfn = pfn_to_bfn(xen_pfn); - /* If buffer is physically aligned, ensure DMA alignment. */ - if (IS_ALIGNED(p, algn) && - !IS_ALIGNED((phys_addr_t)next_bfn << XEN_PAGE_SHIFT, algn)) - return 1; - for (i = 1; i < nr_pages; i++) if (pfn_to_bfn(++xen_pfn) != ++next_bfn) return 1; @@ -156,7 +158,8 @@ xen_swiotlb_alloc_coherent(struct device *dev, size_t size, *dma_handle = xen_phys_to_dma(dev, phys); if (*dma_handle + size - 1 > dma_mask || - range_straddles_page_boundary(phys, size)) { + range_straddles_page_boundary(phys, size) || + range_requires_alignment(phys, size)) { if (xen_create_contiguous_region(phys, order, fls64(dma_mask), dma_handle) != 0) goto out_free_pages; @@ -182,7 +185,8 @@ xen_swiotlb_free_coherent(struct device *dev, size_t size, void *vaddr, size = ALIGN(size, XEN_PAGE_SIZE); if (WARN_ON_ONCE(dma_handle + size - 1 > dev->coherent_dma_mask) || - WARN_ON_ONCE(range_straddles_page_boundary(phys, size))) + WARN_ON_ONCE(range_straddles_page_boundary(phys, size) || + range_requires_alignment(phys, size))) return; if (TestClearPageXenRemapped(virt_to_page(vaddr))) From patchwork Thu Feb 13 08:55:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Gross X-Patchwork-Id: 13972956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6CD1CC0219D for ; Thu, 13 Feb 2025 08:56:05 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.887232.1296779 (Exim 4.92) (envelope-from ) id 1tiV0m-0000Nk-IS; Thu, 13 Feb 2025 08:55:56 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 887232.1296779; Thu, 13 Feb 2025 08:55:56 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tiV0m-0000Nd-Dd; Thu, 13 Feb 2025 08:55:56 +0000 Received: by outflank-mailman (input) for mailman id 887232; Thu, 13 Feb 2025 08:55:54 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tiV0k-0008Vs-Lu for xen-devel@lists.xenproject.org; Thu, 13 Feb 2025 08:55:54 +0000 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 54634b49-e9e8-11ef-b3ef-695165c68f79; Thu, 13 Feb 2025 09:55:53 +0100 (CET) Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9D522229A7; Thu, 13 Feb 2025 08:55:52 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 49F83137DB; Thu, 13 Feb 2025 08:55:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id M4uOEJizrWcTdgAAD6G6ig (envelope-from ); Thu, 13 Feb 2025 08:55:52 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 54634b49-e9e8-11ef-b3ef-695165c68f79 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1739436952; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RIJbT7p+RWBm8N1M1rf2T5LjDA7WDiFc9SjxPhO+fQ0=; b=iAFR59lbL84z0ot4Z/yIZP3ZboFOospn171yHp7M1F2cUBYOPkUaVKKceH4Ihxo77o4UbP WeHu04klIY8m25kkbE75/5C9NPY7I0z1cFzE2HtM5zQ4IMowd0yl+nArOAbsvwLy7L0Rvj /mlyH84br7HmAF/poky0Y9fAsR8cKec= Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=iAFR59lb DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1739436952; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RIJbT7p+RWBm8N1M1rf2T5LjDA7WDiFc9SjxPhO+fQ0=; b=iAFR59lbL84z0ot4Z/yIZP3ZboFOospn171yHp7M1F2cUBYOPkUaVKKceH4Ihxo77o4UbP WeHu04klIY8m25kkbE75/5C9NPY7I0z1cFzE2HtM5zQ4IMowd0yl+nArOAbsvwLy7L0Rvj /mlyH84br7HmAF/poky0Y9fAsR8cKec= From: Juergen Gross To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , xen-devel@lists.xenproject.org, Alan Robinson Subject: [PATCH v2 2/2] x86/xen: allow larger contiguous memory regions in PV guests Date: Thu, 13 Feb 2025 09:55:38 +0100 Message-ID: <20250213085538.17060-3-jgross@suse.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250213085538.17060-1-jgross@suse.com> References: <20250213085538.17060-1-jgross@suse.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9D522229A7 X-Rspamd-Action: no action X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:rdns,imap1.dmz-prg2.suse.org:helo]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; RCPT_COUNT_SEVEN(0.00)[11]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; TO_DN_SOME(0.00)[]; DKIM_TRACE(0.00)[suse.com:+] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org Today a PV guest (including dom0) can create 2MB contiguous memory regions for DMA buffers at max. This has led to problems at least with the megaraid_sas driver, which wants to allocate a 2.3MB DMA buffer. The limiting factor is the frame array used to do the hypercall for making the memory contiguous, which has 512 entries and is just a static array in mmu_pv.c. In order to not waste memory for non-PV guests, put the initial frame array into .init.data section and dynamically allocate an array from the .init_after_bootmem hook of PV guests. In case a contiguous memory area larger than the initially supported 2MB is requested, allocate a larger buffer for the frame list. Note that such an allocation is tried only after memory management has been initialized properly, which is tested via a flag being set in the .init_after_bootmem hook. Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers") Signed-off-by: Juergen Gross Tested-by: Alan Robinson Reviewed-by: Jan Beulich --- Note that the "Fixes:" tag is not really correct, as that patch didn't introduce the problem, but rather made it visible. OTOH it is the best indicator we have to identify kernel versions this patch should be backported to. --- arch/x86/xen/mmu_pv.c | 71 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 62 insertions(+), 9 deletions(-) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2c70cd35e72c..d078de2c952b 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -111,6 +111,51 @@ static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss; */ static DEFINE_SPINLOCK(xen_reservation_lock); +/* Protected by xen_reservation_lock. */ +#define MIN_CONTIG_ORDER 9 /* 2MB */ +static unsigned int discontig_frames_order = MIN_CONTIG_ORDER; +static unsigned long discontig_frames_early[1UL << MIN_CONTIG_ORDER] __initdata; +static unsigned long *discontig_frames __refdata = discontig_frames_early; +static bool discontig_frames_dyn; + +static int alloc_discontig_frames(unsigned int order) +{ + unsigned long *new_array, *old_array; + unsigned int old_order; + unsigned long flags; + + BUG_ON(order < MIN_CONTIG_ORDER); + BUILD_BUG_ON(sizeof(discontig_frames_early) != PAGE_SIZE); + + new_array = (unsigned long *)__get_free_pages(GFP_KERNEL, + order - MIN_CONTIG_ORDER); + if (!new_array) + return -ENOMEM; + + spin_lock_irqsave(&xen_reservation_lock, flags); + + old_order = discontig_frames_order; + + if (order > discontig_frames_order || !discontig_frames_dyn) { + if (!discontig_frames_dyn) + old_array = NULL; + else + old_array = discontig_frames; + + discontig_frames = new_array; + discontig_frames_order = order; + discontig_frames_dyn = true; + } else { + old_array = new_array; + } + + spin_unlock_irqrestore(&xen_reservation_lock, flags); + + free_pages((unsigned long)old_array, old_order - MIN_CONTIG_ORDER); + + return 0; +} + /* * Note about cr3 (pagetable base) values: * @@ -814,6 +859,9 @@ static void __init xen_after_bootmem(void) SetPagePinned(virt_to_page(level3_user_vsyscall)); #endif xen_pgd_walk(&init_mm, xen_mark_pinned, FIXADDR_TOP); + + if (alloc_discontig_frames(MIN_CONTIG_ORDER)) + BUG(); } static void xen_unpin_page(struct mm_struct *mm, struct page *page, @@ -2203,10 +2251,6 @@ void __init xen_init_mmu_ops(void) memset(dummy_mapping, 0xff, PAGE_SIZE); } -/* Protected by xen_reservation_lock. */ -#define MAX_CONTIG_ORDER 9 /* 2MB */ -static unsigned long discontig_frames[1< MAX_CONTIG_ORDER)) - return -ENOMEM; + if (unlikely(order > discontig_frames_order)) { + if (!discontig_frames_dyn) + return -ENOMEM; + + if (alloc_discontig_frames(order)) + return -ENOMEM; + } memset((void *) vstart, 0, PAGE_SIZE << order); spin_lock_irqsave(&xen_reservation_lock, flags); + in_frames = discontig_frames; + /* 1. Zap current PTEs, remembering MFNs. */ xen_zap_pfn_range(vstart, order, in_frames, NULL); @@ -2358,12 +2409,12 @@ int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order) { - unsigned long *out_frames = discontig_frames, in_frame; + unsigned long *out_frames, in_frame; unsigned long flags; int success; unsigned long vstart; - if (unlikely(order > MAX_CONTIG_ORDER)) + if (unlikely(order > discontig_frames_order)) return; vstart = (unsigned long)phys_to_virt(pstart); @@ -2371,6 +2422,8 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order) spin_lock_irqsave(&xen_reservation_lock, flags); + out_frames = discontig_frames; + /* 1. Find start MFN of contiguous extent. */ in_frame = virt_to_mfn((void *)vstart);