Message ID | 1550266182-4061-1-git-send-email-michael.d.labriola@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] drm: add func to better detect wether swiotlb is needed | expand |
On Sat, Feb 16, 2019 at 2:00 PM Koenig, Christian <Christian.Koenig@amd.com> wrote: > > Am 15.02.19 um 22:29 schrieb Michael D Labriola: > > This commit fixes DRM failures on Xen PV systems that were introduced in > > v4.17 by the following commits: > > > > 82626363 drm: add func to get max iomem address v2 > > fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2 > > 1bc3d3cc drm/radeon: only enable swiotlb path when need v2 > > > > The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals > > in the radeon and amdgpu device drivers causes Gnome to immediately crash > > on Xen PV systems, returning the user to the login screen. The following > > kernel errors get logged: > > > > [ 28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed > > [ 31.219821] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > [ 31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14) > > [ 31.226109] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > [ 31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14) > > [ 31.300734] gnome-shell[1935]: segfault at 88 ip 00007f39151cd904 sp 00007ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000] > > [ 31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0 > > [ 38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed > > [ 40.009317] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > [ 40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14) > > [ 40.015114] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes) > > [ 40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14) > > [ 40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 0000000002dadf40 sp 00007ffcd24ea5f8 error 15 > > [ 40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00 > > > > This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a > > xen_pv_domain() check to it, and moves the bit shifting comparison that > > always follows its usage into the function (simplifying the drm driver > > code). > > You signed of by line is missing. Apart from that it looks like a good > fix to me. Oops, I can resend w/ signoff. > But the question is why does Xen needs dma_alloc_coherent() ? I'm not sure I understand, even after discussing on xen-devel. What I was told by Juergen (on CC) is: "The thing which is different between Xen PV guests and most others (all others(?), now that Lguest and UML have been dropped) is that what Linux thinks of as PFN $N isn't necessarily adjacent to PFN $N+1 in system physical address space. Therefore, code which has a buffer spanning a page boundary can't just convert a pointer to the buffer into a physical address, and hand that address to a device. You generally end up with either memory corruption (DMA hitting the wrong page allocated to the guest), or an IOMMU fault (DMA hitting a pages which isn't allocated to the guest)." And Christoph (also on CC) pointed out: "ttm_populate_and_map_pages uses dma_map_page to map GPU memory, and from what I can tell from your report potentially lots of it. So this does "properly" use the DMA API for some amount of "properly". The problem here is that ttm_populate_and_map_pages first allocates memory and then maps it in a way where it bounce buffers a lot, leading to a swiotlb buffer exaustion, as seen in your report. ttm_dma_populate also sort of "properly" uses the DMA API in that it uses the dma_alloc_coherent allocator. The benefit of that allocator is that is always returns addressable memory without exhausing the swiotlb buffer. The dowside of ttm_dma_populate / dma_alloc_coherent is that on architectures where PCIe is not cache coherent it pointlessly up other resources, as coherent DMA memory can be a very finite resource there. So for a short term fix forcing the dma_alloc_coherent route on Xen/x86 is the right thing. On Xen/arm and Xen/arm64 is might already be problemeatic due to the explanation above unfortunately. The real fix is to finally get broadly available non-coherent device memory allocator into mainlaine and with ttm_populate_and_map_pages to use it. Note that for non-coherent devices it seems like ttm_populate_and_map_pages also seems to miss proper ownership management but that is another issue. My proposal for such an allocator here: https://lwn.net/Articles/774429/" > > Using dma_alloc_coherent() is really just a fallback path and we need to > disable a bunch of userspace features when going down that route. So, for the short-term fix, I'll resubmit my patch w/ appropriate sign-off. I'm all for making it work even better on Xen after this band-aid is applied. I can help test out proposed patches on my systems if that helps. Thanks! -Mike -- Michael D Labriola 21 Rip Van Winkle Cir Warwick, RI 02886 401-316-9844 (cell) 401-848-8871 (work) 401-234-1306 (home)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 910c4ce..6bc0266 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); + adev->need_swiotlb = drm_need_swiotlb(dma_bits); r = gmc_v7_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 747c068..8638adf 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); + adev->need_swiotlb = drm_need_swiotlb(dma_bits); r = gmc_v8_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index f35d7a5..4f67093 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -989,7 +989,7 @@ static int gmc_v9_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); printk(KERN_WARNING "amdgpu: No coherent DMA available.\n"); } - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); + adev->need_swiotlb = drm_need_swiotlb(dma_bits); if (adev->asic_type == CHIP_VEGA20) { r = gfxhub_v1_1_get_xgmi_info(adev); diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c index d69e4fc..6af59a6 100644 --- a/drivers/gpu/drm/drm_memory.c +++ b/drivers/gpu/drm/drm_memory.c @@ -35,6 +35,7 @@ #include <linux/highmem.h> #include <linux/export.h> +#include <xen/xen.h> #include <drm/drmP.h> #include "drm_legacy.h" @@ -150,15 +151,27 @@ void drm_legacy_ioremapfree(struct drm_local_map *map, struct drm_device *dev) } EXPORT_SYMBOL(drm_legacy_ioremapfree); -u64 drm_get_max_iomem(void) +bool drm_need_swiotlb(int dma_bits) { struct resource *tmp; resource_size_t max_iomem = 0; + /* + * Xen paravirtual hosts require swiotlb regardless of requested dma + * transfer size. + * + * NOTE: Really, what it requires is use of the dma_alloc_coherent + * allocator used in ttm_dma_populate() instead of + * ttm_populate_and_map_pages(), which bounce buffers so much in + * Xen it leads to swiotlb buffer exhaustion. + */ + if (xen_pv_domain()) + return true; + for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) { max_iomem = max(max_iomem, tmp->end); } - return max_iomem; + return max_iomem > ((u64)1 << dma_bits); } -EXPORT_SYMBOL(drm_get_max_iomem); +EXPORT_SYMBOL(drm_need_swiotlb); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 59c8a66..a1d3c62 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1387,7 +1387,7 @@ int radeon_device_init(struct radeon_device *rdev, pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32)); pr_warn("radeon: No coherent DMA available\n"); } - rdev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits); + rdev->need_swiotlb = drm_need_swiotlb(dma_bits); /* Registers mapping */ /* TODO: block userspace mapping of io register */ diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index bfe1639..633eaaf 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -38,7 +38,7 @@ void drm_clflush_pages(struct page *pages[], unsigned long num_pages); void drm_clflush_sg(struct sg_table *st); void drm_clflush_virt_range(void *addr, unsigned long length); -u64 drm_get_max_iomem(void); +bool drm_need_swiotlb(int dma_bits); static inline bool drm_arch_can_wc_memory(void)