Message ID | 1523394001-4615-1-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/04/18 21:59, Sinan Kaya wrote: > Code is expecing to observe the same number of buffers returned from > dma_map_sg() function compared to sg_alloc_table_from_pages(). This > doesn't hold true universally especially for systems with IOMMU. So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse. > IOMMU driver tries to combine buffers into a single DMA address as much > as it can. The right thing is to tell the DMA layer how much combining > IOMMU can do. Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time). As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;) Robin. > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > --- > drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +- > drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + > drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + > 4 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > index 8e28270..1b031eb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c > @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle) > pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); > dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n"); > } > - > + dma_set_max_seg_size(adev->dev, PAGE_SIZE); > r = gmc_v6_0_init_microcode(adev); > if (r) { > dev_err(adev->dev, "Failed to load mc firmware!\n"); > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c > index 86e9d682..0a4b2cc1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c > @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle) > pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); > pr_warn("amdgpu: No coherent DMA available\n"); > } > + dma_set_max_seg_size(adev->dev, PAGE_SIZE); > > r = gmc_v7_0_init_microcode(adev); > if (r) { > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c > index 9a813d8..b171529 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c > @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle) > pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); > pr_warn("amdgpu: No coherent DMA available\n"); > } > + dma_set_max_seg_size(adev->dev, PAGE_SIZE); > > r = gmc_v8_0_init_microcode(adev); > if (r) { > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index 3b7e7af..36e658ab 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle) > pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); > printk(KERN_WARNING "amdgpu: No coherent DMA available.\n"); > } > + dma_set_max_seg_size(adev->dev, PAGE_SIZE); > > r = gmc_v9_0_mc_init(adev); > if (r) >
On 4/11/2018 8:03 AM, Robin Murphy wrote: > On 10/04/18 21:59, Sinan Kaya wrote: >> Code is expecing to observe the same number of buffers returned from >> dma_map_sg() function compared to sg_alloc_table_from_pages(). This >> doesn't hold true universally especially for systems with IOMMU. > > So why not fix said code? It's clearly not a real hardware limitation, and the map_sg() APIs have potentially returned fewer than nents since forever, so there's really no excuse. Sure, I'll take a better fix if there is one. > >> IOMMU driver tries to combine buffers into a single DMA address as much >> as it can. The right thing is to tell the DMA layer how much combining >> IOMMU can do. > > Disagree; this is a dodgy hack, since you'll now end up passing scatterlists into dma_map_sg() which already violate max_seg_size to begin with, and I think a conscientious DMA API implementation would be at rights to fail the mapping for that reason (I know arm64 happens not to, but that was a deliberate design decision to make my life easier at the time). > > As a short-term fix, at least do something like what i915 does and constrain the table allocation to the desired segment size as well, so things remain self-consistent. But still never claim that faking a hardware constraint as a workaround for a driver shortcoming is "the right thing to do" ;) You are asking for something like this from here, right? https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58 ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); if (ret) goto err_free; src = obj->mm.pages->sgl; dst = st->sgl; for (i = 0; i < obj->mm.pages->nents; i++) { sg_set_page(dst, sg_page(src), src->length, 0); dst = sg_next(dst); src = sg_next(src); } This seems to allocate the scatter gather list and fill it in manually before passing it to dma_map_sg(). I'll give it a try. Just double checking. > > Robin. > >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
On 11/04/18 15:33, Sinan Kaya wrote: > On 4/11/2018 8:03 AM, Robin Murphy wrote: >> On 10/04/18 21:59, Sinan Kaya wrote: >>> Code is expecing to observe the same number of buffers returned >>> from dma_map_sg() function compared to >>> sg_alloc_table_from_pages(). This doesn't hold true universally >>> especially for systems with IOMMU. >> >> So why not fix said code? It's clearly not a real hardware >> limitation, and the map_sg() APIs have potentially returned fewer >> than nents since forever, so there's really no excuse. > > Sure, I'll take a better fix if there is one. > >> >>> IOMMU driver tries to combine buffers into a single DMA address >>> as much as it can. The right thing is to tell the DMA layer how >>> much combining IOMMU can do. >> >> Disagree; this is a dodgy hack, since you'll now end up passing >> scatterlists into dma_map_sg() which already violate max_seg_size >> to begin with, and I think a conscientious DMA API implementation >> would be at rights to fail the mapping for that reason (I know >> arm64 happens not to, but that was a deliberate design decision to >> make my life easier at the time). >> >> As a short-term fix, at least do something like what i915 does and >> constrain the table allocation to the desired segment size as well, >> so things remain self-consistent. But still never claim that faking >> a hardware constraint as a workaround for a driver shortcoming is >> "the right thing to do" ;) > > You are asking for something like this from here, right? > > https://elixir.bootlin.com/linux/v4.16.1/source/drivers/gpu/drm/i915/i915_gem_dmabuf.c#L58 I just looked for callers of __sg_alloc_table_from_pages() with a limit other than SCATTERLIST_MAX_SEGMENT, and found __i915_gem_userptr_alloc_pages(), which at first glance appears to be doing something vaguely similar to amdgpu_ttm_tt_pin_userptr(). Robin.
On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote: > On 10/04/18 21:59, Sinan Kaya wrote: >> Code is expecing to observe the same number of buffers returned from >> dma_map_sg() function compared to sg_alloc_table_from_pages(). This >> doesn't hold true universally especially for systems with IOMMU. > > So why not fix said code? It's clearly not a real hardware limitation, and > the map_sg() APIs have potentially returned fewer than nents since forever, > so there's really no excuse. Yes, relying on dma_map_sg returning the same number of entries as passed it is completely bogus. >> IOMMU driver tries to combine buffers into a single DMA address as much >> as it can. The right thing is to tell the DMA layer how much combining >> IOMMU can do. > > Disagree; this is a dodgy hack, since you'll now end up passing > scatterlists into dma_map_sg() which already violate max_seg_size to begin > with, and I think a conscientious DMA API implementation would be at rights > to fail the mapping for that reason (I know arm64 happens not to, but that > was a deliberate design decision to make my life easier at the time). Agreed.
Am 12.04.2018 um 08:26 schrieb Christoph Hellwig: > On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote: >> On 10/04/18 21:59, Sinan Kaya wrote: >>> Code is expecing to observe the same number of buffers returned from >>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This >>> doesn't hold true universally especially for systems with IOMMU. >> So why not fix said code? It's clearly not a real hardware limitation, and >> the map_sg() APIs have potentially returned fewer than nents since forever, >> so there's really no excuse. > Yes, relying on dma_map_sg returning the same number of entries as passed > it is completely bogus. I agree that the common DRM functions should be able to deal with both, but we should let the driver side decide if it wants multiple addresses combined or not. > >>> IOMMU driver tries to combine buffers into a single DMA address as much >>> as it can. The right thing is to tell the DMA layer how much combining >>> IOMMU can do. >> Disagree; this is a dodgy hack, since you'll now end up passing >> scatterlists into dma_map_sg() which already violate max_seg_size to begin >> with, and I think a conscientious DMA API implementation would be at rights >> to fail the mapping for that reason (I know arm64 happens not to, but that >> was a deliberate design decision to make my life easier at the time). > Agreed. Sounds like my understanding of max_seg_size is incorrect, what exactly should that describe? Thanks, Christian.
On 12/04/18 10:42, Christian König wrote: > Am 12.04.2018 um 08:26 schrieb Christoph Hellwig: >> On Wed, Apr 11, 2018 at 01:03:59PM +0100, Robin Murphy wrote: >>> On 10/04/18 21:59, Sinan Kaya wrote: >>>> Code is expecing to observe the same number of buffers returned from >>>> dma_map_sg() function compared to sg_alloc_table_from_pages(). This >>>> doesn't hold true universally especially for systems with IOMMU. >>> So why not fix said code? It's clearly not a real hardware >>> limitation, and >>> the map_sg() APIs have potentially returned fewer than nents since >>> forever, >>> so there's really no excuse. >> Yes, relying on dma_map_sg returning the same number of entries as passed >> it is completely bogus. > > I agree that the common DRM functions should be able to deal with both, > but we should let the driver side decide if it wants multiple addresses > combined or not. > >> >>>> IOMMU driver tries to combine buffers into a single DMA address as much >>>> as it can. The right thing is to tell the DMA layer how much combining >>>> IOMMU can do. >>> Disagree; this is a dodgy hack, since you'll now end up passing >>> scatterlists into dma_map_sg() which already violate max_seg_size to >>> begin >>> with, and I think a conscientious DMA API implementation would be at >>> rights >>> to fail the mapping for that reason (I know arm64 happens not to, but >>> that >>> was a deliberate design decision to make my life easier at the time). >> Agreed. > > Sounds like my understanding of max_seg_size is incorrect, what exactly > should that describe? The segment size and boundary mask are there to represent a device's hardware scatter-gather capabilities - a lot of things have limits on the size and alignment of the data pointed to by a single descriptor (e.g. an xHCI TRB, where the data buffer must not cross a 64KB boundary) - although they can also be relevant to non-scatter-gather devices if they just have limits on how big/aligned a single DMA transfer can be. Robin.
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index 8e28270..1b031eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -851,7 +851,7 @@ static int gmc_v6_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); dev_warn(adev->dev, "amdgpu: No coherent DMA available.\n"); } - + dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v6_0_init_microcode(adev); if (r) { dev_err(adev->dev, "Failed to load mc firmware!\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index 86e9d682..0a4b2cc1 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -999,6 +999,7 @@ static int gmc_v7_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v7_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 9a813d8..b171529 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -1096,6 +1096,7 @@ static int gmc_v8_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); pr_warn("amdgpu: No coherent DMA available\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v8_0_init_microcode(adev); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 3b7e7af..36e658ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -855,6 +855,7 @@ static int gmc_v9_0_sw_init(void *handle) pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32)); printk(KERN_WARNING "amdgpu: No coherent DMA available.\n"); } + dma_set_max_seg_size(adev->dev, PAGE_SIZE); r = gmc_v9_0_mc_init(adev); if (r)
Code is expecing to observe the same number of buffers returned from dma_map_sg() function compared to sg_alloc_table_from_pages(). This doesn't hold true universally especially for systems with IOMMU. IOMMU driver tries to combine buffers into a single DMA address as much as it can. The right thing is to tell the DMA layer how much combining IOMMU can do. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)