diff mbox series

drm/radeon: Fix screen corruption (v2)

Message ID 20221212020821.8248-1-luben.tuikov@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/radeon: Fix screen corruption (v2) | expand

Commit Message

Luben Tuikov Dec. 12, 2022, 2:08 a.m. UTC
Fix screen corruption on older 32-bit systems using AGP chips.

On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
the device's DMA mask is 0xFFFFFFFF, but the memory mask is 0x7FFFFFF, and
subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
false. As such the result of this static inline isn't suitable for the last
argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
when allocating DMA buffers.

Partially reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

v2: Amend the commit description.

Cc: Mikhail Krylov <sqarert@gmail.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Fixes: 33b3ad3788aba8 ("drm/radeon: handle PCIe root ports with addressing limitations")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        | 1 +
 drivers/gpu/drm/radeon/radeon_device.c | 2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c    | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)


base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f

Comments

Robin Murphy Dec. 14, 2022, 9:53 p.m. UTC | #1
On 2022-12-12 02:08, Luben Tuikov wrote:
> Fix screen corruption on older 32-bit systems using AGP chips.
> 
> On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
> the device's DMA mask is 0xFFFFFFFF, but the memory mask is 0x7FFFFFF, and
> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
> false. As such the result of this static inline isn't suitable for the last
> argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
> when allocating DMA buffers.

This sounds wrong to me. If the issues happen on systems without PAE it 
clearly can't have anything to with the actual DMA address size. Not to 
mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so 
GFP_DMA32 would be functionally meaningless anyway. Although the 
reported symptoms initially sounded like they could be caused by DMA 
going to the wrong place, that is also equally consistent with a loss of 
cache coherency.

My (limited) understanding of AGP is that the GART can effectively alias 
memory to a second physical address, so I could well believe that 
something somewhere in the driver stack needs to perform some cache 
maintenance to avoid coherency issues, and that in these particular 
setups whatever that is might be assuming the memory is direct-mapped 
and thus going wrong for highmem pages.

So as I said before, I really think this is not about using GFP_DMA32 at 
all, but about *not* using GFP_HIGHUSER.

Thanks,
Robin.

> Partially reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
> 
> v2: Amend the commit description.
> 
> Cc: Mikhail Krylov <sqarert@gmail.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
> Fixes: 33b3ad3788aba8 ("drm/radeon: handle PCIe root ports with addressing limitations")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
>   drivers/gpu/drm/radeon/radeon.h        | 1 +
>   drivers/gpu/drm/radeon/radeon_device.c | 2 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c    | 2 +-
>   3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 37dec92339b16a..4fe38fd9be3267 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2426,6 +2426,7 @@ struct radeon_device {
>   	struct radeon_wb		wb;
>   	struct radeon_dummy_page	dummy_page;
>   	bool				shutdown;
> +	bool                            need_dma32;
>   	bool				need_swiotlb;
>   	bool				accel_working;
>   	bool				fastfb_working; /* IGP feature*/
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 6344454a772172..3643a3cfe061bd 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1370,7 +1370,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	if (rdev->family == CHIP_CEDAR)
>   		dma_bits = 32;
>   #endif
> -
> +	rdev->need_dma32 = dma_bits == 32;
>   	r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits));
>   	if (r) {
>   		pr_warn("radeon: No suitable DMA available\n");
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index bdb4c0e0736ba2..3debaeb720d173 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -696,7 +696,7 @@ int radeon_ttm_init(struct radeon_device *rdev)
>   			       rdev->ddev->anon_inode->i_mapping,
>   			       rdev->ddev->vma_offset_manager,
>   			       rdev->need_swiotlb,
> -			       dma_addressing_limited(&rdev->pdev->dev));
> +			       rdev->need_dma32);
>   	if (r) {
>   		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
>   		return r;
> 
> base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f
Alex Deucher Dec. 14, 2022, 10:02 p.m. UTC | #2
On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2022-12-12 02:08, Luben Tuikov wrote:
> > Fix screen corruption on older 32-bit systems using AGP chips.
> >
> > On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
> > the device's DMA mask is 0xFFFFFFFF, but the memory mask is 0x7FFFFFF, and
> > subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
> > false. As such the result of this static inline isn't suitable for the last
> > argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
> > when allocating DMA buffers.
>
> This sounds wrong to me. If the issues happen on systems without PAE it
> clearly can't have anything to with the actual DMA address size. Not to
> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
> GFP_DMA32 would be functionally meaningless anyway. Although the
> reported symptoms initially sounded like they could be caused by DMA
> going to the wrong place, that is also equally consistent with a loss of
> cache coherency.
>
> My (limited) understanding of AGP is that the GART can effectively alias
> memory to a second physical address, so I could well believe that
> something somewhere in the driver stack needs to perform some cache
> maintenance to avoid coherency issues, and that in these particular
> setups whatever that is might be assuming the memory is direct-mapped
> and thus going wrong for highmem pages.
>
> So as I said before, I really think this is not about using GFP_DMA32 at
> all, but about *not* using GFP_HIGHUSER.

One of the wonderful features of AGP is that it has to be used with
uncached memory.  The aperture basically just provides a remapping of
physical pages into a linear aperture that you point the GPU at.  TTM
has to jump through quite a few hoops to get uncached memory in the
first place, so it's likely that that somehow isn't compatible with
HIGHMEM.  Can you get uncached HIGHMEM?

Alex

>
> Thanks,
> Robin.
>
> > Partially reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
> >
> > v2: Amend the commit description.
> >
> > Cc: Mikhail Krylov <sqarert@gmail.com>
> > Cc: Alex Deucher <Alexander.Deucher@amd.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
> > Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
> > Fixes: 33b3ad3788aba8 ("drm/radeon: handle PCIe root ports with addressing limitations")
> > Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> > ---
> >   drivers/gpu/drm/radeon/radeon.h        | 1 +
> >   drivers/gpu/drm/radeon/radeon_device.c | 2 +-
> >   drivers/gpu/drm/radeon/radeon_ttm.c    | 2 +-
> >   3 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> > index 37dec92339b16a..4fe38fd9be3267 100644
> > --- a/drivers/gpu/drm/radeon/radeon.h
> > +++ b/drivers/gpu/drm/radeon/radeon.h
> > @@ -2426,6 +2426,7 @@ struct radeon_device {
> >       struct radeon_wb                wb;
> >       struct radeon_dummy_page        dummy_page;
> >       bool                            shutdown;
> > +     bool                            need_dma32;
> >       bool                            need_swiotlb;
> >       bool                            accel_working;
> >       bool                            fastfb_working; /* IGP feature*/
> > diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> > index 6344454a772172..3643a3cfe061bd 100644
> > --- a/drivers/gpu/drm/radeon/radeon_device.c
> > +++ b/drivers/gpu/drm/radeon/radeon_device.c
> > @@ -1370,7 +1370,7 @@ int radeon_device_init(struct radeon_device *rdev,
> >       if (rdev->family == CHIP_CEDAR)
> >               dma_bits = 32;
> >   #endif
> > -
> > +     rdev->need_dma32 = dma_bits == 32;
> >       r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits));
> >       if (r) {
> >               pr_warn("radeon: No suitable DMA available\n");
> > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> > index bdb4c0e0736ba2..3debaeb720d173 100644
> > --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> > @@ -696,7 +696,7 @@ int radeon_ttm_init(struct radeon_device *rdev)
> >                              rdev->ddev->anon_inode->i_mapping,
> >                              rdev->ddev->vma_offset_manager,
> >                              rdev->need_swiotlb,
> > -                            dma_addressing_limited(&rdev->pdev->dev));
> > +                            rdev->need_dma32);
> >       if (r) {
> >               DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
> >               return r;
> >
> > base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f
Robin Murphy Dec. 14, 2022, 11:08 p.m. UTC | #3
On 2022-12-14 22:02, Alex Deucher wrote:
> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>
>>> On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is 0x7FFFFFF, and
>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>> false. As such the result of this static inline isn't suitable for the last
>>> argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
>>> when allocating DMA buffers.
>>
>> This sounds wrong to me. If the issues happen on systems without PAE it
>> clearly can't have anything to with the actual DMA address size. Not to
>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>> GFP_DMA32 would be functionally meaningless anyway. Although the
>> reported symptoms initially sounded like they could be caused by DMA
>> going to the wrong place, that is also equally consistent with a loss of
>> cache coherency.
>>
>> My (limited) understanding of AGP is that the GART can effectively alias
>> memory to a second physical address, so I could well believe that
>> something somewhere in the driver stack needs to perform some cache
>> maintenance to avoid coherency issues, and that in these particular
>> setups whatever that is might be assuming the memory is direct-mapped
>> and thus going wrong for highmem pages.
>>
>> So as I said before, I really think this is not about using GFP_DMA32 at
>> all, but about *not* using GFP_HIGHUSER.
> 
> One of the wonderful features of AGP is that it has to be used with
> uncached memory.  The aperture basically just provides a remapping of
> physical pages into a linear aperture that you point the GPU at.  TTM
> has to jump through quite a few hoops to get uncached memory in the
> first place, so it's likely that that somehow isn't compatible with
> HIGHMEM.  Can you get uncached HIGHMEM?

I guess in principle yes, if you're careful not to use regular 
kmap()/kmap_atomic(), and always use pgprot_noncached() for 
userspace/vmalloc mappings, but clearly that leaves lots of scope for 
slipping up.

Working backwards from primitives like set_memory_uc(), I see various 
paths in TTM where manipulating the caching state is skipped for highmem 
pages, but I wouldn't even know where to start looking for whether the 
right state is propagated to all the places where they might eventually 
be mapped somewhere.

Cheers,
Robin.
Christian König Dec. 15, 2022, 8:07 a.m. UTC | #4
Am 15.12.22 um 00:08 schrieb Robin Murphy:
> On 2022-12-14 22:02, Alex Deucher wrote:
>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com> 
>> wrote:
>>>
>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>
>>>> On older systems with little memory, for instance 1.5 GiB, using an 
>>>> AGP chip,
>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is 
>>>> 0x7FFFFFF, and
>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>> false. As such the result of this static inline isn't suitable for 
>>>> the last
>>>> argument to ttm_device_init()--it simply needs to now whether to 
>>>> use GFP_DMA32
>>>> when allocating DMA buffers.
>>>
>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>> clearly can't have anything to with the actual DMA address size. Not to
>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>> reported symptoms initially sounded like they could be caused by DMA
>>> going to the wrong place, that is also equally consistent with a 
>>> loss of
>>> cache coherency.
>>>
>>> My (limited) understanding of AGP is that the GART can effectively 
>>> alias
>>> memory to a second physical address, so I could well believe that
>>> something somewhere in the driver stack needs to perform some cache
>>> maintenance to avoid coherency issues, and that in these particular
>>> setups whatever that is might be assuming the memory is direct-mapped
>>> and thus going wrong for highmem pages.
>>>
>>> So as I said before, I really think this is not about using 
>>> GFP_DMA32 at
>>> all, but about *not* using GFP_HIGHUSER.
>>
>> One of the wonderful features of AGP is that it has to be used with
>> uncached memory.  The aperture basically just provides a remapping of
>> physical pages into a linear aperture that you point the GPU at.  TTM
>> has to jump through quite a few hoops to get uncached memory in the
>> first place, so it's likely that that somehow isn't compatible with
>> HIGHMEM.  Can you get uncached HIGHMEM?
>
> I guess in principle yes, if you're careful not to use regular 
> kmap()/kmap_atomic(), and always use pgprot_noncached() for 
> userspace/vmalloc mappings, but clearly that leaves lots of scope for 
> slipping up.

I theory we should do exactly that in TTM, but we have very few users 
who actually still exercise that functionality.

>
> Working backwards from primitives like set_memory_uc(), I see various 
> paths in TTM where manipulating the caching state is skipped for 
> highmem pages, but I wouldn't even know where to start looking for 
> whether the right state is propagated to all the places where they 
> might eventually be mapped somewhere.

The tt object has the caching state for the pages and 
ttm_prot_from_caching() then uses pgprot_noncached() and co for the 
userspace/vmalloc mappings.

Regards,
Christian.

>
> Cheers,
> Robin.
Luben Tuikov Dec. 15, 2022, 9:08 a.m. UTC | #5
On 2022-12-15 03:07, Christian König wrote:
> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>> On 2022-12-14 22:02, Alex Deucher wrote:
>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com> 
>>> wrote:
>>>>
>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>
>>>>> On older systems with little memory, for instance 1.5 GiB, using an 
>>>>> AGP chip,
>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is 
>>>>> 0x7FFFFFF, and
>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>> false. As such the result of this static inline isn't suitable for 
>>>>> the last
>>>>> argument to ttm_device_init()--it simply needs to now whether to 
>>>>> use GFP_DMA32
>>>>> when allocating DMA buffers.
>>>>
>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>> reported symptoms initially sounded like they could be caused by DMA
>>>> going to the wrong place, that is also equally consistent with a 
>>>> loss of
>>>> cache coherency.
>>>>
>>>> My (limited) understanding of AGP is that the GART can effectively 
>>>> alias
>>>> memory to a second physical address, so I could well believe that
>>>> something somewhere in the driver stack needs to perform some cache
>>>> maintenance to avoid coherency issues, and that in these particular
>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>> and thus going wrong for highmem pages.
>>>>
>>>> So as I said before, I really think this is not about using 
>>>> GFP_DMA32 at
>>>> all, but about *not* using GFP_HIGHUSER.
>>>
>>> One of the wonderful features of AGP is that it has to be used with
>>> uncached memory.  The aperture basically just provides a remapping of
>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>> has to jump through quite a few hoops to get uncached memory in the
>>> first place, so it's likely that that somehow isn't compatible with
>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>
>> I guess in principle yes, if you're careful not to use regular 
>> kmap()/kmap_atomic(), and always use pgprot_noncached() for 
>> userspace/vmalloc mappings, but clearly that leaves lots of scope for 
>> slipping up.
> 
> I theory we should do exactly that in TTM, but we have very few users 
> who actually still exercise that functionality.
> 
>>
>> Working backwards from primitives like set_memory_uc(), I see various 
>> paths in TTM where manipulating the caching state is skipped for 
>> highmem pages, but I wouldn't even know where to start looking for 
>> whether the right state is propagated to all the places where they 
>> might eventually be mapped somewhere.
> 
> The tt object has the caching state for the pages and 
> ttm_prot_from_caching() then uses pgprot_noncached() and co for the 
> userspace/vmalloc mappings.
> 

The point of this patch is that dma_addressing_limited() is unsuitable as
the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
(gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())

Is there an objection to this patch, if it fixes the screen corruption?

Or does TTM need fixing, in that what we really need is to specify whether
caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
called from ttm_device_init(), called from radeon_ttm_init.c)?

Regards,
Luben
Christian König Dec. 15, 2022, 9:46 a.m. UTC | #6
Am 15.12.22 um 10:08 schrieb Luben Tuikov:
> On 2022-12-15 03:07, Christian König wrote:
>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>> wrote:
>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>
>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>> AGP chip,
>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>> 0x7FFFFFF, and
>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>> the last
>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>> use GFP_DMA32
>>>>>> when allocating DMA buffers.
>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>> going to the wrong place, that is also equally consistent with a
>>>>> loss of
>>>>> cache coherency.
>>>>>
>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>> alias
>>>>> memory to a second physical address, so I could well believe that
>>>>> something somewhere in the driver stack needs to perform some cache
>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>> and thus going wrong for highmem pages.
>>>>>
>>>>> So as I said before, I really think this is not about using
>>>>> GFP_DMA32 at
>>>>> all, but about *not* using GFP_HIGHUSER.
>>>> One of the wonderful features of AGP is that it has to be used with
>>>> uncached memory.  The aperture basically just provides a remapping of
>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>> has to jump through quite a few hoops to get uncached memory in the
>>>> first place, so it's likely that that somehow isn't compatible with
>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>> I guess in principle yes, if you're careful not to use regular
>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>> slipping up.
>> I theory we should do exactly that in TTM, but we have very few users
>> who actually still exercise that functionality.
>>
>>> Working backwards from primitives like set_memory_uc(), I see various
>>> paths in TTM where manipulating the caching state is skipped for
>>> highmem pages, but I wouldn't even know where to start looking for
>>> whether the right state is propagated to all the places where they
>>> might eventually be mapped somewhere.
>> The tt object has the caching state for the pages and
>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>> userspace/vmalloc mappings.
>>
> The point of this patch is that dma_addressing_limited() is unsuitable as
> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())

Well I would rather say that dma_addressing_limited() works, but the 
default value from dma_get_required_mask() is broken.

32 bits only work with bounce buffers and we can't use those on graphics 
hardware.

> Is there an objection to this patch, if it fixes the screen corruption?

Not from my side, but fixing the underlying issues would be better I think.

> Or does TTM need fixing, in that what we really need is to specify whether
> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
> called from ttm_device_init(), called from radeon_ttm_init.c)?

Could be, but it's more likely that the problem is in the DMA layer 
because we fail to recognize that the device can't access all of the memory.

Regards,
Christian.

>
> Regards,
> Luben
>
Luben Tuikov Dec. 15, 2022, 10:19 a.m. UTC | #7
On 2022-12-15 04:46, Christian König wrote:
> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>> On 2022-12-15 03:07, Christian König wrote:
>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>> wrote:
>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>
>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>> AGP chip,
>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>> 0x7FFFFFF, and
>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>> the last
>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>> use GFP_DMA32
>>>>>>> when allocating DMA buffers.
>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>> loss of
>>>>>> cache coherency.
>>>>>>
>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>> alias
>>>>>> memory to a second physical address, so I could well believe that
>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>> and thus going wrong for highmem pages.
>>>>>>
>>>>>> So as I said before, I really think this is not about using
>>>>>> GFP_DMA32 at
>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>> I guess in principle yes, if you're careful not to use regular
>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>> slipping up.
>>> I theory we should do exactly that in TTM, but we have very few users
>>> who actually still exercise that functionality.
>>>
>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>> paths in TTM where manipulating the caching state is skipped for
>>>> highmem pages, but I wouldn't even know where to start looking for
>>>> whether the right state is propagated to all the places where they
>>>> might eventually be mapped somewhere.
>>> The tt object has the caching state for the pages and
>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>> userspace/vmalloc mappings.
>>>
>> The point of this patch is that dma_addressing_limited() is unsuitable as
>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
> 
> Well I would rather say that dma_addressing_limited() works, but the 
> default value from dma_get_required_mask() is broken.
> 

dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().

> 32 bits only work with bounce buffers and we can't use those on graphics 
> hardware.
> 
>> Is there an objection to this patch, if it fixes the screen corruption?
> 
> Not from my side, but fixing the underlying issues would be better I think.
> 

Have they been identified?

>> Or does TTM need fixing, in that what we really need is to specify whether
>> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
>> called from ttm_device_init(), called from radeon_ttm_init.c)?
> 
> Could be, but it's more likely that the problem is in the DMA layer 
> because we fail to recognize that the device can't access all of the memory.
> 

Right, I agree. Ideally, setting dev->{dma_mask, coherent_dma_mask, bus_dma_limit},
should be sufficient to tell the DMA layer what kind of memory the device
can handle.

But this patch doesn't change non-local behaviour and as such is local and safe
to apply.

Regards,
Luben
Christian König Dec. 15, 2022, 11:27 a.m. UTC | #8
Am 15.12.22 um 11:19 schrieb Luben Tuikov:
> On 2022-12-15 04:46, Christian König wrote:
>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>> On 2022-12-15 03:07, Christian König wrote:
>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>>> wrote:
>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>
>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>> AGP chip,
>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>> 0x7FFFFFF, and
>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>> the last
>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>> use GFP_DMA32
>>>>>>>> when allocating DMA buffers.
>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>> loss of
>>>>>>> cache coherency.
>>>>>>>
>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>> alias
>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>> and thus going wrong for highmem pages.
>>>>>>>
>>>>>>> So as I said before, I really think this is not about using
>>>>>>> GFP_DMA32 at
>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>> I guess in principle yes, if you're careful not to use regular
>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>> slipping up.
>>>> I theory we should do exactly that in TTM, but we have very few users
>>>> who actually still exercise that functionality.
>>>>
>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>> whether the right state is propagated to all the places where they
>>>>> might eventually be mapped somewhere.
>>>> The tt object has the caching state for the pages and
>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>> userspace/vmalloc mappings.
>>>>
>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>> Well I would rather say that dma_addressing_limited() works, but the
>> default value from dma_get_required_mask() is broken.
>>
> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.

This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB 
addressable memory (27 bits set)? Or is there another F missing?

> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>
>> 32 bits only work with bounce buffers and we can't use those on graphics
>> hardware.
>>
>>> Is there an objection to this patch, if it fixes the screen corruption?
>> Not from my side, but fixing the underlying issues would be better I think.
>>
> Have they been identified?

I'm not 100% sure. I think by using GFP_DMA32 we just work around the 
issue somehow.

>>> Or does TTM need fixing, in that what we really need is to specify whether
>>> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
>>> called from ttm_device_init(), called from radeon_ttm_init.c)?
>> Could be, but it's more likely that the problem is in the DMA layer
>> because we fail to recognize that the device can't access all of the memory.
>>
> Right, I agree. Ideally, setting dev->{dma_mask, coherent_dma_mask, bus_dma_limit},
> should be sufficient to tell the DMA layer what kind of memory the device
> can handle.
>
> But this patch doesn't change non-local behaviour and as such is local and safe
> to apply.

Yeah, agree. It's pretty hard to find such old hardware for testing anyway.

I do still have a working AGP system somewhere in my basement, but 
dusting that of just for testing this doesn't sounds like valuable time 
spend.

Regards,
Christian.

>
> Regards,
> Luben
>
Luben Tuikov Dec. 15, 2022, 11:40 a.m. UTC | #9
On 2022-12-15 06:27, Christian König wrote:
> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>> On 2022-12-15 04:46, Christian König wrote:
>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>>>> wrote:
>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>
>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>> AGP chip,
>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>> the last
>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>> use GFP_DMA32
>>>>>>>>> when allocating DMA buffers.
>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>> loss of
>>>>>>>> cache coherency.
>>>>>>>>
>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>> alias
>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>
>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>> GFP_DMA32 at
>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>> slipping up.
>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>> who actually still exercise that functionality.
>>>>>
>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>> whether the right state is propagated to all the places where they
>>>>>> might eventually be mapped somewhere.
>>>>> The tt object has the caching state for the pages and
>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>> userspace/vmalloc mappings.
>>>>>
>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>> Well I would rather say that dma_addressing_limited() works, but the
>>> default value from dma_get_required_mask() is broken.
>>>
>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
> 
> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB 
> addressable memory (27 bits set)? Or is there another F missing?

Yeah, I'm missing an F--it is correctly described at the top of the thread above,
i.e. in the commit of v2 patch.

0x7FFF_FFFF, which seems correct, no?

>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>>
>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>> hardware.
>>>
>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>> Not from my side, but fixing the underlying issues would be better I think.
>>>
>> Have they been identified?
> 
> I'm not 100% sure. I think by using GFP_DMA32 we just work around the 
> issue somehow.

Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
code trying to understand what we do when GFP_DMA32 is not set, and the immediate
thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
(Then I got down to the caching attributes...)

It's be nice if we can find the actual issue--what else would it show us that needs fixing...?

So what do we do with this patch?

Shouldn't leave it in a limbo--some OSes ship their kernel
with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
reverted.

Regards,
Luben
Robin Murphy Dec. 15, 2022, 11:53 a.m. UTC | #10
On 2022-12-15 11:40, Luben Tuikov wrote:
> On 2022-12-15 06:27, Christian König wrote:
>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>> On 2022-12-15 04:46, Christian König wrote:
>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>>>>> wrote:
>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>>
>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>>> AGP chip,
>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>>> the last
>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>>> use GFP_DMA32
>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>>> loss of
>>>>>>>>> cache coherency.
>>>>>>>>>
>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>>> alias
>>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>>
>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>> GFP_DMA32 at
>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>>> slipping up.
>>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>>> who actually still exercise that functionality.
>>>>>>
>>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>>> whether the right state is propagated to all the places where they
>>>>>>> might eventually be mapped somewhere.
>>>>>> The tt object has the caching state for the pages and
>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>>> userspace/vmalloc mappings.
>>>>>>
>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>> Well I would rather say that dma_addressing_limited() works, but the
>>>> default value from dma_get_required_mask() is broken.
>>>>
>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
>>
>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>> addressable memory (27 bits set)? Or is there another F missing?
> 
> Yeah, I'm missing an F--it is correctly described at the top of the thread above,
> i.e. in the commit of v2 patch.
> 
> 0x7FFF_FFFF, which seems correct, no?
> 
>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>>>
>>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>>> hardware.
>>>>
>>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>>> Not from my side, but fixing the underlying issues would be better I think.
>>>>
>>> Have they been identified?
>>
>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the
>> issue somehow.
> 
> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
> code trying to understand what we do when GFP_DMA32 is not set, and the immediate
> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
> (Then I got down to the caching attributes...)
> 
> It's be nice if we can find the actual issue--what else would it show us that needs fixing...?
> 
> So what do we do with this patch?
> 
> Shouldn't leave it in a limbo--some OSes ship their kernel
> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
> reverted.

Removing dma_addressing_limited() is still wrong, for the reasons given 
in that commit. What we need is an *additional* condition that 
encapsulates "also pass use_dma32 for AGP devices because it avoids some 
weird coherency issue with 32-bit highmem that isn't worth trying to 
debug further".

Thanks,
Robin.
Luben Tuikov Dec. 15, 2022, 12:07 p.m. UTC | #11
On 2022-12-15 06:53, Robin Murphy wrote:
> On 2022-12-15 11:40, Luben Tuikov wrote:
>> On 2022-12-15 06:27, Christian König wrote:
>>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>>> On 2022-12-15 04:46, Christian König wrote:
>>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>>>>>> wrote:
>>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>>>
>>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>>>> AGP chip,
>>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>>>> the last
>>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>>>> use GFP_DMA32
>>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>>>> loss of
>>>>>>>>>> cache coherency.
>>>>>>>>>>
>>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>>>> alias
>>>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>>>
>>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>>> GFP_DMA32 at
>>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>>>> slipping up.
>>>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>>>> who actually still exercise that functionality.
>>>>>>>
>>>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>>>> whether the right state is propagated to all the places where they
>>>>>>>> might eventually be mapped somewhere.
>>>>>>> The tt object has the caching state for the pages and
>>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>>>> userspace/vmalloc mappings.
>>>>>>>
>>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>>> Well I would rather say that dma_addressing_limited() works, but the
>>>>> default value from dma_get_required_mask() is broken.
>>>>>
>>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
>>>
>>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>>> addressable memory (27 bits set)? Or is there another F missing?
>>
>> Yeah, I'm missing an F--it is correctly described at the top of the thread above,
>> i.e. in the commit of v2 patch.
>>
>> 0x7FFF_FFFF, which seems correct, no?
>>
>>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>>>>
>>>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>>>> hardware.
>>>>>
>>>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>>>> Not from my side, but fixing the underlying issues would be better I think.
>>>>>
>>>> Have they been identified?
>>>
>>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the
>>> issue somehow.
>>
>> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
>> code trying to understand what we do when GFP_DMA32 is not set, and the immediate
>> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
>> (Then I got down to the caching attributes...)
>>
>> It's be nice if we can find the actual issue--what else would it show us that needs fixing...?
>>
>> So what do we do with this patch?
>>
>> Shouldn't leave it in a limbo--some OSes ship their kernel
>> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
>> reverted.
> 
> Removing dma_addressing_limited() is still wrong, for the reasons given 
> in that commit. What we need is an *additional* condition that 
> encapsulates "also pass use_dma32 for AGP devices because it avoids some 
> weird coherency issue with 32-bit highmem that isn't worth trying to 
> debug further".

Yes, you had a patch earlier which did exactly that--why not push that patch?

Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the device's mask is 0xFFFF_FFFF,
shouldn't we use GFP_DMA32, instead of GFP_HIGHUSER?

Regards,
Luben
Krylov Michael Jan. 19, 2023, 4:56 p.m. UTC | #12
On Thu, 15 Dec 2022 07:07:33 -0500
Luben Tuikov <luben.tuikov@amd.com> wrote:

> On 2022-12-15 06:53, Robin Murphy wrote:
> > On 2022-12-15 11:40, Luben Tuikov wrote:
> >> On 2022-12-15 06:27, Christian König wrote:
> >>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
> >>>> On 2022-12-15 04:46, Christian König wrote:
> >>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
> >>>>>> On 2022-12-15 03:07, Christian König wrote:
> >>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
> >>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
> >>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy
> >>>>>>>>> <robin.murphy@arm.com> wrote:
> >>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
> >>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP
> >>>>>>>>>>> chips.
> >>>>>>>>>>>
> >>>>>>>>>>> On older systems with little memory, for instance 1.5
> >>>>>>>>>>> GiB, using an AGP chip,
> >>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask
> >>>>>>>>>>> is 0x7FFFFFF, and
> >>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF
> >>>>>>>>>>> < 0x7FFFFFFF, false. As such the result of this static
> >>>>>>>>>>> inline isn't suitable for the last
> >>>>>>>>>>> argument to ttm_device_init()--it simply needs to now
> >>>>>>>>>>> whether to use GFP_DMA32
> >>>>>>>>>>> when allocating DMA buffers.
> >>>>>>>>>> This sounds wrong to me. If the issues happen on systems
> >>>>>>>>>> without PAE it clearly can't have anything to with the
> >>>>>>>>>> actual DMA address size. Not to mention that AFAICS 32-bit
> >>>>>>>>>> x86 doesn't even have ZONE_DMA32, so GFP_DMA32 would be
> >>>>>>>>>> functionally meaningless anyway. Although the reported
> >>>>>>>>>> symptoms initially sounded like they could be caused by
> >>>>>>>>>> DMA going to the wrong place, that is also equally
> >>>>>>>>>> consistent with a loss of cache coherency.
> >>>>>>>>>>
> >>>>>>>>>> My (limited) understanding of AGP is that the GART can
> >>>>>>>>>> effectively alias
> >>>>>>>>>> memory to a second physical address, so I could well
> >>>>>>>>>> believe that something somewhere in the driver stack needs
> >>>>>>>>>> to perform some cache maintenance to avoid coherency
> >>>>>>>>>> issues, and that in these particular setups whatever that
> >>>>>>>>>> is might be assuming the memory is direct-mapped and thus
> >>>>>>>>>> going wrong for highmem pages.
> >>>>>>>>>>
> >>>>>>>>>> So as I said before, I really think this is not about using
> >>>>>>>>>> GFP_DMA32 at
> >>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
> >>>>>>>>> One of the wonderful features of AGP is that it has to be
> >>>>>>>>> used with uncached memory.  The aperture basically just
> >>>>>>>>> provides a remapping of physical pages into a linear
> >>>>>>>>> aperture that you point the GPU at.  TTM has to jump
> >>>>>>>>> through quite a few hoops to get uncached memory in the
> >>>>>>>>> first place, so it's likely that that somehow isn't
> >>>>>>>>> compatible with HIGHMEM.  Can you get uncached HIGHMEM?
> >>>>>>>> I guess in principle yes, if you're careful not to use
> >>>>>>>> regular kmap()/kmap_atomic(), and always use
> >>>>>>>> pgprot_noncached() for userspace/vmalloc mappings, but
> >>>>>>>> clearly that leaves lots of scope for slipping up.
> >>>>>>> I theory we should do exactly that in TTM, but we have very
> >>>>>>> few users who actually still exercise that functionality.
> >>>>>>>
> >>>>>>>> Working backwards from primitives like set_memory_uc(), I
> >>>>>>>> see various paths in TTM where manipulating the caching
> >>>>>>>> state is skipped for highmem pages, but I wouldn't even know
> >>>>>>>> where to start looking for whether the right state is
> >>>>>>>> propagated to all the places where they might eventually be
> >>>>>>>> mapped somewhere.
> >>>>>>> The tt object has the caching state for the pages and
> >>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co
> >>>>>>> for the userspace/vmalloc mappings.
> >>>>>>>
> >>>>>> The point of this patch is that dma_addressing_limited() is
> >>>>>> unsuitable as the last parameter to ttm_pool_init(), since if
> >>>>>> it is "false"--as it is in this particular case--then TTM ends
> >>>>>> up using HIGHUSER, and we get the screen corruption.
> >>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
> >>>>> Well I would rather say that dma_addressing_limited() works,
> >>>>> but the default value from dma_get_required_mask() is broken.
> >>>>>
> >>>> dma_get_required_mask() for his setup of 1.5 GiB of memory
> >>>> returns 0x7FFFFFF.
> >>>
> >>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
> >>> addressable memory (27 bits set)? Or is there another F missing?
> >>
> >> Yeah, I'm missing an F--it is correctly described at the top of
> >> the thread above, i.e. in the commit of v2 patch.
> >>
> >> 0x7FFF_FFFF, which seems correct, no?
> >>
> >>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in
> >>>> radeon_device_init().
> >>>>
> >>>>> 32 bits only work with bounce buffers and we can't use those on
> >>>>> graphics hardware.
> >>>>>
> >>>>>> Is there an objection to this patch, if it fixes the screen
> >>>>>> corruption?
> >>>>> Not from my side, but fixing the underlying issues would be
> >>>>> better I think.
> >>>>>
> >>>> Have they been identified?
> >>>
> >>> I'm not 100% sure. I think by using GFP_DMA32 we just work around
> >>> the issue somehow.
> >>
> >> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at
> >> the DRM code trying to understand what we do when GFP_DMA32 is not
> >> set, and the immediate thing I see is that we set GFP_HIGHUSER
> >> when use_dma32 is unset in the device struct. (Then I got down to
> >> the caching attributes...)
> >>
> >> It's be nice if we can find the actual issue--what else would it
> >> show us that needs fixing...?
> >>
> >> So what do we do with this patch?
> >>
> >> Shouldn't leave it in a limbo--some OSes ship their kernel
> >> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with
> >> addressing limitations") wholly reverted.
> > 
> > Removing dma_addressing_limited() is still wrong, for the reasons
> > given in that commit. What we need is an *additional* condition
> > that encapsulates "also pass use_dma32 for AGP devices because it
> > avoids some weird coherency issue with 32-bit highmem that isn't
> > worth trying to debug further".
> 
> Yes, you had a patch earlier which did exactly that--why not push
> that patch?
> 
> Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the
> device's mask is 0xFFFF_FFFF, shouldn't we use GFP_DMA32, instead of
> GFP_HIGHUSER?
> 
> Regards,
> Luben
> 

Sorry for being pushy, but given that we are so close to the finish, is
there any chance that one of the variants will be merged to the kernel
sources any time soon and if so, can I help with testing? I would really
benefit from this patch making it into Debian 12.
Luben Tuikov Jan. 20, 2023, 4:31 a.m. UTC | #13
On 2023-01-19 11:56, Krylov Michael wrote:
> On Thu, 15 Dec 2022 07:07:33 -0500
> Luben Tuikov <luben.tuikov@amd.com> wrote:
> 
>> On 2022-12-15 06:53, Robin Murphy wrote:
>>> On 2022-12-15 11:40, Luben Tuikov wrote:
>>>> On 2022-12-15 06:27, Christian König wrote:
>>>>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>>>>> On 2022-12-15 04:46, Christian König wrote:
>>>>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy
>>>>>>>>>>> <robin.murphy@arm.com> wrote:
>>>>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP
>>>>>>>>>>>>> chips.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On older systems with little memory, for instance 1.5
>>>>>>>>>>>>> GiB, using an AGP chip,
>>>>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask
>>>>>>>>>>>>> is 0x7FFFFFF, and
>>>>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF
>>>>>>>>>>>>> < 0x7FFFFFFF, false. As such the result of this static
>>>>>>>>>>>>> inline isn't suitable for the last
>>>>>>>>>>>>> argument to ttm_device_init()--it simply needs to now
>>>>>>>>>>>>> whether to use GFP_DMA32
>>>>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>>>>> This sounds wrong to me. If the issues happen on systems
>>>>>>>>>>>> without PAE it clearly can't have anything to with the
>>>>>>>>>>>> actual DMA address size. Not to mention that AFAICS 32-bit
>>>>>>>>>>>> x86 doesn't even have ZONE_DMA32, so GFP_DMA32 would be
>>>>>>>>>>>> functionally meaningless anyway. Although the reported
>>>>>>>>>>>> symptoms initially sounded like they could be caused by
>>>>>>>>>>>> DMA going to the wrong place, that is also equally
>>>>>>>>>>>> consistent with a loss of cache coherency.
>>>>>>>>>>>>
>>>>>>>>>>>> My (limited) understanding of AGP is that the GART can
>>>>>>>>>>>> effectively alias
>>>>>>>>>>>> memory to a second physical address, so I could well
>>>>>>>>>>>> believe that something somewhere in the driver stack needs
>>>>>>>>>>>> to perform some cache maintenance to avoid coherency
>>>>>>>>>>>> issues, and that in these particular setups whatever that
>>>>>>>>>>>> is might be assuming the memory is direct-mapped and thus
>>>>>>>>>>>> going wrong for highmem pages.
>>>>>>>>>>>>
>>>>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>>>>> GFP_DMA32 at
>>>>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>>>>> One of the wonderful features of AGP is that it has to be
>>>>>>>>>>> used with uncached memory.  The aperture basically just
>>>>>>>>>>> provides a remapping of physical pages into a linear
>>>>>>>>>>> aperture that you point the GPU at.  TTM has to jump
>>>>>>>>>>> through quite a few hoops to get uncached memory in the
>>>>>>>>>>> first place, so it's likely that that somehow isn't
>>>>>>>>>>> compatible with HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>>>>> I guess in principle yes, if you're careful not to use
>>>>>>>>>> regular kmap()/kmap_atomic(), and always use
>>>>>>>>>> pgprot_noncached() for userspace/vmalloc mappings, but
>>>>>>>>>> clearly that leaves lots of scope for slipping up.
>>>>>>>>> I theory we should do exactly that in TTM, but we have very
>>>>>>>>> few users who actually still exercise that functionality.
>>>>>>>>>
>>>>>>>>>> Working backwards from primitives like set_memory_uc(), I
>>>>>>>>>> see various paths in TTM where manipulating the caching
>>>>>>>>>> state is skipped for highmem pages, but I wouldn't even know
>>>>>>>>>> where to start looking for whether the right state is
>>>>>>>>>> propagated to all the places where they might eventually be
>>>>>>>>>> mapped somewhere.
>>>>>>>>> The tt object has the caching state for the pages and
>>>>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co
>>>>>>>>> for the userspace/vmalloc mappings.
>>>>>>>>>
>>>>>>>> The point of this patch is that dma_addressing_limited() is
>>>>>>>> unsuitable as the last parameter to ttm_pool_init(), since if
>>>>>>>> it is "false"--as it is in this particular case--then TTM ends
>>>>>>>> up using HIGHUSER, and we get the screen corruption.
>>>>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>>>>> Well I would rather say that dma_addressing_limited() works,
>>>>>>> but the default value from dma_get_required_mask() is broken.
>>>>>>>
>>>>>> dma_get_required_mask() for his setup of 1.5 GiB of memory
>>>>>> returns 0x7FFFFFF.
>>>>>
>>>>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>>>>> addressable memory (27 bits set)? Or is there another F missing?
>>>>
>>>> Yeah, I'm missing an F--it is correctly described at the top of
>>>> the thread above, i.e. in the commit of v2 patch.
>>>>
>>>> 0x7FFF_FFFF, which seems correct, no?
>>>>
>>>>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in
>>>>>> radeon_device_init().
>>>>>>
>>>>>>> 32 bits only work with bounce buffers and we can't use those on
>>>>>>> graphics hardware.
>>>>>>>
>>>>>>>> Is there an objection to this patch, if it fixes the screen
>>>>>>>> corruption?
>>>>>>> Not from my side, but fixing the underlying issues would be
>>>>>>> better I think.
>>>>>>>
>>>>>> Have they been identified?
>>>>>
>>>>> I'm not 100% sure. I think by using GFP_DMA32 we just work around
>>>>> the issue somehow.
>>>>
>>>> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at
>>>> the DRM code trying to understand what we do when GFP_DMA32 is not
>>>> set, and the immediate thing I see is that we set GFP_HIGHUSER
>>>> when use_dma32 is unset in the device struct. (Then I got down to
>>>> the caching attributes...)
>>>>
>>>> It's be nice if we can find the actual issue--what else would it
>>>> show us that needs fixing...?
>>>>
>>>> So what do we do with this patch?
>>>>
>>>> Shouldn't leave it in a limbo--some OSes ship their kernel
>>>> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with
>>>> addressing limitations") wholly reverted.
>>>
>>> Removing dma_addressing_limited() is still wrong, for the reasons
>>> given in that commit. What we need is an *additional* condition
>>> that encapsulates "also pass use_dma32 for AGP devices because it
>>> avoids some weird coherency issue with 32-bit highmem that isn't
>>> worth trying to debug further".
>>
>> Yes, you had a patch earlier which did exactly that--why not push
>> that patch?
>>
>> Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the
>> device's mask is 0xFFFF_FFFF, shouldn't we use GFP_DMA32, instead of
>> GFP_HIGHUSER?
>>
>> Regards,
>> Luben
>>
> 
> Sorry for being pushy, but given that we are so close to the finish, is
> there any chance that one of the variants will be merged to the kernel
> sources any time soon and if so, can I help with testing? I would really
> benefit from this patch making it into Debian 12.

Well, there's a couple of patches addressing this problem here in this thread.
If consensus is reached, perhaps one of them could be picked up by upstream.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 37dec92339b16a..4fe38fd9be3267 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2426,6 +2426,7 @@  struct radeon_device {
 	struct radeon_wb		wb;
 	struct radeon_dummy_page	dummy_page;
 	bool				shutdown;
+	bool                            need_dma32;
 	bool				need_swiotlb;
 	bool				accel_working;
 	bool				fastfb_working; /* IGP feature*/
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 6344454a772172..3643a3cfe061bd 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1370,7 +1370,7 @@  int radeon_device_init(struct radeon_device *rdev,
 	if (rdev->family == CHIP_CEDAR)
 		dma_bits = 32;
 #endif
-
+	rdev->need_dma32 = dma_bits == 32;
 	r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits));
 	if (r) {
 		pr_warn("radeon: No suitable DMA available\n");
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index bdb4c0e0736ba2..3debaeb720d173 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -696,7 +696,7 @@  int radeon_ttm_init(struct radeon_device *rdev)
 			       rdev->ddev->anon_inode->i_mapping,
 			       rdev->ddev->vma_offset_manager,
 			       rdev->need_swiotlb,
-			       dma_addressing_limited(&rdev->pdev->dev));
+			       rdev->need_dma32);
 	if (r) {
 		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
 		return r;