diff mbox

drm/radeon: "ring test failed" on PA-RISC Linux

Message ID C501E045-BE57-46B9-A1A9-9BD652964F3A@p0n4ik.tk (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Ivanov Sept. 10, 2013, 9:20 a.m. UTC
Alex,

09.09.2013, ? 21:43, Alex Deucher <alexdeucher@gmail.com> ???????(?):

> On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>> Folks,
>> 
>> We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
>> native video options of the latest PA-RISC servers and workstations
>> (these are ATIs, most of which are based on R100/R300/R420 chips) work
>> correctly on this platform (big endian pa-risc).
>> 
>> However, we hadn't much success. DRM fails every time with
>> "ring test failed" for both AGP & PCI.
>> 
>> Maybe you would give us some suggestions that we could check?
>> 
>> Topic started here:
>> http://www.spinics.net/lists/linux-parisc/msg04908.html
>> And continued there:
>> http://www.spinics.net/lists/linux-parisc/msg04995.html
>> http://www.spinics.net/lists/linux-parisc/msg05006.html
>> 
>> Problems we've already resolved without any signs of progress:
>> - Checked the successful microcode load
>> "parisc AGP GART code writes IOMMU entries in the wrong byte order and
>> doesn't add the coherency information SBA code adds"
>> "our PCI BAR setup doesn't really work very well together with the Radeon
>> DRM address setup. DRM will generate addresses, which are even outside
>> of the connected LBA"
>> 
>> Things planned for a check:
>> "The drivers/video/aty uses
>> an endian config bit DRM doesn't use, but I haven't tested whether
>> this makes a difference and how it is connected to the overall picture."
> 
> I don't think that will any difference.  radeon kms works fine on
> other big endian platforms such as powerpc.

Good! I'll opt it out then.

> 
>> 
>> "The Rage128 product revealed a weakness in some motherboard
>> chipsets in that there is no mechanism to guarantee
>> that data written by the CPU to memory is actually in a readable
>> state before the Graphics Controller receives an
>> update to its copy of the Write Pointer. In an effort to alleviate this
>> problem, we"ve introduced a mechanism into the
>> Graphics Controller that will delay the actual write to the Write Pointer
>> for some programmable amount of time, in
>> order to give the chipset time to flush its internal write buffers to
>> memory.
>> There are two register fields that control this mechanism:
>> PRE_WRITE_TIMER and PRE_WRITE_LIMIT.
>> 
>> In the radeon DRM codebase I didn't found anyone using/setting
>> those registers. Maybe PA-RISC has some problem here?..."
> 
> I doubt it.  If you are using AGP, I'd suggest disabling it and first
> try to get things working using the on chip gart rather than AGP.
> Load radeon with agpmode=-1.  

Already tried this without any luck. Anyway, a radeon driver fallbacks
to the PCI mode in our case, so does it really matter?

In addition, people with PCI cards experiencing the same issue...

> The on chip gart always uses cache
> snooped pci transactions and the driver assumes pci is cache coherent.
> On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
> system ram.  On PCIE asics, the gart table is stored in vram.  The
> gart page table maps system pages to a contiguous aperture in the
> GPU's address space.  The ring lives in gart memory.  The GPU sees a
> contiguous buffer and the gart mechanism handles the access to the
> backing pages via the page table.  I'd suggest verifying that the
> entries written to the gart page table are valid and then the
> information written to the ring buffer is valid before updating the
> ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
> causes the CP to start fetching data from the ring.

Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

> 
> Alex
> 
>> 
>> Thanks.
>> 
>> -------- ???????????? ?????????  --------
>> 04.08.2013, 15:06, "Alex Ivanov" <gnidorah@p0n4ik.tk>:
>> 
>> 11.07.2013, 23:48, "Helge Deller" <deller@gmx.de>:
>> 
>>> adding linux parisc mailing list...:
>>> 
>>> On 07/11/2013 09:46 PM, Helge Deller wrote:
>>>>  On 07/10/2013 11:29 PM, Alex Ivanov wrote:
>>>>>  11.07.2013, 01:14, "Matt Turner" <mattst88@gmail.com>:
>>>>>>  On Wed, Jul 10, 2013 at 1:19 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>>>>>>>   Thank you so much! Your guess looks to be right. After applying of your
>>>>>>>   patch there was no more KP and X just worked.
>>>>>>  Nice! Does DRI work?
>>>>>  Not on my side. Plus i can't visually jump over 8bit depth, although Xorg
>>>>>  states 24bit in it's log.
>>>>>  As for DRI, i'm experiencing
>>>>>  "ring test failed (scratch(0x15E4)=0xCAFEDEAD)" with a firegl x3.
>>>>  FWIW, I'm seeing the same failure on my FireGL X1:
>>>>  80:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Radeon R300 NG [FireGL X1] (rev 80)
>>>> 
>>>>  [drm] radeon: irq initialized.
>>>>  [drm] Loading R300 Microcode
>>>>  [drm] radeon: ring at 0x0000000060001000
>>>>  [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
>>>>  [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
>>>>  radeon 0000:80:00.0: failed initializing CP (-22).
>>>>  radeon 0000:80:00.0: Disabling GPU acceleration
>>>>  [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
>>>>  [drm] radeon: cp finalized
>>>>  [drm] radeon: cp finalized
>> 
>> I still have no clue why this happens. Broken SBA IOMMU / DRM code? Missing syncing primitives?
>> Should we forward this to dri-devel mail list?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> -------- ?????????? ????????????? ????????? --------
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

Comments

Alex Deucher Sept. 10, 2013, 12:37 p.m. UTC | #1
On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
> Alex,
>
> 09.09.2013, ? 21:43, Alex Deucher <alexdeucher@gmail.com> ???????(?):
>
>> On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>>> Folks,
>>>
>>> We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
>>> native video options of the latest PA-RISC servers and workstations
>>> (these are ATIs, most of which are based on R100/R300/R420 chips) work
>>> correctly on this platform (big endian pa-risc).
>>>
>>> However, we hadn't much success. DRM fails every time with
>>> "ring test failed" for both AGP & PCI.
>>>
>>> Maybe you would give us some suggestions that we could check?
>>>
>>> Topic started here:
>>> http://www.spinics.net/lists/linux-parisc/msg04908.html
>>> And continued there:
>>> http://www.spinics.net/lists/linux-parisc/msg04995.html
>>> http://www.spinics.net/lists/linux-parisc/msg05006.html
>>>
>>> Problems we've already resolved without any signs of progress:
>>> - Checked the successful microcode load
>>> "parisc AGP GART code writes IOMMU entries in the wrong byte order and
>>> doesn't add the coherency information SBA code adds"
>>> "our PCI BAR setup doesn't really work very well together with the Radeon
>>> DRM address setup. DRM will generate addresses, which are even outside
>>> of the connected LBA"
>>>
>>> Things planned for a check:
>>> "The drivers/video/aty uses
>>> an endian config bit DRM doesn't use, but I haven't tested whether
>>> this makes a difference and how it is connected to the overall picture."
>>
>> I don't think that will any difference.  radeon kms works fine on
>> other big endian platforms such as powerpc.
>
> Good! I'll opt it out then.
>
>>
>>>
>>> "The Rage128 product revealed a weakness in some motherboard
>>> chipsets in that there is no mechanism to guarantee
>>> that data written by the CPU to memory is actually in a readable
>>> state before the Graphics Controller receives an
>>> update to its copy of the Write Pointer. In an effort to alleviate this
>>> problem, we"ve introduced a mechanism into the
>>> Graphics Controller that will delay the actual write to the Write Pointer
>>> for some programmable amount of time, in
>>> order to give the chipset time to flush its internal write buffers to
>>> memory.
>>> There are two register fields that control this mechanism:
>>> PRE_WRITE_TIMER and PRE_WRITE_LIMIT.
>>>
>>> In the radeon DRM codebase I didn't found anyone using/setting
>>> those registers. Maybe PA-RISC has some problem here?..."
>>
>> I doubt it.  If you are using AGP, I'd suggest disabling it and first
>> try to get things working using the on chip gart rather than AGP.
>> Load radeon with agpmode=-1.
>
> Already tried this without any luck. Anyway, a radeon driver fallbacks
> to the PCI mode in our case, so does it really matter?
>
> In addition, people with PCI cards experiencing the same issue...
>
>> The on chip gart always uses cache
>> snooped pci transactions and the driver assumes pci is cache coherent.
>> On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
>> system ram.  On PCIE asics, the gart table is stored in vram.  The
>> gart page table maps system pages to a contiguous aperture in the
>> GPU's address space.  The ring lives in gart memory.  The GPU sees a
>> contiguous buffer and the gart mechanism handles the access to the
>> backing pages via the page table.  I'd suggest verifying that the
>> entries written to the gart page table are valid and then the
>> information written to the ring buffer is valid before updating the
>> ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
>> causes the CP to start fetching data from the ring.
>
> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
>

The dummy page isn't really going to help much.  That page is just
used as a safety placeholder for gart entries that aren't mapped on
the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
the backing pages for the gart.  You may want to look there.

Alex

> --- radeon_device.c.orig        2013-09-10 08:55:05.000000000 +0000
> +++ radeon_device.c     2013-09-10 09:12:17.000000000 +0000
> @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon
>  {
>         if (rdev->dummy_page.page)
>                 return 0;
> -       rdev->dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO);
> -       if (rdev->dummy_page.page == NULL)
> +       rdev->dummy_page.page = dma_alloc_coherent(&rdev->pdev->dev, PAGE_SIZE,
> +               &rdev->dummy_page.addr, GFP_DMA32|GFP_KERNEL);
> +       if (!rdev->dummy_page.page)
>                 return -ENOMEM;
> -       rdev->dummy_page.addr = pci_map_page(rdev->pdev, rdev->dummy_page.page,
> -                                       0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>         if (pci_dma_mapping_error(rdev->pdev, rdev->dummy_page.addr)) {
>                 dev_err(&rdev->pdev->dev, "Failed to DMA MAP the dummy page\n");
> -               __free_page(rdev->dummy_page.page);
> -               rdev->dummy_page.page = NULL;
> +               radeon_dummy_page_fini(rdev);
>                 return -ENOMEM;
>         }
>         return 0;
> @@ -698,9 +696,8 @@ void radeon_dummy_page_fini(struct radeo
>  {
>         if (rdev->dummy_page.page == NULL)
>                 return;
> -       pci_unmap_page(rdev->pdev, rdev->dummy_page.addr,
> -                       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -       __free_page(rdev->dummy_page.page);
> +       dma_free_coherent(&rdev->pdev->dev, PAGE_SIZE,
> +               rdev->dummy_page.page, rdev->dummy_page.addr);
>         rdev->dummy_page.page = NULL;
>  }
>
>>
>> Alex
>>
>>>
>>> Thanks.
>>>
>>> -------- ???????????? ?????????  --------
>>> 04.08.2013, 15:06, "Alex Ivanov" <gnidorah@p0n4ik.tk>:
>>>
>>> 11.07.2013, 23:48, "Helge Deller" <deller@gmx.de>:
>>>
>>>> adding linux parisc mailing list...:
>>>>
>>>> On 07/11/2013 09:46 PM, Helge Deller wrote:
>>>>>  On 07/10/2013 11:29 PM, Alex Ivanov wrote:
>>>>>>  11.07.2013, 01:14, "Matt Turner" <mattst88@gmail.com>:
>>>>>>>  On Wed, Jul 10, 2013 at 1:19 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>>>>>>>>   Thank you so much! Your guess looks to be right. After applying of your
>>>>>>>>   patch there was no more KP and X just worked.
>>>>>>>  Nice! Does DRI work?
>>>>>>  Not on my side. Plus i can't visually jump over 8bit depth, although Xorg
>>>>>>  states 24bit in it's log.
>>>>>>  As for DRI, i'm experiencing
>>>>>>  "ring test failed (scratch(0x15E4)=0xCAFEDEAD)" with a firegl x3.
>>>>>  FWIW, I'm seeing the same failure on my FireGL X1:
>>>>>  80:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Radeon R300 NG [FireGL X1] (rev 80)
>>>>>
>>>>>  [drm] radeon: irq initialized.
>>>>>  [drm] Loading R300 Microcode
>>>>>  [drm] radeon: ring at 0x0000000060001000
>>>>>  [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
>>>>>  [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
>>>>>  radeon 0000:80:00.0: failed initializing CP (-22).
>>>>>  radeon 0000:80:00.0: Disabling GPU acceleration
>>>>>  [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
>>>>>  [drm] radeon: cp finalized
>>>>>  [drm] radeon: cp finalized
>>>
>>> I still have no clue why this happens. Broken SBA IOMMU / DRM code? Missing syncing primitives?
>>> Should we forward this to dri-devel mail list?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> -------- ?????????? ????????????? ????????? --------
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Konrad Rzeszutek Wilk Sept. 10, 2013, 1:25 p.m. UTC | #2
On Tue, Sep 10, 2013 at 01:20:57PM +0400, Alex Ivanov wrote:
> Alex,
> 
> 09.09.2013, ? 21:43, Alex Deucher <alexdeucher@gmail.com> ???????(?):
> 
> > On Mon, Sep 9, 2013 at 12:44 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
> >> Folks,
> >> 
> >> We (people at linux-parisc @ vger.kernel.org mail list) are trying to make
> >> native video options of the latest PA-RISC servers and workstations
> >> (these are ATIs, most of which are based on R100/R300/R420 chips) work
> >> correctly on this platform (big endian pa-risc).
> >> 
> >> However, we hadn't much success. DRM fails every time with
> >> "ring test failed" for both AGP & PCI.
> >> 
> >> Maybe you would give us some suggestions that we could check?
> >> 
> >> Topic started here:
> >> http://www.spinics.net/lists/linux-parisc/msg04908.html
> >> And continued there:
> >> http://www.spinics.net/lists/linux-parisc/msg04995.html
> >> http://www.spinics.net/lists/linux-parisc/msg05006.html
> >> 
> >> Problems we've already resolved without any signs of progress:
> >> - Checked the successful microcode load
> >> "parisc AGP GART code writes IOMMU entries in the wrong byte order and
> >> doesn't add the coherency information SBA code adds"
> >> "our PCI BAR setup doesn't really work very well together with the Radeon
> >> DRM address setup. DRM will generate addresses, which are even outside
> >> of the connected LBA"
> >> 
> >> Things planned for a check:
> >> "The drivers/video/aty uses
> >> an endian config bit DRM doesn't use, but I haven't tested whether
> >> this makes a difference and how it is connected to the overall picture."
> > 
> > I don't think that will any difference.  radeon kms works fine on
> > other big endian platforms such as powerpc.
> 
> Good! I'll opt it out then.
> 
> > 
> >> 
> >> "The Rage128 product revealed a weakness in some motherboard
> >> chipsets in that there is no mechanism to guarantee
> >> that data written by the CPU to memory is actually in a readable
> >> state before the Graphics Controller receives an
> >> update to its copy of the Write Pointer. In an effort to alleviate this
> >> problem, we"ve introduced a mechanism into the
> >> Graphics Controller that will delay the actual write to the Write Pointer
> >> for some programmable amount of time, in
> >> order to give the chipset time to flush its internal write buffers to
> >> memory.
> >> There are two register fields that control this mechanism:
> >> PRE_WRITE_TIMER and PRE_WRITE_LIMIT.
> >> 
> >> In the radeon DRM codebase I didn't found anyone using/setting
> >> those registers. Maybe PA-RISC has some problem here?..."
> > 
> > I doubt it.  If you are using AGP, I'd suggest disabling it and first
> > try to get things working using the on chip gart rather than AGP.
> > Load radeon with agpmode=-1.  
> 
> Already tried this without any luck. Anyway, a radeon driver fallbacks
> to the PCI mode in our case, so does it really matter?
> 
> In addition, people with PCI cards experiencing the same issue...
> 
> > The on chip gart always uses cache
> > snooped pci transactions and the driver assumes pci is cache coherent.
> > On AGP/PCI chips, the on-chip gart mechanism stores the gart table in
> > system ram.  On PCIE asics, the gart table is stored in vram.  The
> > gart page table maps system pages to a contiguous aperture in the
> > GPU's address space.  The ring lives in gart memory.  The GPU sees a
> > contiguous buffer and the gart mechanism handles the access to the
> > backing pages via the page table.  I'd suggest verifying that the
> > entries written to the gart page table are valid and then the
> > information written to the ring buffer is valid before updating the
> > ring's wptr in radeon_ring_unlock_commit().  Changing the wptr is what
> > causes the CP to start fetching data from the ring.
> 
> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

Is this platform enabling the SWIOTLB layer? The reason I am asking is
b/c if you do indeed enable it you end up using the TTM DMA pool
which allocates pages using the dma_alloc_coherent - which means that
all of the pages that come out of TTM are already 'DMA' mapped.

And that means the radeon_gart_bind and all its friends 
use the DMA addresses that have been constructed by SWIOTLB IOMMU.

Perhaps the PA-RISC IOMMU creates the DMA addresses differently?

When the card gets programmed, you do end up using ttm_agp_bind right?
I am wondering if something like this:

https://lkml.org/lkml/2010/12/6/512

is needed to pass in the right DMA address?

> 
> --- radeon_device.c.orig	2013-09-10 08:55:05.000000000 +0000
> +++ radeon_device.c	2013-09-10 09:12:17.000000000 +0000
> @@ -673,15 +673,13 @@ int radeon_dummy_page_init(struct radeon
>  {
>  	if (rdev->dummy_page.page)
>  		return 0;
> -	rdev->dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO);
> -	if (rdev->dummy_page.page == NULL)
> +	rdev->dummy_page.page = dma_alloc_coherent(&rdev->pdev->dev, PAGE_SIZE,
> +		&rdev->dummy_page.addr, GFP_DMA32|GFP_KERNEL);
> +	if (!rdev->dummy_page.page)
>  		return -ENOMEM;
> -	rdev->dummy_page.addr = pci_map_page(rdev->pdev, rdev->dummy_page.page,
> -					0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>  	if (pci_dma_mapping_error(rdev->pdev, rdev->dummy_page.addr)) {
>  		dev_err(&rdev->pdev->dev, "Failed to DMA MAP the dummy page\n");
> -		__free_page(rdev->dummy_page.page);
> -		rdev->dummy_page.page = NULL;
> +		radeon_dummy_page_fini(rdev);
>  		return -ENOMEM;
>  	}
>  	return 0;
> @@ -698,9 +696,8 @@ void radeon_dummy_page_fini(struct radeo
>  {
>  	if (rdev->dummy_page.page == NULL)
>  		return;
> -	pci_unmap_page(rdev->pdev, rdev->dummy_page.addr,
> -			PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> -	__free_page(rdev->dummy_page.page);
> +	dma_free_coherent(&rdev->pdev->dev, PAGE_SIZE,
> +		rdev->dummy_page.page, rdev->dummy_page.addr);
>  	rdev->dummy_page.page = NULL;
>  }
> 
> > 
> > Alex
> > 
> >> 
> >> Thanks.
> >> 
> >> -------- ???????????? ?????????  --------
> >> 04.08.2013, 15:06, "Alex Ivanov" <gnidorah@p0n4ik.tk>:
> >> 
> >> 11.07.2013, 23:48, "Helge Deller" <deller@gmx.de>:
> >> 
> >>> adding linux parisc mailing list...:
> >>> 
> >>> On 07/11/2013 09:46 PM, Helge Deller wrote:
> >>>>  On 07/10/2013 11:29 PM, Alex Ivanov wrote:
> >>>>>  11.07.2013, 01:14, "Matt Turner" <mattst88@gmail.com>:
> >>>>>>  On Wed, Jul 10, 2013 at 1:19 PM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
> >>>>>>>   Thank you so much! Your guess looks to be right. After applying of your
> >>>>>>>   patch there was no more KP and X just worked.
> >>>>>>  Nice! Does DRI work?
> >>>>>  Not on my side. Plus i can't visually jump over 8bit depth, although Xorg
> >>>>>  states 24bit in it's log.
> >>>>>  As for DRI, i'm experiencing
> >>>>>  "ring test failed (scratch(0x15E4)=0xCAFEDEAD)" with a firegl x3.
> >>>>  FWIW, I'm seeing the same failure on my FireGL X1:
> >>>>  80:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Radeon R300 NG [FireGL X1] (rev 80)
> >>>> 
> >>>>  [drm] radeon: irq initialized.
> >>>>  [drm] Loading R300 Microcode
> >>>>  [drm] radeon: ring at 0x0000000060001000
> >>>>  [drm:r100_ring_test] *ERROR* radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
> >>>>  [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
> >>>>  radeon 0000:80:00.0: failed initializing CP (-22).
> >>>>  radeon 0000:80:00.0: Disabling GPU acceleration
> >>>>  [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP.
> >>>>  [drm] radeon: cp finalized
> >>>>  [drm] radeon: cp finalized
> >> 
> >> I still have no clue why this happens. Broken SBA IOMMU / DRM code? Missing syncing primitives?
> >> Should we forward this to dri-devel mail list?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> -------- ?????????? ????????????? ????????? --------
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Ivanov Sept. 17, 2013, 9:23 a.m. UTC | #3
Alex,

10.09.2013, ? 16:37, Alex Deucher <alexdeucher@gmail.com> ???????(?):

> The dummy page isn't really going to help much.  That page is just
> used as a safety placeholder for gart entries that aren't mapped on
> the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
> the backing pages for the gart.  

> You may want to look there.

Ah, sorry. Indeed. Though, my idea with:

On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:

> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(

doesn't make a sense at TTM part as well.

Konrad,

10.09.2013, 17:25, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>:
>
> Is this platform enabling the SWIOTLB layer? 

Doesn't look like. 

> The reason I am asking is
> b/c if you do indeed enable it you end up using the TTM DMA pool
> which allocates pages using the dma_alloc_coherent - which means that
> all of the pages that come out of TTM are already 'DMA' mapped.
>
> And that means the radeon_gart_bind and all its friends
> use the DMA addresses that have been constructed by SWIOTLB IOMMU.
>
> Perhaps the PA-RISC IOMMU creates the DMA addresses differently?
>
> When the card gets programmed, you do end up using ttm_agp_bind right?
> I am wondering if something like this:
>
> https://lkml.org/lkml/2010/12/6/512
>
> is needed to pass in the right DMA address?

No idea how to modify ttm_agp_bind() this way, though doesn't matter if
swiotlb isn't used anyway?
Alex Deucher Sept. 17, 2013, 2:24 p.m. UTC | #4
On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
> Alex,
>
> 10.09.2013, ? 16:37, Alex Deucher <alexdeucher@gmail.com> ???????(?):
>
>> The dummy page isn't really going to help much.  That page is just
>> used as a safety placeholder for gart entries that aren't mapped on
>> the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
>> the backing pages for the gart.
>
>> You may want to look there.
>
> Ah, sorry. Indeed. Though, my idea with:
>
> On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>
>> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
>> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
>
> doesn't make a sense at TTM part as well.

After the driver is loaded, you can dump some info from debugfs:
r100_rbbm_info
r100_cp_ring_info
r100_cp_csq_fifo
Which will dump a bunch of registers and internal fifos so we can see
that the chip actually processed.

Alex

>
> Konrad,
>
> 10.09.2013, 17:25, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>:
>>
>> Is this platform enabling the SWIOTLB layer?
>
> Doesn't look like.
>
>> The reason I am asking is
>> b/c if you do indeed enable it you end up using the TTM DMA pool
>> which allocates pages using the dma_alloc_coherent - which means that
>> all of the pages that come out of TTM are already 'DMA' mapped.
>>
>> And that means the radeon_gart_bind and all its friends
>> use the DMA addresses that have been constructed by SWIOTLB IOMMU.
>>
>> Perhaps the PA-RISC IOMMU creates the DMA addresses differently?
>>
>> When the card gets programmed, you do end up using ttm_agp_bind right?
>> I am wondering if something like this:
>>
>> https://lkml.org/lkml/2010/12/6/512
>>
>> is needed to pass in the right DMA address?
>
> No idea how to modify ttm_agp_bind() this way, though doesn't matter if
> swiotlb isn't used anyway?
Alex Ivanov Sept. 17, 2013, 7:33 p.m. UTC | #5
17.09.2013, ? 18:24, Alex Deucher <alexdeucher@gmail.com> ???????(?):

> On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>> Alex,
>> 
>> 10.09.2013, ? 16:37, Alex Deucher <alexdeucher@gmail.com> ???????(?):
>> 
>>> The dummy page isn't really going to help much.  That page is just
>>> used as a safety placeholder for gart entries that aren't mapped on
>>> the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
>>> the backing pages for the gart.
>> 
>>> You may want to look there.
>> 
>> Ah, sorry. Indeed. Though, my idea with:
>> 
>> On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>> 
>>> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
>>> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
>> 
>> doesn't make a sense at TTM part as well.
> 
> After the driver is loaded, you can dump some info from debugfs:
> r100_rbbm_info
> r100_cp_ring_info
> r100_cp_csq_fifo
> Which will dump a bunch of registers and internal fifos so we can see
> that the chip actually processed.
> 
> Alex

Reading of r100_cp_ring_info leads to a KP:

r100_debugfs_cp_ring_info():
count = (rdp + ring->ring_size - wdp) & ring->ptr_mask;
i = (rdp + j) & ring->ptr_mask;

        for (j = 0; j <= count; j++) {
                i = (rdp + j) & ring->ptr_mask;
		--> Here at first iteration <--
		--> count = 262080, i = 0 <--
                seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]);
        }

Reading of radeon_ring_gfx (which i've additionally tried to read) 
throws an MCE:

radeon_debugfs_ring_info():
count = (ring->ring_size / 4) - ring->ring_free_dw;
i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask;

        for (j = 0; j <= (count + 32); j++) {
		--> Here at first iteration <--
		--> i = 262112, j = 0 <--
                seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]);
                i = (i + 1) & ring->ptr_mask;
        }

I'm attaching debug outputs on kernel built with these loops commented.
Alex Ivanov Sept. 20, 2013, 6:52 a.m. UTC | #6
17.09.2013, ? 23:33, Alex Ivanov <gnidorah@p0n4ik.tk> ???????(?):

> 17.09.2013, ? 18:24, Alex Deucher <alexdeucher@gmail.com> ???????(?):
> 
>> On Tue, Sep 17, 2013 at 5:23 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>>> Alex,
>>> 
>>> 10.09.2013, ? 16:37, Alex Deucher <alexdeucher@gmail.com> ???????(?):
>>> 
>>>> The dummy page isn't really going to help much.  That page is just
>>>> used as a safety placeholder for gart entries that aren't mapped on
>>>> the GPU.  TTM (drivers/gpu/drm/ttm) actually does the allocation of
>>>> the backing pages for the gart.
>>> 
>>>> You may want to look there.
>>> 
>>> Ah, sorry. Indeed. Though, my idea with:
>>> 
>>> On Tue, Sep 10, 2013 at 5:20 AM, Alex Ivanov <gnidorah@p0n4ik.tk> wrote:
>>> 
>>>> Thanks! I'll try. Meanwhile i've tried a switch from page_alloc() to
>>>> dma_alloc_coherent() in radeon_dummy_page_*(), which didn't help :(
>>> 
>>> doesn't make a sense at TTM part as well.
>> 
>> After the driver is loaded, you can dump some info from debugfs:
>> r100_rbbm_info
>> r100_cp_ring_info
>> r100_cp_csq_fifo
>> Which will dump a bunch of registers and internal fifos so we can see
>> that the chip actually processed.
>> 
>> Alex
> 
> Reading of r100_cp_ring_info leads to a KP:
> 
> r100_debugfs_cp_ring_info():
> count = (rdp + ring->ring_size - wdp) & ring->ptr_mask;
> i = (rdp + j) & ring->ptr_mask;
> 
>        for (j = 0; j <= count; j++) {
>                i = (rdp + j) & ring->ptr_mask;
> 		--> Here at first iteration <--
> 		--> count = 262080, i = 0 <--
>                seq_printf(m, "r[%04d]=0x%08x\n", i, ring->ring[i]);
>        }
> 
> Reading of radeon_ring_gfx (which i've additionally tried to read) 
> throws an MCE:
> 
> radeon_debugfs_ring_info():
> count = (ring->ring_size / 4) - ring->ring_free_dw;
> i = (ring->rptr + ring->ptr_mask + 1 - 32) & ring->ptr_mask;
> 
>        for (j = 0; j <= (count + 32); j++) {
> 		--> Here at first iteration <--
> 		--> count = 64, i = 262112 <--
>                seq_printf(m, "r[%5d]=0x%08x\n", i, ring->ring[i]);
>                i = (i + 1) & ring->ptr_mask;
>        }
> 
> I'm attaching debug outputs on kernel built with these loops commented.
> <drm_parisc_debug.tgz>_______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

The ring->ring is NULL...
diff mbox

Patch

--- radeon_device.c.orig	2013-09-10 08:55:05.000000000 +0000
+++ radeon_device.c	2013-09-10 09:12:17.000000000 +0000
@@ -673,15 +673,13 @@  int radeon_dummy_page_init(struct radeon
 {
 	if (rdev->dummy_page.page)
 		return 0;
-	rdev->dummy_page.page = alloc_page(GFP_DMA32 | GFP_KERNEL | __GFP_ZERO);
-	if (rdev->dummy_page.page == NULL)
+	rdev->dummy_page.page = dma_alloc_coherent(&rdev->pdev->dev, PAGE_SIZE,
+		&rdev->dummy_page.addr, GFP_DMA32|GFP_KERNEL);
+	if (!rdev->dummy_page.page)
 		return -ENOMEM;
-	rdev->dummy_page.addr = pci_map_page(rdev->pdev, rdev->dummy_page.page,
-					0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	if (pci_dma_mapping_error(rdev->pdev, rdev->dummy_page.addr)) {
 		dev_err(&rdev->pdev->dev, "Failed to DMA MAP the dummy page\n");
-		__free_page(rdev->dummy_page.page);
-		rdev->dummy_page.page = NULL;
+		radeon_dummy_page_fini(rdev);
 		return -ENOMEM;
 	}
 	return 0;
@@ -698,9 +696,8 @@  void radeon_dummy_page_fini(struct radeo
 {
 	if (rdev->dummy_page.page == NULL)
 		return;
-	pci_unmap_page(rdev->pdev, rdev->dummy_page.addr,
-			PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-	__free_page(rdev->dummy_page.page);
+	dma_free_coherent(&rdev->pdev->dev, PAGE_SIZE,
+		rdev->dummy_page.page, rdev->dummy_page.addr);
 	rdev->dummy_page.page = NULL;
 }