Message ID | 20181219144546.28224-3-l.stach@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | per-process address spaces for MMUv2 | expand |
Hi Lucas, On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > Keep the page at address 0 as faulting to catch any potential state > setup issues early. This is a nice idea! But applying this and making mesa hit that page leads to the process hanging in D state over here on GC7000: # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds. [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504 [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028 [ 242.752144] Workqueue: events_unbound commit_work [ 242.756860] Call trace: [ 242.759318] __switch_to+0x94/0xd0 [ 242.762741] __schedule+0x1c0/0x6b8 [ 242.766239] schedule+0x40/0xa8 [ 242.769380] schedule_timeout+0x2f0/0x428 [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8 [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0 [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108 [ 242.787819] commit_tail+0x30/0x80 [ 242.791229] commit_work+0x20/0x30 [ 242.794642] process_one_work+0x1ec/0x458 [ 242.798659] worker_thread+0x48/0x430 [ 242.802331] kthread+0x130/0x138 [ 242.805557] ret_from_fork+0x10/0x1c This is in dmesg showing that we hit the first page: [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002 [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40 Without that patch it's sampling random data from that page but does not hang. Cheers, -- Guido > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > --- > drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > index f1c88d8ad5ba..f794e04be9e6 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu) > domain = &etnaviv_domain->base; > > domain->dev = gpu->dev; > - domain->base = 0; > - domain->size = (u64)SZ_1G * 4; > + domain->base = SZ_4K; > + domain->size = (u64)SZ_1G * 4 - SZ_4K; > domain->ops = &etnaviv_iommuv2_ops; > > ret = etnaviv_iommuv2_init(etnaviv_domain); > -- > 2.19.1 > > _______________________________________________ > etnaviv mailing list > etnaviv@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/etnaviv
Hi Guido, Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther: > Hi Lucas, > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > > Keep the page at address 0 as faulting to catch any potential state > > setup issues early. > > This is a nice idea! But applying this and making mesa hit that page > leads to the process hanging in D state over here on GC7000: > > # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds. > [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504 > [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028 > [ 242.752144] Workqueue: events_unbound commit_work > [ 242.756860] Call trace: > [ 242.759318] __switch_to+0x94/0xd0 > [ 242.762741] __schedule+0x1c0/0x6b8 > [ 242.766239] schedule+0x40/0xa8 > [ 242.769380] schedule_timeout+0x2f0/0x428 > [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8 > [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0 > [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108 > [ 242.787819] commit_tail+0x30/0x80 > [ 242.791229] commit_work+0x20/0x30 > [ 242.794642] process_one_work+0x1ec/0x458 > [ 242.798659] worker_thread+0x48/0x430 > [ 242.802331] kthread+0x130/0x138 > [ 242.805557] ret_from_fork+0x10/0x1c > > This is in dmesg showing that we hit the first page: > > [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002 > [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40 > > Without that patch it's sampling random data from that page but does not hang. GPU hangs after a MMU fault are expected or more accurately, we actively request the GPU to stop by setting the exception bit in the page table. A hanging GPU should trigger the scheduler timeout handler, which then makes sure to get the GPU back into a working state. So if things don't progress after the fault for you either the timeout handler is buggy on GC7000, or the fence signaling is broken somehow. I'll take a look at this. Regards, Lucas > Cheers, > -- Guido > > > > > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > > --- > > drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > > index f1c88d8ad5ba..f794e04be9e6 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > > @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu) > > > > domain = &etnaviv_domain->base; > > > > > > domain->dev = gpu->dev; > > > > - domain->base = 0; > > > > - domain->size = (u64)SZ_1G * 4; > > > > + domain->base = SZ_4K; > > > > + domain->size = (u64)SZ_1G * 4 - SZ_4K; > > > > domain->ops = &etnaviv_iommuv2_ops; > > > > > > ret = etnaviv_iommuv2_init(etnaviv_domain); > > -- > > 2.19.1 > > > > _______________________________________________ > > etnaviv mailing list > > etnaviv@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/etnaviv
Hi, On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote: > Hi Guido, > > Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther: > > Hi Lucas, > > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > > > Keep the page at address 0 as faulting to catch any potential state > > > setup issues early. > > > > This is a nice idea! But applying this and making mesa hit that page > > leads to the process hanging in D state over here on GC7000: > > > > # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds. > > [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504 > > [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028 > > [ 242.752144] Workqueue: events_unbound commit_work > > [ 242.756860] Call trace: > > [ 242.759318] __switch_to+0x94/0xd0 > > [ 242.762741] __schedule+0x1c0/0x6b8 > > [ 242.766239] schedule+0x40/0xa8 > > [ 242.769380] schedule_timeout+0x2f0/0x428 > > [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8 > > [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0 > > [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108 > > [ 242.787819] commit_tail+0x30/0x80 > > [ 242.791229] commit_work+0x20/0x30 > > [ 242.794642] process_one_work+0x1ec/0x458 > > [ 242.798659] worker_thread+0x48/0x430 > > [ 242.802331] kthread+0x130/0x138 > > [ 242.805557] ret_from_fork+0x10/0x1c > > > > This is in dmesg showing that we hit the first page: > > > > [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002 > > [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40 > > > > Without that patch it's sampling random data from that page but does not hang. > > GPU hangs after a MMU fault are expected or more accurately, we > actively request the GPU to stop by setting the exception bit in the > page table. Yeah. I put that in to show that this the cause for the trouble above. > > A hanging GPU should trigger the scheduler timeout handler, which then > makes sure to get the GPU back into a working state. So if things don't > progress after the fault for you either the timeout handler is buggy on > GC7000, or the fence signaling is broken somehow. I'll take a look at > this. This isn't a top notch linux-next based tree yet so if you're not seeing this let me forward port our stuff to that and report back again. Cheers, -- Guido
Am Montag, den 07.01.2019, 10:13 +0100 schrieb Guido Günther: > Hi, > On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote: > > Hi Guido, > > > > Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther: > > > Hi Lucas, > > > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > > > > Keep the page at address 0 as faulting to catch any potential state > > > > setup issues early. > > > > > > This is a nice idea! But applying this and making mesa hit that page > > > leads to the process hanging in D state over here on GC7000: > > > > > > # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds. > > > [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504 > > > [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028 > > > [ 242.752144] Workqueue: events_unbound commit_work > > > [ 242.756860] Call trace: > > > [ 242.759318] __switch_to+0x94/0xd0 > > > [ 242.762741] __schedule+0x1c0/0x6b8 > > > [ 242.766239] schedule+0x40/0xa8 > > > [ 242.769380] schedule_timeout+0x2f0/0x428 > > > [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8 > > > [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0 > > > [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108 > > > [ 242.787819] commit_tail+0x30/0x80 > > > [ 242.791229] commit_work+0x20/0x30 > > > [ 242.794642] process_one_work+0x1ec/0x458 > > > [ 242.798659] worker_thread+0x48/0x430 > > > [ 242.802331] kthread+0x130/0x138 > > > [ 242.805557] ret_from_fork+0x10/0x1c > > > > > > This is in dmesg showing that we hit the first page: > > > > > > [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002 > > > [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40 > > > > > > Without that patch it's sampling random data from that page but does not hang. > > > > GPU hangs after a MMU fault are expected or more accurately, we > > actively request the GPU to stop by setting the exception bit in the > > page table. > > Yeah. I put that in to show that this the cause for the trouble above. > > > > > A hanging GPU should trigger the scheduler timeout handler, which then > > makes sure to get the GPU back into a working state. So if things don't > > progress after the fault for you either the timeout handler is buggy on > > GC7000, or the fence signaling is broken somehow. I'll take a look at > > this. > > This isn't a top notch linux-next based tree yet so if you're not seeing this > let me forward port our stuff to that and report back again. I've certainly seen the timeout handler working on GC7000, but with the GC7000 support being relatively lightly tested right now, I wouldn't bet on us handling all corner cases correctly. If this is an issue on a recent kernel, I would certainly love to learn what's going wrong. Regards, Lucas
Am Mi., 19. Dez. 2018 um 15:45 Uhr schrieb Lucas Stach <l.stach@pengutronix.de>: > > Keep the page at address 0 as faulting to catch any potential state > setup issues early. > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> I like this idea.. but I am unsure about Guido's GC7000 problem. Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> > --- > drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > index f1c88d8ad5ba..f794e04be9e6 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu) > domain = &etnaviv_domain->base; > > domain->dev = gpu->dev; > - domain->base = 0; > - domain->size = (u64)SZ_1G * 4; > + domain->base = SZ_4K; > + domain->size = (u64)SZ_1G * 4 - SZ_4K; > domain->ops = &etnaviv_iommuv2_ops; > > ret = etnaviv_iommuv2_init(etnaviv_domain); > -- > 2.19.1 >
Hi, On Mon, Jan 07, 2019 at 04:02:33PM +0100, Lucas Stach wrote: [..snip..] > I've certainly seen the timeout handler working on GC7000, but with the > GC7000 support being relatively lightly tested right now, I wouldn't > bet on us handling all corner cases correctly. > > If this is an issue on a recent kernel, I would certainly love to learn > what's going wrong. I've brought my drm more in line with 5.x and it doesn't seem to hang anymore. Cheers, -- Guido > > Regards, > Lucas > _______________________________________________ > etnaviv mailing list > etnaviv@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/etnaviv
Hi, On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote: > Keep the page at address 0 as faulting to catch any potential state > setup issues early. > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > --- > drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > index f1c88d8ad5ba..f794e04be9e6 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c > @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu) > domain = &etnaviv_domain->base; > > domain->dev = gpu->dev; > - domain->base = 0; > - domain->size = (u64)SZ_1G * 4; > + domain->base = SZ_4K; > + domain->size = (u64)SZ_1G * 4 - SZ_4K; > domain->ops = &etnaviv_iommuv2_ops; > > ret = etnaviv_iommuv2_init(etnaviv_domain); > -- Reviewed-By: Guido Günther <agx@sigxcpu.org> Cheers and sorry for the extreme delay, -- Guido
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c index f1c88d8ad5ba..f794e04be9e6 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu) domain = &etnaviv_domain->base; domain->dev = gpu->dev; - domain->base = 0; - domain->size = (u64)SZ_1G * 4; + domain->base = SZ_4K; + domain->size = (u64)SZ_1G * 4 - SZ_4K; domain->ops = &etnaviv_iommuv2_ops; ret = etnaviv_iommuv2_init(etnaviv_domain);
Keep the page at address 0 as faulting to catch any potential state setup issues early. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> --- drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)