diff mbox

[4/4] drm/vgem: flush page during page fault

Message ID 20180117003559.67837-4-gurchetansingh@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Gurchetan Singh Jan. 17, 2018, 12:35 a.m. UTC
This is required to use buffers allocated by vgem on AMD and ARM devices.
We're experiencing a case where eviction of the cache races with userspace
writes. To fix this, flush the cache after retrieving a page.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
---
 drivers/gpu/drm/vgem/vgem_drv.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Daniel Vetter Jan. 17, 2018, 8:39 a.m. UTC | #1
On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> This is required to use buffers allocated by vgem on AMD and ARM devices.
> We're experiencing a case where eviction of the cache races with userspace
> writes. To fix this, flush the cache after retrieving a page.
> 
> Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> ---
>  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index 35bfdfb746a7..fb263969f02d 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>  				break;
>  		}
>  
> +		drm_flush_pages(obj->base.dev->dev, &page, 1);

Uh ... what exactly are you doing?

Asking because the entire "who's responsible for coherency" story is
entirely undefined still when doing buffer sharing :-/ What is clear is
that currently vgem entirely ignores this (there's not
begin/end_cpu_access callback), mostly because the shared dma-buf support
in drm_prime.c also entirely ignores this. And doing a one-time only
flushing in your fault handler is definitely not going to fix this (at
least not if you do anything else than one-shot uploads).
-Daniel

>  	}
>  	return ret;
>  }
> -- 
> 2.13.5
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Gurchetan Singh Jan. 17, 2018, 10:49 p.m. UTC | #2
On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> > This is required to use buffers allocated by vgem on AMD and ARM devices.
> > We're experiencing a case where eviction of the cache races with
> userspace
> > writes. To fix this, flush the cache after retrieving a page.
> >
> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> > ---
> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
> b/drivers/gpu/drm/vgem/vgem_drv.c
> > index 35bfdfb746a7..fb263969f02d 100644
> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
> >                               break;
> >               }
> >
> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
>
> Uh ... what exactly are you doing?
>
> Asking because the entire "who's responsible for coherency" story is
> entirely undefined still when doing buffer sharing :-/ What is clear is
> that currently vgem entirely ignores this (there's not
> begin/end_cpu_access callback), mostly because the shared dma-buf support
> in drm_prime.c also entirely ignores this.



This patch isn't trying to address the case of a dma-buf imported into
vgem.  It's trying to address the case when a buffer is created by
vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by user
space.  Since the page retrieved by shmem_read_mapping_page during the page
fault may still be in the cache, we're experiencing incorrect data in
buffer.  Here's the test case we're running:

https://chromium.googlesource.com/chromiumos/platform/drm-tests/+/master/vgem_test.c

It fails on line 210 on AMD and ARM devices (not Intel though).

And doing a one-time only
> flushing in your fault handler is definitely not going to fix this (at
> least not if you do anything else than one-shot uploads).
>

There used to be a be vgem_gem_get_pages function, but that's been
removed.  I don't know where else to flush in this situation.


> -Daniel
>
> >       }
> >       return ret;
> >  }
> > --
> > 2.13.5
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
Daniel Vetter Jan. 18, 2018, 7:38 a.m. UTC | #3
On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
<gurchetansingh@chromium.org> wrote:
>
> On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
>> > This is required to use buffers allocated by vgem on AMD and ARM
>> > devices.
>> > We're experiencing a case where eviction of the cache races with
>> > userspace
>> > writes. To fix this, flush the cache after retrieving a page.
>> >
>> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
>> > ---
>> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
>> > b/drivers/gpu/drm/vgem/vgem_drv.c
>> > index 35bfdfb746a7..fb263969f02d 100644
>> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
>> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
>> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>> >                               break;
>> >               }
>> >
>> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
>>
>> Uh ... what exactly are you doing?
>>
>> Asking because the entire "who's responsible for coherency" story is
>> entirely undefined still when doing buffer sharing :-/ What is clear is
>> that currently vgem entirely ignores this (there's not
>> begin/end_cpu_access callback), mostly because the shared dma-buf support
>> in drm_prime.c also entirely ignores this.
>
>
>
> This patch isn't trying to address the case of a dma-buf imported into vgem.
> It's trying to address the case when a buffer is created by
> vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by user
> space.  Since the page retrieved by shmem_read_mapping_page during the page
> fault may still be in the cache, we're experiencing incorrect data in
> buffer.  Here's the test case we're running:
>
> https://chromium.googlesource.com/chromiumos/platform/drm-tests/+/master/vgem_test.c

404s over here (Internal url?).

> It fails on line 210 on AMD and ARM devices (not Intel though).

So you _do_ import it on the other device driver as a dma-buf (and
export it from vgem)? Because coherency isn't well-defined for dma-buf
no matter who the exporter/importer is.

>> And doing a one-time only
>> flushing in your fault handler is definitely not going to fix this (at
>> least not if you do anything else than one-shot uploads).
>
>
> There used to be a be vgem_gem_get_pages function, but that's been removed.
> I don't know where else to flush in this situation.

dma_buf begin/end cpu access. Even exposed as an ioctl when you're
using the dma-buf mmap stuff. Vgem doesn't have that, which means
as-is the dumb mmap support of vgem can't really support this if you
want to do explicit flushing.

What would work is uncached/writecombine/coherent dma memory. But then
we're in the middle of the entire
"who's responsible.
-Daniel

>
>>
>> -Daniel
>>
>> >       }
>> >       return ret;
>> >  }
>> > --
>> > 2.13.5
>> >
>> > _______________________________________________
>> > dri-devel mailing list
>> > dri-devel@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>
>
Gurchetan Singh Jan. 18, 2018, 5:23 p.m. UTC | #4
On Wed, Jan 17, 2018 at 11:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote:

> On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
> <gurchetansingh@chromium.org> wrote:
> >
> > On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >>
> >> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> >> > This is required to use buffers allocated by vgem on AMD and ARM
> >> > devices.
> >> > We're experiencing a case where eviction of the cache races with
> >> > userspace
> >> > writes. To fix this, flush the cache after retrieving a page.
> >> >
> >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> >> > ---
> >> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
> >> >  1 file changed, 1 insertion(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
> >> > b/drivers/gpu/drm/vgem/vgem_drv.c
> >> > index 35bfdfb746a7..fb263969f02d 100644
> >> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> >> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> >> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
> >> >                               break;
> >> >               }
> >> >
> >> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
> >>
> >> Uh ... what exactly are you doing?
> >>
> >> Asking because the entire "who's responsible for coherency" story is
> >> entirely undefined still when doing buffer sharing :-/ What is clear is
> >> that currently vgem entirely ignores this (there's not
> >> begin/end_cpu_access callback), mostly because the shared dma-buf
> support
> >> in drm_prime.c also entirely ignores this.
> >
> >
> >
> > This patch isn't trying to address the case of a dma-buf imported into
> vgem.
> > It's trying to address the case when a buffer is created by
> > vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by
> user
> > space.  Since the page retrieved by shmem_read_mapping_page during the
> page
> > fault may still be in the cache, we're experiencing incorrect data in
> > buffer.  Here's the test case we're running:
> >
> > https://chromium.googlesource.com/chromiumos/platform/drm-te
> sts/+/master/vgem_test.c
>
> 404s over here (Internal url?).


Hmm ... I was able to access that link without being logged in to any
accounts.

> It fails on line 210 on AMD and ARM devices (not Intel though).
>
> So you _do_ import it on the other device driver as a dma-buf (and
> export it from vgem)?


Those portions of the test work fine (if the platform has a drm_clflush
implementation).  vgem_prime_pin calls drm_clflush_pages for the exporting
case.  Typically, ARM drivers flush the cache after drm_gem_get_pages() and
only do WC mappings, so import works.  For Intel, there is some hardware
level coherency involved.  The problem is vgem doesn't flush the cache on
ARM/AMD when getting pages for the non-export/non-import case (when
faulting after a vgem_gem_dumb_map, not during dma-buf mmap) -- i.e, during
regular use of the buffer.

>
> >> And doing a one-time only
> >> flushing in your fault handler is definitely not going to fix this (at
> >> least not if you do anything else than one-shot uploads).
> >
> >
> > There used to be a be vgem_gem_get_pages function, but that's been
> removed.
> > I don't know where else to flush in this situation.
>
> dma_buf begin/end cpu access. Even exposed as an ioctl when you're
> using the dma-buf mmap stuff. Vgem doesn't have that, which means
> as-is the dumb mmap support of vgem can't really support this if you
> want to do explicit flushing.


> What would work is uncached/writecombine/coherent dma memory. But then
> we're in the middle of the entire
> "who's responsible.
> -Daniel
>
> >
> >>
> >> -Daniel
> >>
> >> >       }
> >> >       return ret;
> >> >  }
> >> > --
> >> > 2.13.5
> >> >
> >> > _______________________________________________
> >> > dri-devel mailing list
> >> > dri-devel@lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >
> >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
Daniel Vetter Jan. 30, 2018, 9:14 a.m. UTC | #5
On Thu, Jan 18, 2018 at 09:23:31AM -0800, Gurchetan Singh wrote:
> On Wed, Jan 17, 2018 at 11:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
> > <gurchetansingh@chromium.org> wrote:
> > >
> > > On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>
> > >> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> > >> > This is required to use buffers allocated by vgem on AMD and ARM
> > >> > devices.
> > >> > We're experiencing a case where eviction of the cache races with
> > >> > userspace
> > >> > writes. To fix this, flush the cache after retrieving a page.
> > >> >
> > >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> > >> > ---
> > >> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
> > >> >  1 file changed, 1 insertion(+)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > b/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > index 35bfdfb746a7..fb263969f02d 100644
> > >> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > >> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
> > >> >                               break;
> > >> >               }
> > >> >
> > >> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
> > >>
> > >> Uh ... what exactly are you doing?
> > >>
> > >> Asking because the entire "who's responsible for coherency" story is
> > >> entirely undefined still when doing buffer sharing :-/ What is clear is
> > >> that currently vgem entirely ignores this (there's not
> > >> begin/end_cpu_access callback), mostly because the shared dma-buf
> > support
> > >> in drm_prime.c also entirely ignores this.
> > >
> > >
> > >
> > > This patch isn't trying to address the case of a dma-buf imported into
> > vgem.
> > > It's trying to address the case when a buffer is created by
> > > vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by
> > user
> > > space.  Since the page retrieved by shmem_read_mapping_page during the
> > page
> > > fault may still be in the cache, we're experiencing incorrect data in
> > > buffer.  Here's the test case we're running:
> > >
> > > https://chromium.googlesource.com/chromiumos/platform/drm-te
> > sts/+/master/vgem_test.c
> >
> > 404s over here (Internal url?).
> 
> 
> Hmm ... I was able to access that link without being logged in to any
> accounts.
> 
> > It fails on line 210 on AMD and ARM devices (not Intel though).
> >
> > So you _do_ import it on the other device driver as a dma-buf (and
> > export it from vgem)?
> 
> 
> Those portions of the test work fine (if the platform has a drm_clflush
> implementation).  vgem_prime_pin calls drm_clflush_pages for the exporting
> case.  Typically, ARM drivers flush the cache after drm_gem_get_pages() and
> only do WC mappings, so import works.  For Intel, there is some hardware
> level coherency involved.  The problem is vgem doesn't flush the cache on
> ARM/AMD when getting pages for the non-export/non-import case (when
> faulting after a vgem_gem_dumb_map, not during dma-buf mmap) -- i.e, during
> regular use of the buffer.

So if the idea is that the vgem buffers should be accessed using WC, then
we need to switch the dump_map stuff to do wc. Flushing once is not going
to fix things if you write again (afaik CrOS only does write-once and then
throws buffers away again, but not sure).

The other option is to throw dumb_map into the wind and only support
dma-buf mmaping, which has a special ioctl for range flushing (so that we
could flush before/after each access as needed). A gross hack would be to
keep using dumb_map but abuse the dma-buf flushing ioctl for the flushing.

The problem ofc is that there's no agreement between importers/exporters
on who should flush when and where. Fixing that is way too much work, so
personally I think the simplest clean fix is something along the lines of
using the dma-buf flush ioctl (DMA_BUF_IOCTL_SYNC).

Cheers, Daniel

> 
> >
> > >> And doing a one-time only
> > >> flushing in your fault handler is definitely not going to fix this (at
> > >> least not if you do anything else than one-shot uploads).
> > >
> > >
> > > There used to be a be vgem_gem_get_pages function, but that's been
> > removed.
> > > I don't know where else to flush in this situation.
> >
> > dma_buf begin/end cpu access. Even exposed as an ioctl when you're
> > using the dma-buf mmap stuff. Vgem doesn't have that, which means
> > as-is the dumb mmap support of vgem can't really support this if you
> > want to do explicit flushing.
> 
> 
> > What would work is uncached/writecombine/coherent dma memory. But then
> > we're in the middle of the entire
> > "who's responsible.
> > -Daniel
> >
> > >
> > >>
> > >> -Daniel
> > >>
> > >> >       }
> > >> >       return ret;
> > >> >  }
> > >> > --
> > >> > 2.13.5
> > >> >
> > >> > _______________________________________________
> > >> > dri-devel mailing list
> > >> > dri-devel@lists.freedesktop.org
> > >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>
> > >> --
> > >> Daniel Vetter
> > >> Software Engineer, Intel Corporation
> > >> http://blog.ffwll.ch
> > >
> > >
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
Gurchetan Singh Jan. 30, 2018, 8:01 p.m. UTC | #6
On Tue, Jan 30, 2018 at 1:14 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Jan 18, 2018 at 09:23:31AM -0800, Gurchetan Singh wrote:
> > On Wed, Jan 17, 2018 at 11:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
> > > <gurchetansingh@chromium.org> wrote:
> > > >
> > > > On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >>
> > > >> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
> > > >> > This is required to use buffers allocated by vgem on AMD and ARM
> > > >> > devices.
> > > >> > We're experiencing a case where eviction of the cache races with
> > > >> > userspace
> > > >> > writes. To fix this, flush the cache after retrieving a page.
> > > >> >
> > > >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
> > > >> > ---
> > > >> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
> > > >> >  1 file changed, 1 insertion(+)
> > > >> >
> > > >> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
> > > >> > b/drivers/gpu/drm/vgem/vgem_drv.c
> > > >> > index 35bfdfb746a7..fb263969f02d 100644
> > > >> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
> > > >> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> > > >> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
> > > >> >                               break;
> > > >> >               }
> > > >> >
> > > >> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
> > > >>
> > > >> Uh ... what exactly are you doing?
> > > >>
> > > >> Asking because the entire "who's responsible for coherency" story is
> > > >> entirely undefined still when doing buffer sharing :-/ What is clear is
> > > >> that currently vgem entirely ignores this (there's not
> > > >> begin/end_cpu_access callback), mostly because the shared dma-buf
> > > support
> > > >> in drm_prime.c also entirely ignores this.
> > > >
> > > >
> > > >
> > > > This patch isn't trying to address the case of a dma-buf imported into
> > > vgem.
> > > > It's trying to address the case when a buffer is created by
> > > > vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by
> > > user
> > > > space.  Since the page retrieved by shmem_read_mapping_page during the
> > > page
> > > > fault may still be in the cache, we're experiencing incorrect data in
> > > > buffer.  Here's the test case we're running:
> > > >
> > > > https://chromium.googlesource.com/chromiumos/platform/drm-te
> > > sts/+/master/vgem_test.c
> > >
> > > 404s over here (Internal url?).
> >
> >
> > Hmm ... I was able to access that link without being logged in to any
> > accounts.
> >
> > > It fails on line 210 on AMD and ARM devices (not Intel though).
> > >
> > > So you _do_ import it on the other device driver as a dma-buf (and
> > > export it from vgem)?
> >
> >
> > Those portions of the test work fine (if the platform has a drm_clflush
> > implementation).  vgem_prime_pin calls drm_clflush_pages for the exporting
> > case.  Typically, ARM drivers flush the cache after drm_gem_get_pages() and
> > only do WC mappings, so import works.  For Intel, there is some hardware
> > level coherency involved.  The problem is vgem doesn't flush the cache on
> > ARM/AMD when getting pages for the non-export/non-import case (when
> > faulting after a vgem_gem_dumb_map, not during dma-buf mmap) -- i.e, during
> > regular use of the buffer.
>
> So if the idea is that the vgem buffers should be accessed using WC, then
> we need to switch the dump_map stuff to do wc. Flushing once is not going
> to fix things if you write again (afaik CrOS only does write-once and then
> throws buffers away again, but not sure).

vgem dumb mmap is already write-combined, no?  As I understand it,
after calling the dumb map ioctl (to create the fake offset) and then
calling mmap(), the stack looks something like this:

drm_gem_mmap(..)
vgem_mmap(..)
call_mmap(..)
mmap_region(..)
mmap_pgoff(..)
SyS_mmap_pgoff(..)

drm_gem_mmap() always maps WC.

> The other option is to throw dumb_map into the wind and only support
> dma-buf mmaping, which has a special ioctl for range flushing (so that we
> could flush before/after each access as needed). A gross hack would be to
> keep using dumb_map but abuse the dma-buf flushing ioctl for the flushing.

Some SW drivers (kms_swrast in Mesa) rely on dumb mapping, though in
theory they can be modified to use dma-buf mmap.

> The problem ofc is that there's no agreement between importers/exporters
> on who should flush when and where. Fixing that is way too much work, so
> personally I think the simplest clean fix is something along the lines of
> using the dma-buf flush ioctl (DMA_BUF_IOCTL_SYNC).

The failure case I'm experiencing is not related to
importing/exporting.  The issue is the pages retrieved on ARM
architecture usually have to be flushed before they can be used (see
rockchip_gem_get_pages / tegra_bo_get_pages /
etnaviv_gem_scatter_map()), even if they are mapped WC later.  Since
we couldn't get any agreement on implementing ARM support in
drm_cache.c, the recommended approach for getting vgem working on ARM
seems to be:

1) Create vgem_get_pages / vgem_put_pages functions.  vgem_get_pages
will be called from vgem_gem_fault (since currently vgem does delayed
allocation) and vgem_pin_pages.
2) Call dma_map_sg() / dma_unmap_sg in vgem_get_pages / vgem_put_pages
if the architecture is ARM (we'll have some #ifdef-ery).

Does that work for you?

>
> Cheers, Daniel
>
> >
> > >
> > > >> And doing a one-time only
> > > >> flushing in your fault handler is definitely not going to fix this (at
> > > >> least not if you do anything else than one-shot uploads).
> > > >
> > > >
> > > > There used to be a be vgem_gem_get_pages function, but that's been
> > > removed.
> > > > I don't know where else to flush in this situation.
> > >
> > > dma_buf begin/end cpu access. Even exposed as an ioctl when you're
> > > using the dma-buf mmap stuff. Vgem doesn't have that, which means
> > > as-is the dumb mmap support of vgem can't really support this if you
> > > want to do explicit flushing.
> >
> >
> > > What would work is uncached/writecombine/coherent dma memory. But then
> > > we're in the middle of the entire
> > > "who's responsible.
> > > -Daniel
> > >
> > > >
> > > >>
> > > >> -Daniel
> > > >>
> > > >> >       }
> > > >> >       return ret;
> > > >> >  }
> > > >> > --
> > > >> > 2.13.5
> > > >> >
> > > >> > _______________________________________________
> > > >> > dri-devel mailing list
> > > >> > dri-devel@lists.freedesktop.org
> > > >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >>
> > > >> --
> > > >> Daniel Vetter
> > > >> Software Engineer, Intel Corporation
> > > >> http://blog.ffwll.ch
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Jan. 30, 2018, 10:08 p.m. UTC | #7
On Tue, Jan 30, 2018 at 9:01 PM, Gurchetan Singh
<gurchetansingh@chromium.org> wrote:
> On Tue, Jan 30, 2018 at 1:14 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Thu, Jan 18, 2018 at 09:23:31AM -0800, Gurchetan Singh wrote:
>> > On Wed, Jan 17, 2018 at 11:38 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> >
>> > > On Wed, Jan 17, 2018 at 11:49 PM, Gurchetan Singh
>> > > <gurchetansingh@chromium.org> wrote:
>> > > >
>> > > > On Wed, Jan 17, 2018 at 12:39 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > > >>
>> > > >> On Tue, Jan 16, 2018 at 04:35:59PM -0800, Gurchetan Singh wrote:
>> > > >> > This is required to use buffers allocated by vgem on AMD and ARM
>> > > >> > devices.
>> > > >> > We're experiencing a case where eviction of the cache races with
>> > > >> > userspace
>> > > >> > writes. To fix this, flush the cache after retrieving a page.
>> > > >> >
>> > > >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
>> > > >> > ---
>> > > >> >  drivers/gpu/drm/vgem/vgem_drv.c | 1 +
>> > > >> >  1 file changed, 1 insertion(+)
>> > > >> >
>> > > >> > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c
>> > > >> > b/drivers/gpu/drm/vgem/vgem_drv.c
>> > > >> > index 35bfdfb746a7..fb263969f02d 100644
>> > > >> > --- a/drivers/gpu/drm/vgem/vgem_drv.c
>> > > >> > +++ b/drivers/gpu/drm/vgem/vgem_drv.c
>> > > >> > @@ -112,6 +112,7 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>> > > >> >                               break;
>> > > >> >               }
>> > > >> >
>> > > >> > +             drm_flush_pages(obj->base.dev->dev, &page, 1);
>> > > >>
>> > > >> Uh ... what exactly are you doing?
>> > > >>
>> > > >> Asking because the entire "who's responsible for coherency" story is
>> > > >> entirely undefined still when doing buffer sharing :-/ What is clear is
>> > > >> that currently vgem entirely ignores this (there's not
>> > > >> begin/end_cpu_access callback), mostly because the shared dma-buf
>> > > support
>> > > >> in drm_prime.c also entirely ignores this.
>> > > >
>> > > >
>> > > >
>> > > > This patch isn't trying to address the case of a dma-buf imported into
>> > > vgem.
>> > > > It's trying to address the case when a buffer is created by
>> > > > vgem_gem_dumb_create, mapped by vgem_gem_dumb_map and then accessed by
>> > > user
>> > > > space.  Since the page retrieved by shmem_read_mapping_page during the
>> > > page
>> > > > fault may still be in the cache, we're experiencing incorrect data in
>> > > > buffer.  Here's the test case we're running:
>> > > >
>> > > > https://chromium.googlesource.com/chromiumos/platform/drm-te
>> > > sts/+/master/vgem_test.c
>> > >
>> > > 404s over here (Internal url?).
>> >
>> >
>> > Hmm ... I was able to access that link without being logged in to any
>> > accounts.
>> >
>> > > It fails on line 210 on AMD and ARM devices (not Intel though).
>> > >
>> > > So you _do_ import it on the other device driver as a dma-buf (and
>> > > export it from vgem)?
>> >
>> >
>> > Those portions of the test work fine (if the platform has a drm_clflush
>> > implementation).  vgem_prime_pin calls drm_clflush_pages for the exporting
>> > case.  Typically, ARM drivers flush the cache after drm_gem_get_pages() and
>> > only do WC mappings, so import works.  For Intel, there is some hardware
>> > level coherency involved.  The problem is vgem doesn't flush the cache on
>> > ARM/AMD when getting pages for the non-export/non-import case (when
>> > faulting after a vgem_gem_dumb_map, not during dma-buf mmap) -- i.e, during
>> > regular use of the buffer.
>>
>> So if the idea is that the vgem buffers should be accessed using WC, then
>> we need to switch the dump_map stuff to do wc. Flushing once is not going
>> to fix things if you write again (afaik CrOS only does write-once and then
>> throws buffers away again, but not sure).
>
> vgem dumb mmap is already write-combined, no?  As I understand it,
> after calling the dumb map ioctl (to create the fake offset) and then
> calling mmap(), the stack looks something like this:
>
> drm_gem_mmap(..)
> vgem_mmap(..)
> call_mmap(..)
> mmap_region(..)
> mmap_pgoff(..)
> SyS_mmap_pgoff(..)
>
> drm_gem_mmap() always maps WC.

Oh, interesting. Never realized that, but yeah if you just return a
page (and dont call one of the vm_insert_* functions from your fault
handler directly), then the default pgprot settings of drm_gem_mmap
take over, which is wc.

Now I wonder how this ever worked/s on i915 ... Other issue is that if
you get a lowmem page on arm32 you have caching attribute aliasing
issues as Russell points out. We'd not just need to make sure the
pages are flushed, but also allocated as something we can use for wc.

>> The other option is to throw dumb_map into the wind and only support
>> dma-buf mmaping, which has a special ioctl for range flushing (so that we
>> could flush before/after each access as needed). A gross hack would be to
>> keep using dumb_map but abuse the dma-buf flushing ioctl for the flushing.
>
> Some SW drivers (kms_swrast in Mesa) rely on dumb mapping, though in
> theory they can be modified to use dma-buf mmap.

As long as you don't share dumb is good enough.

>> The problem ofc is that there's no agreement between importers/exporters
>> on who should flush when and where. Fixing that is way too much work, so
>> personally I think the simplest clean fix is something along the lines of
>> using the dma-buf flush ioctl (DMA_BUF_IOCTL_SYNC).
>
> The failure case I'm experiencing is not related to
> importing/exporting.  The issue is the pages retrieved on ARM
> architecture usually have to be flushed before they can be used (see
> rockchip_gem_get_pages / tegra_bo_get_pages /
> etnaviv_gem_scatter_map()), even if they are mapped WC later.  Since
> we couldn't get any agreement on implementing ARM support in
> drm_cache.c, the recommended approach for getting vgem working on ARM
> seems to be:
>
> 1) Create vgem_get_pages / vgem_put_pages functions.  vgem_get_pages
> will be called from vgem_gem_fault (since currently vgem does delayed
> allocation) and vgem_pin_pages.
> 2) Call dma_map_sg() / dma_unmap_sg in vgem_get_pages / vgem_put_pages
> if the architecture is ARM (we'll have some #ifdef-ery).
>
> Does that work for you?

Tbh I have no idea how you get at wc-capable pages in a
platform-agnostic way. Figuring that out will probably solve the the
"must flush them at allocation time" issue too.

But yeah, from your description a vgem_get_pages function that makes
sure the pages are correctly flushed before anyone starts to use them
sounds like the right thing to do.
-Daniel

>
>>
>> Cheers, Daniel
>>
>> >
>> > >
>> > > >> And doing a one-time only
>> > > >> flushing in your fault handler is definitely not going to fix this (at
>> > > >> least not if you do anything else than one-shot uploads).
>> > > >
>> > > >
>> > > > There used to be a be vgem_gem_get_pages function, but that's been
>> > > removed.
>> > > > I don't know where else to flush in this situation.
>> > >
>> > > dma_buf begin/end cpu access. Even exposed as an ioctl when you're
>> > > using the dma-buf mmap stuff. Vgem doesn't have that, which means
>> > > as-is the dumb mmap support of vgem can't really support this if you
>> > > want to do explicit flushing.
>> >
>> >
>> > > What would work is uncached/writecombine/coherent dma memory. But then
>> > > we're in the middle of the entire
>> > > "who's responsible.
>> > > -Daniel
>> > >
>> > > >
>> > > >>
>> > > >> -Daniel
>> > > >>
>> > > >> >       }
>> > > >> >       return ret;
>> > > >> >  }
>> > > >> > --
>> > > >> > 2.13.5
>> > > >> >
>> > > >> > _______________________________________________
>> > > >> > dri-devel mailing list
>> > > >> > dri-devel@lists.freedesktop.org
>> > > >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> > > >>
>> > > >> --
>> > > >> Daniel Vetter
>> > > >> Software Engineer, Intel Corporation
>> > > >> http://blog.ffwll.ch
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Daniel Vetter
>> > > Software Engineer, Intel Corporation
>> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>> > >
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
diff mbox

Patch

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 35bfdfb746a7..fb263969f02d 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -112,6 +112,7 @@  static int vgem_gem_fault(struct vm_fault *vmf)
 				break;
 		}
 
+		drm_flush_pages(obj->base.dev->dev, &page, 1);
 	}
 	return ret;
 }