diff mbox series

media: mediatek: vcodec: Alloc DMA memory with DMA_ATTR_ALLOC_SINGLE_PAGES

Message ID 20240422100354.1.I58b4456c014a4d678455a4ec09b908b1c71c3017@changeid (mailing list archive)
State New, archived
Headers show
Series media: mediatek: vcodec: Alloc DMA memory with DMA_ATTR_ALLOC_SINGLE_PAGES | expand

Commit Message

Douglas Anderson April 22, 2024, 5:03 p.m. UTC
As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
really make sense to try to allocate contiguous chunks of memory for
video encoding/decoding. Let's switch the Mediatek vcodec driver to
pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
memory subsystem.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
NOTE: I haven't personally done massive amounts of testing with this
change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
specifically for the video encoding / decoding cases and I know it
helped avoid memory problems in the past on other systems. Colleagues
of mine have told me that with this change memory problems are harder
to reproduce, so it seems like we should consider doing it.

 .../media/platform/mediatek/vcodec/common/mtk_vcodec_util.c    | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Nicolas Dufresne April 22, 2024, 6:27 p.m. UTC | #1
Hi,

Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> really make sense to try to allocate contiguous chunks of memory for
> video encoding/decoding. Let's switch the Mediatek vcodec driver to
> pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> memory subsystem.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> NOTE: I haven't personally done massive amounts of testing with this
> change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> specifically for the video encoding / decoding cases and I know it
> helped avoid memory problems in the past on other systems. Colleagues
> of mine have told me that with this change memory problems are harder
> to reproduce, so it seems like we should consider doing it.

One thing to improve in your patch submission is to avoid abstracting the
problems. Patch review and pulling is based on a technical rational and very
rarely on the trust that it helps someone somewhere in some unknown context.
What kind of memory issues are you facing ? What is the technical advantage of
using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
the issue? I do expect this to be documented in the commit message itselfé.

regards,
Nicolas

> 
>  .../media/platform/mediatek/vcodec/common/mtk_vcodec_util.c    | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> index 9ce34a3b5ee6..3fb1d48c3e15 100644
> --- a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> +++ b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
> @@ -64,7 +64,8 @@ int mtk_vcodec_mem_alloc(void *priv, struct mtk_vcodec_mem *mem)
>  		id = dec_ctx->id;
>  	}
>  
> -	mem->va = dma_alloc_coherent(&plat_dev->dev, size, &mem->dma_addr, GFP_KERNEL);
> +	mem->va = dma_alloc_attrs(&plat_dev->dev, size, &mem->dma_addr,
> +				  GFP_KERNEL, DMA_ATTR_ALLOC_SINGLE_PAGES);
>  	if (!mem->va) {
>  		mtk_v4l2_err(plat_dev, "%s dma_alloc size=%ld failed!",
>  			     dev_name(&plat_dev->dev), size);
Douglas Anderson April 22, 2024, 7:25 p.m. UTC | #2
Hi,

On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
<nicolas.dufresne@collabora.com> wrote:
>
> Hi,
>
> Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > really make sense to try to allocate contiguous chunks of memory for
> > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > memory subsystem.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > NOTE: I haven't personally done massive amounts of testing with this
> > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > specifically for the video encoding / decoding cases and I know it
> > helped avoid memory problems in the past on other systems. Colleagues
> > of mine have told me that with this change memory problems are harder
> > to reproduce, so it seems like we should consider doing it.
>
> One thing to improve in your patch submission is to avoid abstracting the
> problems. Patch review and pulling is based on a technical rational and very
> rarely on the trust that it helps someone somewhere in some unknown context.
> What kind of memory issues are you facing ? What is the technical advantage of
> using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> the issue? I do expect this to be documented in the commit message itselfé.

Right. The problem here is that I'm not _directly_ facing any problems
here and I also haven't done massive amounts of analysis of the
Mediatek video codec. I know that some of my colleagues have run into
issues on Mediatek devices where the system starts getting
unresponsive when lots of videos are decoded in parallel. That
reminded me of the old problem I debugged in 2015 on Rockchip
platforms and is talked about a bunch in the referenced commit
14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
this patch. The referenced commit contains quite a bit of details
about the problems faced back in 2015.

When I asked, my colleagues said that my patch seemed to help, though
it was more of a qualitative statement than a quantitative one.

I wasn't 100% sure if it was worth sending the patch up at this point,
but logically, I think it makes sense. There aren't great reasons to
hog all the large chunks of memory for video decoding.

-Doug
Nicolas Dufresne April 23, 2024, 1:47 p.m. UTC | #3
Hey,

Le lundi 22 avril 2024 à 12:25 -0700, Doug Anderson a écrit :
> Hi,
> 
> On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
> <nicolas.dufresne@collabora.com> wrote:
> > 
> > Hi,
> > 
> > Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > > really make sense to try to allocate contiguous chunks of memory for
> > > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > > memory subsystem.
> > > 
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > NOTE: I haven't personally done massive amounts of testing with this
> > > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > > specifically for the video encoding / decoding cases and I know it
> > > helped avoid memory problems in the past on other systems. Colleagues
> > > of mine have told me that with this change memory problems are harder
> > > to reproduce, so it seems like we should consider doing it.
> > 
> > One thing to improve in your patch submission is to avoid abstracting the
> > problems. Patch review and pulling is based on a technical rational and very
> > rarely on the trust that it helps someone somewhere in some unknown context.
> > What kind of memory issues are you facing ? What is the technical advantage of
> > using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> > the issue? I do expect this to be documented in the commit message itselfé.
> 
> Right. The problem here is that I'm not _directly_ facing any problems
> here and I also haven't done massive amounts of analysis of the
> Mediatek video codec. I know that some of my colleagues have run into
> issues on Mediatek devices where the system starts getting
> unresponsive when lots of videos are decoded in parallel. That
> reminded me of the old problem I debugged in 2015 on Rockchip
> platforms and is talked about a bunch in the referenced commit
> 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
> this patch. The referenced commit contains quite a bit of details
> about the problems faced back in 2015.
> 
> When I asked, my colleagues said that my patch seemed to help, though
> it was more of a qualitative statement than a quantitative one.
> 
> I wasn't 100% sure if it was worth sending the patch up at this point,
> but logically, I think it makes sense. There aren't great reasons to
> hog all the large chunks of memory for video decoding.

Ok, slowly started retracing this 2016 effort (which now I understand you where
deeply involved in). Its pretty clear this hint is only used for codecs. One
thing the explanation seems missing (or that I missed) is that all the enabled
drivers seems to come with a dedicated mmu (dedicated TLB). But this argument
seems void if it is not combined with DMA_ATTR_NO_KERNEL_MAPPING to avoid using
the main mmu TLB space. There is currently three drivers using S5P_MFC, Hantro
and RKVDEC that uses this hint, only Hantro sets the DMA_ATTR_NO_KERNEL_MAPPING
hint.

It would be nice to check if VCODEC needs kernel mapping on the RAW images, and
introduce that hint too while introducing DMA_ATTR_ALLOC_SINGLE_PAGES. But with
a now proper understanding, I also feel like this is wanted , but I'll have a
hard time thinking of a test that shows the performance gain, since it requires
specific level of fragmentation in the system to make a difference.

Another aspect of the original description that is off, is CODECs doing linear
access, while this is mostly true for reconstruction (writes), this is not true
for prediction (reads). What really matters is that the CODECs tiles are most of
the time no bigger then a page, or less then a handful of pages.

Nicolas
Douglas Anderson April 23, 2024, 9:52 p.m. UTC | #4
Hi,

On Tue, Apr 23, 2024 at 6:47 AM Nicolas Dufresne
<nicolas.dufresne@collabora.com> wrote:
>
> Hey,
>
> Le lundi 22 avril 2024 à 12:25 -0700, Doug Anderson a écrit :
> > Hi,
> >
> > On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
> > <nicolas.dufresne@collabora.com> wrote:
> > >
> > > Hi,
> > >
> > > Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > > > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > > > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > > > really make sense to try to allocate contiguous chunks of memory for
> > > > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > > > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > > > memory subsystem.
> > > >
> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > ---
> > > > NOTE: I haven't personally done massive amounts of testing with this
> > > > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > > > specifically for the video encoding / decoding cases and I know it
> > > > helped avoid memory problems in the past on other systems. Colleagues
> > > > of mine have told me that with this change memory problems are harder
> > > > to reproduce, so it seems like we should consider doing it.
> > >
> > > One thing to improve in your patch submission is to avoid abstracting the
> > > problems. Patch review and pulling is based on a technical rational and very
> > > rarely on the trust that it helps someone somewhere in some unknown context.
> > > What kind of memory issues are you facing ? What is the technical advantage of
> > > using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> > > the issue? I do expect this to be documented in the commit message itselfé.
> >
> > Right. The problem here is that I'm not _directly_ facing any problems
> > here and I also haven't done massive amounts of analysis of the
> > Mediatek video codec. I know that some of my colleagues have run into
> > issues on Mediatek devices where the system starts getting
> > unresponsive when lots of videos are decoded in parallel. That
> > reminded me of the old problem I debugged in 2015 on Rockchip
> > platforms and is talked about a bunch in the referenced commit
> > 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
> > this patch. The referenced commit contains quite a bit of details
> > about the problems faced back in 2015.
> >
> > When I asked, my colleagues said that my patch seemed to help, though
> > it was more of a qualitative statement than a quantitative one.
> >
> > I wasn't 100% sure if it was worth sending the patch up at this point,
> > but logically, I think it makes sense. There aren't great reasons to
> > hog all the large chunks of memory for video decoding.
>
> Ok, slowly started retracing this 2016 effort (which now I understand you where
> deeply involved in). Its pretty clear this hint is only used for codecs. One
> thing the explanation seems missing (or that I missed) is that all the enabled
> drivers seems to come with a dedicated mmu (dedicated TLB). But this argument
> seems void if it is not combined with DMA_ATTR_NO_KERNEL_MAPPING to avoid using
> the main mmu TLB space. There is currently three drivers using S5P_MFC, Hantro
> and RKVDEC that uses this hint, only Hantro sets the DMA_ATTR_NO_KERNEL_MAPPING
> hint.

Why would it be void if not combined with DMA_ATTR_NO_KERNEL_MAPPING?
I mean: sure, if we have a kernel mapping and the kernel is accessing
the memory through this mapping then it will take up space in the TLB
of the main processor. ...but that's just fine, isn't it?

...actually, unless I'm mistaken, the kernel today is always using 4K
pages anyway, even for contiguous chunks of memory (though folks are
working on this problem). That means that from a kernel mapping point
of view the TLB usage is the same whether you use big chunks of memory
or small chunks...

In any case, let's take a step back. So we're going to allocate a big
chunk of memory for video decoding / encoding, right? We're going to
map this memory (via IOMMU) in the space of the video
encoding/decoding device and probably in the space of the kernel. If
we allocate larger chunks of memory then we can (if we set things up
right) configure the page tables on the device side and (maybe) on the
kernel side to use fewer TLB entries.

In general, taking all the big chunks of memory in the system has
downsides. It makes other parts of the kernel that also need big
chunks of memory work harder.

So which is worse, eating up more TLB entries or eating up all the big
chunks of memory? It depends on your access patterns, the size of your
TLB, and your requirements. I forget the exact number, but each TLB
miss incurs something like 4 extra memory accesses. So if you only
need an extra TLB miss every 1024 accesses it's not a huge deal but if
you incur a TLB miss every few accesses then it can be a deal breaker.

In general: If you can successfully meet your performance goals while
using 4K chunks then I'd argue that's the best based on what I've
seen.


> It would be nice to check if VCODEC needs kernel mapping on the RAW images, and
> introduce that hint too while introducing DMA_ATTR_ALLOC_SINGLE_PAGES. But with
> a now proper understanding, I also feel like this is wanted , but I'll have a
> hard time thinking of a test that shows the performance gain, since it requires
> specific level of fragmentation in the system to make a difference.
>
> Another aspect of the original description that is off, is CODECs doing linear
> access, while this is mostly true for reconstruction (writes), this is not true
> for prediction (reads). What really matters is that the CODECs tiles are most of
> the time no bigger then a page, or less then a handful of pages.

I haven't spent lots of time looking at video formats. I guess,
though, that almost everything is somewhat linear compared to trying
to do a 90 degree rotation which is copying image data from rows to
columns. 90 degress rotation is _super_ non-linear. I don't know the
exact history, but I could imagine trying to do rotation on a 512x512
8-bit image would look like this:

uint8 image[512][512];

image[0][0] = image[0][511];
image[0][1] = image[1][511];
image[0][2] = image[2][511];
...
image[1][0] = image[0][510];
image[1][1] = image[1][510];
...
image[511][511] = image[511][0];

(maybe you could optimize this by reading 32-bits at a time and doing
fancier math, but you get the picture)


Let's imagine you had a tiny TLB with only 4 entries in it. If you
could get 64K chunks then you could imagine that you could do the
whole 512x512 90-degree rotation without any TLB misses since the
whole "image" takes up 256K (64 * 4) memory. If you had 4K chunks then
I think you'd get a TLB miss after every 8 accesses and that would
tank your performance.

(Hopefully I didn't mess the example above up too badly--I just write
it up off the cuff).

...so while I don't know tons about encoded video formats, I'd hope
that at least they wouldn't be accessing memory in such a terrible way
as 90-degree rotation. I'd also hope that, if they did, that the
hardware designers would design them with a TLB that was big enough
for the job at hand. Even if TLB misses are a little worse with 4K
pages, as long as they aren't _terrible_ with 4K pages and you can
still meet your FPS goals then they're the way to go.

-Doug
Fei Shao April 26, 2024, 10:20 a.m. UTC | #5
Hi Nicolas,

On Tue, Apr 23, 2024 at 2:52 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
> <nicolas.dufresne@collabora.com> wrote:
> >
> > Hi,
> >
> > Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > > really make sense to try to allocate contiguous chunks of memory for
> > > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > > memory subsystem.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > NOTE: I haven't personally done massive amounts of testing with this
> > > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > > specifically for the video encoding / decoding cases and I know it
> > > helped avoid memory problems in the past on other systems. Colleagues
> > > of mine have told me that with this change memory problems are harder
> > > to reproduce, so it seems like we should consider doing it.
> >
> > One thing to improve in your patch submission is to avoid abstracting the
> > problems. Patch review and pulling is based on a technical rational and very
> > rarely on the trust that it helps someone somewhere in some unknown context.
> > What kind of memory issues are you facing ? What is the technical advantage of
> > using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> > the issue? I do expect this to be documented in the commit message itselfé.
>
> Right. The problem here is that I'm not _directly_ facing any problems
> here and I also haven't done massive amounts of analysis of the
> Mediatek video codec. I know that some of my colleagues have run into
> issues on Mediatek devices where the system starts getting
> unresponsive when lots of videos are decoded in parallel. That
> reminded me of the old problem I debugged in 2015 on Rockchip
> platforms and is talked about a bunch in the referenced commit
> 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
> this patch. The referenced commit contains quite a bit of details
> about the problems faced back in 2015.
>
> When I asked, my colleagues said that my patch seemed to help, though
> it was more of a qualitative statement than a quantitative one.

The story behind this is that I'm looking into an issue on the MediaTek
MT8188 Chromebook, where in some scenarios the system may emit 30+
video decoders concurrently (e.g. generating thumbnails for excess
amount of video files etc.), and such behavior can easily hang the
system if it has a smaller amount of memory (<4GB).

In addition to seeking mitigation in the user space software side,
we're also looking for ways to optimize how the video decoders use
memory, so Doug suggested this improvement.
My preliminary experiment showed that it has some positive impact -
the system doesn't freeze up completely with it and is still
responsive in the UART serial console. However, just like mentioned, I
didn’t have any rigorous numbers to support it.

To test the patch better, today I set up a local WebRTC demo to
simulate a video conference with 49 people where the mocked input
stream is captured from the device's own front camera.
With that, the original system easily hung in less than one minute
with less than 40MB available memory at the time; but with the change,
the system ran for several minutes and had an average of over 100MB
memory. It's not a huge improvement, but it's something.

I know this isn't the most scientific experiment, but I hope it’s a
good enough representation of one of the multi video decoder use
cases, and gives you some confidence that the patch is worth merging.

With the test above I think I can give this:
Tested-by: Fei Shao <fshao@chromium.org>

And, since this patch LGTM and I support it, here's my humble
Reviewed-by: Fei Shao <fshao@chromium.org>

Regards,
Fei

>
> I wasn't 100% sure if it was worth sending the patch up at this point,
> but logically, I think it makes sense. There aren't great reasons to
> hog all the large chunks of memory for video decoding.
>
> -Doug
Nicolas Dufresne May 1, 2024, 6:31 p.m. UTC | #6
Le vendredi 26 avril 2024 à 18:20 +0800, Fei Shao a écrit :
> Hi Nicolas,
> 
> On Tue, Apr 23, 2024 at 2:52 PM Doug Anderson <dianders@chromium.org> wrote:
> > 
> > Hi,
> > 
> > On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
> > <nicolas.dufresne@collabora.com> wrote:
> > > 
> > > Hi,
> > > 
> > > Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > > > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > > > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > > > really make sense to try to allocate contiguous chunks of memory for
> > > > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > > > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > > > memory subsystem.
> > > > 
> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > ---
> > > > NOTE: I haven't personally done massive amounts of testing with this
> > > > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > > > specifically for the video encoding / decoding cases and I know it
> > > > helped avoid memory problems in the past on other systems. Colleagues
> > > > of mine have told me that with this change memory problems are harder
> > > > to reproduce, so it seems like we should consider doing it.
> > > 
> > > One thing to improve in your patch submission is to avoid abstracting the
> > > problems. Patch review and pulling is based on a technical rational and very
> > > rarely on the trust that it helps someone somewhere in some unknown context.
> > > What kind of memory issues are you facing ? What is the technical advantage of
> > > using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> > > the issue? I do expect this to be documented in the commit message itselfé.
> > 
> > Right. The problem here is that I'm not _directly_ facing any problems
> > here and I also haven't done massive amounts of analysis of the
> > Mediatek video codec. I know that some of my colleagues have run into
> > issues on Mediatek devices where the system starts getting
> > unresponsive when lots of videos are decoded in parallel. That
> > reminded me of the old problem I debugged in 2015 on Rockchip
> > platforms and is talked about a bunch in the referenced commit
> > 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
> > this patch. The referenced commit contains quite a bit of details
> > about the problems faced back in 2015.
> > 
> > When I asked, my colleagues said that my patch seemed to help, though
> > it was more of a qualitative statement than a quantitative one.
> 
> The story behind this is that I'm looking into an issue on the MediaTek
> MT8188 Chromebook, where in some scenarios the system may emit 30+
> video decoders concurrently (e.g. generating thumbnails for excess
> amount of video files etc.), and such behavior can easily hang the
> system if it has a smaller amount of memory (<4GB).
> 
> In addition to seeking mitigation in the user space software side,
> we're also looking for ways to optimize how the video decoders use
> memory, so Doug suggested this improvement.
> My preliminary experiment showed that it has some positive impact -
> the system doesn't freeze up completely with it and is still
> responsive in the UART serial console. However, just like mentioned, I
> didn’t have any rigorous numbers to support it.
> 
> To test the patch better, today I set up a local WebRTC demo to
> simulate a video conference with 49 people where the mocked input
> stream is captured from the device's own front camera.
> With that, the original system easily hung in less than one minute
> with less than 40MB available memory at the time; but with the change,
> the system ran for several minutes and had an average of over 100MB
> memory. It's not a huge improvement, but it's something.
> 
> I know this isn't the most scientific experiment, but I hope it’s a
> good enough representation of one of the multi video decoder use
> cases, and gives you some confidence that the patch is worth merging.
> 
> With the test above I think I can give this:
> Tested-by: Fei Shao <fshao@chromium.org>
> 
> And, since this patch LGTM and I support it, here's my humble
> Reviewed-by: Fei Shao <fshao@chromium.org>

The arguments hew and my own research has finish convincing me we want to do
this (unless we had limited TLB space at the device level, or performance metric
that show that bigger contiguous chunk helps).

Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>

> 
> Regards,
> Fei
> 
> > 
> > I wasn't 100% sure if it was worth sending the patch up at this point,
> > but logically, I think it makes sense. There aren't great reasons to
> > hog all the large chunks of memory for video decoding.
> > 
> > -Doug
>
diff mbox series

Patch

diff --git a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
index 9ce34a3b5ee6..3fb1d48c3e15 100644
--- a/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
+++ b/drivers/media/platform/mediatek/vcodec/common/mtk_vcodec_util.c
@@ -64,7 +64,8 @@  int mtk_vcodec_mem_alloc(void *priv, struct mtk_vcodec_mem *mem)
 		id = dec_ctx->id;
 	}
 
-	mem->va = dma_alloc_coherent(&plat_dev->dev, size, &mem->dma_addr, GFP_KERNEL);
+	mem->va = dma_alloc_attrs(&plat_dev->dev, size, &mem->dma_addr,
+				  GFP_KERNEL, DMA_ATTR_ALLOC_SINGLE_PAGES);
 	if (!mem->va) {
 		mtk_v4l2_err(plat_dev, "%s dma_alloc size=%ld failed!",
 			     dev_name(&plat_dev->dev), size);