Message ID | 1716564705-9929-1-git-send-email-quic_mojha@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | [v2] firmware: qcom_scm: Add a padded page to ensure DMA memory from lower 4GB | expand |
On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: > For SCM protection, memory allocation should be physically contiguous, > 4K aligned, and non-cacheable to avoid XPU violations. This granularity > of protection applies from the secure world. Additionally, it's possible > that a 32-bit secure peripheral will access memory in SoCs like > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA > pool can be used with dma_alloc APIs. > > However, dma_alloc APIs will fall back to the buddy pool if the requested > size is less than or equal to PAGE_SIZE. It's also possible that the remote > processor's metadata blob size is less than a PAGE_SIZE. Even though the > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` > is disabled. Does "fail" here mean that the buddy heap returns a failure - in some case where dma_alloc would have succeeded, or that it does give you a PAGE_SIZE allocation which doesn't meeting your requirements? From this I do find the behavior of dma_alloc unintuitive, do we know if there's a reason for the "equal to PAGE_SIZE" case you describe here? > > To address this issue, use an extra page as padding to ensure allocation > from the CMA region. Since this memory is temporary, it will be released > once the remote processor is up or in case of any failure. > Thanks for updating the commit message, this is good. Regards, Bjorn > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > --- > Changes in v2: > - Described the issue more clearly in commit text. > > drivers/firmware/qcom/qcom_scm.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c > index 520de9b5633a..0426972178a4 100644 > --- a/drivers/firmware/qcom/qcom_scm.c > +++ b/drivers/firmware/qcom/qcom_scm.c > @@ -538,6 +538,7 @@ static void qcom_scm_set_download_mode(bool enable) > int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > struct qcom_scm_pas_metadata *ctx) > { > + size_t page_aligned_size; > dma_addr_t mdata_phys; > void *mdata_buf; > int ret; > @@ -555,7 +556,8 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > * data blob, so make sure it's physically contiguous, 4K aligned and > * non-cachable to avoid XPU violations. > */ > - mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys, > + page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE); > + mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys, > GFP_KERNEL); > if (!mdata_buf) { > dev_err(__scm->dev, "Allocation of metadata buffer failed.\n"); > @@ -580,11 +582,11 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, > > out: > if (ret < 0 || !ctx) { > - dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys); > + dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys); > } else if (ctx) { > ctx->ptr = mdata_buf; > ctx->phys = mdata_phys; > - ctx->size = size; > + ctx->size = page_aligned_size; > } > > return ret ? : res.result[0]; > -- > 2.7.4 >
On 5/27/2024 2:16 AM, Bjorn Andersson wrote: > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: >> For SCM protection, memory allocation should be physically contiguous, >> 4K aligned, and non-cacheable to avoid XPU violations. This granularity >> of protection applies from the secure world. Additionally, it's possible >> that a 32-bit secure peripheral will access memory in SoCs like >> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation >> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA >> pool can be used with dma_alloc APIs. >> >> However, dma_alloc APIs will fall back to the buddy pool if the requested >> size is less than or equal to PAGE_SIZE. It's also possible that the remote >> processor's metadata blob size is less than a PAGE_SIZE. Even though the >> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall >> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` >> is disabled. > > Does "fail" here mean that the buddy heap returns a failure - in some > case where dma_alloc would have succeeded, or that it does give you > a PAGE_SIZE allocation which doesn't meeting your requirements? Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is disabled. However, DMA memory would have successful such case if padding is added to size to cross > PAGE_SIZE. > > From this I do find the behavior of dma_alloc unintuitive, do we know if > there's a reason for the "equal to PAGE_SIZE" case you describe here? I am not a memory expert but the reason i can think of could be, <= PAGE_SIZE can anyway possible to be requested outside DMA coherent api's with kmalloc and friends api and that could be the reason it is falling back to buddy pool in DMA api. -Mukesh
On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote: > > > On 5/27/2024 2:16 AM, Bjorn Andersson wrote: > > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: > > > For SCM protection, memory allocation should be physically contiguous, > > > 4K aligned, and non-cacheable to avoid XPU violations. This granularity > > > of protection applies from the secure world. Additionally, it's possible > > > that a 32-bit secure peripheral will access memory in SoCs like > > > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation > > > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA > > > pool can be used with dma_alloc APIs. > > > > > > However, dma_alloc APIs will fall back to the buddy pool if the requested > > > size is less than or equal to PAGE_SIZE. It's also possible that the remote > > > processor's metadata blob size is less than a PAGE_SIZE. Even though the > > > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall > > > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` > > > is disabled. > > > > Does "fail" here mean that the buddy heap returns a failure - in some > > case where dma_alloc would have succeeded, or that it does give you > > a PAGE_SIZE allocation which doesn't meeting your requirements? > > Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory > in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is > disabled. Is that -ENOMEM or does "not get" mean that the buddy fallback will provide an allocation above 4GB? Regards, Bjorn > However, DMA memory would have successful such case if > padding is added to size to cross > PAGE_SIZE. > > > > > From this I do find the behavior of dma_alloc unintuitive, do we know if > > there's a reason for the "equal to PAGE_SIZE" case you describe here? > > I am not a memory expert but the reason i can think of could be, <= > PAGE_SIZE can anyway possible to be requested outside DMA coherent api's > with kmalloc and friends api and that could be the reason it is falling > back to buddy pool in DMA api. > > -Mukesh
On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote: > On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote: > > > > > > On 5/27/2024 2:16 AM, Bjorn Andersson wrote: > > > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: > > > > For SCM protection, memory allocation should be physically contiguous, > > > > 4K aligned, and non-cacheable to avoid XPU violations. This granularity > > > > of protection applies from the secure world. Additionally, it's possible > > > > that a 32-bit secure peripheral will access memory in SoCs like > > > > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation > > > > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA > > > > pool can be used with dma_alloc APIs. > > > > > > > > However, dma_alloc APIs will fall back to the buddy pool if the requested > > > > size is less than or equal to PAGE_SIZE. It's also possible that the remote > > > > processor's metadata blob size is less than a PAGE_SIZE. Even though the > > > > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall > > > > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` > > > > is disabled. > > > > > > Does "fail" here mean that the buddy heap returns a failure - in some > > > case where dma_alloc would have succeeded, or that it does give you > > > a PAGE_SIZE allocation which doesn't meeting your requirements? > > > > Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory > > in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is > > disabled. > > Is that -ENOMEM or does "not get" mean that the buddy fallback will > provide an allocation above 4GB? dma_alloc_coherent() returns NULL in that situation. https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142 -Mukesh > > Regards, > Bjorn > > > However, DMA memory would have successful such case if > > padding is added to size to cross > PAGE_SIZE. > > > > > > > > From this I do find the behavior of dma_alloc unintuitive, do we know if > > > there's a reason for the "equal to PAGE_SIZE" case you describe here? > > > > I am not a memory expert but the reason i can think of could be, <= > > PAGE_SIZE can anyway possible to be requested outside DMA coherent api's > > with kmalloc and friends api and that could be the reason it is falling > > back to buddy pool in DMA api. > > > > -Mukesh
On 6/24/2024 8:34 PM, Mukesh Ojha wrote: > On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote: >> On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote: >>> >>> >>> On 5/27/2024 2:16 AM, Bjorn Andersson wrote: >>>> On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: >>>>> For SCM protection, memory allocation should be physically contiguous, >>>>> 4K aligned, and non-cacheable to avoid XPU violations. This granularity >>>>> of protection applies from the secure world. Additionally, it's possible >>>>> that a 32-bit secure peripheral will access memory in SoCs like >>>>> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation >>>>> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA >>>>> pool can be used with dma_alloc APIs. >>>>> >>>>> However, dma_alloc APIs will fall back to the buddy pool if the requested >>>>> size is less than or equal to PAGE_SIZE. It's also possible that the remote >>>>> processor's metadata blob size is less than a PAGE_SIZE. Even though the >>>>> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall >>>>> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` >>>>> is disabled. >>>> >>>> Does "fail" here mean that the buddy heap returns a failure - in some >>>> case where dma_alloc would have succeeded, or that it does give you >>>> a PAGE_SIZE allocation which doesn't meeting your requirements? >>> >>> Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory >>> in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is >>> disabled. >> >> Is that -ENOMEM or does "not get" mean that the buddy fallback will >> provide an allocation above 4GB? > > dma_alloc_coherent() returns NULL in that situation. > > https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142 > > -Mukesh > scm device is using DMA mask as 32b. With size <= PAGE_SIZE, call to dma_alloc_contiguous() at [1] will return NULL. With DMA mask as 32b and CONFIG_ZONE_{DMA|DMA32} disabled, the allocation can't be guaranteed within 32b with buddy. Hence will return page as NULL. Adding a padded page should allow allocation from CMA region. Thanks, Prakash [1] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L131 >> >> Regards, >> Bjorn >> >>> However, DMA memory would have successful such case if >>> padding is added to size to cross > PAGE_SIZE. >>> >>>> >>>> From this I do find the behavior of dma_alloc unintuitive, do we know if >>>> there's a reason for the "equal to PAGE_SIZE" case you describe here? >>> >>> I am not a memory expert but the reason i can think of could be, <= >>> PAGE_SIZE can anyway possible to be requested outside DMA coherent api's >>> with kmalloc and friends api and that could be the reason it is falling >>> back to buddy pool in DMA api. >>> >>> -Mukesh > >
On Mon, Jul 08, 2024 at 03:33:57PM GMT, Prakash Gupta wrote: > > > On 6/24/2024 8:34 PM, Mukesh Ojha wrote: > > On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote: > >> On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote: > >>> > >>> > >>> On 5/27/2024 2:16 AM, Bjorn Andersson wrote: > >>>> On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote: > >>>>> For SCM protection, memory allocation should be physically contiguous, > >>>>> 4K aligned, and non-cacheable to avoid XPU violations. This granularity > >>>>> of protection applies from the secure world. Additionally, it's possible > >>>>> that a 32-bit secure peripheral will access memory in SoCs like > >>>>> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation > >>>>> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA > >>>>> pool can be used with dma_alloc APIs. > >>>>> > >>>>> However, dma_alloc APIs will fall back to the buddy pool if the requested > >>>>> size is less than or equal to PAGE_SIZE. It's also possible that the remote > >>>>> processor's metadata blob size is less than a PAGE_SIZE. Even though the > >>>>> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall > >>>>> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` > >>>>> is disabled. > >>>> > >>>> Does "fail" here mean that the buddy heap returns a failure - in some > >>>> case where dma_alloc would have succeeded, or that it does give you > >>>> a PAGE_SIZE allocation which doesn't meeting your requirements? > >>> > >>> Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory > >>> in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is > >>> disabled. > >> > >> Is that -ENOMEM or does "not get" mean that the buddy fallback will > >> provide an allocation above 4GB? > > > > dma_alloc_coherent() returns NULL in that situation. > > > > https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142 > > > > -Mukesh > > > > scm device is using DMA mask as 32b. With size <= PAGE_SIZE, call to > dma_alloc_contiguous() at [1] will return NULL. > > With DMA mask as 32b and CONFIG_ZONE_{DMA|DMA32} disabled, the allocation > can't be guaranteed within 32b with buddy. Hence will return page as NULL. > > Adding a padded page should allow allocation from CMA region. > Thanks for the explanation, Prakash and Mukesh. It sounds wrong to me that one has to trick the DMA API this way. I assume this comes from the check [a], could we perhaps propose modifying this? If nothing else it will give us the DMA maintainers' input regarding this being the appropriate workaround. [a] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/contiguous.c#L362 Regards, Bjorn > Thanks, > Prakash > > [1] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L131 > >> > >> Regards, > >> Bjorn > >> > >>> However, DMA memory would have successful such case if > >>> padding is added to size to cross > PAGE_SIZE. > >>> > >>>> > >>>> From this I do find the behavior of dma_alloc unintuitive, do we know if > >>>> there's a reason for the "equal to PAGE_SIZE" case you describe here? > >>> > >>> I am not a memory expert but the reason i can think of could be, <= > >>> PAGE_SIZE can anyway possible to be requested outside DMA coherent api's > >>> with kmalloc and friends api and that could be the reason it is falling > >>> back to buddy pool in DMA api. > >>> > >>> -Mukesh > > > > >
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 520de9b5633a..0426972178a4 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -538,6 +538,7 @@ static void qcom_scm_set_download_mode(bool enable) int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, struct qcom_scm_pas_metadata *ctx) { + size_t page_aligned_size; dma_addr_t mdata_phys; void *mdata_buf; int ret; @@ -555,7 +556,8 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, * data blob, so make sure it's physically contiguous, 4K aligned and * non-cachable to avoid XPU violations. */ - mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys, + page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE); + mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys, GFP_KERNEL); if (!mdata_buf) { dev_err(__scm->dev, "Allocation of metadata buffer failed.\n"); @@ -580,11 +582,11 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size, out: if (ret < 0 || !ctx) { - dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys); + dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys); } else if (ctx) { ctx->ptr = mdata_buf; ctx->phys = mdata_phys; - ctx->size = size; + ctx->size = page_aligned_size; } return ret ? : res.result[0];
For SCM protection, memory allocation should be physically contiguous, 4K aligned, and non-cacheable to avoid XPU violations. This granularity of protection applies from the secure world. Additionally, it's possible that a 32-bit secure peripheral will access memory in SoCs like sm8{4|5|6}50 for some remote processors. Therefore, memory allocation needs to be done in the lower 4 GB range. To achieve this, Linux's CMA pool can be used with dma_alloc APIs. However, dma_alloc APIs will fall back to the buddy pool if the requested size is less than or equal to PAGE_SIZE. It's also possible that the remote processor's metadata blob size is less than a PAGE_SIZE. Even though the DMA APIs align the requested memory size to PAGE_SIZE, they can still fall back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}` is disabled. To address this issue, use an extra page as padding to ensure allocation from the CMA region. Since this memory is temporary, it will be released once the remote processor is up or in case of any failure. Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> --- Changes in v2: - Described the issue more clearly in commit text. drivers/firmware/qcom/qcom_scm.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)