diff mbox series

[v2] firmware: qcom_scm: Add a padded page to ensure DMA memory from lower 4GB

Message ID 1716564705-9929-1-git-send-email-quic_mojha@quicinc.com (mailing list archive)
State Changes Requested, archived
Headers show
Series [v2] firmware: qcom_scm: Add a padded page to ensure DMA memory from lower 4GB | expand

Commit Message

Mukesh Ojha May 24, 2024, 3:31 p.m. UTC
For SCM protection, memory allocation should be physically contiguous,
4K aligned, and non-cacheable to avoid XPU violations. This granularity
of protection applies from the secure world. Additionally, it's possible
that a 32-bit secure peripheral will access memory in SoCs like
sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
pool can be used with dma_alloc APIs.

However, dma_alloc APIs will fall back to the buddy pool if the requested
size is less than or equal to PAGE_SIZE. It's also possible that the remote
processor's metadata blob size is less than a PAGE_SIZE. Even though the
DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
is disabled.

To address this issue, use an extra page as padding to ensure allocation
from the CMA region. Since this memory is temporary, it will be released
once the remote processor is up or in case of any failure.

Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
---
Changes in v2:
 - Described the issue more clearly in commit text.

 drivers/firmware/qcom/qcom_scm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Bjorn Andersson May 26, 2024, 8:46 p.m. UTC | #1
On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
> For SCM protection, memory allocation should be physically contiguous,
> 4K aligned, and non-cacheable to avoid XPU violations. This granularity
> of protection applies from the secure world. Additionally, it's possible
> that a 32-bit secure peripheral will access memory in SoCs like
> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
> pool can be used with dma_alloc APIs.
> 
> However, dma_alloc APIs will fall back to the buddy pool if the requested
> size is less than or equal to PAGE_SIZE. It's also possible that the remote
> processor's metadata blob size is less than a PAGE_SIZE. Even though the
> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
> is disabled.

Does "fail" here mean that the buddy heap returns a failure - in some
case where dma_alloc would have succeeded, or that it does give you
a PAGE_SIZE allocation which doesn't meeting your requirements?

From this I do find the behavior of dma_alloc unintuitive, do we know if
there's a reason for the "equal to PAGE_SIZE" case you describe here?

> 
> To address this issue, use an extra page as padding to ensure allocation
> from the CMA region. Since this memory is temporary, it will be released
> once the remote processor is up or in case of any failure.
> 

Thanks for updating the commit message, this is good.

Regards,
Bjorn

> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
> Changes in v2:
>  - Described the issue more clearly in commit text.
> 
>  drivers/firmware/qcom/qcom_scm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
> index 520de9b5633a..0426972178a4 100644
> --- a/drivers/firmware/qcom/qcom_scm.c
> +++ b/drivers/firmware/qcom/qcom_scm.c
> @@ -538,6 +538,7 @@ static void qcom_scm_set_download_mode(bool enable)
>  int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
>  			    struct qcom_scm_pas_metadata *ctx)
>  {
> +	size_t page_aligned_size;
>  	dma_addr_t mdata_phys;
>  	void *mdata_buf;
>  	int ret;
> @@ -555,7 +556,8 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
>  	 * data blob, so make sure it's physically contiguous, 4K aligned and
>  	 * non-cachable to avoid XPU violations.
>  	 */
> -	mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys,
> +	page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE);
> +	mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys,
>  				       GFP_KERNEL);
>  	if (!mdata_buf) {
>  		dev_err(__scm->dev, "Allocation of metadata buffer failed.\n");
> @@ -580,11 +582,11 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
>  
>  out:
>  	if (ret < 0 || !ctx) {
> -		dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys);
> +		dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys);
>  	} else if (ctx) {
>  		ctx->ptr = mdata_buf;
>  		ctx->phys = mdata_phys;
> -		ctx->size = size;
> +		ctx->size = page_aligned_size;
>  	}
>  
>  	return ret ? : res.result[0];
> -- 
> 2.7.4
>
Mukesh Ojha May 29, 2024, 11:54 a.m. UTC | #2
On 5/27/2024 2:16 AM, Bjorn Andersson wrote:
> On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
>> For SCM protection, memory allocation should be physically contiguous,
>> 4K aligned, and non-cacheable to avoid XPU violations. This granularity
>> of protection applies from the secure world. Additionally, it's possible
>> that a 32-bit secure peripheral will access memory in SoCs like
>> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
>> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
>> pool can be used with dma_alloc APIs.
>>
>> However, dma_alloc APIs will fall back to the buddy pool if the requested
>> size is less than or equal to PAGE_SIZE. It's also possible that the remote
>> processor's metadata blob size is less than a PAGE_SIZE. Even though the
>> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
>> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
>> is disabled.
> 
> Does "fail" here mean that the buddy heap returns a failure - in some
> case where dma_alloc would have succeeded, or that it does give you
> a PAGE_SIZE allocation which doesn't meeting your requirements?

Yes, buddy will also try to allocate memory and may not get PAGE_SIZE 
memory in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} 
is disabled. However, DMA memory would have successful such case if
padding is added to size to cross > PAGE_SIZE.

> 
>  From this I do find the behavior of dma_alloc unintuitive, do we know if
> there's a reason for the "equal to PAGE_SIZE" case you describe here?

I am not a memory expert but the reason i can think of could be, <= 
PAGE_SIZE can anyway possible to be requested outside DMA coherent api's
with kmalloc and friends api and that could be the reason it is falling
back to buddy pool in DMA api.

-Mukesh
Bjorn Andersson June 24, 2024, 12:25 a.m. UTC | #3
On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote:
> 
> 
> On 5/27/2024 2:16 AM, Bjorn Andersson wrote:
> > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
> > > For SCM protection, memory allocation should be physically contiguous,
> > > 4K aligned, and non-cacheable to avoid XPU violations. This granularity
> > > of protection applies from the secure world. Additionally, it's possible
> > > that a 32-bit secure peripheral will access memory in SoCs like
> > > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
> > > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
> > > pool can be used with dma_alloc APIs.
> > > 
> > > However, dma_alloc APIs will fall back to the buddy pool if the requested
> > > size is less than or equal to PAGE_SIZE. It's also possible that the remote
> > > processor's metadata blob size is less than a PAGE_SIZE. Even though the
> > > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
> > > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
> > > is disabled.
> > 
> > Does "fail" here mean that the buddy heap returns a failure - in some
> > case where dma_alloc would have succeeded, or that it does give you
> > a PAGE_SIZE allocation which doesn't meeting your requirements?
> 
> Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory
> in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is
> disabled.

Is that -ENOMEM or does "not get" mean that the buddy fallback will
provide an allocation above 4GB?

Regards,
Bjorn

> However, DMA memory would have successful such case if
> padding is added to size to cross > PAGE_SIZE.
> 
> > 
> >  From this I do find the behavior of dma_alloc unintuitive, do we know if
> > there's a reason for the "equal to PAGE_SIZE" case you describe here?
> 
> I am not a memory expert but the reason i can think of could be, <=
> PAGE_SIZE can anyway possible to be requested outside DMA coherent api's
> with kmalloc and friends api and that could be the reason it is falling
> back to buddy pool in DMA api.
> 
> -Mukesh
Mukesh Ojha June 24, 2024, 3:04 p.m. UTC | #4
On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote:
> On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote:
> > 
> > 
> > On 5/27/2024 2:16 AM, Bjorn Andersson wrote:
> > > On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
> > > > For SCM protection, memory allocation should be physically contiguous,
> > > > 4K aligned, and non-cacheable to avoid XPU violations. This granularity
> > > > of protection applies from the secure world. Additionally, it's possible
> > > > that a 32-bit secure peripheral will access memory in SoCs like
> > > > sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
> > > > needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
> > > > pool can be used with dma_alloc APIs.
> > > > 
> > > > However, dma_alloc APIs will fall back to the buddy pool if the requested
> > > > size is less than or equal to PAGE_SIZE. It's also possible that the remote
> > > > processor's metadata blob size is less than a PAGE_SIZE. Even though the
> > > > DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
> > > > back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
> > > > is disabled.
> > > 
> > > Does "fail" here mean that the buddy heap returns a failure - in some
> > > case where dma_alloc would have succeeded, or that it does give you
> > > a PAGE_SIZE allocation which doesn't meeting your requirements?
> > 
> > Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory
> > in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is
> > disabled.
> 
> Is that -ENOMEM or does "not get" mean that the buddy fallback will
> provide an allocation above 4GB?

dma_alloc_coherent() returns NULL in that situation.

https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142

-Mukesh

> 
> Regards,
> Bjorn
> 
> > However, DMA memory would have successful such case if
> > padding is added to size to cross > PAGE_SIZE.
> > 
> > > 
> > >  From this I do find the behavior of dma_alloc unintuitive, do we know if
> > > there's a reason for the "equal to PAGE_SIZE" case you describe here?
> > 
> > I am not a memory expert but the reason i can think of could be, <=
> > PAGE_SIZE can anyway possible to be requested outside DMA coherent api's
> > with kmalloc and friends api and that could be the reason it is falling
> > back to buddy pool in DMA api.
> > 
> > -Mukesh
Prakash Gupta July 8, 2024, 10:03 a.m. UTC | #5
On 6/24/2024 8:34 PM, Mukesh Ojha wrote:
> On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote:
>> On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote:
>>>
>>>
>>> On 5/27/2024 2:16 AM, Bjorn Andersson wrote:
>>>> On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
>>>>> For SCM protection, memory allocation should be physically contiguous,
>>>>> 4K aligned, and non-cacheable to avoid XPU violations. This granularity
>>>>> of protection applies from the secure world. Additionally, it's possible
>>>>> that a 32-bit secure peripheral will access memory in SoCs like
>>>>> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
>>>>> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
>>>>> pool can be used with dma_alloc APIs.
>>>>>
>>>>> However, dma_alloc APIs will fall back to the buddy pool if the requested
>>>>> size is less than or equal to PAGE_SIZE. It's also possible that the remote
>>>>> processor's metadata blob size is less than a PAGE_SIZE. Even though the
>>>>> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
>>>>> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
>>>>> is disabled.
>>>>
>>>> Does "fail" here mean that the buddy heap returns a failure - in some
>>>> case where dma_alloc would have succeeded, or that it does give you
>>>> a PAGE_SIZE allocation which doesn't meeting your requirements?
>>>
>>> Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory
>>> in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is
>>> disabled.
>>
>> Is that -ENOMEM or does "not get" mean that the buddy fallback will
>> provide an allocation above 4GB?
> 
> dma_alloc_coherent() returns NULL in that situation.
> 
> https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142
> 
> -Mukesh
> 

scm device is using DMA mask as 32b. With size <= PAGE_SIZE, call to 
dma_alloc_contiguous() at [1] will return NULL.

With DMA mask as 32b and CONFIG_ZONE_{DMA|DMA32} disabled, the allocation 
can't be guaranteed within 32b with buddy. Hence will return page as NULL.

Adding a padded page should allow allocation from CMA region.

Thanks,
Prakash

[1] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L131
>>
>> Regards,
>> Bjorn
>>
>>> However, DMA memory would have successful such case if
>>> padding is added to size to cross > PAGE_SIZE.
>>>
>>>>
>>>>  From this I do find the behavior of dma_alloc unintuitive, do we know if
>>>> there's a reason for the "equal to PAGE_SIZE" case you describe here?
>>>
>>> I am not a memory expert but the reason i can think of could be, <=
>>> PAGE_SIZE can anyway possible to be requested outside DMA coherent api's
>>> with kmalloc and friends api and that could be the reason it is falling
>>> back to buddy pool in DMA api.
>>>
>>> -Mukesh
> 

>
Bjorn Andersson July 9, 2024, 6:54 p.m. UTC | #6
On Mon, Jul 08, 2024 at 03:33:57PM GMT, Prakash Gupta wrote:
> 
> 
> On 6/24/2024 8:34 PM, Mukesh Ojha wrote:
> > On Sun, Jun 23, 2024 at 07:25:23PM -0500, Bjorn Andersson wrote:
> >> On Wed, May 29, 2024 at 05:24:29PM GMT, Mukesh Ojha wrote:
> >>>
> >>>
> >>> On 5/27/2024 2:16 AM, Bjorn Andersson wrote:
> >>>> On Fri, May 24, 2024 at 09:01:45PM GMT, Mukesh Ojha wrote:
> >>>>> For SCM protection, memory allocation should be physically contiguous,
> >>>>> 4K aligned, and non-cacheable to avoid XPU violations. This granularity
> >>>>> of protection applies from the secure world. Additionally, it's possible
> >>>>> that a 32-bit secure peripheral will access memory in SoCs like
> >>>>> sm8{4|5|6}50 for some remote processors. Therefore, memory allocation
> >>>>> needs to be done in the lower 4 GB range. To achieve this, Linux's CMA
> >>>>> pool can be used with dma_alloc APIs.
> >>>>>
> >>>>> However, dma_alloc APIs will fall back to the buddy pool if the requested
> >>>>> size is less than or equal to PAGE_SIZE. It's also possible that the remote
> >>>>> processor's metadata blob size is less than a PAGE_SIZE. Even though the
> >>>>> DMA APIs align the requested memory size to PAGE_SIZE, they can still fall
> >>>>> back to the buddy allocator, which may fail if `CONFIG_ZONE_{DMA|DMA32}`
> >>>>> is disabled.
> >>>>
> >>>> Does "fail" here mean that the buddy heap returns a failure - in some
> >>>> case where dma_alloc would have succeeded, or that it does give you
> >>>> a PAGE_SIZE allocation which doesn't meeting your requirements?
> >>>
> >>> Yes, buddy will also try to allocate memory and may not get PAGE_SIZE memory
> >>> in lower 4GB(for 32bit capable device) if CONFIG_ZONE_{DMA|DMA32} is
> >>> disabled.
> >>
> >> Is that -ENOMEM or does "not get" mean that the buddy fallback will
> >> provide an allocation above 4GB?
> > 
> > dma_alloc_coherent() returns NULL in that situation.
> > 
> > https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L142
> > 
> > -Mukesh
> > 
> 
> scm device is using DMA mask as 32b. With size <= PAGE_SIZE, call to 
> dma_alloc_contiguous() at [1] will return NULL.
> 
> With DMA mask as 32b and CONFIG_ZONE_{DMA|DMA32} disabled, the allocation 
> can't be guaranteed within 32b with buddy. Hence will return page as NULL.
> 
> Adding a padded page should allow allocation from CMA region.
> 

Thanks for the explanation, Prakash and Mukesh. It sounds wrong to me
that one has to trick the DMA API this way.

I assume this comes from the check [a], could we perhaps propose
modifying this? If nothing else it will give us the DMA maintainers'
input regarding this being the appropriate workaround.

[a] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/contiguous.c#L362

Regards,
Bjorn

> Thanks,
> Prakash
> 
> [1] https://elixir.bootlin.com/linux/v6.10-rc5/source/kernel/dma/direct.c#L131
> >>
> >> Regards,
> >> Bjorn
> >>
> >>> However, DMA memory would have successful such case if
> >>> padding is added to size to cross > PAGE_SIZE.
> >>>
> >>>>
> >>>>  From this I do find the behavior of dma_alloc unintuitive, do we know if
> >>>> there's a reason for the "equal to PAGE_SIZE" case you describe here?
> >>>
> >>> I am not a memory expert but the reason i can think of could be, <=
> >>> PAGE_SIZE can anyway possible to be requested outside DMA coherent api's
> >>> with kmalloc and friends api and that could be the reason it is falling
> >>> back to buddy pool in DMA api.
> >>>
> >>> -Mukesh
> > 
> 
> >
diff mbox series

Patch

diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 520de9b5633a..0426972178a4 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -538,6 +538,7 @@  static void qcom_scm_set_download_mode(bool enable)
 int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
 			    struct qcom_scm_pas_metadata *ctx)
 {
+	size_t page_aligned_size;
 	dma_addr_t mdata_phys;
 	void *mdata_buf;
 	int ret;
@@ -555,7 +556,8 @@  int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
 	 * data blob, so make sure it's physically contiguous, 4K aligned and
 	 * non-cachable to avoid XPU violations.
 	 */
-	mdata_buf = dma_alloc_coherent(__scm->dev, size, &mdata_phys,
+	page_aligned_size = PAGE_ALIGN(size + PAGE_SIZE);
+	mdata_buf = dma_alloc_coherent(__scm->dev, page_aligned_size, &mdata_phys,
 				       GFP_KERNEL);
 	if (!mdata_buf) {
 		dev_err(__scm->dev, "Allocation of metadata buffer failed.\n");
@@ -580,11 +582,11 @@  int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
 
 out:
 	if (ret < 0 || !ctx) {
-		dma_free_coherent(__scm->dev, size, mdata_buf, mdata_phys);
+		dma_free_coherent(__scm->dev, page_aligned_size, mdata_buf, mdata_phys);
 	} else if (ctx) {
 		ctx->ptr = mdata_buf;
 		ctx->phys = mdata_phys;
-		ctx->size = size;
+		ctx->size = page_aligned_size;
 	}
 
 	return ret ? : res.result[0];