diff mbox series

[v3,5/5] platform/x86/amd/pmf: Add PMF driver changes to make compatible with PMF-TA

Message ID 20241023063245.1404420-6-Shyam-sundar.S-k@amd.com (mailing list archive)
State Changes Requested, archived
Headers show
Series platform/x86/amd/pmf: Updates to AMD PMF driver | expand

Commit Message

Shyam Sundar S K Oct. 23, 2024, 6:32 a.m. UTC
The PMF driver will allocate shared buffer memory using the
tee_shm_alloc_kernel_buf(). This allocated memory is located in the
secure world and is used for communication with the PMF-TA.

The latest PMF-TA version introduces new structures with OEM debug
information and additional policy input conditions for evaluating the
policy binary. Consequently, the shared memory size must be increased to
ensure compatibility between the PMF driver and the updated PMF-TA.

Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
---
 drivers/platform/x86/amd/pmf/pmf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mario Limonciello Oct. 23, 2024, 2:11 p.m. UTC | #1
On 10/23/2024 01:32, Shyam Sundar S K wrote:
> The PMF driver will allocate shared buffer memory using the
> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
> secure world and is used for communication with the PMF-TA.
> 
> The latest PMF-TA version introduces new structures with OEM debug
> information and additional policy input conditions for evaluating the
> policy binary. Consequently, the shared memory size must be increased to
> ensure compatibility between the PMF driver and the updated PMF-TA.
> 
> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>

How does this present to a user?  From what you describe it seems to me 
like this means a new TA will fail on older kernel in some way.
Some ideas:

1) Should there be header version check on the TA and dynamically 
allocate the structure size based on the version of the F/W?

2) Or is there a command to the TA that can query the expected output size?

3) Or should the new TA filename be versioned, and the driver has a 
fallback policy?

Whatever the outcome is; I think it's best that if possible this change 
goes back to stable to try to minimize regressions to users as distros 
update linux-firmware.  For example Fedora updates this monthly, but 
also tracks stable kernels.

> ---
>   drivers/platform/x86/amd/pmf/pmf.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
> index a79808fda1d8..18f12aad46a9 100644
> --- a/drivers/platform/x86/amd/pmf/pmf.h
> +++ b/drivers/platform/x86/amd/pmf/pmf.h
> @@ -106,7 +106,7 @@ struct cookie_header {
>   #define PMF_TA_IF_VERSION_MAJOR				1
>   #define TA_PMF_ACTION_MAX					32
>   #define TA_PMF_UNDO_MAX						8
> -#define TA_OUTPUT_RESERVED_MEM				906
> +#define TA_OUTPUT_RESERVED_MEM				922
>   #define MAX_OPERATION_PARAMS					4
>   
>   #define PMF_IF_V1		1
Shyam Sundar S K Oct. 23, 2024, 2:29 p.m. UTC | #2
On 10/23/2024 19:41, Mario Limonciello wrote:
> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>> The PMF driver will allocate shared buffer memory using the
>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>> secure world and is used for communication with the PMF-TA.
>>
>> The latest PMF-TA version introduces new structures with OEM debug
>> information and additional policy input conditions for evaluating the
>> policy binary. Consequently, the shared memory size must be
>> increased to
>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>
>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
> 
> How does this present to a user?  From what you describe it seems to
> me like this means a new TA will fail on older kernel in some way.

Newer TA will not fail on older systems. This change is just about the
increase in TA reserved memory that is presented as "shared memory",
as TA needs the additional memory for its own debug data structures.

From user standpoint, always be on latest FW, irrespective of the
platform. At this point in time, I don't see a need for FW versioning
name (in the future, if there is a need for having a limited support
to older platforms, we can carve out a logic to do versioning stuff).

> Some ideas:
> 
> 1) Should there be header version check on the TA and dynamically
> allocate the structure size based on the version of the F/W?
> 

This can be done, when the TA versioning upgrade happens, like from
1.3 to 1.4, apart from that there is no header stuff association.

> 2) Or is there a command to the TA that can query the expected output
> size?
> 

No, this is just the initial shared memory that the driver allocates
to pass the inputs and the commands to TA.

> 3) Or should the new TA filename be versioned, and the driver has a
> fallback policy?
> 
> Whatever the outcome is; I think it's best that if possible this
> change goes back to stable to try to minimize regressions to users as
> distros update linux-firmware.  For example Fedora updates this
> monthly, but also tracks stable kernels.
> 

Advisory to distros should be to pick the latest PMF TA (note that, I
have not still submitted to new TA FW).

Thanks,
Shyam

>> ---
>>   drivers/platform/x86/amd/pmf/pmf.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/platform/x86/amd/pmf/pmf.h
>> b/drivers/platform/x86/amd/pmf/pmf.h
>> index a79808fda1d8..18f12aad46a9 100644
>> --- a/drivers/platform/x86/amd/pmf/pmf.h
>> +++ b/drivers/platform/x86/amd/pmf/pmf.h
>> @@ -106,7 +106,7 @@ struct cookie_header {
>>   #define PMF_TA_IF_VERSION_MAJOR                1
>>   #define TA_PMF_ACTION_MAX                    32
>>   #define TA_PMF_UNDO_MAX                        8
>> -#define TA_OUTPUT_RESERVED_MEM                906
>> +#define TA_OUTPUT_RESERVED_MEM                922
>>   #define MAX_OPERATION_PARAMS                    4
>>     #define PMF_IF_V1        1
>
Mario Limonciello Oct. 23, 2024, 2:34 p.m. UTC | #3
On 10/23/2024 09:29, Shyam Sundar S K wrote:
> 
> 
> On 10/23/2024 19:41, Mario Limonciello wrote:
>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>> The PMF driver will allocate shared buffer memory using the
>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>>> secure world and is used for communication with the PMF-TA.
>>>
>>> The latest PMF-TA version introduces new structures with OEM debug
>>> information and additional policy input conditions for evaluating the
>>> policy binary. Consequently, the shared memory size must be
>>> increased to
>>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>>
>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>
>> How does this present to a user?  From what you describe it seems to
>> me like this means a new TA will fail on older kernel in some way.
> 
> Newer TA will not fail on older systems. This change is just about the
> increase in TA reserved memory that is presented as "shared memory",
> as TA needs the additional memory for its own debug data structures.

Thx for comments. But so if you use new TA with older kernel driver, 
what will happen?  Can TA do a buffer overrun because the presented 
shared memory was too small?

> 
>  From user standpoint, always be on latest FW, irrespective of the
> platform. At this point in time, I don't see a need for FW versioning
> name (in the future, if there is a need for having a limited support
> to older platforms, we can carve out a logic to do versioning stuff).

I wish we could enforce this, but In the Linux world there is an 
expectation that these two trains don't need to arrive at station at the 
same time.

> 
>> Some ideas:
>>
>> 1) Should there be header version check on the TA and dynamically
>> allocate the structure size based on the version of the F/W?
>>
> 
> This can be done, when the TA versioning upgrade happens, like from
> 1.3 to 1.4, apart from that there is no header stuff association.
> 
>> 2) Or is there a command to the TA that can query the expected output
>> size?
>>
> 
> No, this is just the initial shared memory that the driver allocates
> to pass the inputs and the commands to TA.
> 
>> 3) Or should the new TA filename be versioned, and the driver has a
>> fallback policy?
>>
>> Whatever the outcome is; I think it's best that if possible this
>> change goes back to stable to try to minimize regressions to users as
>> distros update linux-firmware.  For example Fedora updates this
>> monthly, but also tracks stable kernels.
>>
> 
> Advisory to distros should be to pick the latest PMF TA (note that, I
> have not still submitted to new TA FW).

Yeah we can advise distros to pick it up when upstreamed as long as 
there isn't tight dependency on this patch being present.

> 
> Thanks,
> Shyam
> 
>>> ---
>>>    drivers/platform/x86/amd/pmf/pmf.h | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/platform/x86/amd/pmf/pmf.h
>>> b/drivers/platform/x86/amd/pmf/pmf.h
>>> index a79808fda1d8..18f12aad46a9 100644
>>> --- a/drivers/platform/x86/amd/pmf/pmf.h
>>> +++ b/drivers/platform/x86/amd/pmf/pmf.h
>>> @@ -106,7 +106,7 @@ struct cookie_header {
>>>    #define PMF_TA_IF_VERSION_MAJOR                1
>>>    #define TA_PMF_ACTION_MAX                    32
>>>    #define TA_PMF_UNDO_MAX                        8
>>> -#define TA_OUTPUT_RESERVED_MEM                906
>>> +#define TA_OUTPUT_RESERVED_MEM                922
>>>    #define MAX_OPERATION_PARAMS                    4
>>>      #define PMF_IF_V1        1
>>
Shyam Sundar S K Oct. 23, 2024, 3:32 p.m. UTC | #4
On 10/23/2024 20:04, Mario Limonciello wrote:
> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>
>>
>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>> The PMF driver will allocate shared buffer memory using the
>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>>>> secure world and is used for communication with the PMF-TA.
>>>>
>>>> The latest PMF-TA version introduces new structures with OEM debug
>>>> information and additional policy input conditions for evaluating the
>>>> policy binary. Consequently, the shared memory size must be
>>>> increased to
>>>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>>>
>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>
>>> How does this present to a user?  From what you describe it seems to
>>> me like this means a new TA will fail on older kernel in some way.
>>
>> Newer TA will not fail on older systems. This change is just about the
>> increase in TA reserved memory that is presented as "shared memory",
>> as TA needs the additional memory for its own debug data structures.
> 
> Thx for comments. But so if you use new TA with older kernel driver,
> what will happen?  Can TA do a buffer overrun because the presented
> shared memory was too small?
> 

New TA will fail on older kernel and hence this change will be
required for new TA to work.

>>
>>  From user standpoint, always be on latest FW, irrespective of the
>> platform. At this point in time, I don't see a need for FW versioning
>> name (in the future, if there is a need for having a limited support
>> to older platforms, we can carve out a logic to do versioning stuff).
> 
> I wish we could enforce this, but In the Linux world there is an
> expectation that these two trains don't need to arrive at station at
> the same time.
> 
>>
>>> Some ideas:
>>>
>>> 1) Should there be header version check on the TA and dynamically
>>> allocate the structure size based on the version of the F/W?
>>>
>>
>> This can be done, when the TA versioning upgrade happens, like from
>> 1.3 to 1.4, apart from that there is no header stuff association.
>>
>>> 2) Or is there a command to the TA that can query the expected output
>>> size?
>>>
>>
>> No, this is just the initial shared memory that the driver allocates
>> to pass the inputs and the commands to TA.
>>
>>> 3) Or should the new TA filename be versioned, and the driver has a
>>> fallback policy?
>>>
>>> Whatever the outcome is; I think it's best that if possible this
>>> change goes back to stable to try to minimize regressions to users as
>>> distros update linux-firmware.  For example Fedora updates this
>>> monthly, but also tracks stable kernels.
>>>
>>
>> Advisory to distros should be to pick the latest PMF TA (note that, I
>> have not still submitted to new TA FW).
> 
> Yeah we can advise distros to pick it up when upstreamed as long as
> there isn't tight dependency on this patch being present.
> 

That is the reason I am waiting for this change to land. Once that is
done, I will submit the new TA, you can send out a advisory to upgrade
the kernel or this change has to be back-ported to stable/oem kernels
for their enablement.

Makes sense?

Thanks,
Shyam

>>
>> Thanks,
>> Shyam
>>
>>>> ---
>>>>    drivers/platform/x86/amd/pmf/pmf.h | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/platform/x86/amd/pmf/pmf.h
>>>> b/drivers/platform/x86/amd/pmf/pmf.h
>>>> index a79808fda1d8..18f12aad46a9 100644
>>>> --- a/drivers/platform/x86/amd/pmf/pmf.h
>>>> +++ b/drivers/platform/x86/amd/pmf/pmf.h
>>>> @@ -106,7 +106,7 @@ struct cookie_header {
>>>>    #define PMF_TA_IF_VERSION_MAJOR                1
>>>>    #define TA_PMF_ACTION_MAX                    32
>>>>    #define TA_PMF_UNDO_MAX                        8
>>>> -#define TA_OUTPUT_RESERVED_MEM                906
>>>> +#define TA_OUTPUT_RESERVED_MEM                922
>>>>    #define MAX_OPERATION_PARAMS                    4
>>>>      #define PMF_IF_V1        1
>>>
>
Mario Limonciello Oct. 23, 2024, 3:40 p.m. UTC | #5
On 10/23/2024 10:32, Shyam Sundar S K wrote:
> 
> 
> On 10/23/2024 20:04, Mario Limonciello wrote:
>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>
>>>
>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>> The PMF driver will allocate shared buffer memory using the
>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>>>>> secure world and is used for communication with the PMF-TA.
>>>>>
>>>>> The latest PMF-TA version introduces new structures with OEM debug
>>>>> information and additional policy input conditions for evaluating the
>>>>> policy binary. Consequently, the shared memory size must be
>>>>> increased to
>>>>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>>>>
>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>
>>>> How does this present to a user?  From what you describe it seems to
>>>> me like this means a new TA will fail on older kernel in some way.
>>>
>>> Newer TA will not fail on older systems. This change is just about the
>>> increase in TA reserved memory that is presented as "shared memory",
>>> as TA needs the additional memory for its own debug data structures.
>>
>> Thx for comments. But so if you use new TA with older kernel driver,
>> what will happen?  Can TA do a buffer overrun because the presented
>> shared memory was too small?
>>
> 
> New TA will fail on older kernel and hence this change will be
> required for new TA to work.

OK, that's what I was worried about.

> 
>>>
>>>   From user standpoint, always be on latest FW, irrespective of the
>>> platform. At this point in time, I don't see a need for FW versioning
>>> name (in the future, if there is a need for having a limited support
>>> to older platforms, we can carve out a logic to do versioning stuff).
>>
>> I wish we could enforce this, but In the Linux world there is an
>> expectation that these two trains don't need to arrive at station at
>> the same time.
>>
>>>
>>>> Some ideas:
>>>>
>>>> 1) Should there be header version check on the TA and dynamically
>>>> allocate the structure size based on the version of the F/W?
>>>>
>>>
>>> This can be done, when the TA versioning upgrade happens, like from
>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>
>>>> 2) Or is there a command to the TA that can query the expected output
>>>> size?
>>>>
>>>
>>> No, this is just the initial shared memory that the driver allocates
>>> to pass the inputs and the commands to TA.
>>>
>>>> 3) Or should the new TA filename be versioned, and the driver has a
>>>> fallback policy?
>>>>
>>>> Whatever the outcome is; I think it's best that if possible this
>>>> change goes back to stable to try to minimize regressions to users as
>>>> distros update linux-firmware.  For example Fedora updates this
>>>> monthly, but also tracks stable kernels.
>>>>
>>>
>>> Advisory to distros should be to pick the latest PMF TA (note that, I
>>> have not still submitted to new TA FW).
>>
>> Yeah we can advise distros to pick it up when upstreamed as long as
>> there isn't tight dependency on this patch being present.
>>
> 
> That is the reason I am waiting for this change to land. Once that is
> done, I will submit the new TA, you can send out a advisory to upgrade
> the kernel or this change has to be back-ported to stable/oem kernels
> for their enablement.
> 
> Makes sense?
> 

I think we need Hans' and Ilpo's comments here to decide what to do.

I will say that when we had this happen in amdgpu for a breaking reason 
there was a new firmware binary filename created/upstreamed for the 
breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to have 
fallback code so it could be compatible with either binary.

* If user on older kernel took newer linux-firmware package they used 
older binary.
* If user on newer kernel took older linux-firmware package they used 
older binary.
* If user on newer kernel took newer linux-firmware package they used 
newer binary.

If the decision is this goes in "as is" it definitely needs to go back 
to stable kernels.
Shyam Sundar S K Oct. 23, 2024, 3:52 p.m. UTC | #6
On 10/23/2024 21:10, Mario Limonciello wrote:
> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>
>>
>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>
>>>>
>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>
>>>>>> The latest PMF-TA version introduces new structures with OEM debug
>>>>>> information and additional policy input conditions for
>>>>>> evaluating the
>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>> increased to
>>>>>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>>>>>
>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>
>>>>> How does this present to a user?  From what you describe it seems to
>>>>> me like this means a new TA will fail on older kernel in some way.
>>>>
>>>> Newer TA will not fail on older systems. This change is just about
>>>> the
>>>> increase in TA reserved memory that is presented as "shared memory",
>>>> as TA needs the additional memory for its own debug data structures.
>>>
>>> Thx for comments. But so if you use new TA with older kernel driver,
>>> what will happen?  Can TA do a buffer overrun because the presented
>>> shared memory was too small?
>>>
>>
>> New TA will fail on older kernel and hence this change will be
>> required for new TA to work.
> 
> OK, that's what I was worried about.
> 
>>
>>>>
>>>>   From user standpoint, always be on latest FW, irrespective of the
>>>> platform. At this point in time, I don't see a need for FW versioning
>>>> name (in the future, if there is a need for having a limited support
>>>> to older platforms, we can carve out a logic to do versioning stuff).
>>>
>>> I wish we could enforce this, but In the Linux world there is an
>>> expectation that these two trains don't need to arrive at station at
>>> the same time.
>>>
>>>>
>>>>> Some ideas:
>>>>>
>>>>> 1) Should there be header version check on the TA and dynamically
>>>>> allocate the structure size based on the version of the F/W?
>>>>>
>>>>
>>>> This can be done, when the TA versioning upgrade happens, like from
>>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>>
>>>>> 2) Or is there a command to the TA that can query the expected
>>>>> output
>>>>> size?
>>>>>
>>>>
>>>> No, this is just the initial shared memory that the driver allocates
>>>> to pass the inputs and the commands to TA.
>>>>
>>>>> 3) Or should the new TA filename be versioned, and the driver has a
>>>>> fallback policy?
>>>>>
>>>>> Whatever the outcome is; I think it's best that if possible this
>>>>> change goes back to stable to try to minimize regressions to
>>>>> users as
>>>>> distros update linux-firmware.  For example Fedora updates this
>>>>> monthly, but also tracks stable kernels.
>>>>>
>>>>
>>>> Advisory to distros should be to pick the latest PMF TA (note that, I
>>>> have not still submitted to new TA FW).
>>>
>>> Yeah we can advise distros to pick it up when upstreamed as long as
>>> there isn't tight dependency on this patch being present.
>>>
>>
>> That is the reason I am waiting for this change to land. Once that is
>> done, I will submit the new TA, you can send out a advisory to upgrade
>> the kernel or this change has to be back-ported to stable/oem kernels
>> for their enablement.
>>
>> Makes sense?
>>
> 
> I think we need Hans' and Ilpo's comments here to decide what to do.
> 

Sure.

> I will say that when we had this happen in amdgpu for a breaking
> reason there was a new firmware binary filename created/upstreamed for
> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
> have fallback code so it could be compatible with either binary.
> 

True. In case of amdgpu, the FW loading is part of the amdgpu driver.
But in case of PMF, the PMF TA gets picked from the AMD TEE driver
through the TEE commands.

So, there is no need for FW versioning logic in PMF driver.


> * If user on older kernel took newer linux-firmware package they used
> older binary.
> * If user on newer kernel took older linux-firmware package they used
> older binary.
> * If user on newer kernel took newer linux-firmware package they used
> newer binary.
> 
> If the decision is this goes in "as is" it definitely needs to go back
> to stable kernels.
> 

IMHO, let's not put too many fallback mechanisms. The philosophy
should be use latest driver and latest FW that avoids a lot of
confusion and yeah for that to happen this change has to go to stable.

Thanks,
Shyam
Mario Limonciello Oct. 23, 2024, 4:20 p.m. UTC | #7
On 10/23/2024 10:52, Shyam Sundar S K wrote:
> 
> 
> On 10/23/2024 21:10, Mario Limonciello wrote:
>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>
>>>
>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>
>>>>>
>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in the
>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>
>>>>>>> The latest PMF-TA version introduces new structures with OEM debug
>>>>>>> information and additional policy input conditions for
>>>>>>> evaluating the
>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>> increased to
>>>>>>> ensure compatibility between the PMF driver and the updated PMF-TA.
>>>>>>>
>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>
>>>>>> How does this present to a user?  From what you describe it seems to
>>>>>> me like this means a new TA will fail on older kernel in some way.
>>>>>
>>>>> Newer TA will not fail on older systems. This change is just about
>>>>> the
>>>>> increase in TA reserved memory that is presented as "shared memory",
>>>>> as TA needs the additional memory for its own debug data structures.
>>>>
>>>> Thx for comments. But so if you use new TA with older kernel driver,
>>>> what will happen?  Can TA do a buffer overrun because the presented
>>>> shared memory was too small?
>>>>
>>>
>>> New TA will fail on older kernel and hence this change will be
>>> required for new TA to work.
>>
>> OK, that's what I was worried about.
>>
>>>
>>>>>
>>>>>    From user standpoint, always be on latest FW, irrespective of the
>>>>> platform. At this point in time, I don't see a need for FW versioning
>>>>> name (in the future, if there is a need for having a limited support
>>>>> to older platforms, we can carve out a logic to do versioning stuff).
>>>>
>>>> I wish we could enforce this, but In the Linux world there is an
>>>> expectation that these two trains don't need to arrive at station at
>>>> the same time.
>>>>
>>>>>
>>>>>> Some ideas:
>>>>>>
>>>>>> 1) Should there be header version check on the TA and dynamically
>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>
>>>>>
>>>>> This can be done, when the TA versioning upgrade happens, like from
>>>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>>>
>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>> output
>>>>>> size?
>>>>>>
>>>>>
>>>>> No, this is just the initial shared memory that the driver allocates
>>>>> to pass the inputs and the commands to TA.
>>>>>
>>>>>> 3) Or should the new TA filename be versioned, and the driver has a
>>>>>> fallback policy?
>>>>>>
>>>>>> Whatever the outcome is; I think it's best that if possible this
>>>>>> change goes back to stable to try to minimize regressions to
>>>>>> users as
>>>>>> distros update linux-firmware.  For example Fedora updates this
>>>>>> monthly, but also tracks stable kernels.
>>>>>>
>>>>>
>>>>> Advisory to distros should be to pick the latest PMF TA (note that, I
>>>>> have not still submitted to new TA FW).
>>>>
>>>> Yeah we can advise distros to pick it up when upstreamed as long as
>>>> there isn't tight dependency on this patch being present.
>>>>
>>>
>>> That is the reason I am waiting for this change to land. Once that is
>>> done, I will submit the new TA, you can send out a advisory to upgrade
>>> the kernel or this change has to be back-ported to stable/oem kernels
>>> for their enablement.
>>>
>>> Makes sense?
>>>
>>
>> I think we need Hans' and Ilpo's comments here to decide what to do.
>>
> 
> Sure.
> 
>> I will say that when we had this happen in amdgpu for a breaking
>> reason there was a new firmware binary filename created/upstreamed for
>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>> have fallback code so it could be compatible with either binary.
>>
> 
> True. In case of amdgpu, the FW loading is part of the amdgpu driver.
> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
> through the TEE commands.
> 
> So, there is no need for FW versioning logic in PMF driver.
> 

That's a very good point, and this is a lot of complexity then.

> 
>> * If user on older kernel took newer linux-firmware package they used
>> older binary.
>> * If user on newer kernel took older linux-firmware package they used
>> older binary.
>> * If user on newer kernel took newer linux-firmware package they used
>> newer binary.
>>
>> If the decision is this goes in "as is" it definitely needs to go back
>> to stable kernels.
>>
> 
> IMHO, let's not put too many fallback mechanisms. The philosophy
> should be use latest driver and latest FW that avoids a lot of
> confusion and yeah for that to happen this change has to go to stable.
> 
> Thanks,
> Shyam

Of course Hans and Ilpo make the final call, but I think from our 
discussions here it would be ideal that patch 1 and patch 5 from this 
series go into 6.12 and have stable tags, the rest would be 6.13 material.
Ilpo Järvinen Oct. 29, 2024, 2:07 p.m. UTC | #8
Hi Hens,

There a question / item needing your input below.

On Wed, 23 Oct 2024, Mario Limonciello wrote:
> On 10/23/2024 10:52, Shyam Sundar S K wrote:
> > On 10/23/2024 21:10, Mario Limonciello wrote:
> > > On 10/23/2024 10:32, Shyam Sundar S K wrote:
> > > > On 10/23/2024 20:04, Mario Limonciello wrote:
> > > > > On 10/23/2024 09:29, Shyam Sundar S K wrote:
> > > > > > On 10/23/2024 19:41, Mario Limonciello wrote:
> > > > > > > On 10/23/2024 01:32, Shyam Sundar S K wrote:
> > > > > > > > The PMF driver will allocate shared buffer memory using the
> > > > > > > > tee_shm_alloc_kernel_buf(). This allocated memory is located in
> > > > > > > > the
> > > > > > > > secure world and is used for communication with the PMF-TA.
> > > > > > > > 
> > > > > > > > The latest PMF-TA version introduces new structures with OEM
> > > > > > > > debug
> > > > > > > > information and additional policy input conditions for
> > > > > > > > evaluating the
> > > > > > > > policy binary. Consequently, the shared memory size must be
> > > > > > > > increased to
> > > > > > > > ensure compatibility between the PMF driver and the updated
> > > > > > > > PMF-TA.
> > > > > > > > 
> > > > > > > > Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> > > > > > > > Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> > > > > > > > Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
> > > > > > > 
> > > > > > > How does this present to a user?  From what you describe it seems
> > > > > > > to
> > > > > > > me like this means a new TA will fail on older kernel in some way.
> > > > > > 
> > > > > > Newer TA will not fail on older systems. This change is just about
> > > > > > the
> > > > > > increase in TA reserved memory that is presented as "shared memory",
> > > > > > as TA needs the additional memory for its own debug data structures.
> > > > > 
> > > > > Thx for comments. But so if you use new TA with older kernel driver,
> > > > > what will happen?  Can TA do a buffer overrun because the presented
> > > > > shared memory was too small?
> > > > > 
> > > > 
> > > > New TA will fail on older kernel and hence this change will be
> > > > required for new TA to work.
> > > 
> > > OK, that's what I was worried about.
> > > 
> > > > 
> > > > > > 
> > > > > >    From user standpoint, always be on latest FW, irrespective of the
> > > > > > platform. At this point in time, I don't see a need for FW
> > > > > > versioning
> > > > > > name (in the future, if there is a need for having a limited support
> > > > > > to older platforms, we can carve out a logic to do versioning
> > > > > > stuff).
> > > > > 
> > > > > I wish we could enforce this, but In the Linux world there is an
> > > > > expectation that these two trains don't need to arrive at station at
> > > > > the same time.
> > > > > 
> > > > > > 
> > > > > > > Some ideas:
> > > > > > > 
> > > > > > > 1) Should there be header version check on the TA and dynamically
> > > > > > > allocate the structure size based on the version of the F/W?
> > > > > > > 
> > > > > > 
> > > > > > This can be done, when the TA versioning upgrade happens, like from
> > > > > > 1.3 to 1.4, apart from that there is no header stuff association.
> > > > > > 
> > > > > > > 2) Or is there a command to the TA that can query the expected
> > > > > > > output
> > > > > > > size?
> > > > > > > 
> > > > > > 
> > > > > > No, this is just the initial shared memory that the driver allocates
> > > > > > to pass the inputs and the commands to TA.
> > > > > > 
> > > > > > > 3) Or should the new TA filename be versioned, and the driver has
> > > > > > > a
> > > > > > > fallback policy?
> > > > > > > 
> > > > > > > Whatever the outcome is; I think it's best that if possible this
> > > > > > > change goes back to stable to try to minimize regressions to
> > > > > > > users as
> > > > > > > distros update linux-firmware.  For example Fedora updates this
> > > > > > > monthly, but also tracks stable kernels.
> > > > > > > 
> > > > > > 
> > > > > > Advisory to distros should be to pick the latest PMF TA (note that,
> > > > > > I
> > > > > > have not still submitted to new TA FW).
> > > > > 
> > > > > Yeah we can advise distros to pick it up when upstreamed as long as
> > > > > there isn't tight dependency on this patch being present.
> > > > > 
> > > > 
> > > > That is the reason I am waiting for this change to land. Once that is
> > > > done, I will submit the new TA, you can send out a advisory to upgrade
> > > > the kernel or this change has to be back-ported to stable/oem kernels
> > > > for their enablement.
> > > > 
> > > > Makes sense?
> > > > 
> > > 
> > > I think we need Hans' and Ilpo's comments here to decide what to do.
> > > 
> > 
> > Sure.
> > 
> > > I will say that when we had this happen in amdgpu for a breaking
> > > reason there was a new firmware binary filename created/upstreamed for
> > > the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
> > > have fallback code so it could be compatible with either binary.
> > > 
> > 
> > True. In case of amdgpu, the FW loading is part of the amdgpu driver.
> > But in case of PMF, the PMF TA gets picked from the AMD TEE driver
> > through the TEE commands.
> > 
> > So, there is no need for FW versioning logic in PMF driver.
> > 
> 
> That's a very good point, and this is a lot of complexity then.
> 
> > 
> > > * If user on older kernel took newer linux-firmware package they used
> > > older binary.
> > > * If user on newer kernel took older linux-firmware package they used
> > > older binary.
> > > * If user on newer kernel took newer linux-firmware package they used
> > > newer binary.
> > > 
> > > If the decision is this goes in "as is" it definitely needs to go back
> > > to stable kernels.
> > > 
> > 
> > IMHO, let's not put too many fallback mechanisms. The philosophy
> > should be use latest driver and latest FW that avoids a lot of
> > confusion and yeah for that to happen this change has to go to stable.
> > 
> > Thanks,
> > Shyam
> 
> Of course Hans and Ilpo make the final call, but I think from our discussions
> here it would be ideal that patch 1 and patch 5 from this series go into 6.12
> and have stable tags, the rest would be 6.13 material.

Distros and SW component management challenges are more in the domain of 
Hans' expertise so I'd prefer to hear his opinion on this.

Personally I feel though that the commit message is not entirely honest 
on all the impact as is. The wordings are sounding quite innocent while if 
I infer the above right, an incorrect combination will cause a 
non-gracious failure.
Hans de Goede Oct. 30, 2024, 2 p.m. UTC | #9
Hi,

On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
> Hi Hens,
> 
> There a question / item needing your input below.
> 
> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in
>>>>>>>>> the
>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>
>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>> debug
>>>>>>>>> information and additional policy input conditions for
>>>>>>>>> evaluating the
>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>> increased to
>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>> PMF-TA.
>>>>>>>>>
>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>
>>>>>>>> How does this present to a user?  From what you describe it seems
>>>>>>>> to
>>>>>>>> me like this means a new TA will fail on older kernel in some way.
>>>>>>>
>>>>>>> Newer TA will not fail on older systems. This change is just about
>>>>>>> the
>>>>>>> increase in TA reserved memory that is presented as "shared memory",
>>>>>>> as TA needs the additional memory for its own debug data structures.
>>>>>>
>>>>>> Thx for comments. But so if you use new TA with older kernel driver,
>>>>>> what will happen?  Can TA do a buffer overrun because the presented
>>>>>> shared memory was too small?
>>>>>>
>>>>>
>>>>> New TA will fail on older kernel and hence this change will be
>>>>> required for new TA to work.
>>>>
>>>> OK, that's what I was worried about.
>>>>
>>>>>
>>>>>>>
>>>>>>>    From user standpoint, always be on latest FW, irrespective of the
>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>> versioning
>>>>>>> name (in the future, if there is a need for having a limited support
>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>> stuff).
>>>>>>
>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>> expectation that these two trains don't need to arrive at station at
>>>>>> the same time.
>>>>>>
>>>>>>>
>>>>>>>> Some ideas:
>>>>>>>>
>>>>>>>> 1) Should there be header version check on the TA and dynamically
>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>
>>>>>>>
>>>>>>> This can be done, when the TA versioning upgrade happens, like from
>>>>>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>>>>>
>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>> output
>>>>>>>> size?
>>>>>>>>
>>>>>>>
>>>>>>> No, this is just the initial shared memory that the driver allocates
>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>
>>>>>>>> 3) Or should the new TA filename be versioned, and the driver has
>>>>>>>> a
>>>>>>>> fallback policy?
>>>>>>>>
>>>>>>>> Whatever the outcome is; I think it's best that if possible this
>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>> users as
>>>>>>>> distros update linux-firmware.  For example Fedora updates this
>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>
>>>>>>>
>>>>>>> Advisory to distros should be to pick the latest PMF TA (note that,
>>>>>>> I
>>>>>>> have not still submitted to new TA FW).
>>>>>>
>>>>>> Yeah we can advise distros to pick it up when upstreamed as long as
>>>>>> there isn't tight dependency on this patch being present.
>>>>>>
>>>>>
>>>>> That is the reason I am waiting for this change to land. Once that is
>>>>> done, I will submit the new TA, you can send out a advisory to upgrade
>>>>> the kernel or this change has to be back-ported to stable/oem kernels
>>>>> for their enablement.
>>>>>
>>>>> Makes sense?
>>>>>
>>>>
>>>> I think we need Hans' and Ilpo's comments here to decide what to do.
>>>>
>>>
>>> Sure.
>>>
>>>> I will say that when we had this happen in amdgpu for a breaking
>>>> reason there was a new firmware binary filename created/upstreamed for
>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>> have fallback code so it could be compatible with either binary.
>>>>
>>>
>>> True. In case of amdgpu, the FW loading is part of the amdgpu driver.
>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>> through the TEE commands.
>>>
>>> So, there is no need for FW versioning logic in PMF driver.
>>>
>>
>> That's a very good point, and this is a lot of complexity then.
>>
>>>
>>>> * If user on older kernel took newer linux-firmware package they used
>>>> older binary.
>>>> * If user on newer kernel took older linux-firmware package they used
>>>> older binary.
>>>> * If user on newer kernel took newer linux-firmware package they used
>>>> newer binary.
>>>>
>>>> If the decision is this goes in "as is" it definitely needs to go back
>>>> to stable kernels.
>>>>
>>>
>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>> should be use latest driver and latest FW that avoids a lot of
>>> confusion and yeah for that to happen this change has to go to stable.
>>>
>>> Thanks,
>>> Shyam
>>
>> Of course Hans and Ilpo make the final call, but I think from our discussions
>> here it would be ideal that patch 1 and patch 5 from this series go into 6.12
>> and have stable tags, the rest would be 6.13 material.
> 
> Distros and SW component management challenges are more in the domain of 
> Hans' expertise so I'd prefer to hear his opinion on this.
> 
> Personally I feel though that the commit message is not entirely honest 
> on all the impact as is. The wordings are sounding quite innocent while if 
> I infer the above right, an incorrect combination will cause a 
> non-gracious failure.

There are basically 4 possible scenarios and to me it
is only clear from this thread what will happen in 3 of
the 4 scenarios :

1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works

If the answer to 3 is: "works" then I agree that this patch
should be submitted to Linus as a fix with Cc: stable ASAP
and then once that has hit most stable series it should be
ok to upgrade the fw in linux-firmware

Note this is still not ideal but IMHO it would be ok.

But if the answer is "broken" then we will really need to
find some way to unbreak this, which could be as simple
as querying the fw-version and basing the size on this,
but having a kernel change which will regress things for
users who do not have the old firmware yet is simply
not acceptable.

Regards,

Hans
Shyam Sundar S K Oct. 30, 2024, 4:03 p.m. UTC | #10
Hi,

On 10/30/2024 19:30, Hans de Goede wrote:
> Hi,
> 
> On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
>> Hi Hens,
>>
>> There a question / item needing your input below.
>>
>> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in
>>>>>>>>>> the
>>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>>
>>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>>> debug
>>>>>>>>>> information and additional policy input conditions for
>>>>>>>>>> evaluating the
>>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>>> increased to
>>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>>> PMF-TA.
>>>>>>>>>>
>>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>>
>>>>>>>>> How does this present to a user?  From what you describe it seems
>>>>>>>>> to
>>>>>>>>> me like this means a new TA will fail on older kernel in some way.
>>>>>>>>
>>>>>>>> Newer TA will not fail on older systems. This change is just about
>>>>>>>> the
>>>>>>>> increase in TA reserved memory that is presented as "shared memory",
>>>>>>>> as TA needs the additional memory for its own debug data structures.
>>>>>>>
>>>>>>> Thx for comments. But so if you use new TA with older kernel driver,
>>>>>>> what will happen?  Can TA do a buffer overrun because the presented
>>>>>>> shared memory was too small?
>>>>>>>
>>>>>>
>>>>>> New TA will fail on older kernel and hence this change will be
>>>>>> required for new TA to work.
>>>>>
>>>>> OK, that's what I was worried about.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>    From user standpoint, always be on latest FW, irrespective of the
>>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>>> versioning
>>>>>>>> name (in the future, if there is a need for having a limited support
>>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>>> stuff).
>>>>>>>
>>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>>> expectation that these two trains don't need to arrive at station at
>>>>>>> the same time.
>>>>>>>
>>>>>>>>
>>>>>>>>> Some ideas:
>>>>>>>>>
>>>>>>>>> 1) Should there be header version check on the TA and dynamically
>>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>>
>>>>>>>>
>>>>>>>> This can be done, when the TA versioning upgrade happens, like from
>>>>>>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>>>>>>
>>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>>> output
>>>>>>>>> size?
>>>>>>>>>
>>>>>>>>
>>>>>>>> No, this is just the initial shared memory that the driver allocates
>>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>>
>>>>>>>>> 3) Or should the new TA filename be versioned, and the driver has
>>>>>>>>> a
>>>>>>>>> fallback policy?
>>>>>>>>>
>>>>>>>>> Whatever the outcome is; I think it's best that if possible this
>>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>>> users as
>>>>>>>>> distros update linux-firmware.  For example Fedora updates this
>>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Advisory to distros should be to pick the latest PMF TA (note that,
>>>>>>>> I
>>>>>>>> have not still submitted to new TA FW).
>>>>>>>
>>>>>>> Yeah we can advise distros to pick it up when upstreamed as long as
>>>>>>> there isn't tight dependency on this patch being present.
>>>>>>>
>>>>>>
>>>>>> That is the reason I am waiting for this change to land. Once that is
>>>>>> done, I will submit the new TA, you can send out a advisory to upgrade
>>>>>> the kernel or this change has to be back-ported to stable/oem kernels
>>>>>> for their enablement.
>>>>>>
>>>>>> Makes sense?
>>>>>>
>>>>>
>>>>> I think we need Hans' and Ilpo's comments here to decide what to do.
>>>>>
>>>>
>>>> Sure.
>>>>
>>>>> I will say that when we had this happen in amdgpu for a breaking
>>>>> reason there was a new firmware binary filename created/upstreamed for
>>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>>> have fallback code so it could be compatible with either binary.
>>>>>
>>>>
>>>> True. In case of amdgpu, the FW loading is part of the amdgpu driver.
>>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>>> through the TEE commands.
>>>>
>>>> So, there is no need for FW versioning logic in PMF driver.
>>>>
>>>
>>> That's a very good point, and this is a lot of complexity then.
>>>
>>>>
>>>>> * If user on older kernel took newer linux-firmware package they used
>>>>> older binary.
>>>>> * If user on newer kernel took older linux-firmware package they used
>>>>> older binary.
>>>>> * If user on newer kernel took newer linux-firmware package they used
>>>>> newer binary.
>>>>>
>>>>> If the decision is this goes in "as is" it definitely needs to go back
>>>>> to stable kernels.
>>>>>
>>>>
>>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>>> should be use latest driver and latest FW that avoids a lot of
>>>> confusion and yeah for that to happen this change has to go to stable.
>>>>
>>>> Thanks,
>>>> Shyam
>>>
>>> Of course Hans and Ilpo make the final call, but I think from our discussions
>>> here it would be ideal that patch 1 and patch 5 from this series go into 6.12
>>> and have stable tags, the rest would be 6.13 material.
>>
>> Distros and SW component management challenges are more in the domain of 
>> Hans' expertise so I'd prefer to hear his opinion on this.
>>
>> Personally I feel though that the commit message is not entirely honest 
>> on all the impact as is. The wordings are sounding quite innocent while if 
>> I infer the above right, an incorrect combination will cause a 
>> non-gracious failure.
> 
> There are basically 4 possible scenarios and to me it
> is only clear from this thread what will happen in 3 of
> the 4 scenarios :
> 
> 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
> 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
> 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
> 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
> 
> If the answer to 3 is: "works" then I agree that this patch
> should be submitted to Linus as a fix with Cc: stable ASAP
> and then once that has hit most stable series it should be
> ok to upgrade the fw in linux-firmware
> 

Short answer, "yes" it does not work for "3." and you can consider it
a broken.

> Note this is still not ideal but IMHO it would be ok.
> 
> But if the answer is "broken" then we will really need to
> find some way to unbreak this, which could be as simple
> as querying the fw-version and basing the size on this,
> but having a kernel change which will regress things for
> users who do not have the old firmware yet is simply
> not acceptable.
> 

I am not sure if there is a firmware versioning interface that the ASP
(AMD Security Processor) returns back the kernel/driver.

The code path in this case is:

AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
TA -> ASP HW.

So, I uncertain which module has this information and where exactly
the code of fw versioning has to reside. It will take a while for me
to dig this in.

Meanwhile, shall I drop this patch and resend the series (by
addressing the dev_dbg change Mario commented) so that this atleast
becomes a 6.13 material?


Thanks,
Shyam

> Regards,
> 
> Hans
> 
> 
> 
>
Mario Limonciello Oct. 30, 2024, 4:08 p.m. UTC | #11
On 10/30/2024 11:03, Shyam Sundar S K wrote:
> Hi,
> 
> On 10/30/2024 19:30, Hans de Goede wrote:
>> Hi,
>>
>> On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
>>> Hi Hens,
>>>
>>> There a question / item needing your input below.
>>>
>>> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>>>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in
>>>>>>>>>>> the
>>>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>>>
>>>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>>>> debug
>>>>>>>>>>> information and additional policy input conditions for
>>>>>>>>>>> evaluating the
>>>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>>>> increased to
>>>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>>>> PMF-TA.
>>>>>>>>>>>
>>>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>>>
>>>>>>>>>> How does this present to a user?  From what you describe it seems
>>>>>>>>>> to
>>>>>>>>>> me like this means a new TA will fail on older kernel in some way.
>>>>>>>>>
>>>>>>>>> Newer TA will not fail on older systems. This change is just about
>>>>>>>>> the
>>>>>>>>> increase in TA reserved memory that is presented as "shared memory",
>>>>>>>>> as TA needs the additional memory for its own debug data structures.
>>>>>>>>
>>>>>>>> Thx for comments. But so if you use new TA with older kernel driver,
>>>>>>>> what will happen?  Can TA do a buffer overrun because the presented
>>>>>>>> shared memory was too small?
>>>>>>>>
>>>>>>>
>>>>>>> New TA will fail on older kernel and hence this change will be
>>>>>>> required for new TA to work.
>>>>>>
>>>>>> OK, that's what I was worried about.
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>     From user standpoint, always be on latest FW, irrespective of the
>>>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>>>> versioning
>>>>>>>>> name (in the future, if there is a need for having a limited support
>>>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>>>> stuff).
>>>>>>>>
>>>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>>>> expectation that these two trains don't need to arrive at station at
>>>>>>>> the same time.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Some ideas:
>>>>>>>>>>
>>>>>>>>>> 1) Should there be header version check on the TA and dynamically
>>>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This can be done, when the TA versioning upgrade happens, like from
>>>>>>>>> 1.3 to 1.4, apart from that there is no header stuff association.
>>>>>>>>>
>>>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>>>> output
>>>>>>>>>> size?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, this is just the initial shared memory that the driver allocates
>>>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>>>
>>>>>>>>>> 3) Or should the new TA filename be versioned, and the driver has
>>>>>>>>>> a
>>>>>>>>>> fallback policy?
>>>>>>>>>>
>>>>>>>>>> Whatever the outcome is; I think it's best that if possible this
>>>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>>>> users as
>>>>>>>>>> distros update linux-firmware.  For example Fedora updates this
>>>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Advisory to distros should be to pick the latest PMF TA (note that,
>>>>>>>>> I
>>>>>>>>> have not still submitted to new TA FW).
>>>>>>>>
>>>>>>>> Yeah we can advise distros to pick it up when upstreamed as long as
>>>>>>>> there isn't tight dependency on this patch being present.
>>>>>>>>
>>>>>>>
>>>>>>> That is the reason I am waiting for this change to land. Once that is
>>>>>>> done, I will submit the new TA, you can send out a advisory to upgrade
>>>>>>> the kernel or this change has to be back-ported to stable/oem kernels
>>>>>>> for their enablement.
>>>>>>>
>>>>>>> Makes sense?
>>>>>>>
>>>>>>
>>>>>> I think we need Hans' and Ilpo's comments here to decide what to do.
>>>>>>
>>>>>
>>>>> Sure.
>>>>>
>>>>>> I will say that when we had this happen in amdgpu for a breaking
>>>>>> reason there was a new firmware binary filename created/upstreamed for
>>>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>>>> have fallback code so it could be compatible with either binary.
>>>>>>
>>>>>
>>>>> True. In case of amdgpu, the FW loading is part of the amdgpu driver.
>>>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>>>> through the TEE commands.
>>>>>
>>>>> So, there is no need for FW versioning logic in PMF driver.
>>>>>
>>>>
>>>> That's a very good point, and this is a lot of complexity then.
>>>>
>>>>>
>>>>>> * If user on older kernel took newer linux-firmware package they used
>>>>>> older binary.
>>>>>> * If user on newer kernel took older linux-firmware package they used
>>>>>> older binary.
>>>>>> * If user on newer kernel took newer linux-firmware package they used
>>>>>> newer binary.
>>>>>>
>>>>>> If the decision is this goes in "as is" it definitely needs to go back
>>>>>> to stable kernels.
>>>>>>
>>>>>
>>>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>>>> should be use latest driver and latest FW that avoids a lot of
>>>>> confusion and yeah for that to happen this change has to go to stable.
>>>>>
>>>>> Thanks,
>>>>> Shyam
>>>>
>>>> Of course Hans and Ilpo make the final call, but I think from our discussions
>>>> here it would be ideal that patch 1 and patch 5 from this series go into 6.12
>>>> and have stable tags, the rest would be 6.13 material.
>>>
>>> Distros and SW component management challenges are more in the domain of
>>> Hans' expertise so I'd prefer to hear his opinion on this.
>>>
>>> Personally I feel though that the commit message is not entirely honest
>>> on all the impact as is. The wordings are sounding quite innocent while if
>>> I infer the above right, an incorrect combination will cause a
>>> non-gracious failure.
>>
>> There are basically 4 possible scenarios and to me it
>> is only clear from this thread what will happen in 3 of
>> the 4 scenarios :
>>
>> 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
>> 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
>> 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
>> 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
>>
>> If the answer to 3 is: "works" then I agree that this patch
>> should be submitted to Linus as a fix with Cc: stable ASAP
>> and then once that has hit most stable series it should be
>> ok to upgrade the fw in linux-firmware
>>
> 
> Short answer, "yes" it does not work for "3." and you can consider it
> a broken.
> 
>> Note this is still not ideal but IMHO it would be ok.
>>
>> But if the answer is "broken" then we will really need to
>> find some way to unbreak this, which could be as simple
>> as querying the fw-version and basing the size on this,
>> but having a kernel change which will regress things for
>> users who do not have the old firmware yet is simply
>> not acceptable.
>>
> 
> I am not sure if there is a firmware versioning interface that the ASP
> (AMD Security Processor) returns back the kernel/driver.
  > The code path in this case is:
> 
> AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
> TA -> ASP HW.
> 
> So, I uncertain which module has this information and where exactly
> the code of fw versioning has to reside. It will take a while for me
> to dig this in.

As a solution to this, can amd-pmf explicitly do it's own 
request_firmware() call to load the firmware binary and determine the 
size to use in the array and then discard the loaded binary?

This would let the TEE module still do it's own load later like normal 
without having to plumb this information across subsystems.

> 
> Meanwhile, shall I drop this patch and resend the series (by
> addressing the dev_dbg change Mario commented) so that this atleast
> becomes a 6.13 material?
 > >
> Thanks,
> Shyam
> 
>> Regards,
>>
>> Hans
>>
>>
>>
>>
Shyam Sundar S K Oct. 30, 2024, 4:23 p.m. UTC | #12
On 10/30/2024 21:38, Mario Limonciello wrote:
> On 10/30/2024 11:03, Shyam Sundar S K wrote:
>> Hi,
>>
>> On 10/30/2024 19:30, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
>>>> Hi Hens,
>>>>
>>>> There a question / item needing your input below.
>>>>
>>>> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>>>>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>>>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is
>>>>>>>>>>>> located in
>>>>>>>>>>>> the
>>>>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>>>>
>>>>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>>>>> debug
>>>>>>>>>>>> information and additional policy input conditions for
>>>>>>>>>>>> evaluating the
>>>>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>>>>> increased to
>>>>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>>>>> PMF-TA.
>>>>>>>>>>>>
>>>>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>>>>
>>>>>>>>>>> How does this present to a user?  From what you describe it
>>>>>>>>>>> seems
>>>>>>>>>>> to
>>>>>>>>>>> me like this means a new TA will fail on older kernel in
>>>>>>>>>>> some way.
>>>>>>>>>>
>>>>>>>>>> Newer TA will not fail on older systems. This change is just
>>>>>>>>>> about
>>>>>>>>>> the
>>>>>>>>>> increase in TA reserved memory that is presented as "shared
>>>>>>>>>> memory",
>>>>>>>>>> as TA needs the additional memory for its own debug data
>>>>>>>>>> structures.
>>>>>>>>>
>>>>>>>>> Thx for comments. But so if you use new TA with older kernel
>>>>>>>>> driver,
>>>>>>>>> what will happen?  Can TA do a buffer overrun because the
>>>>>>>>> presented
>>>>>>>>> shared memory was too small?
>>>>>>>>>
>>>>>>>>
>>>>>>>> New TA will fail on older kernel and hence this change will be
>>>>>>>> required for new TA to work.
>>>>>>>
>>>>>>> OK, that's what I was worried about.
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     From user standpoint, always be on latest FW,
>>>>>>>>>> irrespective of the
>>>>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>>>>> versioning
>>>>>>>>>> name (in the future, if there is a need for having a limited
>>>>>>>>>> support
>>>>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>>>>> stuff).
>>>>>>>>>
>>>>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>>>>> expectation that these two trains don't need to arrive at
>>>>>>>>> station at
>>>>>>>>> the same time.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Some ideas:
>>>>>>>>>>>
>>>>>>>>>>> 1) Should there be header version check on the TA and
>>>>>>>>>>> dynamically
>>>>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This can be done, when the TA versioning upgrade happens,
>>>>>>>>>> like from
>>>>>>>>>> 1.3 to 1.4, apart from that there is no header stuff
>>>>>>>>>> association.
>>>>>>>>>>
>>>>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>>>>> output
>>>>>>>>>>> size?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No, this is just the initial shared memory that the driver
>>>>>>>>>> allocates
>>>>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>>>>
>>>>>>>>>>> 3) Or should the new TA filename be versioned, and the
>>>>>>>>>>> driver has
>>>>>>>>>>> a
>>>>>>>>>>> fallback policy?
>>>>>>>>>>>
>>>>>>>>>>> Whatever the outcome is; I think it's best that if possible
>>>>>>>>>>> this
>>>>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>>>>> users as
>>>>>>>>>>> distros update linux-firmware.  For example Fedora updates
>>>>>>>>>>> this
>>>>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Advisory to distros should be to pick the latest PMF TA
>>>>>>>>>> (note that,
>>>>>>>>>> I
>>>>>>>>>> have not still submitted to new TA FW).
>>>>>>>>>
>>>>>>>>> Yeah we can advise distros to pick it up when upstreamed as
>>>>>>>>> long as
>>>>>>>>> there isn't tight dependency on this patch being present.
>>>>>>>>>
>>>>>>>>
>>>>>>>> That is the reason I am waiting for this change to land. Once
>>>>>>>> that is
>>>>>>>> done, I will submit the new TA, you can send out a advisory to
>>>>>>>> upgrade
>>>>>>>> the kernel or this change has to be back-ported to stable/oem
>>>>>>>> kernels
>>>>>>>> for their enablement.
>>>>>>>>
>>>>>>>> Makes sense?
>>>>>>>>
>>>>>>>
>>>>>>> I think we need Hans' and Ilpo's comments here to decide what
>>>>>>> to do.
>>>>>>>
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>>> I will say that when we had this happen in amdgpu for a breaking
>>>>>>> reason there was a new firmware binary filename
>>>>>>> created/upstreamed for
>>>>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>>>>> have fallback code so it could be compatible with either binary.
>>>>>>>
>>>>>>
>>>>>> True. In case of amdgpu, the FW loading is part of the amdgpu
>>>>>> driver.
>>>>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>>>>> through the TEE commands.
>>>>>>
>>>>>> So, there is no need for FW versioning logic in PMF driver.
>>>>>>
>>>>>
>>>>> That's a very good point, and this is a lot of complexity then.
>>>>>
>>>>>>
>>>>>>> * If user on older kernel took newer linux-firmware package
>>>>>>> they used
>>>>>>> older binary.
>>>>>>> * If user on newer kernel took older linux-firmware package
>>>>>>> they used
>>>>>>> older binary.
>>>>>>> * If user on newer kernel took newer linux-firmware package
>>>>>>> they used
>>>>>>> newer binary.
>>>>>>>
>>>>>>> If the decision is this goes in "as is" it definitely needs to
>>>>>>> go back
>>>>>>> to stable kernels.
>>>>>>>
>>>>>>
>>>>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>>>>> should be use latest driver and latest FW that avoids a lot of
>>>>>> confusion and yeah for that to happen this change has to go to
>>>>>> stable.
>>>>>>
>>>>>> Thanks,
>>>>>> Shyam
>>>>>
>>>>> Of course Hans and Ilpo make the final call, but I think from our
>>>>> discussions
>>>>> here it would be ideal that patch 1 and patch 5 from this series
>>>>> go into 6.12
>>>>> and have stable tags, the rest would be 6.13 material.
>>>>
>>>> Distros and SW component management challenges are more in the
>>>> domain of
>>>> Hans' expertise so I'd prefer to hear his opinion on this.
>>>>
>>>> Personally I feel though that the commit message is not entirely
>>>> honest
>>>> on all the impact as is. The wordings are sounding quite innocent
>>>> while if
>>>> I infer the above right, an incorrect combination will cause a
>>>> non-gracious failure.
>>>
>>> There are basically 4 possible scenarios and to me it
>>> is only clear from this thread what will happen in 3 of
>>> the 4 scenarios :
>>>
>>> 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
>>> 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
>>> 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
>>> 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
>>>
>>> If the answer to 3 is: "works" then I agree that this patch
>>> should be submitted to Linus as a fix with Cc: stable ASAP
>>> and then once that has hit most stable series it should be
>>> ok to upgrade the fw in linux-firmware
>>>
>>
>> Short answer, "yes" it does not work for "3." and you can consider it
>> a broken.
>>
>>> Note this is still not ideal but IMHO it would be ok.
>>>
>>> But if the answer is "broken" then we will really need to
>>> find some way to unbreak this, which could be as simple
>>> as querying the fw-version and basing the size on this,
>>> but having a kernel change which will regress things for
>>> users who do not have the old firmware yet is simply
>>> not acceptable.
>>>
>>
>> I am not sure if there is a firmware versioning interface that the ASP
>> (AMD Security Processor) returns back the kernel/driver.
>  > The code path in this case is:
>>
>> AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
>> TA -> ASP HW.
>>
>> So, I uncertain which module has this information and where exactly
>> the code of fw versioning has to reside. It will take a while for me
>> to dig this in.
> 
> As a solution to this, can amd-pmf explicitly do it's own
> request_firmware() call to load the firmware binary and determine the
> size to use in the array and then discard the loaded binary?
> 
> This would let the TEE module still do it's own load later like normal
> without having to plumb this information across subsystems.
> 

TEE driver feeds in a lot of metadata and the structure information
for PSP headers and I don't think just having a request_firmware()
will help.

Sidebar, TEE driver has a lot of plumbing that can be used decrypt the
policy binaries to debug issues related to TA load failures and policy
binary issues (basically the descriptors)

So we might end up in replicating a majority of TEE code into PMF
driver, which might not be a good design choice.

Let me first talk to internal folks to see if we can solve it by not
making complex changes.

Thanks,
Shyam


>>
>> Meanwhile, shall I drop this patch and resend the series (by
>> addressing the dev_dbg change Mario commented) so that this atleast
>> becomes a 6.13 material?
>> >
>> Thanks,
>> Shyam
>>
>>> Regards,
>>>
>>> Hans
>>>
>>>
>>>
>>>
>
Mario Limonciello Oct. 30, 2024, 4:38 p.m. UTC | #13
On 10/30/2024 11:23, Shyam Sundar S K wrote:
> 
> 
> On 10/30/2024 21:38, Mario Limonciello wrote:
>> On 10/30/2024 11:03, Shyam Sundar S K wrote:
>>> Hi,
>>>
>>> On 10/30/2024 19:30, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
>>>>> Hi Hens,
>>>>>
>>>>> There a question / item needing your input below.
>>>>>
>>>>> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>>>>>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>>>>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>>>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is
>>>>>>>>>>>>> located in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>>>>>> debug
>>>>>>>>>>>>> information and additional policy input conditions for
>>>>>>>>>>>>> evaluating the
>>>>>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>>>>>> increased to
>>>>>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>>>>>> PMF-TA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>>>>>
>>>>>>>>>>>> How does this present to a user?  From what you describe it
>>>>>>>>>>>> seems
>>>>>>>>>>>> to
>>>>>>>>>>>> me like this means a new TA will fail on older kernel in
>>>>>>>>>>>> some way.
>>>>>>>>>>>
>>>>>>>>>>> Newer TA will not fail on older systems. This change is just
>>>>>>>>>>> about
>>>>>>>>>>> the
>>>>>>>>>>> increase in TA reserved memory that is presented as "shared
>>>>>>>>>>> memory",
>>>>>>>>>>> as TA needs the additional memory for its own debug data
>>>>>>>>>>> structures.
>>>>>>>>>>
>>>>>>>>>> Thx for comments. But so if you use new TA with older kernel
>>>>>>>>>> driver,
>>>>>>>>>> what will happen?  Can TA do a buffer overrun because the
>>>>>>>>>> presented
>>>>>>>>>> shared memory was too small?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> New TA will fail on older kernel and hence this change will be
>>>>>>>>> required for new TA to work.
>>>>>>>>
>>>>>>>> OK, that's what I was worried about.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      From user standpoint, always be on latest FW,
>>>>>>>>>>> irrespective of the
>>>>>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>>>>>> versioning
>>>>>>>>>>> name (in the future, if there is a need for having a limited
>>>>>>>>>>> support
>>>>>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>>>>>> stuff).
>>>>>>>>>>
>>>>>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>>>>>> expectation that these two trains don't need to arrive at
>>>>>>>>>> station at
>>>>>>>>>> the same time.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Some ideas:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Should there be header version check on the TA and
>>>>>>>>>>>> dynamically
>>>>>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This can be done, when the TA versioning upgrade happens,
>>>>>>>>>>> like from
>>>>>>>>>>> 1.3 to 1.4, apart from that there is no header stuff
>>>>>>>>>>> association.
>>>>>>>>>>>
>>>>>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>>>>>> output
>>>>>>>>>>>> size?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, this is just the initial shared memory that the driver
>>>>>>>>>>> allocates
>>>>>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>>>>>
>>>>>>>>>>>> 3) Or should the new TA filename be versioned, and the
>>>>>>>>>>>> driver has
>>>>>>>>>>>> a
>>>>>>>>>>>> fallback policy?
>>>>>>>>>>>>
>>>>>>>>>>>> Whatever the outcome is; I think it's best that if possible
>>>>>>>>>>>> this
>>>>>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>>>>>> users as
>>>>>>>>>>>> distros update linux-firmware.  For example Fedora updates
>>>>>>>>>>>> this
>>>>>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Advisory to distros should be to pick the latest PMF TA
>>>>>>>>>>> (note that,
>>>>>>>>>>> I
>>>>>>>>>>> have not still submitted to new TA FW).
>>>>>>>>>>
>>>>>>>>>> Yeah we can advise distros to pick it up when upstreamed as
>>>>>>>>>> long as
>>>>>>>>>> there isn't tight dependency on this patch being present.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is the reason I am waiting for this change to land. Once
>>>>>>>>> that is
>>>>>>>>> done, I will submit the new TA, you can send out a advisory to
>>>>>>>>> upgrade
>>>>>>>>> the kernel or this change has to be back-ported to stable/oem
>>>>>>>>> kernels
>>>>>>>>> for their enablement.
>>>>>>>>>
>>>>>>>>> Makes sense?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think we need Hans' and Ilpo's comments here to decide what
>>>>>>>> to do.
>>>>>>>>
>>>>>>>
>>>>>>> Sure.
>>>>>>>
>>>>>>>> I will say that when we had this happen in amdgpu for a breaking
>>>>>>>> reason there was a new firmware binary filename
>>>>>>>> created/upstreamed for
>>>>>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>>>>>> have fallback code so it could be compatible with either binary.
>>>>>>>>
>>>>>>>
>>>>>>> True. In case of amdgpu, the FW loading is part of the amdgpu
>>>>>>> driver.
>>>>>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>>>>>> through the TEE commands.
>>>>>>>
>>>>>>> So, there is no need for FW versioning logic in PMF driver.
>>>>>>>
>>>>>>
>>>>>> That's a very good point, and this is a lot of complexity then.
>>>>>>
>>>>>>>
>>>>>>>> * If user on older kernel took newer linux-firmware package
>>>>>>>> they used
>>>>>>>> older binary.
>>>>>>>> * If user on newer kernel took older linux-firmware package
>>>>>>>> they used
>>>>>>>> older binary.
>>>>>>>> * If user on newer kernel took newer linux-firmware package
>>>>>>>> they used
>>>>>>>> newer binary.
>>>>>>>>
>>>>>>>> If the decision is this goes in "as is" it definitely needs to
>>>>>>>> go back
>>>>>>>> to stable kernels.
>>>>>>>>
>>>>>>>
>>>>>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>>>>>> should be use latest driver and latest FW that avoids a lot of
>>>>>>> confusion and yeah for that to happen this change has to go to
>>>>>>> stable.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Shyam
>>>>>>
>>>>>> Of course Hans and Ilpo make the final call, but I think from our
>>>>>> discussions
>>>>>> here it would be ideal that patch 1 and patch 5 from this series
>>>>>> go into 6.12
>>>>>> and have stable tags, the rest would be 6.13 material.
>>>>>
>>>>> Distros and SW component management challenges are more in the
>>>>> domain of
>>>>> Hans' expertise so I'd prefer to hear his opinion on this.
>>>>>
>>>>> Personally I feel though that the commit message is not entirely
>>>>> honest
>>>>> on all the impact as is. The wordings are sounding quite innocent
>>>>> while if
>>>>> I infer the above right, an incorrect combination will cause a
>>>>> non-gracious failure.
>>>>
>>>> There are basically 4 possible scenarios and to me it
>>>> is only clear from this thread what will happen in 3 of
>>>> the 4 scenarios :
>>>>
>>>> 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
>>>> 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
>>>> 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
>>>> 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
>>>>
>>>> If the answer to 3 is: "works" then I agree that this patch
>>>> should be submitted to Linus as a fix with Cc: stable ASAP
>>>> and then once that has hit most stable series it should be
>>>> ok to upgrade the fw in linux-firmware
>>>>
>>>
>>> Short answer, "yes" it does not work for "3." and you can consider it
>>> a broken.
>>>
>>>> Note this is still not ideal but IMHO it would be ok.
>>>>
>>>> But if the answer is "broken" then we will really need to
>>>> find some way to unbreak this, which could be as simple
>>>> as querying the fw-version and basing the size on this,
>>>> but having a kernel change which will regress things for
>>>> users who do not have the old firmware yet is simply
>>>> not acceptable.
>>>>
>>>
>>> I am not sure if there is a firmware versioning interface that the ASP
>>> (AMD Security Processor) returns back the kernel/driver.
>>   > The code path in this case is:
>>>
>>> AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
>>> TA -> ASP HW.
>>>
>>> So, I uncertain which module has this information and where exactly
>>> the code of fw versioning has to reside. It will take a while for me
>>> to dig this in.
>>
>> As a solution to this, can amd-pmf explicitly do it's own
>> request_firmware() call to load the firmware binary and determine the
>> size to use in the array and then discard the loaded binary?
>>
>> This would let the TEE module still do it's own load later like normal
>> without having to plumb this information across subsystems.
>>
> 
> TEE driver feeds in a lot of metadata and the structure information
> for PSP headers and I don't think just having a request_firmware()
> will help.
> 
> Sidebar, TEE driver has a lot of plumbing that can be used decrypt the
> policy binaries to debug issues related to TA load failures and policy
> binary issues (basically the descriptors)
> 
> So we might end up in replicating a majority of TEE code into PMF
> driver, which might not be a good design choice.
> 
> Let me first talk to internal folks to see if we can solve it by not
> making complex changes.
> 

OK, I think for the purpose of 6.13 then this series minus the last 
patch probably makes sense.

I also think the first patch should ideally come in 6.12 if Hans is OK 
with that.  As 6.12 is probably going to be the next LTS we'll want 
hardware supported as widely as possible within stable rules.

> Thanks,
> Shyam
> 
> 
>>>
>>> Meanwhile, shall I drop this patch and resend the series (by
>>> addressing the dev_dbg change Mario commented) so that this atleast
>>> becomes a 6.13 material?
>>>>
>>> Thanks,
>>> Shyam
>>>
>>>> Regards,
>>>>
>>>> Hans
>>>>
>>>>
>>>>
>>>>
>>
Hans de Goede Oct. 30, 2024, 5:19 p.m. UTC | #14
Hi,

On 30-Oct-24 5:38 PM, Mario Limonciello wrote:
> On 10/30/2024 11:23, Shyam Sundar S K wrote:
>>
>>
>> On 10/30/2024 21:38, Mario Limonciello wrote:
>>> On 10/30/2024 11:03, Shyam Sundar S K wrote:
>>>> Hi,
>>>>
>>>> On 10/30/2024 19:30, Hans de Goede wrote:
>>>>> Hi,
>>>>>
>>>>> On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
>>>>>> Hi Hens,
>>>>>>
>>>>>> There a question / item needing your input below.
>>>>>>
>>>>>> On Wed, 23 Oct 2024, Mario Limonciello wrote:
>>>>>>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
>>>>>>>> On 10/23/2024 21:10, Mario Limonciello wrote:
>>>>>>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
>>>>>>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
>>>>>>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
>>>>>>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
>>>>>>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
>>>>>>>>>>>>>> The PMF driver will allocate shared buffer memory using the
>>>>>>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is
>>>>>>>>>>>>>> located in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> secure world and is used for communication with the PMF-TA.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>> information and additional policy input conditions for
>>>>>>>>>>>>>> evaluating the
>>>>>>>>>>>>>> policy binary. Consequently, the shared memory size must be
>>>>>>>>>>>>>> increased to
>>>>>>>>>>>>>> ensure compatibility between the PMF driver and the updated
>>>>>>>>>>>>>> PMF-TA.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
>>>>>>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> How does this present to a user?  From what you describe it
>>>>>>>>>>>>> seems
>>>>>>>>>>>>> to
>>>>>>>>>>>>> me like this means a new TA will fail on older kernel in
>>>>>>>>>>>>> some way.
>>>>>>>>>>>>
>>>>>>>>>>>> Newer TA will not fail on older systems. This change is just
>>>>>>>>>>>> about
>>>>>>>>>>>> the
>>>>>>>>>>>> increase in TA reserved memory that is presented as "shared
>>>>>>>>>>>> memory",
>>>>>>>>>>>> as TA needs the additional memory for its own debug data
>>>>>>>>>>>> structures.
>>>>>>>>>>>
>>>>>>>>>>> Thx for comments. But so if you use new TA with older kernel
>>>>>>>>>>> driver,
>>>>>>>>>>> what will happen?  Can TA do a buffer overrun because the
>>>>>>>>>>> presented
>>>>>>>>>>> shared memory was too small?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> New TA will fail on older kernel and hence this change will be
>>>>>>>>>> required for new TA to work.
>>>>>>>>>
>>>>>>>>> OK, that's what I was worried about.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>      From user standpoint, always be on latest FW,
>>>>>>>>>>>> irrespective of the
>>>>>>>>>>>> platform. At this point in time, I don't see a need for FW
>>>>>>>>>>>> versioning
>>>>>>>>>>>> name (in the future, if there is a need for having a limited
>>>>>>>>>>>> support
>>>>>>>>>>>> to older platforms, we can carve out a logic to do versioning
>>>>>>>>>>>> stuff).
>>>>>>>>>>>
>>>>>>>>>>> I wish we could enforce this, but In the Linux world there is an
>>>>>>>>>>> expectation that these two trains don't need to arrive at
>>>>>>>>>>> station at
>>>>>>>>>>> the same time.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Some ideas:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Should there be header version check on the TA and
>>>>>>>>>>>>> dynamically
>>>>>>>>>>>>> allocate the structure size based on the version of the F/W?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This can be done, when the TA versioning upgrade happens,
>>>>>>>>>>>> like from
>>>>>>>>>>>> 1.3 to 1.4, apart from that there is no header stuff
>>>>>>>>>>>> association.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Or is there a command to the TA that can query the expected
>>>>>>>>>>>>> output
>>>>>>>>>>>>> size?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No, this is just the initial shared memory that the driver
>>>>>>>>>>>> allocates
>>>>>>>>>>>> to pass the inputs and the commands to TA.
>>>>>>>>>>>>
>>>>>>>>>>>>> 3) Or should the new TA filename be versioned, and the
>>>>>>>>>>>>> driver has
>>>>>>>>>>>>> a
>>>>>>>>>>>>> fallback policy?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Whatever the outcome is; I think it's best that if possible
>>>>>>>>>>>>> this
>>>>>>>>>>>>> change goes back to stable to try to minimize regressions to
>>>>>>>>>>>>> users as
>>>>>>>>>>>>> distros update linux-firmware.  For example Fedora updates
>>>>>>>>>>>>> this
>>>>>>>>>>>>> monthly, but also tracks stable kernels.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Advisory to distros should be to pick the latest PMF TA
>>>>>>>>>>>> (note that,
>>>>>>>>>>>> I
>>>>>>>>>>>> have not still submitted to new TA FW).
>>>>>>>>>>>
>>>>>>>>>>> Yeah we can advise distros to pick it up when upstreamed as
>>>>>>>>>>> long as
>>>>>>>>>>> there isn't tight dependency on this patch being present.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That is the reason I am waiting for this change to land. Once
>>>>>>>>>> that is
>>>>>>>>>> done, I will submit the new TA, you can send out a advisory to
>>>>>>>>>> upgrade
>>>>>>>>>> the kernel or this change has to be back-ported to stable/oem
>>>>>>>>>> kernels
>>>>>>>>>> for their enablement.
>>>>>>>>>>
>>>>>>>>>> Makes sense?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we need Hans' and Ilpo's comments here to decide what
>>>>>>>>> to do.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sure.
>>>>>>>>
>>>>>>>>> I will say that when we had this happen in amdgpu for a breaking
>>>>>>>>> reason there was a new firmware binary filename
>>>>>>>>> created/upstreamed for
>>>>>>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
>>>>>>>>> have fallback code so it could be compatible with either binary.
>>>>>>>>>
>>>>>>>>
>>>>>>>> True. In case of amdgpu, the FW loading is part of the amdgpu
>>>>>>>> driver.
>>>>>>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
>>>>>>>> through the TEE commands.
>>>>>>>>
>>>>>>>> So, there is no need for FW versioning logic in PMF driver.
>>>>>>>>
>>>>>>>
>>>>>>> That's a very good point, and this is a lot of complexity then.
>>>>>>>
>>>>>>>>
>>>>>>>>> * If user on older kernel took newer linux-firmware package
>>>>>>>>> they used
>>>>>>>>> older binary.
>>>>>>>>> * If user on newer kernel took older linux-firmware package
>>>>>>>>> they used
>>>>>>>>> older binary.
>>>>>>>>> * If user on newer kernel took newer linux-firmware package
>>>>>>>>> they used
>>>>>>>>> newer binary.
>>>>>>>>>
>>>>>>>>> If the decision is this goes in "as is" it definitely needs to
>>>>>>>>> go back
>>>>>>>>> to stable kernels.
>>>>>>>>>
>>>>>>>>
>>>>>>>> IMHO, let's not put too many fallback mechanisms. The philosophy
>>>>>>>> should be use latest driver and latest FW that avoids a lot of
>>>>>>>> confusion and yeah for that to happen this change has to go to
>>>>>>>> stable.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shyam
>>>>>>>
>>>>>>> Of course Hans and Ilpo make the final call, but I think from our
>>>>>>> discussions
>>>>>>> here it would be ideal that patch 1 and patch 5 from this series
>>>>>>> go into 6.12
>>>>>>> and have stable tags, the rest would be 6.13 material.
>>>>>>
>>>>>> Distros and SW component management challenges are more in the
>>>>>> domain of
>>>>>> Hans' expertise so I'd prefer to hear his opinion on this.
>>>>>>
>>>>>> Personally I feel though that the commit message is not entirely
>>>>>> honest
>>>>>> on all the impact as is. The wordings are sounding quite innocent
>>>>>> while if
>>>>>> I infer the above right, an incorrect combination will cause a
>>>>>> non-gracious failure.
>>>>>
>>>>> There are basically 4 possible scenarios and to me it
>>>>> is only clear from this thread what will happen in 3 of
>>>>> the 4 scenarios :
>>>>>
>>>>> 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
>>>>> 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
>>>>> 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
>>>>> 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
>>>>>
>>>>> If the answer to 3 is: "works" then I agree that this patch
>>>>> should be submitted to Linus as a fix with Cc: stable ASAP
>>>>> and then once that has hit most stable series it should be
>>>>> ok to upgrade the fw in linux-firmware
>>>>>
>>>>
>>>> Short answer, "yes" it does not work for "3." and you can consider it
>>>> a broken.
>>>>
>>>>> Note this is still not ideal but IMHO it would be ok.
>>>>>
>>>>> But if the answer is "broken" then we will really need to
>>>>> find some way to unbreak this, which could be as simple
>>>>> as querying the fw-version and basing the size on this,
>>>>> but having a kernel change which will regress things for
>>>>> users who do not have the old firmware yet is simply
>>>>> not acceptable.
>>>>>
>>>>
>>>> I am not sure if there is a firmware versioning interface that the ASP
>>>> (AMD Security Processor) returns back the kernel/driver.
>>>   > The code path in this case is:
>>>>
>>>> AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
>>>> TA -> ASP HW.
>>>>
>>>> So, I uncertain which module has this information and where exactly
>>>> the code of fw versioning has to reside. It will take a while for me
>>>> to dig this in.
>>>
>>> As a solution to this, can amd-pmf explicitly do it's own
>>> request_firmware() call to load the firmware binary and determine the
>>> size to use in the array and then discard the loaded binary?
>>>
>>> This would let the TEE module still do it's own load later like normal
>>> without having to plumb this information across subsystems.
>>>
>>
>> TEE driver feeds in a lot of metadata and the structure information
>> for PSP headers and I don't think just having a request_firmware()
>> will help.
>>
>> Sidebar, TEE driver has a lot of plumbing that can be used decrypt the
>> policy binaries to debug issues related to TA load failures and policy
>> binary issues (basically the descriptors)
>>
>> So we might end up in replicating a majority of TEE code into PMF
>> driver, which might not be a good design choice.
>>
>> Let me first talk to internal folks to see if we can solve it by not
>> making complex changes.
>>
> 
> OK, I think for the purpose of 6.13 then this series minus the last patch probably makes sense.
> 
> I also think the first patch should ideally come in 6.12 if Hans is OK with that.  As 6.12 is probably going to be the next LTS we'll want hardware supported as widely as possible within stable rules.

Yes I have already tagged the first patch in patchwork for picking it up
for 6.12 .

Regards,

Hans
Ilpo Järvinen Oct. 31, 2024, 10:57 a.m. UTC | #15
On Wed, 30 Oct 2024, Shyam Sundar S K wrote:
> On 10/30/2024 19:30, Hans de Goede wrote:
> > On 29-Oct-24 3:07 PM, Ilpo Järvinen wrote:
> >> Hi Hens,
> >>
> >> There a question / item needing your input below.
> >>
> >> On Wed, 23 Oct 2024, Mario Limonciello wrote:
> >>> On 10/23/2024 10:52, Shyam Sundar S K wrote:
> >>>> On 10/23/2024 21:10, Mario Limonciello wrote:
> >>>>> On 10/23/2024 10:32, Shyam Sundar S K wrote:
> >>>>>> On 10/23/2024 20:04, Mario Limonciello wrote:
> >>>>>>> On 10/23/2024 09:29, Shyam Sundar S K wrote:
> >>>>>>>> On 10/23/2024 19:41, Mario Limonciello wrote:
> >>>>>>>>> On 10/23/2024 01:32, Shyam Sundar S K wrote:
> >>>>>>>>>> The PMF driver will allocate shared buffer memory using the
> >>>>>>>>>> tee_shm_alloc_kernel_buf(). This allocated memory is located in
> >>>>>>>>>> the
> >>>>>>>>>> secure world and is used for communication with the PMF-TA.
> >>>>>>>>>>
> >>>>>>>>>> The latest PMF-TA version introduces new structures with OEM
> >>>>>>>>>> debug
> >>>>>>>>>> information and additional policy input conditions for
> >>>>>>>>>> evaluating the
> >>>>>>>>>> policy binary. Consequently, the shared memory size must be
> >>>>>>>>>> increased to
> >>>>>>>>>> ensure compatibility between the PMF driver and the updated
> >>>>>>>>>> PMF-TA.
> >>>>>>>>>>
> >>>>>>>>>> Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> >>>>>>>>>> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
> >>>>>>>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
> >>>>>>>>>
> >>>>>>>>> How does this present to a user?  From what you describe it seems
> >>>>>>>>> to
> >>>>>>>>> me like this means a new TA will fail on older kernel in some way.
> >>>>>>>>
> >>>>>>>> Newer TA will not fail on older systems. This change is just about
> >>>>>>>> the
> >>>>>>>> increase in TA reserved memory that is presented as "shared memory",
> >>>>>>>> as TA needs the additional memory for its own debug data structures.
> >>>>>>>
> >>>>>>> Thx for comments. But so if you use new TA with older kernel driver,
> >>>>>>> what will happen?  Can TA do a buffer overrun because the presented
> >>>>>>> shared memory was too small?
> >>>>>>>
> >>>>>>
> >>>>>> New TA will fail on older kernel and hence this change will be
> >>>>>> required for new TA to work.
> >>>>>
> >>>>> OK, that's what I was worried about.
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>>    From user standpoint, always be on latest FW, irrespective of the
> >>>>>>>> platform. At this point in time, I don't see a need for FW
> >>>>>>>> versioning
> >>>>>>>> name (in the future, if there is a need for having a limited support
> >>>>>>>> to older platforms, we can carve out a logic to do versioning
> >>>>>>>> stuff).
> >>>>>>>
> >>>>>>> I wish we could enforce this, but In the Linux world there is an
> >>>>>>> expectation that these two trains don't need to arrive at station at
> >>>>>>> the same time.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> Some ideas:
> >>>>>>>>>
> >>>>>>>>> 1) Should there be header version check on the TA and dynamically
> >>>>>>>>> allocate the structure size based on the version of the F/W?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This can be done, when the TA versioning upgrade happens, like from
> >>>>>>>> 1.3 to 1.4, apart from that there is no header stuff association.
> >>>>>>>>
> >>>>>>>>> 2) Or is there a command to the TA that can query the expected
> >>>>>>>>> output
> >>>>>>>>> size?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> No, this is just the initial shared memory that the driver allocates
> >>>>>>>> to pass the inputs and the commands to TA.
> >>>>>>>>
> >>>>>>>>> 3) Or should the new TA filename be versioned, and the driver has
> >>>>>>>>> a
> >>>>>>>>> fallback policy?
> >>>>>>>>>
> >>>>>>>>> Whatever the outcome is; I think it's best that if possible this
> >>>>>>>>> change goes back to stable to try to minimize regressions to
> >>>>>>>>> users as
> >>>>>>>>> distros update linux-firmware.  For example Fedora updates this
> >>>>>>>>> monthly, but also tracks stable kernels.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Advisory to distros should be to pick the latest PMF TA (note that,
> >>>>>>>> I
> >>>>>>>> have not still submitted to new TA FW).
> >>>>>>>
> >>>>>>> Yeah we can advise distros to pick it up when upstreamed as long as
> >>>>>>> there isn't tight dependency on this patch being present.
> >>>>>>>
> >>>>>>
> >>>>>> That is the reason I am waiting for this change to land. Once that is
> >>>>>> done, I will submit the new TA, you can send out a advisory to upgrade
> >>>>>> the kernel or this change has to be back-ported to stable/oem kernels
> >>>>>> for their enablement.
> >>>>>>
> >>>>>> Makes sense?
> >>>>>>
> >>>>>
> >>>>> I think we need Hans' and Ilpo's comments here to decide what to do.
> >>>>>
> >>>>
> >>>> Sure.
> >>>>
> >>>>> I will say that when we had this happen in amdgpu for a breaking
> >>>>> reason there was a new firmware binary filename created/upstreamed for
> >>>>> the breaking version (IIRC foo.bin -> foo_1.bin) and amdgpu had to
> >>>>> have fallback code so it could be compatible with either binary.
> >>>>>
> >>>>
> >>>> True. In case of amdgpu, the FW loading is part of the amdgpu driver.
> >>>> But in case of PMF, the PMF TA gets picked from the AMD TEE driver
> >>>> through the TEE commands.
> >>>>
> >>>> So, there is no need for FW versioning logic in PMF driver.
> >>>>
> >>>
> >>> That's a very good point, and this is a lot of complexity then.
> >>>
> >>>>
> >>>>> * If user on older kernel took newer linux-firmware package they used
> >>>>> older binary.
> >>>>> * If user on newer kernel took older linux-firmware package they used
> >>>>> older binary.
> >>>>> * If user on newer kernel took newer linux-firmware package they used
> >>>>> newer binary.
> >>>>>
> >>>>> If the decision is this goes in "as is" it definitely needs to go back
> >>>>> to stable kernels.
> >>>>>
> >>>>
> >>>> IMHO, let's not put too many fallback mechanisms. The philosophy
> >>>> should be use latest driver and latest FW that avoids a lot of
> >>>> confusion and yeah for that to happen this change has to go to stable.
> >>>>
> >>>> Thanks,
> >>>> Shyam
> >>>
> >>> Of course Hans and Ilpo make the final call, but I think from our discussions
> >>> here it would be ideal that patch 1 and patch 5 from this series go into 6.12
> >>> and have stable tags, the rest would be 6.13 material.
> >>
> >> Distros and SW component management challenges are more in the domain of 
> >> Hans' expertise so I'd prefer to hear his opinion on this.
> >>
> >> Personally I feel though that the commit message is not entirely honest 
> >> on all the impact as is. The wordings are sounding quite innocent while if 
> >> I infer the above right, an incorrect combination will cause a 
> >> non-gracious failure.
> > 
> > There are basically 4 possible scenarios and to me it
> > is only clear from this thread what will happen in 3 of
> > the 4 scenarios :
> > 
> > 1. Old TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> works
> > 2. New TA fw, Old kernel (TA_OUTPUT_RESERVED_MEM=906) -> broken
> > 3. Old TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> ???
> > 4. New TA fw, new kernel (TA_OUTPUT_RESERVED_MEM=922) -> works
> > 
> > If the answer to 3 is: "works" then I agree that this patch
> > should be submitted to Linus as a fix with Cc: stable ASAP
> > and then once that has hit most stable series it should be
> > ok to upgrade the fw in linux-firmware
> > 
> 
> Short answer, "yes" it does not work for "3." and you can consider it
> a broken.
> 
> > Note this is still not ideal but IMHO it would be ok.
> > 
> > But if the answer is "broken" then we will really need to
> > find some way to unbreak this, which could be as simple
> > as querying the fw-version and basing the size on this,
> > but having a kernel change which will regress things for
> > users who do not have the old firmware yet is simply
> > not acceptable.
> > 
> 
> I am not sure if there is a firmware versioning interface that the ASP
> (AMD Security Processor) returns back the kernel/driver.
> 
> The code path in this case is:
> 
> AMD PMF driver -> AMD TEE driver -> AMD CCP driver -> ASP TEE -> ASP
> TA -> ASP HW.
> 
> So, I uncertain which module has this information and where exactly
> the code of fw versioning has to reside. It will take a while for me
> to dig this in.
> 
> Meanwhile, shall I drop this patch and resend the series (by
> addressing the dev_dbg change Mario commented) so that this atleast
> becomes a 6.13 material?

Yes, please do send a new version which has the comment addressed (I think 
Hans can pick the first patch out of this version so you don't need to 
include that one).

Also, please remember add a lore Link: to this thread into the patch 5/5 
when you send it later in some form.
diff mbox series

Patch

diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
index a79808fda1d8..18f12aad46a9 100644
--- a/drivers/platform/x86/amd/pmf/pmf.h
+++ b/drivers/platform/x86/amd/pmf/pmf.h
@@ -106,7 +106,7 @@  struct cookie_header {
 #define PMF_TA_IF_VERSION_MAJOR				1
 #define TA_PMF_ACTION_MAX					32
 #define TA_PMF_UNDO_MAX						8
-#define TA_OUTPUT_RESERVED_MEM				906
+#define TA_OUTPUT_RESERVED_MEM				922
 #define MAX_OPERATION_PARAMS					4
 
 #define PMF_IF_V1		1