diff mbox series

HID: amd_sfh: Return immediately if no sensor is found

Message ID 20241003160454.3017229-1-Basavaraj.Natikar@amd.com (mailing list archive)
State New
Delegated to: Jiri Kosina
Headers show
Series HID: amd_sfh: Return immediately if no sensor is found | expand

Commit Message

Basavaraj Natikar Oct. 3, 2024, 4:04 p.m. UTC
There is no need for additional cleanup, as all resources are managed.
Additionally, if no sensor is found, there will be no initialization of
HID devices. Therefore, return immediately if no sensor is detected.

Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +--
 drivers/hid/amd-sfh-hid/amd_sfh_pcie.c   | 4 +++-
 2 files changed, 4 insertions(+), 3 deletions(-)

Comments

Thorsten Leemhuis Oct. 4, 2024, 4:51 a.m. UTC | #1
[CCing the three reporters and the regressions list]

On 03.10.24 18:04, Basavaraj Natikar wrote:
> There is no need for additional cleanup, as all resources are managed.
> Additionally, if no sensor is found, there will be no initialization of
> HID devices. Therefore, return immediately if no sensor is detected.

I'm not a reviewer, so feel free to ignore the follow comment:

I think the patch description should mentioned that this bug caused
Memory Errors / Page Faults / btrfs going read-only / btrfs disk
corruption, as that is a crucial detail for later and downstreams that
need to consider when deciding about backporting.

> Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331

Some reported-by tags IMHO would be appropriate to give credit; all
three reporters already agreed to use their email address in public.

There meanwhile is also one comment in the bugzilla ticket that could be
read as a tested-by tag.

Maybe a Link: to
https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/
might be appropriate as well.

Ohh, and participation in stable is optional, but given the severeness
on the problem: would you maybe be willing to add a stable tag to the
commit to ensure this is backported to affected stable series quickly?

Ciao, Thorsten

> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> ---
>  drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +--
>  drivers/hid/amd-sfh-hid/amd_sfh_pcie.c   | 4 +++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> index 4b59687ff5d8..3fcb971d5fda 100644
> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
>  	    (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) {
>  		dev_warn(dev, "Failed to discover, sensors not enabled is %d\n",
>  			 cl_data->is_any_sensor_enabled);
> -		rc = -EOPNOTSUPP;
> -		goto cleanup;
> +		return -EOPNOTSUPP;
>  	}
>  
>  	for (i = 0; i < cl_data->num_hid_devices; i++) {
> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> index 0c28ca349bcd..1300f122b524 100644
> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work)
>  
>  	rc = amd_sfh_hid_client_init(mp2);
>  	if (rc) {
> -		amd_sfh_clear_intr(mp2);
> +		if (rc != -EOPNOTSUPP)
> +			amd_sfh_clear_intr(mp2);
> +
>  		dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc);
>  		return;
>  	}
Benjamin Tissoires Oct. 4, 2024, 9:12 a.m. UTC | #2
On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing the three reporters and the regressions list]
> 
> On 03.10.24 18:04, Basavaraj Natikar wrote:
> > There is no need for additional cleanup, as all resources are managed.
> > Additionally, if no sensor is found, there will be no initialization of
> > HID devices. Therefore, return immediately if no sensor is detected.
> 
> I'm not a reviewer, so feel free to ignore the follow comment:
> 
> I think the patch description should mentioned that this bug caused
> Memory Errors / Page Faults / btrfs going read-only / btrfs disk
> corruption, as that is a crucial detail for later and downstreams that
> need to consider when deciding about backporting.
> 
> > Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331
> 
> Some reported-by tags IMHO would be appropriate to give credit; all
> three reporters already agreed to use their email address in public.
> 
> There meanwhile is also one comment in the bugzilla ticket that could be
> read as a tested-by tag.
> 
> Maybe a Link: to
> https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/
> might be appropriate as well.
> 
> Ohh, and participation in stable is optional, but given the severeness
> on the problem: would you maybe be willing to add a stable tag to the
> commit to ensure this is backported to affected stable series quickly?

Fully agree here. It would definitely help if the submitter of the patch
keeps track of all of these instead of relying on the maintainers or
Thorsten to do the tedious work.

I was about to apply the patch, but I still have one remark on the fix
itself.


> 
> Ciao, Thorsten
> 
> > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> > ---
> >  drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +--
> >  drivers/hid/amd-sfh-hid/amd_sfh_pcie.c   | 4 +++-
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> > index 4b59687ff5d8..3fcb971d5fda 100644
> > --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> > @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
> >  	    (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) {
> >  		dev_warn(dev, "Failed to discover, sensors not enabled is %d\n",
> >  			 cl_data->is_any_sensor_enabled);
> > -		rc = -EOPNOTSUPP;
> > -		goto cleanup;
> > +		return -EOPNOTSUPP;

so cleanup is doing:
cleanup:
	amd_sfh_hid_client_deinit(privdata);
	for (i = 0; i < cl_data->num_hid_devices; i++) {
		devm_kfree(dev, cl_data->feature_report[i]);
		devm_kfree(dev, in_data->input_report[i]);
		devm_kfree(dev, cl_data->report_descr[i]);
	}
	return rc;

Would that means that the memory corruption appears during
amd_sfh_hid_client_deinit(), or...

> >  	}
> >  
> >  	for (i = 0; i < cl_data->num_hid_devices; i++) {
> > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> > index 0c28ca349bcd..1300f122b524 100644
> > --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
> > @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work)
> >  
> >  	rc = amd_sfh_hid_client_init(mp2);
> >  	if (rc) {
> > -		amd_sfh_clear_intr(mp2);
> > +		if (rc != -EOPNOTSUPP)
> > +			amd_sfh_clear_intr(mp2);

... or during amd_sfh_clear_intr()?

This very much looks like a band-aid (I know it is because you can not
reproduce, not blaming anyone), but I'd like to know a little bit more
if the bug is not appearing anywhere else in the normal processing of
the driver itself.

Also a comment explaining why this is the only case where
amd_sfh_clear_intr() should not be called would be appreciated.

So all in all, I have a feeling one of these 2 functions is not making a
proper check and I would rather fix the root cause, not the symptoms.

Cheers,
Benjamin


PS: sorry, I know this is a long standing issue, but I'd rather not
paper over a bigger issue :/

> > +
> >  		dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc);
> >  		return;
> >  	}
>
Basavaraj Natikar Oct. 6, 2024, 12:40 p.m. UTC | #3
On 10/4/2024 2:42 PM, Benjamin Tissoires wrote:
> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [CCing the three reporters and the regressions list]
>>
>> On 03.10.24 18:04, Basavaraj Natikar wrote:
>>> There is no need for additional cleanup, as all resources are managed.
>>> Additionally, if no sensor is found, there will be no initialization of
>>> HID devices. Therefore, return immediately if no sensor is detected.
>> I'm not a reviewer, so feel free to ignore the follow comment:
>>
>> I think the patch description should mentioned that this bug caused
>> Memory Errors / Page Faults / btrfs going read-only / btrfs disk
>> corruption, as that is a crucial detail for later and downstreams that
>> need to consider when deciding about backporting.
>>
>>> Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization")
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331
>> Some reported-by tags IMHO would be appropriate to give credit; all
>> three reporters already agreed to use their email address in public.
>>
>> There meanwhile is also one comment in the bugzilla ticket that could be
>> read as a tested-by tag.
>>
>> Maybe a Link: to
>> https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/
>> might be appropriate as well.
>>
>> Ohh, and participation in stable is optional, but given the severeness
>> on the problem: would you maybe be willing to add a stable tag to the
>> commit to ensure this is backported to affected stable series quickly?
> Fully agree here. It would definitely help if the submitter of the patch
> keeps track of all of these instead of relying on the maintainers or
> Thorsten to do the tedious work.
>
> I was about to apply the patch, but I still have one remark on the fix
> itself.
>
>
>> Ciao, Thorsten
>>
>>> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>>> ---
>>>   drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +--
>>>   drivers/hid/amd-sfh-hid/amd_sfh_pcie.c   | 4 +++-
>>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> index 4b59687ff5d8..3fcb971d5fda 100644
>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
>>> @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
>>>   	    (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) {
>>>   		dev_warn(dev, "Failed to discover, sensors not enabled is %d\n",
>>>   			 cl_data->is_any_sensor_enabled);
>>> -		rc = -EOPNOTSUPP;
>>> -		goto cleanup;
>>> +		return -EOPNOTSUPP;
> so cleanup is doing:
> cleanup:
> 	amd_sfh_hid_client_deinit(privdata);
> 	for (i = 0; i < cl_data->num_hid_devices; i++) {
> 		devm_kfree(dev, cl_data->feature_report[i]);
> 		devm_kfree(dev, in_data->input_report[i]);
> 		devm_kfree(dev, cl_data->report_descr[i]);
> 	}
> 	return rc;
>
> Would that means that the memory corruption appears during
> amd_sfh_hid_client_deinit(), or...
>
>>>   	}
>>>   
>>>   	for (i = 0; i < cl_data->num_hid_devices; i++) {
>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> index 0c28ca349bcd..1300f122b524 100644
>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work)
>>>   
>>>   	rc = amd_sfh_hid_client_init(mp2);
>>>   	if (rc) {
>>> -		amd_sfh_clear_intr(mp2);
>>> +		if (rc != -EOPNOTSUPP)
>>> +			amd_sfh_clear_intr(mp2);
> ... or during amd_sfh_clear_intr()?
>
> This very much looks like a band-aid (I know it is because you can not
> reproduce, not blaming anyone), but I'd like to know a little bit more
> if the bug is not appearing anywhere else in the normal processing of
> the driver itself.
>
> Also a comment explaining why this is the only case where
> amd_sfh_clear_intr() should not be called would be appreciated.
>
> So all in all, I have a feeling one of these 2 functions is not making a
> proper check and I would rather fix the root cause, not the symptoms.

Sure Benjamin, I have added the latest cleanup patch in the bug ID to see
if that helps resolve the issue and to find the root cause analysis.

Thanks
--
Basavaraj
Thorsten Leemhuis Oct. 9, 2024, 10:20 a.m. UTC | #4
On 04.10.24 11:12, Benjamin Tissoires wrote:
> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote:
>
> so cleanup is doing:
> cleanup:
> 	amd_sfh_hid_client_deinit(privdata);
> 	for (i = 0; i < cl_data->num_hid_devices; i++) {
> 		devm_kfree(dev, cl_data->feature_report[i]);
> 		devm_kfree(dev, in_data->input_report[i]);
> 		devm_kfree(dev, cl_data->report_descr[i]);
> 	}
> 	return rc;
> 
> Would that means that the memory corruption appears during
> amd_sfh_hid_client_deinit(), or...
> 
>>>  	}
>>>  
>>>  	for (i = 0; i < cl_data->num_hid_devices; i++) {
>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> index 0c28ca349bcd..1300f122b524 100644
>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work)
>>>  
>>>  	rc = amd_sfh_hid_client_init(mp2);
>>>  	if (rc) {
>>> -		amd_sfh_clear_intr(mp2);
>>> +		if (rc != -EOPNOTSUPP)
>>> +			amd_sfh_clear_intr(mp2);
> 
> ... or during amd_sfh_clear_intr()?
> 
> This very much looks like a band-aid (I know it is because you can not
> reproduce, not blaming anyone), but I'd like to know a little bit more
> if the bug is not appearing anywhere else in the normal processing of
> the driver itself.
> 
> Also a comment explaining why this is the only case where
> amd_sfh_clear_intr() should not be called would be appreciated.
> 
> So all in all, I have a feeling one of these 2 functions is not making a
> proper check and I would rather fix the root cause, not the symptoms.
> 
> Cheers,
> Benjamin
> 
> PS: sorry, I know this is a long standing issue, but I'd rather not
> paper over a bigger issue :/

Yeah, me to!

There was a new patch in bugzilla and people tested it (see
https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but
nothing has happened now for 48 hours, so please allow me to ask:

Basavaraj Natikar, is there any hope that a proper fix will emerge soon?

Sorry, normally I would wait a lot longer before asking things like
this, but given the severeness of the problem I thought a inquiry for a
quick status update was in order.

Ciao, Thorsten
Basavaraj Natikar Oct. 9, 2024, 2:52 p.m. UTC | #5
On 10/9/2024 3:50 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 04.10.24 11:12, Benjamin Tissoires wrote:
>> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> so cleanup is doing:
>> cleanup:
>> 	amd_sfh_hid_client_deinit(privdata);
>> 	for (i = 0; i < cl_data->num_hid_devices; i++) {
>> 		devm_kfree(dev, cl_data->feature_report[i]);
>> 		devm_kfree(dev, in_data->input_report[i]);
>> 		devm_kfree(dev, cl_data->report_descr[i]);
>> 	}
>> 	return rc;
>>
>> Would that means that the memory corruption appears during
>> amd_sfh_hid_client_deinit(), or...
>>
>>>>   	}
>>>>   
>>>>   	for (i = 0; i < cl_data->num_hid_devices; i++) {
>>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>>> index 0c28ca349bcd..1300f122b524 100644
>>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
>>>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work)
>>>>   
>>>>   	rc = amd_sfh_hid_client_init(mp2);
>>>>   	if (rc) {
>>>> -		amd_sfh_clear_intr(mp2);
>>>> +		if (rc != -EOPNOTSUPP)
>>>> +			amd_sfh_clear_intr(mp2);
>> ... or during amd_sfh_clear_intr()?
>>
>> This very much looks like a band-aid (I know it is because you can not
>> reproduce, not blaming anyone), but I'd like to know a little bit more
>> if the bug is not appearing anywhere else in the normal processing of
>> the driver itself.
>>
>> Also a comment explaining why this is the only case where
>> amd_sfh_clear_intr() should not be called would be appreciated.
>>
>> So all in all, I have a feeling one of these 2 functions is not making a
>> proper check and I would rather fix the root cause, not the symptoms.
>>
>> Cheers,
>> Benjamin
>>
>> PS: sorry, I know this is a long standing issue, but I'd rather not
>> paper over a bigger issue :/
> Yeah, me to!
>
> There was a new patch in bugzilla and people tested it (see
> https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but
> nothing has happened now for 48 hours, so please allow me to ask:
>
> Basavaraj Natikar, is there any hope that a proper fix will emerge soon?

Yes , please find link for the fix as mentioned in bugid.

https://lore.kernel.org/all/20241009144757.3577625-1-Basavaraj.Natikar@amd.com/

Thanks,
--
Basavaraj
Thorsten Leemhuis Oct. 9, 2024, 3:32 p.m. UTC | #6
On 09.10.24 16:52, Basavaraj Natikar wrote:
> On 10/9/2024 3:50 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 04.10.24 11:12, Benjamin Tissoires wrote:
>>> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>
>>> PS: sorry, I know this is a long standing issue, but I'd rather not
>>> paper over a bigger issue :/
>
>> Yeah, me to!
>>
>> There was a new patch in bugzilla and people tested it (see
>> https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but
>> nothing has happened now for 48 hours, so please allow me to ask:
>>
>> Basavaraj Natikar, is there any hope that a proper fix will emerge soon?
> 
> Yes , please find link for the fix as mentioned in bugid.
> 
> https://lore.kernel.org/all/20241009144757.3577625-1-
> Basavaraj.Natikar@amd.com/

Great, many thx!

Nitpicking: still no stable tag though. :-/ And ideally Richard and
Skyler would get a "Reported-by" as well.

Ciao, Thorsten
Jiri Kosina Oct. 9, 2024, 3:34 p.m. UTC | #7
On Wed, 9 Oct 2024, Linux regression tracking (Thorsten Leemhuis) wrote:

> >>> PS: sorry, I know this is a long standing issue, but I'd rather not
> >>> paper over a bigger issue :/
> >
> >> Yeah, me to!
> >>
> >> There was a new patch in bugzilla and people tested it (see
> >> https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but
> >> nothing has happened now for 48 hours, so please allow me to ask:
> >>
> >> Basavaraj Natikar, is there any hope that a proper fix will emerge soon?
> > 
> > Yes , please find link for the fix as mentioned in bugid.
> > 
> > https://lore.kernel.org/all/20241009144757.3577625-1-
> > Basavaraj.Natikar@amd.com/
> 
> Great, many thx!
> 
> Nitpicking: still no stable tag though. :-/ And ideally Richard and
> Skyler would get a "Reported-by" as well.

I don't care all that much about stable tag, but if there are more 
Reported-by:s, I like to give credit where it's due, and I don't think 
rebasing for-6.12/upstream-fixes because of that would break anybody.

So please feel free to send additional tags, and I'll rewrite it.
diff mbox series

Patch

diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
index 4b59687ff5d8..3fcb971d5fda 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
@@ -297,8 +297,7 @@  int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
 	    (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) {
 		dev_warn(dev, "Failed to discover, sensors not enabled is %d\n",
 			 cl_data->is_any_sensor_enabled);
-		rc = -EOPNOTSUPP;
-		goto cleanup;
+		return -EOPNOTSUPP;
 	}
 
 	for (i = 0; i < cl_data->num_hid_devices; i++) {
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
index 0c28ca349bcd..1300f122b524 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c
@@ -351,7 +351,9 @@  static void sfh_init_work(struct work_struct *work)
 
 	rc = amd_sfh_hid_client_init(mp2);
 	if (rc) {
-		amd_sfh_clear_intr(mp2);
+		if (rc != -EOPNOTSUPP)
+			amd_sfh_clear_intr(mp2);
+
 		dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc);
 		return;
 	}