Message ID | 20241003160454.3017229-1-Basavaraj.Natikar@amd.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Jiri Kosina |
Headers | show |
Series | HID: amd_sfh: Return immediately if no sensor is found | expand |
[CCing the three reporters and the regressions list] On 03.10.24 18:04, Basavaraj Natikar wrote: > There is no need for additional cleanup, as all resources are managed. > Additionally, if no sensor is found, there will be no initialization of > HID devices. Therefore, return immediately if no sensor is detected. I'm not a reviewer, so feel free to ignore the follow comment: I think the patch description should mentioned that this bug caused Memory Errors / Page Faults / btrfs going read-only / btrfs disk corruption, as that is a crucial detail for later and downstreams that need to consider when deciding about backporting. > Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 Some reported-by tags IMHO would be appropriate to give credit; all three reporters already agreed to use their email address in public. There meanwhile is also one comment in the bugzilla ticket that could be read as a tested-by tag. Maybe a Link: to https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ might be appropriate as well. Ohh, and participation in stable is optional, but given the severeness on the problem: would you maybe be willing to add a stable tag to the commit to ensure this is backported to affected stable series quickly? Ciao, Thorsten > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> > --- > drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- > drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > index 4b59687ff5d8..3fcb971d5fda 100644 > --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) > (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { > dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", > cl_data->is_any_sensor_enabled); > - rc = -EOPNOTSUPP; > - goto cleanup; > + return -EOPNOTSUPP; > } > > for (i = 0; i < cl_data->num_hid_devices; i++) { > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > index 0c28ca349bcd..1300f122b524 100644 > --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) > > rc = amd_sfh_hid_client_init(mp2); > if (rc) { > - amd_sfh_clear_intr(mp2); > + if (rc != -EOPNOTSUPP) > + amd_sfh_clear_intr(mp2); > + > dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); > return; > }
On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: > [CCing the three reporters and the regressions list] > > On 03.10.24 18:04, Basavaraj Natikar wrote: > > There is no need for additional cleanup, as all resources are managed. > > Additionally, if no sensor is found, there will be no initialization of > > HID devices. Therefore, return immediately if no sensor is detected. > > I'm not a reviewer, so feel free to ignore the follow comment: > > I think the patch description should mentioned that this bug caused > Memory Errors / Page Faults / btrfs going read-only / btrfs disk > corruption, as that is a crucial detail for later and downstreams that > need to consider when deciding about backporting. > > > Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 > > Some reported-by tags IMHO would be appropriate to give credit; all > three reporters already agreed to use their email address in public. > > There meanwhile is also one comment in the bugzilla ticket that could be > read as a tested-by tag. > > Maybe a Link: to > https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ > might be appropriate as well. > > Ohh, and participation in stable is optional, but given the severeness > on the problem: would you maybe be willing to add a stable tag to the > commit to ensure this is backported to affected stable series quickly? Fully agree here. It would definitely help if the submitter of the patch keeps track of all of these instead of relying on the maintainers or Thorsten to do the tedious work. I was about to apply the patch, but I still have one remark on the fix itself. > > Ciao, Thorsten > > > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> > > --- > > drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- > > drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > index 4b59687ff5d8..3fcb971d5fda 100644 > > --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c > > @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) > > (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { > > dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", > > cl_data->is_any_sensor_enabled); > > - rc = -EOPNOTSUPP; > > - goto cleanup; > > + return -EOPNOTSUPP; so cleanup is doing: cleanup: amd_sfh_hid_client_deinit(privdata); for (i = 0; i < cl_data->num_hid_devices; i++) { devm_kfree(dev, cl_data->feature_report[i]); devm_kfree(dev, in_data->input_report[i]); devm_kfree(dev, cl_data->report_descr[i]); } return rc; Would that means that the memory corruption appears during amd_sfh_hid_client_deinit(), or... > > } > > > > for (i = 0; i < cl_data->num_hid_devices; i++) { > > diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > index 0c28ca349bcd..1300f122b524 100644 > > --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c > > @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) > > > > rc = amd_sfh_hid_client_init(mp2); > > if (rc) { > > - amd_sfh_clear_intr(mp2); > > + if (rc != -EOPNOTSUPP) > > + amd_sfh_clear_intr(mp2); ... or during amd_sfh_clear_intr()? This very much looks like a band-aid (I know it is because you can not reproduce, not blaming anyone), but I'd like to know a little bit more if the bug is not appearing anywhere else in the normal processing of the driver itself. Also a comment explaining why this is the only case where amd_sfh_clear_intr() should not be called would be appreciated. So all in all, I have a feeling one of these 2 functions is not making a proper check and I would rather fix the root cause, not the symptoms. Cheers, Benjamin PS: sorry, I know this is a long standing issue, but I'd rather not paper over a bigger issue :/ > > + > > dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); > > return; > > } >
On 10/4/2024 2:42 PM, Benjamin Tissoires wrote: > On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: >> [CCing the three reporters and the regressions list] >> >> On 03.10.24 18:04, Basavaraj Natikar wrote: >>> There is no need for additional cleanup, as all resources are managed. >>> Additionally, if no sensor is found, there will be no initialization of >>> HID devices. Therefore, return immediately if no sensor is detected. >> I'm not a reviewer, so feel free to ignore the follow comment: >> >> I think the patch description should mentioned that this bug caused >> Memory Errors / Page Faults / btrfs going read-only / btrfs disk >> corruption, as that is a crucial detail for later and downstreams that >> need to consider when deciding about backporting. >> >>> Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 >> Some reported-by tags IMHO would be appropriate to give credit; all >> three reporters already agreed to use their email address in public. >> >> There meanwhile is also one comment in the bugzilla ticket that could be >> read as a tested-by tag. >> >> Maybe a Link: to >> https://lore.kernel.org/all/90f6ee64-df5e-43b2-ad04-fa3a35efc1d5@leemhuis.info/ >> might be appropriate as well. >> >> Ohh, and participation in stable is optional, but given the severeness >> on the problem: would you maybe be willing to add a stable tag to the >> commit to ensure this is backported to affected stable series quickly? > Fully agree here. It would definitely help if the submitter of the patch > keeps track of all of these instead of relying on the maintainers or > Thorsten to do the tedious work. > > I was about to apply the patch, but I still have one remark on the fix > itself. > > >> Ciao, Thorsten >> >>> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> >>> --- >>> drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- >>> drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- >>> 2 files changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> index 4b59687ff5d8..3fcb971d5fda 100644 >>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c >>> @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) >>> (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { >>> dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", >>> cl_data->is_any_sensor_enabled); >>> - rc = -EOPNOTSUPP; >>> - goto cleanup; >>> + return -EOPNOTSUPP; > so cleanup is doing: > cleanup: > amd_sfh_hid_client_deinit(privdata); > for (i = 0; i < cl_data->num_hid_devices; i++) { > devm_kfree(dev, cl_data->feature_report[i]); > devm_kfree(dev, in_data->input_report[i]); > devm_kfree(dev, cl_data->report_descr[i]); > } > return rc; > > Would that means that the memory corruption appears during > amd_sfh_hid_client_deinit(), or... > >>> } >>> >>> for (i = 0; i < cl_data->num_hid_devices; i++) { >>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> index 0c28ca349bcd..1300f122b524 100644 >>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) >>> >>> rc = amd_sfh_hid_client_init(mp2); >>> if (rc) { >>> - amd_sfh_clear_intr(mp2); >>> + if (rc != -EOPNOTSUPP) >>> + amd_sfh_clear_intr(mp2); > ... or during amd_sfh_clear_intr()? > > This very much looks like a band-aid (I know it is because you can not > reproduce, not blaming anyone), but I'd like to know a little bit more > if the bug is not appearing anywhere else in the normal processing of > the driver itself. > > Also a comment explaining why this is the only case where > amd_sfh_clear_intr() should not be called would be appreciated. > > So all in all, I have a feeling one of these 2 functions is not making a > proper check and I would rather fix the root cause, not the symptoms. Sure Benjamin, I have added the latest cleanup patch in the bug ID to see if that helps resolve the issue and to find the root cause analysis. Thanks -- Basavaraj
On 04.10.24 11:12, Benjamin Tissoires wrote: > On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: > > so cleanup is doing: > cleanup: > amd_sfh_hid_client_deinit(privdata); > for (i = 0; i < cl_data->num_hid_devices; i++) { > devm_kfree(dev, cl_data->feature_report[i]); > devm_kfree(dev, in_data->input_report[i]); > devm_kfree(dev, cl_data->report_descr[i]); > } > return rc; > > Would that means that the memory corruption appears during > amd_sfh_hid_client_deinit(), or... > >>> } >>> >>> for (i = 0; i < cl_data->num_hid_devices; i++) { >>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> index 0c28ca349bcd..1300f122b524 100644 >>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) >>> >>> rc = amd_sfh_hid_client_init(mp2); >>> if (rc) { >>> - amd_sfh_clear_intr(mp2); >>> + if (rc != -EOPNOTSUPP) >>> + amd_sfh_clear_intr(mp2); > > ... or during amd_sfh_clear_intr()? > > This very much looks like a band-aid (I know it is because you can not > reproduce, not blaming anyone), but I'd like to know a little bit more > if the bug is not appearing anywhere else in the normal processing of > the driver itself. > > Also a comment explaining why this is the only case where > amd_sfh_clear_intr() should not be called would be appreciated. > > So all in all, I have a feeling one of these 2 functions is not making a > proper check and I would rather fix the root cause, not the symptoms. > > Cheers, > Benjamin > > PS: sorry, I know this is a long standing issue, but I'd rather not > paper over a bigger issue :/ Yeah, me to! There was a new patch in bugzilla and people tested it (see https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but nothing has happened now for 48 hours, so please allow me to ask: Basavaraj Natikar, is there any hope that a proper fix will emerge soon? Sorry, normally I would wait a lot longer before asking things like this, but given the severeness of the problem I thought a inquiry for a quick status update was in order. Ciao, Thorsten
On 10/9/2024 3:50 PM, Linux regression tracking (Thorsten Leemhuis) wrote: > On 04.10.24 11:12, Benjamin Tissoires wrote: >> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: >> >> so cleanup is doing: >> cleanup: >> amd_sfh_hid_client_deinit(privdata); >> for (i = 0; i < cl_data->num_hid_devices; i++) { >> devm_kfree(dev, cl_data->feature_report[i]); >> devm_kfree(dev, in_data->input_report[i]); >> devm_kfree(dev, cl_data->report_descr[i]); >> } >> return rc; >> >> Would that means that the memory corruption appears during >> amd_sfh_hid_client_deinit(), or... >> >>>> } >>>> >>>> for (i = 0; i < cl_data->num_hid_devices; i++) { >>>> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>>> index 0c28ca349bcd..1300f122b524 100644 >>>> --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>>> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c >>>> @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) >>>> >>>> rc = amd_sfh_hid_client_init(mp2); >>>> if (rc) { >>>> - amd_sfh_clear_intr(mp2); >>>> + if (rc != -EOPNOTSUPP) >>>> + amd_sfh_clear_intr(mp2); >> ... or during amd_sfh_clear_intr()? >> >> This very much looks like a band-aid (I know it is because you can not >> reproduce, not blaming anyone), but I'd like to know a little bit more >> if the bug is not appearing anywhere else in the normal processing of >> the driver itself. >> >> Also a comment explaining why this is the only case where >> amd_sfh_clear_intr() should not be called would be appreciated. >> >> So all in all, I have a feeling one of these 2 functions is not making a >> proper check and I would rather fix the root cause, not the symptoms. >> >> Cheers, >> Benjamin >> >> PS: sorry, I know this is a long standing issue, but I'd rather not >> paper over a bigger issue :/ > Yeah, me to! > > There was a new patch in bugzilla and people tested it (see > https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but > nothing has happened now for 48 hours, so please allow me to ask: > > Basavaraj Natikar, is there any hope that a proper fix will emerge soon? Yes , please find link for the fix as mentioned in bugid. https://lore.kernel.org/all/20241009144757.3577625-1-Basavaraj.Natikar@amd.com/ Thanks, -- Basavaraj
On 09.10.24 16:52, Basavaraj Natikar wrote: > On 10/9/2024 3:50 PM, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 04.10.24 11:12, Benjamin Tissoires wrote: >>> On Oct 04 2024, Linux regression tracking (Thorsten Leemhuis) wrote: >>> >>> PS: sorry, I know this is a long standing issue, but I'd rather not >>> paper over a bigger issue :/ > >> Yeah, me to! >> >> There was a new patch in bugzilla and people tested it (see >> https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but >> nothing has happened now for 48 hours, so please allow me to ask: >> >> Basavaraj Natikar, is there any hope that a proper fix will emerge soon? > > Yes , please find link for the fix as mentioned in bugid. > > https://lore.kernel.org/all/20241009144757.3577625-1- > Basavaraj.Natikar@amd.com/ Great, many thx! Nitpicking: still no stable tag though. :-/ And ideally Richard and Skyler would get a "Reported-by" as well. Ciao, Thorsten
On Wed, 9 Oct 2024, Linux regression tracking (Thorsten Leemhuis) wrote: > >>> PS: sorry, I know this is a long standing issue, but I'd rather not > >>> paper over a bigger issue :/ > > > >> Yeah, me to! > >> > >> There was a new patch in bugzilla and people tested it (see > >> https://bugzilla.kernel.org/show_bug.cgi?id=219331 for details) , but > >> nothing has happened now for 48 hours, so please allow me to ask: > >> > >> Basavaraj Natikar, is there any hope that a proper fix will emerge soon? > > > > Yes , please find link for the fix as mentioned in bugid. > > > > https://lore.kernel.org/all/20241009144757.3577625-1- > > Basavaraj.Natikar@amd.com/ > > Great, many thx! > > Nitpicking: still no stable tag though. :-/ And ideally Richard and > Skyler would get a "Reported-by" as well. I don't care all that much about stable tag, but if there are more Reported-by:s, I like to give credit where it's due, and I don't think rebasing for-6.12/upstream-fixes because of that would break anybody. So please feel free to send additional tags, and I'll rewrite it.
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c index 4b59687ff5d8..3fcb971d5fda 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c @@ -297,8 +297,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) (mp2_ops->discovery_status && mp2_ops->discovery_status(privdata) == 0)) { dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", cl_data->is_any_sensor_enabled); - rc = -EOPNOTSUPP; - goto cleanup; + return -EOPNOTSUPP; } for (i = 0; i < cl_data->num_hid_devices; i++) { diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c index 0c28ca349bcd..1300f122b524 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_pcie.c @@ -351,7 +351,9 @@ static void sfh_init_work(struct work_struct *work) rc = amd_sfh_hid_client_init(mp2); if (rc) { - amd_sfh_clear_intr(mp2); + if (rc != -EOPNOTSUPP) + amd_sfh_clear_intr(mp2); + dev_err(&pdev->dev, "amd_sfh_hid_client_init failed err %d\n", rc); return; }
There is no need for additional cleanup, as all resources are managed. Additionally, if no sensor is found, there will be no initialization of HID devices. Therefore, return immediately if no sensor is detected. Fixes: 8031b001da70 ("HID: amd_sfh: Move sensor discovery before HID device initialization") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331 Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> --- drivers/hid/amd-sfh-hid/amd_sfh_client.c | 3 +-- drivers/hid/amd-sfh-hid/amd_sfh_pcie.c | 4 +++- 2 files changed, 4 insertions(+), 3 deletions(-)