mbox series

[V5,0/6] firmware: arm_scmi: Misc Fixes

Message ID 20241030125512.2884761-1-quic_sibis@quicinc.com (mailing list archive)
Headers show
Series firmware: arm_scmi: Misc Fixes | expand

Message

Sibi Sankar Oct. 30, 2024, 12:55 p.m. UTC
The series addresses the kernel warnings reported by Johan at [1] and are
are required to X1E cpufreq device tree changes to land.

[1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/

Duplicate levels:
arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15

^^ exist because SCP reports duplicate values for the highest sustainable
freq for perf domains 1 and 2. These are the only freqs that appear as
duplicates and will be fixed with a firmware update. FWIW the warnings
that we are addressing in this series will also get fixed by a firmware
update but they still have to land for devices already out in the wild.

V4:
* Rework debugfs node creation patch [Ulf/Dmitry]
* Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
* Add cc stable and err logs to patch 1 commit message [Johan]

V3:
* Pick up R-b, T-b from the list.
* Pick up the updated patch from Cristian for skipping opps.
* Update device names only when a name collision occurs [Dmitry/Ulf]
* Drop Johan's T-b from "fix debugfs node creation failure"
* Move scmi_protocol_msg_check to the top [Sudeep]

V2:
* Include the fix for do_xfer timeout
* Include the fix debugfs node creation failure
* Include Cristian's fix for skipping opp duplication

V1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.

base: next-20241029

Cristian Marussi (1):
  firmware: arm_scmi: Skip opp duplicates

Sibi Sankar (5):
  firmware: arm_scmi: Ensure that the message-id supports fastchannel
  firmware: arm_scmi: Report duplicate opps as firmware bugs
  pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
  pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
  mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag

 drivers/firmware/arm_scmi/driver.c      | 72 +++++++++++++------------
 drivers/firmware/arm_scmi/perf.c        | 44 ++++++++++-----
 drivers/firmware/arm_scmi/protocols.h   |  2 +
 drivers/mailbox/qcom-cpucp-mbox.c       |  2 +-
 drivers/pmdomain/arm/scmi_perf_domain.c |  3 +-
 drivers/pmdomain/core.c                 | 49 +++++++++++------
 include/linux/pm_domain.h               |  6 +++
 7 files changed, 116 insertions(+), 62 deletions(-)

Comments

Ulf Hansson Oct. 30, 2024, 4:19 p.m. UTC | #1
On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
>
> The series addresses the kernel warnings reported by Johan at [1] and are
> are required to X1E cpufreq device tree changes to land.
>
> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>
> Duplicate levels:
> arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
>
> ^^ exist because SCP reports duplicate values for the highest sustainable
> freq for perf domains 1 and 2. These are the only freqs that appear as
> duplicates and will be fixed with a firmware update. FWIW the warnings
> that we are addressing in this series will also get fixed by a firmware
> update but they still have to land for devices already out in the wild.
>
> V4:
> * Rework debugfs node creation patch [Ulf/Dmitry]
> * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> * Add cc stable and err logs to patch 1 commit message [Johan]

Patch4 and patch5 applied for fixes to my pmdomain tree - and by
adding a stable tag to them, thanks!

Potentially I could help to take the other patches too, to keep things
together, but in that case I need confirmation that's okay to do so.

[...]

Kind regards
Uffe
Cristian Marussi Oct. 30, 2024, 4:28 p.m. UTC | #2
On Wed, Oct 30, 2024 at 05:19:39PM +0100, Ulf Hansson wrote:
> On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> >
> > The series addresses the kernel warnings reported by Johan at [1] and are
> > are required to X1E cpufreq device tree changes to land.
> >
> > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >
> > Duplicate levels:
> > arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> > arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> > arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
> >
> > ^^ exist because SCP reports duplicate values for the highest sustainable
> > freq for perf domains 1 and 2. These are the only freqs that appear as
> > duplicates and will be fixed with a firmware update. FWIW the warnings
> > that we are addressing in this series will also get fixed by a firmware
> > update but they still have to land for devices already out in the wild.
> >
> > V4:
> > * Rework debugfs node creation patch [Ulf/Dmitry]
> > * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> > * Add cc stable and err logs to patch 1 commit message [Johan]
> 
> Patch4 and patch5 applied for fixes to my pmdomain tree - and by
> adding a stable tag to them, thanks!
> 
> Potentially I could help to take the other patches too, to keep things
> together, but in that case I need confirmation that's okay to do so.

SCMI patches in these series are all reviewed (all but one even by Sudeep)
so it is really up to Sudeep preference...(who is travelling now so it could
take a bit to reply)...moreover I am not sure if the SCMI patches in this
series could end up with wome trivial conflicts against the scmi patches
already queued at

	sudeep/for-next/scmi/updates

(at least the perf related ones 2 and 3 probably not)

Thanks,
Cristian
Sudeep Holla Nov. 6, 2024, 7:12 a.m. UTC | #3
On Wed, Oct 30, 2024 at 04:28:41PM +0000, Cristian Marussi wrote:
> On Wed, Oct 30, 2024 at 05:19:39PM +0100, Ulf Hansson wrote:
> > On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > >
> > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > are required to X1E cpufreq device tree changes to land.
> > >
> > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > >
> > > Duplicate levels:
> > > arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> > > arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> > > arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
> > >
> > > ^^ exist because SCP reports duplicate values for the highest sustainable
> > > freq for perf domains 1 and 2. These are the only freqs that appear as
> > > duplicates and will be fixed with a firmware update. FWIW the warnings
> > > that we are addressing in this series will also get fixed by a firmware
> > > update but they still have to land for devices already out in the wild.
> > >
> > > V4:
> > > * Rework debugfs node creation patch [Ulf/Dmitry]
> > > * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> > > * Add cc stable and err logs to patch 1 commit message [Johan]
> > 
> > Patch4 and patch5 applied for fixes to my pmdomain tree - and by
> > adding a stable tag to them, thanks!
> > 
> > Potentially I could help to take the other patches too, to keep things
> > together, but in that case I need confirmation that's okay to do so.
> 
> SCMI patches in these series are all reviewed (all but one even by Sudeep)
> so it is really up to Sudeep preference...(who is travelling now so it could
> take a bit to reply)

I have added my reviewed by now.

> ...moreover I am not sure if the SCMI patches in this
> series could end up with wome trivial conflicts against the scmi patches
> already queued at
> 
> 	sudeep/for-next/scmi/updates
> 
> (at least the perf related ones 2 and 3 probably not)
>

I did a quick check and no conflicts were observed. Let me know if you need
a branch with first 3 patches, but I need to do that today or after Sunday
as I will away from my computer for few more days again from tomorrow.

Let me know ASAP.
Ulf Hansson Nov. 12, 2024, 3:56 p.m. UTC | #4
On Wed, 6 Nov 2024 at 08:12, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Oct 30, 2024 at 04:28:41PM +0000, Cristian Marussi wrote:
> > On Wed, Oct 30, 2024 at 05:19:39PM +0100, Ulf Hansson wrote:
> > > On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > > >
> > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > are required to X1E cpufreq device tree changes to land.
> > > >
> > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > >
> > > > Duplicate levels:
> > > > arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> > > > arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> > > > arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
> > > >
> > > > ^^ exist because SCP reports duplicate values for the highest sustainable
> > > > freq for perf domains 1 and 2. These are the only freqs that appear as
> > > > duplicates and will be fixed with a firmware update. FWIW the warnings
> > > > that we are addressing in this series will also get fixed by a firmware
> > > > update but they still have to land for devices already out in the wild.
> > > >
> > > > V4:
> > > > * Rework debugfs node creation patch [Ulf/Dmitry]
> > > > * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> > > > * Add cc stable and err logs to patch 1 commit message [Johan]
> > >
> > > Patch4 and patch5 applied for fixes to my pmdomain tree - and by
> > > adding a stable tag to them, thanks!
> > >
> > > Potentially I could help to take the other patches too, to keep things
> > > together, but in that case I need confirmation that's okay to do so.
> >
> > SCMI patches in these series are all reviewed (all but one even by Sudeep)
> > so it is really up to Sudeep preference...(who is travelling now so it could
> > take a bit to reply)
>
> I have added my reviewed by now.
>
> > ...moreover I am not sure if the SCMI patches in this
> > series could end up with wome trivial conflicts against the scmi patches
> > already queued at
> >
> >       sudeep/for-next/scmi/updates
> >
> > (at least the perf related ones 2 and 3 probably not)
> >
>
> I did a quick check and no conflicts were observed. Let me know if you need
> a branch with first 3 patches, but I need to do that today or after Sunday
> as I will away from my computer for few more days again from tomorrow.
>
> Let me know ASAP.
>
> --
> Regards,
> Sudeep

Sorry for the delay. I have picked up the remaining patches from this
series. All applied for fixes and by adding stable tags to them,
thanks!

Kind regards
Uffe
Johan Hovold Nov. 12, 2024, 5:19 p.m. UTC | #5
On Tue, Nov 12, 2024 at 04:56:26PM +0100, Ulf Hansson wrote:
> On Wed, 6 Nov 2024 at 08:12, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > On Wed, Oct 30, 2024 at 04:28:41PM +0000, Cristian Marussi wrote:
> > > On Wed, Oct 30, 2024 at 05:19:39PM +0100, Ulf Hansson wrote:
> > > > On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > > > >
> > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > are required to X1E cpufreq device tree changes to land.
> > > > >
> > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > >
> > > > > Duplicate levels:
> > > > > arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> > > > > arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> > > > > arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
> > > > >
> > > > > ^^ exist because SCP reports duplicate values for the highest sustainable
> > > > > freq for perf domains 1 and 2. These are the only freqs that appear as
> > > > > duplicates and will be fixed with a firmware update. FWIW the warnings
> > > > > that we are addressing in this series will also get fixed by a firmware
> > > > > update but they still have to land for devices already out in the wild.
> > > > >
> > > > > V4:
> > > > > * Rework debugfs node creation patch [Ulf/Dmitry]
> > > > > * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> > > > > * Add cc stable and err logs to patch 1 commit message [Johan]

> Sorry for the delay. I have picked up the remaining patches from this
> series. All applied for fixes and by adding stable tags to them,
> thanks!

As I reported here:

	https://lore.kernel.org/lkml/ZyTQ9QD1tEkhQ9eu@hovoldconsulting.com/

I'm seeing a hard reset on the x1e80100 CRD and Lenovo ThinkPad T14s
when accessing the cpufreq sysfs attributes.

Sibi tracked it down to the first patch in this series ("firmware:
arm_scmi: Ensure that the message-id supports fastchannel") and
reverting that one indeed fixes the reset.

Unfortunately this was only discussed on IRC and was never reported in
this thread.

Ulf, could you please drop the first patch again until we've figured out
how best to handle this?

Johan
Ulf Hansson Nov. 12, 2024, 6:48 p.m. UTC | #6
On Tue, 12 Nov 2024 at 18:20, Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Nov 12, 2024 at 04:56:26PM +0100, Ulf Hansson wrote:
> > On Wed, 6 Nov 2024 at 08:12, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > On Wed, Oct 30, 2024 at 04:28:41PM +0000, Cristian Marussi wrote:
> > > > On Wed, Oct 30, 2024 at 05:19:39PM +0100, Ulf Hansson wrote:
> > > > > On Wed, 30 Oct 2024 at 13:55, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > > > > >
> > > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > > are required to X1E cpufreq device tree changes to land.
> > > > > >
> > > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > > >
> > > > > > Duplicate levels:
> > > > > > arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> > > > > > arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> > > > > > arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
> > > > > >
> > > > > > ^^ exist because SCP reports duplicate values for the highest sustainable
> > > > > > freq for perf domains 1 and 2. These are the only freqs that appear as
> > > > > > duplicates and will be fixed with a firmware update. FWIW the warnings
> > > > > > that we are addressing in this series will also get fixed by a firmware
> > > > > > update but they still have to land for devices already out in the wild.
> > > > > >
> > > > > > V4:
> > > > > > * Rework debugfs node creation patch [Ulf/Dmitry]
> > > > > > * Reduce report level to dev_info and tag it with FW_BUG [Johan/Dmitry]
> > > > > > * Add cc stable and err logs to patch 1 commit message [Johan]
>
> > Sorry for the delay. I have picked up the remaining patches from this
> > series. All applied for fixes and by adding stable tags to them,
> > thanks!
>
> As I reported here:
>
>         https://lore.kernel.org/lkml/ZyTQ9QD1tEkhQ9eu@hovoldconsulting.com/
>
> I'm seeing a hard reset on the x1e80100 CRD and Lenovo ThinkPad T14s
> when accessing the cpufreq sysfs attributes.
>
> Sibi tracked it down to the first patch in this series ("firmware:
> arm_scmi: Ensure that the message-id supports fastchannel") and
> reverting that one indeed fixes the reset.
>
> Unfortunately this was only discussed on IRC and was never reported in
> this thread.
>
> Ulf, could you please drop the first patch again until we've figured out
> how best to handle this?

Done, thanks!

Kind regards
Uffe