Message ID | 1547790380-6276-1-git-send-email-pramod.kumar@broadcom.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC,1/1] arm64: Use PSCI calls for CPU stop when hotplug is supported | expand |
On Fri, Jan 18, 2019 at 11:16:20AM +0530, Pramod Kumar wrote: > If CPU hotplug is supported, ipi_cpu_stop should use PSCI cpudie > call to stop the CPU. This call ensures L1/L2 cache flush, > CPUs cache-cohenrecy setting w.r.to interconnect. > Firstly, this is not specific to PSCI and I don't see any PSCI calls as $subject claims. Next, you fail to explain why do you have to ensure caches are cleaned and why do you need that in ipi_cpu_stop ? What's the use case ? > Apart from this, this gives control to f/w to reduce power consumption > by take appropriate decesion on power rails for plugging-out core. > May be, but ipi_cpu_stop is used is machine reboot/poweroff/halt which may restart or poweroff the system, powering down individual CPUs is not necessary and may consume lot of time in systems with large number of CPUs. It would be good to know the use-case in case I am missing to consider that. -- Regards, Sudeep
On Mon, Jan 21, 2019 at 11:30 AM Pramod Kumar <pramod.kumar@broadcom.com> wrote: > > > > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> wrote: >> >> Hi Sudeep, >> >> Thanks for having a glance on the patch. Please see my reply in line. >> >> Regards, >> Pramod >> >> On Fri, Jan 18, 2019 at 5:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote: >>> >>> On Fri, Jan 18, 2019 at 11:16:20AM +0530, Pramod Kumar wrote: >>> > If CPU hotplug is supported, ipi_cpu_stop should use PSCI cpudie >>> > call to stop the CPU. This call ensures L1/L2 cache flush, >>> > CPUs cache-cohenrecy setting w.r.to interconnect. >>> > >>> >>> Firstly, this is not specific to PSCI and I don't see any PSCI calls as >>> $subject claims. >> >> >> I had seen all the other cpu ops methods where only PSCI supports only cpu_die features hence >> used PSCI reference in my subject line to be more clear where this die gets converted in last. >> >>> >>> Next, you fail to explain why do you have to ensure >>> caches are cleaned and why do you need that in ipi_cpu_stop ? >>> What's the use case ? >>> >> >> Need comes from a specific use case where one Accelerator card(SoC) is plugged in a sever over a PCIe interface. This Card gets supply from a battery, which could provide very less power for a very small time, in case of any power loss. Once Card switches to battery, this has to reduce its power consumption to its lowest point and back-up the DDR contents asap before battery gets fully drained off. >> >> Since battery can provide limited power for a very short time hence need to transition to lowest power. As per the transition process , CPUs power domain has to be off but before that it needs to flush out its content to system memory(L3) so that content could be backed-up by a MCU, a controller consuming very less power. Since we can not afford plugging-out every individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs which ultimately switch to ATF to flush out all the CPUs caches and comes out of coherency domain so that its power rails could be switched-off. >> >>> >>> > Apart from this, this gives control to f/w to reduce power consumption >>> > by take appropriate decesion on power rails for plugging-out core. >>> > >>> >>> May be, but ipi_cpu_stop is used is machine reboot/poweroff/halt which >>> may restart or poweroff the system, powering down individual CPUs is not >>> necessary and may consume lot of time in systems with large number of CPUs. >>> It would be good to know the use-case in case I am missing to consider >>> that. >> >> >> Use case as explained above. >> >>> >>> -- >>> Regards, >>> Sudeep
On Mon, Jan 21, 2019 at 11:28:27AM +0530, Pramod Kumar wrote: > On Fri, Jan 18, 2019 at 5:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote: > > On Fri, Jan 18, 2019 at 11:16:20AM +0530, Pramod Kumar wrote: > > > If CPU hotplug is supported, ipi_cpu_stop should use PSCI cpudie > > > call to stop the CPU. This call ensures L1/L2 cache flush, > > > CPUs cache-cohenrecy setting w.r.to interconnect. > > > > Firstly, this is not specific to PSCI and I don't see any PSCI calls as > > $subject claims. > > I had seen all the other cpu ops methods where only PSCI supports only > cpu_die features hence used PSCI reference in my subject line to be more > clear where this die gets converted in last. > OK, but since you are not directly dealing with PSCI APIs, it's misleading. > > Next, you fail to explain why do you have to ensure > > caches are cleaned and why do you need that in ipi_cpu_stop ? > > What's the use case ? > > > Need comes from a specific use case where one Accelerator card(SoC) is > plugged in a sever over a PCIe interface. This Card gets supply from a > battery, which could provide very less power for a very small time, in case > of any power loss. Once Card switches to battery, this has to reduce its > power consumption to its lowest point and back-up the DDR contents asap > before battery gets fully drained off. > OK, but I was expecting the complete call path for this use-case. > Since battery can provide limited power for a very short time hence need to > transition to lowest power. As per the transition process , CPUs power > domain has to be off but before that it needs to flush out its content to > system memory(L3) so that content could be backed-up by a MCU, a controller > consuming very less power. Since we can not afford plugging-out every > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs So, you are randomly using ipi_cpu_stop for something it's not supposed to be used. What do you exactly want to achieve in that context where you need to save power ? Why system off or reset not sufficient ? -- Regards, Sudeep
Thanks Sudeep for reviewing. Please see my comment inline below. On Mon, Jan 21, 2019 at 4:52 PM Sudeep Holla <sudeep.holla@arm.com> wrote: > > On Mon, Jan 21, 2019 at 11:28:27AM +0530, Pramod Kumar wrote: > > On Fri, Jan 18, 2019 at 5:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote: > > > On Fri, Jan 18, 2019 at 11:16:20AM +0530, Pramod Kumar wrote: > > > > If CPU hotplug is supported, ipi_cpu_stop should use PSCI cpudie > > > > call to stop the CPU. This call ensures L1/L2 cache flush, > > > > CPUs cache-cohenrecy setting w.r.to interconnect. > > > > > > Firstly, this is not specific to PSCI and I don't see any PSCI calls as > > > $subject claims. > > > > I had seen all the other cpu ops methods where only PSCI supports only > > cpu_die features hence used PSCI reference in my subject line to be more > > clear where this die gets converted in last. > > > > OK, but since you are not directly dealing with PSCI APIs, it's misleading. > I can re-phrase my commit message which links explanation to PSCI. > > > Next, you fail to explain why do you have to ensure > > > caches are cleaned and why do you need that in ipi_cpu_stop ? > > > What's the use case ? > > > > > Need comes from a specific use case where one Accelerator card(SoC) is > > plugged in a sever over a PCIe interface. This Card gets supply from a > > battery, which could provide very less power for a very small time, in case > > of any power loss. Once Card switches to battery, this has to reduce its > > power consumption to its lowest point and back-up the DDR contents asap > > before battery gets fully drained off. > > > > OK, but I was expecting the complete call path for this use-case. OK. Once power loss happens, Application processor gets interrupt and it has to switch to MCU asap so that it can switch-off all irrelevant power domain to reduce battery drain. flow is like this- Power loss interrupt --> interrupt handler -> send IPI to stop all secondary CPUs-> switch primary CPU to ATF which ultimately transition to MCU where all power off action is taken. > > > Since battery can provide limited power for a very short time hence need to > > transition to lowest power. As per the transition process , CPUs power > > domain has to be off but before that it needs to flush out its content to > > system memory(L3) so that content could be backed-up by a MCU, a controller > > consuming very less power. Since we can not afford plugging-out every > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > > So, you are randomly using ipi_cpu_stop for something it's not supposed to > be used. What do you exactly want to achieve in that context where you need > to save power ? Need to back-up DDR from MCU which will require L1/L2 content gets flushed before CPUs getting shutdown. >Why system off or reset not sufficient ? System off or reset is not sufficient as we can not afford that much delay in transition. Flushing L1/L2 contents to System memory and Bringing out CPUs Clusters out of coherency domain before shutting down Clusters puts rest of system in sane state. If cluster are not being taken out properly from coherency domain before shutting down, it could lead system un-reliable for rest parts. > -- > Regards, > Sudeep >
On Wed, Jan 23, 2019 at 10:21:05AM +0530, Pramod Kumar wrote: > Thanks Sudeep for reviewing. Please see my comment inline below. > > On Mon, Jan 21, 2019 at 4:52 PM Sudeep Holla <sudeep.holla@arm.com> wrote: [...] > > > > OK, but I was expecting the complete call path for this use-case. > > OK. Once power loss happens, Application processor gets interrupt and > it has to switch to MCU asap so that it can switch-off all irrelevant > power domain to reduce battery drain. > flow is like this- > Power loss interrupt --> interrupt handler -> send IPI to stop all > secondary CPUs-> switch primary CPU to ATF which ultimately transition > to MCU where all power off action is taken. > So does the entire AP domain get powered off or is that the intention ? If so, just stopping secondaries through IPI is not sufficient. What happens to all the running tasks and other operations going on in the system. > > > > > Since battery can provide limited power for a very short time hence need to > > > transition to lowest power. As per the transition process , CPUs power > > > domain has to be off but before that it needs to flush out its content to > > > system memory(L3) so that content could be backed-up by a MCU, a controller > > > consuming very less power. Since we can not afford plugging-out every > > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > > > > So, you are randomly using ipi_cpu_stop for something it's not supposed to > > be used. What do you exactly want to achieve in that context where you need > > to save power ? > > Need to back-up DDR from MCU which will require L1/L2 content gets > flushed before CPUs getting shutdown. > Understood. > >Why system off or reset not sufficient ? > System off or reset is not sufficient as we can not afford that much > delay in transition. > But who will save the system state ? You may corrupt data at the cost of saving transition delay. So IMO, you don't have a valid use-case or you are failing to present one properly. > Flushing L1/L2 contents to System memory and Bringing out CPUs > Clusters out of coherency domain before shutting down Clusters puts > rest of system in sane state. If cluster are not being taken out > properly from coherency domain before shutting down, it could lead > system un-reliable for rest parts. > Looks like you are concerned only with CPUs in this scenario, but what about other devices in the domain. So suspend-to-{ram,idle} are only sane options to make sure you don't loose/corrupt any data/state. Further point me to the code either part of this series or upstream handling the "Power loss interrupt" explained above. -- Regards, Sudeep
On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> > wrote: > > Need comes from a specific use case where one Accelerator card(SoC) is > plugged in a sever over a PCIe interface. This Card gets supply from a > battery, which could provide very less power for a very small time, in case > of any power loss. Once Card switches to battery, this has to reduce its > power consumption to its lowest point and back-up the DDR contents asap > before battery gets fully drained off. In this example is Linux running on the server, or on the accelerator? What precisely are you trying to back up from DDR, and why? What is responsible for backing up that contents? > Since battery can provide limited power for a very short time hence need to > transition to lowest power. As per the transition process , CPUs power > domain has to be off but before that it needs to flush out its content to > system memory(L3) so that content could be backed-up by a MCU, a controller > consuming very less power. Since we can not afford plugging-out every > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > which ultimately switch to ATF to flush out all the CPUs caches and comes > out of coherency domain so that its power rails could be switched-off. If you're stopping CPUs from completely arbitrary states, what is the benefit of saving the RAM contents? CPUs might be running with IRQs disabled for an arbitrarily long time, so there's no guarantee that all of them will be turned off before power is lost. I'm very confused as to what you're trying to achieve here. Thanks, Mark.
On Fri, Jan 18, 2019 at 11:16:20AM +0530, Pramod Kumar wrote: > If CPU hotplug is supported, ipi_cpu_stop should use PSCI cpudie > call to stop the CPU. This call ensures L1/L2 cache flush, > CPUs cache-cohenrecy setting w.r.to interconnect. > > Apart from this, this gives control to f/w to reduce power consumption > by take appropriate decesion on power rails for plugging-out core. > > Signed-off-by: Pramod Kumar <pramod.kumar@broadcom.com> > Reviewed-by: Ray Jui <ray.jui@broadcom.com> > Reviewed-by: Scott Branden <scott.branden@broadcom.com> > --- > arch/arm64/kernel/smp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 1598d6f..360e52b 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -822,8 +822,13 @@ static void ipi_cpu_stop(unsigned int cpu) > local_daif_mask(); > sdei_mask_local_cpu(); > > +#ifdef CONFIG_HOTPLUG_CPU > + if (cpu_ops[cpu]->cpu_die) > + cpu_ops[cpu]->cpu_die(cpu); > +#else > while (1) > cpu_relax(); > +#endif If cpu_ops[cpu]->cpu_die is NULL, this change makes ipi_cpu_stop() return, which is not correct. Regardless, I don't think that there is sufficient rationale for this change, especially given that your commit message describes platform-specific assumptions which do not hold in general. Thanks, Mark.
Hi Mark, Hopefully I can shed some light on the use case inline. On 2019-01-23 8:48 a.m., Mark Rutland wrote: > On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: >> On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> >> wrote: >> >> Need comes from a specific use case where one Accelerator card(SoC) is >> plugged in a sever over a PCIe interface. This Card gets supply from a >> battery, which could provide very less power for a very small time, in case >> of any power loss. Once Card switches to battery, this has to reduce its >> power consumption to its lowest point and back-up the DDR contents asap >> before battery gets fully drained off. > In this example is Linux running on the server, or on the accelerator? Accelerator > > What precisely are you trying to back up from DDR, and why? Data in DDR is being written to disk at this time (disk is connected to accelerator) > > What is responsible for backing up that contents? A low power M-class processor and DMA engine which continues necessary operations to transfer DDR memory to disk. The high power processors on the accelerator running linux needed to be halted ASAP on this power loss event and M0 take over. Graceful shutdown of linux and other peripherals is unnecessary (and we don't have the power necessary to do so). > >> Since battery can provide limited power for a very short time hence need to >> transition to lowest power. As per the transition process , CPUs power >> domain has to be off but before that it needs to flush out its content to >> system memory(L3) so that content could be backed-up by a MCU, a controller >> consuming very less power. Since we can not afford plugging-out every >> individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs >> which ultimately switch to ATF to flush out all the CPUs caches and comes >> out of coherency domain so that its power rails could be switched-off. > If you're stopping CPUs from completely arbitrary states, what is the > benefit of saving the RAM contents? Some of the RAM contains data that was in the process of being written to disk by the accelerator. This data must be saved to disk and the high power CPUs consume too much power to continue performing this operation. > > CPUs might be running with IRQs disabled for an arbitrarily long time, In an embedded linux system we control everything running. But good point - we'll need to ensure interrupts are not disabled for too long. > so there's no guarantee that all of them will be turned off before power > is lost. > > I'm very confused as to what you're trying to achieve here. > > Thanks, > Mark. Regards, Scott
On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: > Hi Mark, > > Hopefully I can shed some light on the use case inline. > > On 2019-01-23 8:48 a.m., Mark Rutland wrote: > > On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: > > > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> > > > wrote: > > > > > > Need comes from a specific use case where one Accelerator card(SoC) is > > > plugged in a sever over a PCIe interface. This Card gets supply from a > > > battery, which could provide very less power for a very small time, in case > > > of any power loss. Once Card switches to battery, this has to reduce its > > > power consumption to its lowest point and back-up the DDR contents asap > > > before battery gets fully drained off. > > In this example is Linux running on the server, or on the accelerator? > Accelerator > > > > What precisely are you trying to back up from DDR, and why? > Data in DDR is being written to disk at this time (disk is connected to > accelerator) > > > > What is responsible for backing up that contents? > > A low power M-class processor and DMA engine which continues necessary > operations to transfer DDR memory to disk. > > The high power processors on the accelerator running linux needed to be > halted ASAP on this power loss event and M0 take over. Graceful shutdown of > linux and other peripherals is unnecessary (and we don't have the power > necessary to do so). > It may be unnecessary for your use-case, but not recommended. > > > > > Since battery can provide limited power for a very short time hence need to > > > transition to lowest power. As per the transition process , CPUs power > > > domain has to be off but before that it needs to flush out its content to > > > system memory(L3) so that content could be backed-up by a MCU, a controller > > > consuming very less power. Since we can not afford plugging-out every > > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > > > which ultimately switch to ATF to flush out all the CPUs caches and comes > > > out of coherency domain so that its power rails could be switched-off. > > If you're stopping CPUs from completely arbitrary states, what is the > > benefit of saving the RAM contents? > > Some of the RAM contains data that was in the process of being written to > disk by the accelerator. > > This data must be saved to disk and the high power CPUs consume too much > power to continue performing this operation. > Why will suspend to ram or idle not work ? It will power off the secondaries which this patch is trying to achieve, but in more sane way so that no data/state is lost/corrupted as I stated earlier. > > > > CPUs might be running with IRQs disabled for an arbitrarily long time, > > In an embedded linux system we control everything running. > By which I assume you have patches to do all sorts of things to make this work and this patch standalone is of no use :) I don't like this as it's not scalable to big systems as this is in the same code path as system off/reset. -- Regards, Sudeep
Hi Sudeep, On 2019-01-23 9:21 a.m., Sudeep Holla wrote: > On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: >> Hi Mark, >> >> Hopefully I can shed some light on the use case inline. >> >> On 2019-01-23 8:48 a.m., Mark Rutland wrote: >>> On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: >>>> On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> >>>> wrote: >>>> >>>> Need comes from a specific use case where one Accelerator card(SoC) is >>>> plugged in a sever over a PCIe interface. This Card gets supply from a >>>> battery, which could provide very less power for a very small time, in case >>>> of any power loss. Once Card switches to battery, this has to reduce its >>>> power consumption to its lowest point and back-up the DDR contents asap >>>> before battery gets fully drained off. >>> In this example is Linux running on the server, or on the accelerator? >> Accelerator >>> What precisely are you trying to back up from DDR, and why? >> Data in DDR is being written to disk at this time (disk is connected to >> accelerator) >>> What is responsible for backing up that contents? >> A low power M-class processor and DMA engine which continues necessary >> operations to transfer DDR memory to disk. >> >> The high power processors on the accelerator running linux needed to be >> halted ASAP on this power loss event and M0 take over. Graceful shutdown of >> linux and other peripherals is unnecessary (and we don't have the power >> necessary to do so). >> > It may be unnecessary for your use-case, but not recommended. No choice - we don't have the time/power for a graceful shutdown. > >>>> Since battery can provide limited power for a very short time hence need to >>>> transition to lowest power. As per the transition process , CPUs power >>>> domain has to be off but before that it needs to flush out its content to >>>> system memory(L3) so that content could be backed-up by a MCU, a controller >>>> consuming very less power. Since we can not afford plugging-out every >>>> individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs >>>> which ultimately switch to ATF to flush out all the CPUs caches and comes >>>> out of coherency domain so that its power rails could be switched-off. >>> If you're stopping CPUs from completely arbitrary states, what is the >>> benefit of saving the RAM contents? >> Some of the RAM contains data that was in the process of being written to >> disk by the accelerator. >> >> This data must be saved to disk and the high power CPUs consume too much >> power to continue performing this operation. >> > Why will suspend to ram or idle not work ? It will power off the secondaries > which this patch is trying to achieve, but in more sane way so that no > data/state is lost/corrupted as I stated earlier. We need to take over control of the disk write operations. There is no time/power available to leave linux running. > >>> CPUs might be running with IRQs disabled for an arbitrarily long time, >> In an embedded linux system we control everything running. >> > By which I assume you have patches to do all sorts of things to make this > work and this patch standalone is of no use :) I believe other patch is in a standalone driver to be upstreamed. Remainder of code is on a standalone M0 processor not running linux. > > I don't like this as it's not scalable to big systems as this is in the > same code path as system off/reset. Many things are not used by big systems and vice versa. What do you suggest is done otherwise? > > -- > Regards, > Sudeep Regards, Scott
On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: > Hi Mark, > > Hopefully I can shed some light on the use case inline. > > On 2019-01-23 8:48 a.m., Mark Rutland wrote: > > On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: > > > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> > > > wrote: > > > > > > Need comes from a specific use case where one Accelerator card(SoC) is > > > plugged in a sever over a PCIe interface. This Card gets supply from a > > > battery, which could provide very less power for a very small time, in case > > > of any power loss. Once Card switches to battery, this has to reduce its > > > power consumption to its lowest point and back-up the DDR contents asap > > > before battery gets fully drained off. > > In this example is Linux running on the server, or on the accelerator? > Accelerator > > > > What precisely are you trying to back up from DDR, and why? > Data in DDR is being written to disk at this time (disk is connected to > accelerator) > > > > What is responsible for backing up that contents? > > A low power M-class processor and DMA engine which continues necessary > operations to transfer DDR memory to disk. > > The high power processors on the accelerator running linux needed to be > halted ASAP on this power loss event and M0 take over. Graceful shutdown of > linux and other peripherals is unnecessary (and we don't have the power > necessary to do so). If graceful shutdown of Linux is not required (and is in fact undesireable), why is Linux involved at all in this shutdown process? For example, why is this not a secure interrupt taken to EL3, which can (gracefully) shut down the CPUs regardless? > > > Since battery can provide limited power for a very short time hence need to > > > transition to lowest power. As per the transition process , CPUs power > > > domain has to be off but before that it needs to flush out its content to > > > system memory(L3) so that content could be backed-up by a MCU, a controller > > > consuming very less power. Since we can not afford plugging-out every > > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > > > which ultimately switch to ATF to flush out all the CPUs caches and comes > > > out of coherency domain so that its power rails could be switched-off. > > If you're stopping CPUs from completely arbitrary states, what is the > > benefit of saving the RAM contents? > > Some of the RAM contains data that was in the process of being written to > disk by the accelerator. Ok, so this isn't actually about backing up RAM contents; it's about completing pending I/O. I'm still confused as to how that works. How do you avoid leaving the disk in some corrupt state if data runs out partway through? > This data must be saved to disk and the high power CPUs consume too much > power to continue performing this operation. > > > CPUs might be running with IRQs disabled for an arbitrarily long time, > > In an embedded linux system we control everything running. Sure, and that complete control allows you to do something better than this RFC, AFAICT. Thanks, Mark.
On 2019-01-23 9:33 a.m., Mark Rutland wrote: > On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: >> Hi Mark, >> >> Hopefully I can shed some light on the use case inline. >> >> On 2019-01-23 8:48 a.m., Mark Rutland wrote: >>> On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: >>>> On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> >>>> wrote: >>>> >>>> Need comes from a specific use case where one Accelerator card(SoC) is >>>> plugged in a sever over a PCIe interface. This Card gets supply from a >>>> battery, which could provide very less power for a very small time, in case >>>> of any power loss. Once Card switches to battery, this has to reduce its >>>> power consumption to its lowest point and back-up the DDR contents asap >>>> before battery gets fully drained off. >>> In this example is Linux running on the server, or on the accelerator? >> Accelerator >>> What precisely are you trying to back up from DDR, and why? >> Data in DDR is being written to disk at this time (disk is connected to >> accelerator) >>> What is responsible for backing up that contents? >> A low power M-class processor and DMA engine which continues necessary >> operations to transfer DDR memory to disk. >> >> The high power processors on the accelerator running linux needed to be >> halted ASAP on this power loss event and M0 take over. Graceful shutdown of >> linux and other peripherals is unnecessary (and we don't have the power >> necessary to do so). > If graceful shutdown of Linux is not required (and is in fact > undesireable), why is Linux involved at all in this shutdown process? > > For example, why is this not a secure interrupt taken to EL3, which can > (gracefully) shut down the CPUs regardless? Will need Pramod to explain the detailed rationale here. >>>> Since battery can provide limited power for a very short time hence need to >>>> transition to lowest power. As per the transition process , CPUs power >>>> domain has to be off but before that it needs to flush out its content to >>>> system memory(L3) so that content could be backed-up by a MCU, a controller >>>> consuming very less power. Since we can not afford plugging-out every >>>> individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs >>>> which ultimately switch to ATF to flush out all the CPUs caches and comes >>>> out of coherency domain so that its power rails could be switched-off. >>> If you're stopping CPUs from completely arbitrary states, what is the >>> benefit of saving the RAM contents? >> Some of the RAM contains data that was in the process of being written to >> disk by the accelerator. > Ok, so this isn't actually about backing up RAM contents; it's about > completing pending I/O. > > I'm still confused as to how that works. How do you avoid leaving the > disk in some corrupt state if data runs out partway through? Some additional flags and details are saved to disk with the "pending i/o". On next power up an app runs which recovers the data and recovers it and completes processing. Of course, if the store doesn't succeed properly portions of the recovery are discarded. > >> This data must be saved to disk and the high power CPUs consume too much >> power to continue performing this operation. >> >>> CPUs might be running with IRQs disabled for an arbitrarily long time, >> In an embedded linux system we control everything running. > Sure, and that complete control allows you to do something better than > this RFC, AFAICT. If possible that would be great. Need Pramod to comment whether the direct EL3 will solve all issues. > > Thanks, > Mark. Thanks for input Mark. Scott
On Wed, Jan 23, 2019 at 09:46:22AM -0800, Scott Branden wrote: [...] > > Some additional flags and details are saved to disk with the "pending i/o". > I am getting a sense that this is some raw partition on a disk(flash/eMMC/UFS) the M0 can write to and Linux can read. It's can't be full fledge file system like ext4 were are talking here or are we ? -- Regards, Sudeep
On Wed, Jan 23, 2019 at 11:03 PM Mark Rutland <mark.rutland@arm.com> wrote: > > On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: > > Hi Mark, > > > > Hopefully I can shed some light on the use case inline. > > > > On 2019-01-23 8:48 a.m., Mark Rutland wrote: > > > On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: > > > > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> > > > > wrote: > > > > > > > > Need comes from a specific use case where one Accelerator card(SoC) is > > > > plugged in a sever over a PCIe interface. This Card gets supply from a > > > > battery, which could provide very less power for a very small time, in case > > > > of any power loss. Once Card switches to battery, this has to reduce its > > > > power consumption to its lowest point and back-up the DDR contents asap > > > > before battery gets fully drained off. > > > In this example is Linux running on the server, or on the accelerator? > > Accelerator > > > > > > What precisely are you trying to back up from DDR, and why? > > Data in DDR is being written to disk at this time (disk is connected to > > accelerator) > > > > > > What is responsible for backing up that contents? > > > > A low power M-class processor and DMA engine which continues necessary > > operations to transfer DDR memory to disk. > > > > The high power processors on the accelerator running linux needed to be > > halted ASAP on this power loss event and M0 take over. Graceful shutdown of > > linux and other peripherals is unnecessary (and we don't have the power > > necessary to do so). > > If graceful shutdown of Linux is not required (and is in fact > undesireable), why is Linux involved at all in this shutdown process? > > For example, why is this not a secure interrupt taken to EL3, which can > (gracefully) shut down the CPUs regardless? > This is an GPIO interrupt. This can not be marked secure as for that we need to mark whole GPIO controller as secure which is not possible as GPIO controller is meant for non-secure world having more than 100 lines connected. I agree we have work around where we invoke handler in Linux and switch to ATF via SMC and from ATF we need bring all secondary CPU to ATF via sending SGI and and then respective core flushes the L1/L2 and bring himself out of coherency domain and cluster and MCU shutdowns the CPU subsystem gracefully. This could work for our requirement. Need to check ATF support for that. But What about generic system? This patch address the generic multi-master system's requirement. Consider system where shutting down the linux does not mean shutting down the complete system. Lets take an example of smartnic case Where NIC master and CPUs access cachable DDR. In smarnic its quite common to bring CPUs on demand means when needed via MCU help. Now in full-fledged system. if CPU subsystem is shutdown via poweroff command which does not bring secondary CPUs out of coherency domain, it will bring the complete system unstable when NIC master tries to access DDR and snoop is send to CPUs as well which is not available. Fabric/System hangs... I feel While shutting down the CPUs subsystem or powering off, All secondary CPUs must be shutdown properly by bring-out of coherency domain to remain rest of subsystem usable. I agree that introducing PSCI call introduce delay for shutdown/reboot case but stability matter than little delay. > > > > Since battery can provide limited power for a very short time hence need to > > > > transition to lowest power. As per the transition process , CPUs power > > > > domain has to be off but before that it needs to flush out its content to > > > > system memory(L3) so that content could be backed-up by a MCU, a controller > > > > consuming very less power. Since we can not afford plugging-out every > > > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs > > > > which ultimately switch to ATF to flush out all the CPUs caches and comes > > > > out of coherency domain so that its power rails could be switched-off. > > > If you're stopping CPUs from completely arbitrary states, what is the > > > benefit of saving the RAM contents? > > > > Some of the RAM contains data that was in the process of being written to > > disk by the accelerator. > > Ok, so this isn't actually about backing up RAM contents; it's about > completing pending I/O. > > I'm still confused as to how that works. How do you avoid leaving the > disk in some corrupt state if data runs out partway through? > > > This data must be saved to disk and the high power CPUs consume too much > > power to continue performing this operation. > > > > > CPUs might be running with IRQs disabled for an arbitrarily long time, > > > > In an embedded linux system we control everything running. > > Sure, and that complete control allows you to do something better than > this RFC, AFAICT. > > Thanks, > Mark.
On 25/01/2019 07:03, Pramod Kumar wrote: > On Wed, Jan 23, 2019 at 11:03 PM Mark Rutland <mark.rutland@arm.com> wrote: >> >> On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote: >>> Hi Mark, >>> >>> Hopefully I can shed some light on the use case inline. >>> >>> On 2019-01-23 8:48 a.m., Mark Rutland wrote: >>>> On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote: >>>>> On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@broadcom.com> >>>>> wrote: >>>>> >>>>> Need comes from a specific use case where one Accelerator card(SoC) is >>>>> plugged in a sever over a PCIe interface. This Card gets supply from a >>>>> battery, which could provide very less power for a very small time, in case >>>>> of any power loss. Once Card switches to battery, this has to reduce its >>>>> power consumption to its lowest point and back-up the DDR contents asap >>>>> before battery gets fully drained off. >>>> In this example is Linux running on the server, or on the accelerator? >>> Accelerator >>>> >>>> What precisely are you trying to back up from DDR, and why? >>> Data in DDR is being written to disk at this time (disk is connected to >>> accelerator) >>>> >>>> What is responsible for backing up that contents? >>> >>> A low power M-class processor and DMA engine which continues necessary >>> operations to transfer DDR memory to disk. >>> >>> The high power processors on the accelerator running linux needed to be >>> halted ASAP on this power loss event and M0 take over. Graceful shutdown of >>> linux and other peripherals is unnecessary (and we don't have the power >>> necessary to do so). >> >> If graceful shutdown of Linux is not required (and is in fact >> undesireable), why is Linux involved at all in this shutdown process? >> >> For example, why is this not a secure interrupt taken to EL3, which can >> (gracefully) shut down the CPUs regardless? >> > > This is an GPIO interrupt. This can not be marked secure as for that > we need to mark whole GPIO controller as secure which is not possible > as GPIO controller is meant for non-secure world having more than 100 > lines connected. > > I agree we have work around where we invoke handler in Linux and > switch to ATF via SMC and from ATF we need bring all secondary CPU to > ATF via sending SGI and and then respective core flushes the L1/L2 and > bring himself out of coherency domain and cluster and MCU shutdowns > the CPU subsystem gracefully. This could work for our requirement. > Need to check ATF support for that. Right, SMCCC has whole spaces for SoC-specific and platform-specific service calls. If your system has a need to power off as fast as possible under system-specific constraints, it seems much more sensible to immediately tell the firmware "power off as fast as possible under the system-specific constraints that you have full knowledge of, please", rather than trying to coax the generic kernel_halt() (or whatever) infrastructure to sort-of-do-what-you-want. > But What about generic system? This patch address the generic > multi-master system's requirement. Consider system where shutting down > the linux does not mean shutting down the complete system. Lets take > an example of smartnic case Where NIC master and CPUs access cachable > DDR. In smarnic its quite common to bring CPUs on demand means when > needed via MCU help. > Now in full-fledged system. if CPU subsystem is shutdown via poweroff > command which does not bring secondary CPUs out of coherency domain, > it will bring the complete system unstable when NIC master tries to > access DDR and snoop is send to CPUs as well which is not available. > Fabric/System hangs... Not sure that's really relevant here... If platform firmware is able to power things off in a way that breaks the platform, surely that's entirely the firmware's own fault. > I feel While shutting down the CPUs subsystem or powering off, All > secondary CPUs must be shutdown properly by bring-out of coherency > domain to remain rest of subsystem usable. I agree that introducing > PSCI call introduce delay for shutdown/reboot case but stability > matter than little delay. Again, if you don't trust the firmware to implement SYSTEM_OFF appropriately for the platform, can you really assume its CPU_OFF implementation is safe either? People already complain today about how long CPU bringup takes on certain systems. Extending their reboot cycle by a similar degree for reasons that are entirely irrelevant to those systems is hardly going to make those users any happier. Robin.
On Fri, Jan 25, 2019 at 12:33:56PM +0530, Pramod Kumar wrote: [...] > > This is an GPIO interrupt. This can not be marked secure as for that > we need to mark whole GPIO controller as secure which is not possible > as GPIO controller is meant for non-secure world having more than 100 > lines connected. > > I agree we have work around where we invoke handler in Linux and > switch to ATF via SMC and from ATF we need bring all secondary CPU to > ATF via sending SGI and and then respective core flushes the L1/L2 and > bring himself out of coherency domain and cluster and MCU shutdowns > the CPU subsystem gracefully. This could work for our requirement. > Need to check ATF support for that. > Yes platform specific requirement and platform specific solution, happy ending :). > But What about generic system? This patch address the generic > multi-master system's requirement. Why do we need to address this in generic system ? And what exactly is this multi-master system's requirement ? Linux runs on one or more masters and will own it completely. Shutdown must involve everything it owns if it needs to be graceful. > Consider system where shutting down the linux does not mean shutting > down the complete system. Why is that the case ? Is it forceful shutdown ? Does Linux just own CPUs and don't care about other blocks in the system ? Irrespective of what it owns, system shutdown will take care. > Lets take an example of smartnic case Where NIC master and CPUs access > cachable DDR. In smarnic its quite common to bring CPUs on demand means > when needed via MCU help. Yes you are talking about CPU hotplug or system here ? The above indicates, it's just CPU hotplug and a solution already exists for that. > Now in full-fledged system. if CPU subsystem is shutdown via poweroff > command which does not bring secondary CPUs out of coherency domain, > it will bring the complete system unstable when NIC master tries to > access DDR and snoop is send to CPUs as well which is not available. > Fabric/System hangs... > That's because your custom solution just sends ipi to stop CPUs. If you shutdown the system, all the required information is save to non-volatile memory and system is powered off gracefully. > I feel While shutting down the CPUs subsystem or powering off, All > secondary CPUs must be shutdown properly by bring-out of coherency > domain to remain rest of subsystem usable. I agree that introducing > PSCI call introduce delay for shutdown/reboot case but stability > matter than little delay. IPI_STOP is not designed to do a graceful shutdown of CPU subsystem. Use CPU hotplug. You are trying to make you custom requirement a generic one. We have CPU hotplug framework to do what you want, you just have to use it. Infact you are doing almost the same with you patch, I don't see any point as why CPU hotplug can't be used. -- Regards, Sudeep
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 1598d6f..360e52b 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -822,8 +822,13 @@ static void ipi_cpu_stop(unsigned int cpu) local_daif_mask(); sdei_mask_local_cpu(); +#ifdef CONFIG_HOTPLUG_CPU + if (cpu_ops[cpu]->cpu_die) + cpu_ops[cpu]->cpu_die(cpu); +#else while (1) cpu_relax(); +#endif } #ifdef CONFIG_KEXEC_CORE