Message ID | 1359651943-21752-4-git-send-email-dinguyen@altera.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/31/2013 10:05 AM, dinguyen@altera.com wrote: > From: Dinh Nguyen <dinguyen@altera.com> > > mach-socfpga is another platform that needs to use > v7_invalidate_l1 to bringup additional cores. There was a comment that > the ideal place for v7_invalidate_l1 should be in arm/mm/cache-v7.S Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> (on Tegra30 Cardhu, with this patch applied on top of Tegra's for-next branch, played audio and did repeated CPU hotplugs) There will be a minor merge conflict when this patch is merged with the Tegra tree changes, but it's trivial to resolve; some other code was removed from this file right next to this function.
Hi, On Thu, Jan 31, 2013 at 11:05:42AM -0600, dinguyen@altera.com wrote: > From: Dinh Nguyen <dinguyen@altera.com> > > mach-socfpga is another platform that needs to use > v7_invalidate_l1 to bringup additional cores. There was a comment that > the ideal place for v7_invalidate_l1 should be in arm/mm/cache-v7.S > > Signed-off-by: Dinh Nguyen <dinguyen@altera.com> > Acked-by: Simon Horman <horms+renesas@verge.net.au> > Tested-by: Pavel Machek <pavel@denx.de> > Reviewed-by: Pavel Machek <pavel@denx.de> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Olof Johansson <olof@lixom.net> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Rob Herring <rob.herring@calxeda.com> > Cc: Sascha Hauer <kernel@pengutronix.de> > Cc: Magnus Damm <magnus.damm@gmail.com> > Cc: Stephen Warren <swarren@wwwdotorg.org> This should be merged through Russell's tree. Please feed it to his patch tracker. -Olof
+ Lorenzo, On Thursday 31 January 2013 10:35 PM, dinguyen@altera.com wrote: > From: Dinh Nguyen <dinguyen@altera.com> > > mach-socfpga is another platform that needs to use > v7_invalidate_l1 to bringup additional cores. There was a comment that > the ideal place for v7_invalidate_l1 should be in arm/mm/cache-v7.S > > Signed-off-by: Dinh Nguyen <dinguyen@altera.com> > Acked-by: Simon Horman <horms+renesas@verge.net.au> > Tested-by: Pavel Machek <pavel@denx.de> > Reviewed-by: Pavel Machek <pavel@denx.de> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Olof Johansson <olof@lixom.net> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Rob Herring <rob.herring@calxeda.com> > Cc: Sascha Hauer <kernel@pengutronix.de> > Cc: Magnus Damm <magnus.damm@gmail.com> > Cc: Stephen Warren <swarren@wwwdotorg.org> > --- > arch/arm/mach-imx/headsmp.S | 47 ------------------------------------- > arch/arm/mach-shmobile/headsmp.S | 48 -------------------------------------- > arch/arm/mach-tegra/headsmp.S | 43 ---------------------------------- > arch/arm/mm/cache-v7.S | 46 ++++++++++++++++++++++++++++++++++++ > 4 files changed, 46 insertions(+), 138 deletions(-) > [..] > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S > index 7539ec2..15451ee 100644 > --- a/arch/arm/mm/cache-v7.S > +++ b/arch/arm/mm/cache-v7.S > @@ -19,6 +19,52 @@ > #include "proc-macros.S" > > /* > + * The secondary kernel init calls v7_flush_dcache_all before it enables > + * the L1; however, the L1 comes out of reset in an undefined state, so > + * the clean + invalidate performed by v7_flush_dcache_all causes a bunch > + * of cache lines with uninitialized data and uninitialized tags to get > + * written out to memory, which does really unpleasant things to the main > + * processor. We fix this by performing an invalidate, rather than a > + * clean + invalidate, before jumping into the kernel. > + * > + * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs > + * to be called for both secondary cores startup and primary core resume > + * procedures. > + */ > +ENTRY(v7_invalidate_l1) Now since we are moving the code under common place, probably we should update this a function a bit so that it invalidates the CPU cache till line of unification. Just to be consistent with other flush API. Lorenzo, Does that make sense ? Regards, Santosh
On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: > Now since we are moving the code under common place, probably we should > update this a function a bit so that it invalidates the CPU cache till > line of unification. Just to be consistent with other flush API. Hmm. Do you really want a CPU being brought up to do that to the PoU, every time that it is brought up? I thought you wanted to get rid of that kind of stuff from the hotplug paths so that a CPU being brought up/taken down doesn't affect the caches for the other CPUs within the inner sharable domain.
On Friday 01 February 2013 05:02 PM, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: >> Now since we are moving the code under common place, probably we should >> update this a function a bit so that it invalidates the CPU cache till >> line of unification. Just to be consistent with other flush API. > > Hmm. Do you really want a CPU being brought up to do that to the PoU, > every time that it is brought up? I thought you wanted to get rid of > that kind of stuff from the hotplug paths so that a CPU being brought > up/taken down doesn't affect the caches for the other CPUs within the > inner sharable domain. > You are right. We already git rid of the flush of all cache levels in hotplug and wakeup paths and now it is restricted till the PoU. Assuming for the current v7 machines, PoU is L2, invalidating the cache *till* PoU means only CPU local cache. So the API will in a way invalidate only local cache. May be I am missing your point here. Regards, Santosh
On Fri, Feb 01, 2013 at 11:29:44AM +0000, Santosh Shilimkar wrote: > + Lorenzo, > > On Thursday 31 January 2013 10:35 PM, dinguyen@altera.com wrote: > > From: Dinh Nguyen <dinguyen@altera.com> > > > > mach-socfpga is another platform that needs to use > > v7_invalidate_l1 to bringup additional cores. There was a comment that > > the ideal place for v7_invalidate_l1 should be in arm/mm/cache-v7.S > > > > Signed-off-by: Dinh Nguyen <dinguyen@altera.com> > > Acked-by: Simon Horman <horms+renesas@verge.net.au> > > Tested-by: Pavel Machek <pavel@denx.de> > > Reviewed-by: Pavel Machek <pavel@denx.de> > > Cc: Arnd Bergmann <arnd@arndb.de> > > Cc: Russell King <linux@arm.linux.org.uk> > > Cc: Olof Johansson <olof@lixom.net> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Rob Herring <rob.herring@calxeda.com> > > Cc: Sascha Hauer <kernel@pengutronix.de> > > Cc: Magnus Damm <magnus.damm@gmail.com> > > Cc: Stephen Warren <swarren@wwwdotorg.org> > > --- > > arch/arm/mach-imx/headsmp.S | 47 ------------------------------------- > > arch/arm/mach-shmobile/headsmp.S | 48 -------------------------------------- > > arch/arm/mach-tegra/headsmp.S | 43 ---------------------------------- > > arch/arm/mm/cache-v7.S | 46 ++++++++++++++++++++++++++++++++++++ > > 4 files changed, 46 insertions(+), 138 deletions(-) > > > [..] > > > diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S > > index 7539ec2..15451ee 100644 > > --- a/arch/arm/mm/cache-v7.S > > +++ b/arch/arm/mm/cache-v7.S > > @@ -19,6 +19,52 @@ > > #include "proc-macros.S" > > > > /* > > + * The secondary kernel init calls v7_flush_dcache_all before it enables > > + * the L1; however, the L1 comes out of reset in an undefined state, so > > + * the clean + invalidate performed by v7_flush_dcache_all causes a bunch > > + * of cache lines with uninitialized data and uninitialized tags to get > > + * written out to memory, which does really unpleasant things to the main > > + * processor. We fix this by performing an invalidate, rather than a > > + * clean + invalidate, before jumping into the kernel. > > + * > > + * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs > > + * to be called for both secondary cores startup and primary core resume > > + * procedures. > > + */ > > +ENTRY(v7_invalidate_l1) > Now since we are moving the code under common place, probably we should > update this a function a bit so that it invalidates the CPU cache till > line of unification. Just to be consistent with other flush API. > > Lorenzo, > Does that make sense ? Well, on latest processors (A15, A7) caches are invalidated on reset unless the chip is programmed to skip that on reset (ie L2 retained). But it makes sense, for sure it should not be called v7_invalidate_l1, but invalidate_louis, and instead of forcing level 0 we should be reading LoUIS and invalidate up to that level as we do in the clean and invalidate function. Is it worth adding a v7 cache function where the only difference wrt the clean and invalidate operation is a coprocessor opcode ? Probably not IMHO, why add the set/way loop again ? It is never called from C code, so I do not think there is a point in adding a C API either. Lorenzo
On Friday 01 February 2013 05:41 PM, Lorenzo Pieralisi wrote: > On Fri, Feb 01, 2013 at 11:29:44AM +0000, Santosh Shilimkar wrote: >> + Lorenzo, >> >> On Thursday 31 January 2013 10:35 PM, dinguyen@altera.com wrote: >>> From: Dinh Nguyen <dinguyen@altera.com> >>> >>> mach-socfpga is another platform that needs to use >>> v7_invalidate_l1 to bringup additional cores. There was a comment that >>> the ideal place for v7_invalidate_l1 should be in arm/mm/cache-v7.S >>> >>> Signed-off-by: Dinh Nguyen <dinguyen@altera.com> >>> Acked-by: Simon Horman <horms+renesas@verge.net.au> >>> Tested-by: Pavel Machek <pavel@denx.de> >>> Reviewed-by: Pavel Machek <pavel@denx.de> >>> Cc: Arnd Bergmann <arnd@arndb.de> >>> Cc: Russell King <linux@arm.linux.org.uk> >>> Cc: Olof Johansson <olof@lixom.net> >>> Cc: Thomas Gleixner <tglx@linutronix.de> >>> Cc: Rob Herring <rob.herring@calxeda.com> >>> Cc: Sascha Hauer <kernel@pengutronix.de> >>> Cc: Magnus Damm <magnus.damm@gmail.com> >>> Cc: Stephen Warren <swarren@wwwdotorg.org> >>> --- >>> arch/arm/mach-imx/headsmp.S | 47 ------------------------------------- >>> arch/arm/mach-shmobile/headsmp.S | 48 -------------------------------------- >>> arch/arm/mach-tegra/headsmp.S | 43 ---------------------------------- >>> arch/arm/mm/cache-v7.S | 46 ++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 46 insertions(+), 138 deletions(-) >>> >> [..] >> >>> diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S >>> index 7539ec2..15451ee 100644 >>> --- a/arch/arm/mm/cache-v7.S >>> +++ b/arch/arm/mm/cache-v7.S >>> @@ -19,6 +19,52 @@ >>> #include "proc-macros.S" >>> >>> /* >>> + * The secondary kernel init calls v7_flush_dcache_all before it enables >>> + * the L1; however, the L1 comes out of reset in an undefined state, so >>> + * the clean + invalidate performed by v7_flush_dcache_all causes a bunch >>> + * of cache lines with uninitialized data and uninitialized tags to get >>> + * written out to memory, which does really unpleasant things to the main >>> + * processor. We fix this by performing an invalidate, rather than a >>> + * clean + invalidate, before jumping into the kernel. >>> + * >>> + * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs >>> + * to be called for both secondary cores startup and primary core resume >>> + * procedures. >>> + */ >>> +ENTRY(v7_invalidate_l1) >> Now since we are moving the code under common place, probably we should >> update this a function a bit so that it invalidates the CPU cache till >> line of unification. Just to be consistent with other flush API. >> >> Lorenzo, >> Does that make sense ? > > Well, on latest processors (A15, A7) caches are invalidated on reset unless > the chip is programmed to skip that on reset (ie L2 retained). > > But it makes sense, for sure it should not be called v7_invalidate_l1, > but invalidate_louis, and instead of forcing level 0 we should be > reading LoUIS and invalidate up to that level as we do in the clean and > invalidate function. > That is exactly what I was thinking. > Is it worth adding a v7 cache function where the only difference wrt the clean > and invalidate operation is a coprocessor opcode ? Probably not IMHO, why add > the set/way loop again ? > Probably same function can be re-used with the parameter passing to differentiate the inv and flush. > It is never called from C code, so I do not think there is a point in > adding a C API either. > I agree. C API isn't needed. Regards, Santosh
On Fri, Feb 01, 2013 at 05:14:11PM +0530, Santosh Shilimkar wrote: > On Friday 01 February 2013 05:02 PM, Russell King - ARM Linux wrote: >> On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: >>> Now since we are moving the code under common place, probably we should >>> update this a function a bit so that it invalidates the CPU cache till >>> line of unification. Just to be consistent with other flush API. >> >> Hmm. Do you really want a CPU being brought up to do that to the PoU, >> every time that it is brought up? I thought you wanted to get rid of >> that kind of stuff from the hotplug paths so that a CPU being brought >> up/taken down doesn't affect the caches for the other CPUs within the >> inner sharable domain. >> > You are right. We already git rid of the flush of all cache levels > in hotplug and wakeup paths and now it is restricted till the PoU. > > Assuming for the current v7 machines, PoU is L2, invalidating the cache > *till* PoU means only CPU local cache. So the API will in a way > invalidate only local cache. Err, you want to _invalidate_ the caches down to the point of I/D/TLB unification? Are you really sure you want to do that on a system here other CPUs are running? Even going down to the LoUIS, that point is the point at which the _other_ CPUs may be sharing caches. And invalidating those caches while the other CPUs are running on secondary CPU startup will be VERY VERY VERY bad.
On Fri, Feb 01, 2013 at 12:11:38PM +0000, Lorenzo Pieralisi wrote: > Well, on latest processors (A15, A7) caches are invalidated on reset unless > the chip is programmed to skip that on reset (ie L2 retained). > > But it makes sense, for sure it should not be called v7_invalidate_l1, > but invalidate_louis, and instead of forcing level 0 we should be > reading LoUIS and invalidate up to that level as we do in the clean and > invalidate function. No. Think about it. c7, c6, 2 _invalidates_ the cache. That means any data contained within the cache is discarded. Data is not written back. If you do this down to the LoUIS, that includes all cache levels in the inner sharable domain. The inner sharable domain includes the other CPUs in the system which may already be running (certainly the boot CPU will be running). Are you _really_ sure you want to be invalidating _valid_ data held in caches in the inner sharable domain by other CPUs, rather than just the cache associated with the CPU which is being brought online?
On Friday 01 February 2013 06:18 PM, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 05:14:11PM +0530, Santosh Shilimkar wrote: >> On Friday 01 February 2013 05:02 PM, Russell King - ARM Linux wrote: >>> On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: >>>> Now since we are moving the code under common place, probably we should >>>> update this a function a bit so that it invalidates the CPU cache till >>>> line of unification. Just to be consistent with other flush API. >>> >>> Hmm. Do you really want a CPU being brought up to do that to the PoU, >>> every time that it is brought up? I thought you wanted to get rid of >>> that kind of stuff from the hotplug paths so that a CPU being brought >>> up/taken down doesn't affect the caches for the other CPUs within the >>> inner sharable domain. >>> >> You are right. We already git rid of the flush of all cache levels >> in hotplug and wakeup paths and now it is restricted till the PoU. >> >> Assuming for the current v7 machines, PoU is L2, invalidating the cache >> *till* PoU means only CPU local cache. So the API will in a way >> invalidate only local cache. > > Err, you want to _invalidate_ the caches down to the point of I/D/TLB > unification? Are you really sure you want to do that on a system > here other CPUs are running? > > Even going down to the LoUIS, that point is the point at which the > _other_ CPUs may be sharing caches. > > And invalidating those caches while the other CPUs are running on > secondary CPU startup will be VERY VERY VERY bad. > Absolutly and my intention was never to invalidate all the cache levels. When I said lous, I mean till that point and not including that and next cache levels. May be my terminology isn't accurate. Regards, Santosh
On Fri, Feb 01, 2013 at 06:34:06PM +0530, Santosh Shilimkar wrote: > On Friday 01 February 2013 06:18 PM, Russell King - ARM Linux wrote: >> On Fri, Feb 01, 2013 at 05:14:11PM +0530, Santosh Shilimkar wrote: >>> On Friday 01 February 2013 05:02 PM, Russell King - ARM Linux wrote: >>>> On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: >>>>> Now since we are moving the code under common place, probably we should >>>>> update this a function a bit so that it invalidates the CPU cache till >>>>> line of unification. Just to be consistent with other flush API. >>>> >>>> Hmm. Do you really want a CPU being brought up to do that to the PoU, >>>> every time that it is brought up? I thought you wanted to get rid of >>>> that kind of stuff from the hotplug paths so that a CPU being brought >>>> up/taken down doesn't affect the caches for the other CPUs within the >>>> inner sharable domain. >>>> >>> You are right. We already git rid of the flush of all cache levels >>> in hotplug and wakeup paths and now it is restricted till the PoU. >>> >>> Assuming for the current v7 machines, PoU is L2, invalidating the cache >>> *till* PoU means only CPU local cache. So the API will in a way >>> invalidate only local cache. >> >> Err, you want to _invalidate_ the caches down to the point of I/D/TLB >> unification? Are you really sure you want to do that on a system >> here other CPUs are running? >> >> Even going down to the LoUIS, that point is the point at which the >> _other_ CPUs may be sharing caches. >> >> And invalidating those caches while the other CPUs are running on >> secondary CPU startup will be VERY VERY VERY bad. >> > Absolutly and my intention was never to invalidate all the cache > levels. When I said lous, I mean till that point and not including > that and next cache levels. May be my terminology isn't accurate. Confused. Can't find the term "lous" in your previous mails. I'll assume you mean LoUIS. I'm saying that's totally wrong because going down to that point _includes_ the other CPUs in the system. What we should be doing on secondary CPU bringup for CPUs where the caches are in an unknown state is invalidating those caches which are local to _that_ _CPU_ _only_. That is not "all cache levels down to LoUIS". Here's an example. CPU0-3 are in the inner sharable domain. +----------+ +----------+ +----------+ +----------+ | CPU0 | | CPU1 | | CPU2 | | CPU3 | +----------+ +----------+ +----------+ +----------+ | | | | | | | | +--v-+--v-+ +--v-+--v-+ +--v-+--v-+ +--v-+--v-+ |CL0I|CL0D| |CL0I|CL0D| |CL0I|CL0D| |CL0I|CL0D| <-- cache level 0 +----+----+ +----+----+ +----+----+ +----+----+ | | | | | | | | | | +---+ +--+ | | | +------------------+ | | | | | +-------|--------------+ | | | | +-------+ | | | | | | | +-----------------|-----+ | | | | | | +-----------+ | | | | +---|--------|--------|--------+ | | | +--v--------v--------v--------v--+ +--v--------v--------v--------v--+ | CL1I | | CL1D | level 1 +--------------------------------+ +--------------------------------+ | | +----------------v-----------------------------------v---------------+ | CL2 | level 2 +--------------------------------------------------------------------+ Therefore, because the point of unification for the inner sharable domain is defined as the point at which the I, D and TLB streams (TLB not shown) are combined, this happens at CL2, and does *not* include CL2. CL2 is where the two paths see the same data. So, the LoUIS includes caches in level 0 and level 1. However, CL1 _is_ shared between CPU0-3 - it is part of the inner sharable domain. If you invalidate CL1, then you're destroying data held there by the other CPUs. So, invalidating down to the LoUIS on secondary CPU bringup is _wrong_.
On Friday 01 February 2013 06:50 PM, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 06:34:06PM +0530, Santosh Shilimkar wrote: >> On Friday 01 February 2013 06:18 PM, Russell King - ARM Linux wrote: >>> On Fri, Feb 01, 2013 at 05:14:11PM +0530, Santosh Shilimkar wrote: >>>> On Friday 01 February 2013 05:02 PM, Russell King - ARM Linux wrote: >>>>> On Fri, Feb 01, 2013 at 04:59:44PM +0530, Santosh Shilimkar wrote: >>>>>> Now since we are moving the code under common place, probably we should >>>>>> update this a function a bit so that it invalidates the CPU cache till >>>>>> line of unification. Just to be consistent with other flush API. >>>>> >>>>> Hmm. Do you really want a CPU being brought up to do that to the PoU, >>>>> every time that it is brought up? I thought you wanted to get rid of >>>>> that kind of stuff from the hotplug paths so that a CPU being brought >>>>> up/taken down doesn't affect the caches for the other CPUs within the >>>>> inner sharable domain. >>>>> >>>> You are right. We already git rid of the flush of all cache levels >>>> in hotplug and wakeup paths and now it is restricted till the PoU. >>>> >>>> Assuming for the current v7 machines, PoU is L2, invalidating the cache >>>> *till* PoU means only CPU local cache. So the API will in a way >>>> invalidate only local cache. >>> >>> Err, you want to _invalidate_ the caches down to the point of I/D/TLB >>> unification? Are you really sure you want to do that on a system >>> here other CPUs are running? >>> >>> Even going down to the LoUIS, that point is the point at which the >>> _other_ CPUs may be sharing caches. >>> >>> And invalidating those caches while the other CPUs are running on >>> secondary CPU startup will be VERY VERY VERY bad. >>> >> Absolutly and my intention was never to invalidate all the cache >> levels. When I said lous, I mean till that point and not including >> that and next cache levels. May be my terminology isn't accurate. > > Confused. Can't find the term "lous" in your previous mails. I'll > assume you mean LoUIS. I'm saying that's totally wrong because going > down to that point _includes_ the other CPUs in the system. > Sorry for the typo. I mean LoUIS. > What we should be doing on secondary CPU bringup for CPUs where the > caches are in an unknown state is invalidating those caches which are > local to _that_ _CPU_ _only_. That is not "all cache levels down to > LoUIS". > Restricting it to local CPU cache is also my point. My example was bit narrow with current A9 and A15 designs where there is only L1 and L2 cache and hence not considered the below case. > Here's an example. CPU0-3 are in the inner sharable domain. > > +----------+ +----------+ +----------+ +----------+ > | CPU0 | | CPU1 | | CPU2 | | CPU3 | > +----------+ +----------+ +----------+ +----------+ > | | | | | | | | > +--v-+--v-+ +--v-+--v-+ +--v-+--v-+ +--v-+--v-+ > |CL0I|CL0D| |CL0I|CL0D| |CL0I|CL0D| |CL0I|CL0D| <-- cache level 0 > +----+----+ +----+----+ +----+----+ +----+----+ > | | | | | | | | > | | +---+ +--+ | | | +------------------+ > | | | | | +-------|--------------+ | > | | | +-------+ | | | > | | | | +-----------------|-----+ | | > | | | | +-----------+ | | | > | +---|--------|--------|--------+ | | | > +--v--------v--------v--------v--+ +--v--------v--------v--------v--+ > | CL1I | | CL1D | level 1 > +--------------------------------+ +--------------------------------+ > | | > +----------------v-----------------------------------v---------------+ > | CL2 | level 2 > +--------------------------------------------------------------------+ > > Therefore, because the point of unification for the inner sharable domain > is defined as the point at which the I, D and TLB streams (TLB not shown) > are combined, this happens at CL2, and does *not* include CL2. CL2 is > where the two paths see the same data. > > So, the LoUIS includes caches in level 0 and level 1. > Thats right for above example. I haven't seen such design so far and hence my view was narrow. > However, CL1 _is_ shared between CPU0-3 - it is part of the inner sharable > domain. If you invalidate CL1, then you're destroying data held there by > the other CPUs. So, invalidating down to the LoUIS on secondary CPU > bringup is _wrong_. > As I said above, I didn't considered the case you mentioned here. Thanks for bring up this example. I am aligned with you on _NO_ on invaliding cache level from a CPU which is shared across multiple CPUs. Regards, Santosh
On Fri, Feb 01, 2013 at 12:54:17PM +0000, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 12:11:38PM +0000, Lorenzo Pieralisi wrote: > > Well, on latest processors (A15, A7) caches are invalidated on reset unless > > the chip is programmed to skip that on reset (ie L2 retained). > > > > But it makes sense, for sure it should not be called v7_invalidate_l1, > > but invalidate_louis, and instead of forcing level 0 we should be > > reading LoUIS and invalidate up to that level as we do in the clean and > > invalidate function. > > No. Think about it. c7, c6, 2 _invalidates_ the cache. That means any > data contained within the cache is discarded. Data is not written back. > > If you do this down to the LoUIS, that includes all cache levels in the > inner sharable domain. The inner sharable domain includes the other CPUs > in the system which may already be running (certainly the boot CPU will > be running). On all v7 ARM systems I know of LoUIS correspond to L1 and using LoUIS is a plaster that works for current systems (where by LoUIS I mean all cache levels down to but not inclusive of, LoUIS). What you are saying is true for cpu_suspend code and hotplug as well, there we are cleaning and invalidating down to the LoUIS, which might not be needed since LoUIS does not mean that all cache levels included are _local_ to the CPU. There is no way on ARM systems to detect what cache levels are local to a CPU, so we decided to use LoUIS for that, as a temporary solution to avoid cluttering the code with DT bindings that link cache topology and CPU topology. And on all ARM systems in the mainline "operating on all levels down to but not inclusive of LoUIS" is equivalent to cache levels local to a CPU, if it is not I am missing something and I apologize. > Are you _really_ sure you want to be invalidating _valid_ data held in > caches in the inner sharable domain by other CPUs, rather than just the > cache associated with the CPU which is being brought online? We know that LoUIS corresponds to cache levels local to a CPU for now and we know that it will fail as soon as there are caches in the inner shareability domain that belong to multiple CPUs. Again, that's valid for the clean and invalidate functions as well (though there it is an optimization problem, here I agree is a validity issue and more serious). There is no way to detect cache levels local to a CPU, I tried to do that before posting the new LoUIS API, and all I can say is that we need device tree bindings to do that cleanly, architecturally it is just not detectable. I am happy to discuss a way forward Russell, your concerns are correct, we are aware of them and happy to improve the code to make it work consistently. Thanks, Lorenzo
On Fri, Feb 01, 2013 at 02:10:52PM +0000, Lorenzo Pieralisi wrote: > On Fri, Feb 01, 2013 at 12:54:17PM +0000, Russell King - ARM Linux wrote: > > On Fri, Feb 01, 2013 at 12:11:38PM +0000, Lorenzo Pieralisi wrote: > > > Well, on latest processors (A15, A7) caches are invalidated on reset unless > > > the chip is programmed to skip that on reset (ie L2 retained). > > > > > > But it makes sense, for sure it should not be called v7_invalidate_l1, > > > but invalidate_louis, and instead of forcing level 0 we should be > > > reading LoUIS and invalidate up to that level as we do in the clean and > > > invalidate function. > > > > No. Think about it. c7, c6, 2 _invalidates_ the cache. That means any > > data contained within the cache is discarded. Data is not written back. > > > > If you do this down to the LoUIS, that includes all cache levels in the > > inner sharable domain. The inner sharable domain includes the other CPUs > > in the system which may already be running (certainly the boot CPU will > > be running). > > On all v7 ARM systems I know of LoUIS correspond to L1 and using LoUIS is a > plaster that works for current systems (where by LoUIS I mean all cache > levels down to but not inclusive of, LoUIS). All that I'm saying is that suggesting that v7_invalidate_l1 should go down to LoUIS is wrong, in much the same way as _invalidating_ only the first level of cache. However, invalidating the first level of cache only is safer than invalidating down to LoUIS. The only path which needs this is the secondary CPU bringup path; that's the only path we have where some platforms give us a CPU which has only just come out of reset, and so the caches for that CPU may be in an unknown state. And it only happens to a small subset of platforms. Currently, that small subset of platforms only need the first level of cache invalidating. (Most platforms don't need this; most platforms this would be a waste of time - and people seem to care about hotplug times.) So, that's what we should do. And v7_invalidate_l1 is a perfectly reasonable name for a function to do that for the V7 architecture. As has been pointed out, there's several duplications of that. That's fine, let's move them into the v7 cache code. But... let's not change how they work and go through a pointless design exercise for invalidating more levels of cache when we know that no platform needs it. If/when we do end up with a platform which requires it, we can deal with it then. But let's not lead people down the route of thinking that LoUIS is suitable here when it isn't.
On Fri, Feb 01, 2013 at 02:19:14PM +0000, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 02:10:52PM +0000, Lorenzo Pieralisi wrote: > > On Fri, Feb 01, 2013 at 12:54:17PM +0000, Russell King - ARM Linux wrote: > > > On Fri, Feb 01, 2013 at 12:11:38PM +0000, Lorenzo Pieralisi wrote: > > > > Well, on latest processors (A15, A7) caches are invalidated on reset unless > > > > the chip is programmed to skip that on reset (ie L2 retained). > > > > > > > > But it makes sense, for sure it should not be called v7_invalidate_l1, > > > > but invalidate_louis, and instead of forcing level 0 we should be > > > > reading LoUIS and invalidate up to that level as we do in the clean and > > > > invalidate function. > > > > > > No. Think about it. c7, c6, 2 _invalidates_ the cache. That means any > > > data contained within the cache is discarded. Data is not written back. > > > > > > If you do this down to the LoUIS, that includes all cache levels in the > > > inner sharable domain. The inner sharable domain includes the other CPUs > > > in the system which may already be running (certainly the boot CPU will > > > be running). > > > > On all v7 ARM systems I know of LoUIS correspond to L1 and using LoUIS is a > > plaster that works for current systems (where by LoUIS I mean all cache > > levels down to but not inclusive of, LoUIS). > > All that I'm saying is that suggesting that v7_invalidate_l1 should go > down to LoUIS is wrong, in much the same way as _invalidating_ only the > first level of cache. > > However, invalidating the first level of cache only is safer than > invalidating down to LoUIS. > > The only path which needs this is the secondary CPU bringup path; that's > the only path we have where some platforms give us a CPU which has only > just come out of reset, and so the caches for that CPU may be in an > unknown state. And it only happens to a small subset of platforms. > > Currently, that small subset of platforms only need the first level of > cache invalidating. (Most platforms don't need this; most platforms > this would be a waste of time - and people seem to care about hotplug > times.) > > So, that's what we should do. And v7_invalidate_l1 is a perfectly > reasonable name for a function to do that for the V7 architecture. > > As has been pointed out, there's several duplications of that. That's > fine, let's move them into the v7 cache code. But... let's not change > how they work and go through a pointless design exercise for > invalidating more levels of cache when we know that no platform needs > it. > > If/when we do end up with a platform which requires it, we can deal > with it then. But let's not lead people down the route of thinking > that LoUIS is suitable here when it isn't. Just to further provide some insight into the reasoning: Invalidating data out of a working cache risks data corruption; maybe the data being invalidated is filesystem metadata which was about to be cleaned and written back to storage. That risks filesystem corruption. Invalidating fewer levels than are actually required is different: we may leave dirty cache lines behind which may be evicted, but there's also the chance that the CPU will end up _reading_ from its uninitialized caches and may crash before that happens. So, the risks are: 1. invalidate more levels than are necessary and risk discarding data which other CPUs are using, which may be important data. 2. invalidate less levels than are necessary and risk writing out data from the CPU cache, which may or may not happen _before_ the CPU crashes due to reading invalid data. Out of those two, (2) sounds to me to be the safer approach. Plus, I can't think of a reason why you'd want to put on a SMP system more than one layer of CPU local caches... to do so would seem to me to be an exercise in coherency complexity... So, I suspect that in the real world, we will _never_ see any system which has more than one layer of caches local to the CPU. But we may see a system with a cache architecture similar to the one I drew in my email to Santosh.
On Fri, Feb 01, 2013 at 02:19:14PM +0000, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 02:10:52PM +0000, Lorenzo Pieralisi wrote: > > On Fri, Feb 01, 2013 at 12:54:17PM +0000, Russell King - ARM Linux wrote: > > > On Fri, Feb 01, 2013 at 12:11:38PM +0000, Lorenzo Pieralisi wrote: > > > > Well, on latest processors (A15, A7) caches are invalidated on reset unless > > > > the chip is programmed to skip that on reset (ie L2 retained). > > > > > > > > But it makes sense, for sure it should not be called v7_invalidate_l1, > > > > but invalidate_louis, and instead of forcing level 0 we should be > > > > reading LoUIS and invalidate up to that level as we do in the clean and > > > > invalidate function. > > > > > > No. Think about it. c7, c6, 2 _invalidates_ the cache. That means any > > > data contained within the cache is discarded. Data is not written back. > > > > > > If you do this down to the LoUIS, that includes all cache levels in the > > > inner sharable domain. The inner sharable domain includes the other CPUs > > > in the system which may already be running (certainly the boot CPU will > > > be running). > > > > On all v7 ARM systems I know of LoUIS correspond to L1 and using LoUIS is a > > plaster that works for current systems (where by LoUIS I mean all cache > > levels down to but not inclusive of, LoUIS). > > All that I'm saying is that suggesting that v7_invalidate_l1 should go > down to LoUIS is wrong, in much the same way as _invalidating_ only the > first level of cache. > > However, invalidating the first level of cache only is safer than > invalidating down to LoUIS. > > The only path which needs this is the secondary CPU bringup path; that's > the only path we have where some platforms give us a CPU which has only > just come out of reset, and so the caches for that CPU may be in an > unknown state. And it only happens to a small subset of platforms. > > Currently, that small subset of platforms only need the first level of > cache invalidating. (Most platforms don't need this; most platforms > this would be a waste of time - and people seem to care about hotplug > times.) > > So, that's what we should do. And v7_invalidate_l1 is a perfectly > reasonable name for a function to do that for the V7 architecture. > > As has been pointed out, there's several duplications of that. That's > fine, let's move them into the v7 cache code. But... let's not change > how they work and go through a pointless design exercise for > invalidating more levels of cache when we know that no platform needs > it. > > If/when we do end up with a platform which requires it, we can deal > with it then. But let's not lead people down the route of thinking > that LoUIS is suitable here when it isn't. Your point is fair Russell, and I agree with that in the context you provided. Last thing I want to mention is that duplicating code that loops through sets/ways is not that great, but I also understand that refactoring the code to share the loop in v7_flush_dcache_all might do more harm than good and introduce bugs, so let's merge the invalidate_l1 function as it stands. Thank you, Lorenzo
On Friday 01 February 2013 08:01 PM, Russell King - ARM Linux wrote: > Just to further provide some insight into the reasoning: > > Invalidating data out of a working cache risks data corruption; maybe > the data being invalidated is filesystem metadata which was about to > be cleaned and written back to storage. That risks filesystem > corruption. > > Invalidating fewer levels than are actually required is different: we > may leave dirty cache lines behind which may be evicted, but there's > also the chance that the CPU will end up _reading_ from its > uninitialized caches and may crash before that happens. > > So, the risks are: > 1. invalidate more levels than are necessary and risk discarding data > which other CPUs are using, which may be important data. > 2. invalidate less levels than are necessary and risk writing out > data from the CPU cache, which may or may not happen _before_ the > CPU crashes due to reading invalid data. > > Out of those two, (2) sounds to me to be the safer approach. > > Plus, I can't think of a reason why you'd want to put on a SMP system > more than one layer of CPU local caches... to do so would seem to me to > be an exercise in coherency complexity... So, I suspect that in the > real world, we will _never_ see any system which has more than one > layer of caches local to the CPU. But we may see a system with a > cache architecture similar to the one I drew in my email to Santosh. > I still scratching my head on why you would even have a CPU design with two L2 shared caches for a 4 CPU system. If you ever design such a system, you need to ensure that 1. Both L2 are used in exclusive mode 2. Both L2 cache has coherency hardware connected to keep them in sync for shared data. For 1, one would just increase the size of L2 and have only 1 memory. 2 Doesn't bring much advantage unless and until your L3 is too far away for access in terms of CPU access cycles. Regards Santosh
On Fri, Feb 01, 2013 at 08:13:34PM +0530, Santosh Shilimkar wrote: > On Friday 01 February 2013 08:01 PM, Russell King - ARM Linux wrote: >> Just to further provide some insight into the reasoning: >> >> Invalidating data out of a working cache risks data corruption; maybe >> the data being invalidated is filesystem metadata which was about to >> be cleaned and written back to storage. That risks filesystem >> corruption. >> >> Invalidating fewer levels than are actually required is different: we >> may leave dirty cache lines behind which may be evicted, but there's >> also the chance that the CPU will end up _reading_ from its >> uninitialized caches and may crash before that happens. >> >> So, the risks are: >> 1. invalidate more levels than are necessary and risk discarding data >> which other CPUs are using, which may be important data. >> 2. invalidate less levels than are necessary and risk writing out >> data from the CPU cache, which may or may not happen _before_ the >> CPU crashes due to reading invalid data. >> >> Out of those two, (2) sounds to me to be the safer approach. >> >> Plus, I can't think of a reason why you'd want to put on a SMP system >> more than one layer of CPU local caches... to do so would seem to me to >> be an exercise in coherency complexity... So, I suspect that in the >> real world, we will _never_ see any system which has more than one >> layer of caches local to the CPU. But we may see a system with a >> cache architecture similar to the one I drew in my email to Santosh. >> > I still scratching my head on why you would even have a CPU design > with two L2 shared caches for a 4 CPU system. > > If you ever design such a system, you need to ensure that > > 1. Both L2 are used in exclusive mode > 2. Both L2 cache has coherency hardware connected to keep them in sync > for shared data. > > For 1, one would just increase the size of L2 and have only 1 memory. > > 2 Doesn't bring much advantage unless and until your L3 is too far > away for access in terms of CPU access cycles. I don't think you quite understood my diagram. There aren't two separate L2 data caches (CL1I and CL1D). I'm showing the L2 cache as having a harvard structure (separate instruction and data) with no coherency between them - and because they're harvard structured, that means the unification level must be _below_ that point.
On Friday 01 February 2013 08:19 PM, Russell King - ARM Linux wrote: > On Fri, Feb 01, 2013 at 08:13:34PM +0530, Santosh Shilimkar wrote: >> On Friday 01 February 2013 08:01 PM, Russell King - ARM Linux wrote: >>> Just to further provide some insight into the reasoning: >>> >>> Invalidating data out of a working cache risks data corruption; maybe >>> the data being invalidated is filesystem metadata which was about to >>> be cleaned and written back to storage. That risks filesystem >>> corruption. >>> >>> Invalidating fewer levels than are actually required is different: we >>> may leave dirty cache lines behind which may be evicted, but there's >>> also the chance that the CPU will end up _reading_ from its >>> uninitialized caches and may crash before that happens. >>> >>> So, the risks are: >>> 1. invalidate more levels than are necessary and risk discarding data >>> which other CPUs are using, which may be important data. >>> 2. invalidate less levels than are necessary and risk writing out >>> data from the CPU cache, which may or may not happen _before_ the >>> CPU crashes due to reading invalid data. >>> >>> Out of those two, (2) sounds to me to be the safer approach. >>> >>> Plus, I can't think of a reason why you'd want to put on a SMP system >>> more than one layer of CPU local caches... to do so would seem to me to >>> be an exercise in coherency complexity... So, I suspect that in the >>> real world, we will _never_ see any system which has more than one >>> layer of caches local to the CPU. But we may see a system with a >>> cache architecture similar to the one I drew in my email to Santosh. >>> >> I still scratching my head on why you would even have a CPU design >> with two L2 shared caches for a 4 CPU system. >> >> If you ever design such a system, you need to ensure that >> >> 1. Both L2 are used in exclusive mode >> 2. Both L2 cache has coherency hardware connected to keep them in sync >> for shared data. >> >> For 1, one would just increase the size of L2 and have only 1 memory. >> >> 2 Doesn't bring much advantage unless and until your L3 is too far >> away for access in terms of CPU access cycles. > > I don't think you quite understood my diagram. There aren't two separate > L2 data caches (CL1I and CL1D). I'm showing the L2 cache as having a > harvard structure (separate instruction and data) with no coherency > between them - and because they're harvard structured, that means the > unification level must be _below_ that point. > Now I get it. Yes I missed the I and D separation in the diagram. Thanks a lot for drawing. Regards, Santosh
diff --git a/arch/arm/mach-imx/headsmp.S b/arch/arm/mach-imx/headsmp.S index 7e49deb..921fc15 100644 --- a/arch/arm/mach-imx/headsmp.S +++ b/arch/arm/mach-imx/headsmp.S @@ -17,53 +17,6 @@ .section ".text.head", "ax" -/* - * The secondary kernel init calls v7_flush_dcache_all before it enables - * the L1; however, the L1 comes out of reset in an undefined state, so - * the clean + invalidate performed by v7_flush_dcache_all causes a bunch - * of cache lines with uninitialized data and uninitialized tags to get - * written out to memory, which does really unpleasant things to the main - * processor. We fix this by performing an invalidate, rather than a - * clean + invalidate, before jumping into the kernel. - * - * This funciton is cloned from arch/arm/mach-tegra/headsmp.S, and needs - * to be called for both secondary cores startup and primary core resume - * procedures. Ideally, it should be moved into arch/arm/mm/cache-v7.S. - */ -ENTRY(v7_invalidate_l1) - mov r0, #0 - mcr p15, 0, r0, c7, c5, 0 @ invalidate I cache - mcr p15, 2, r0, c0, c0, 0 - mrc p15, 1, r0, c0, c0, 0 - - ldr r1, =0x7fff - and r2, r1, r0, lsr #13 - - ldr r1, =0x3ff - - and r3, r1, r0, lsr #3 @ NumWays - 1 - add r2, r2, #1 @ NumSets - - and r0, r0, #0x7 - add r0, r0, #4 @ SetShift - - clz r1, r3 @ WayShift - add r4, r3, #1 @ NumWays -1: sub r2, r2, #1 @ NumSets-- - mov r3, r4 @ Temp = NumWays -2: subs r3, r3, #1 @ Temp-- - mov r5, r3, lsl r1 - mov r6, r2, lsl r0 - orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift) - mcr p15, 0, r5, c7, c6, 2 - bgt 2b - cmp r2, #0 - bgt 1b - dsb - isb - mov pc, lr -ENDPROC(v7_invalidate_l1) - #ifdef CONFIG_SMP ENTRY(v7_secondary_startup) bl v7_invalidate_l1 diff --git a/arch/arm/mach-shmobile/headsmp.S b/arch/arm/mach-shmobile/headsmp.S index b202c12..96001fd 100644 --- a/arch/arm/mach-shmobile/headsmp.S +++ b/arch/arm/mach-shmobile/headsmp.S @@ -16,54 +16,6 @@ __CPUINIT -/* Cache invalidation nicked from arch/arm/mach-imx/head-v7.S, thanks! - * - * The secondary kernel init calls v7_flush_dcache_all before it enables - * the L1; however, the L1 comes out of reset in an undefined state, so - * the clean + invalidate performed by v7_flush_dcache_all causes a bunch - * of cache lines with uninitialized data and uninitialized tags to get - * written out to memory, which does really unpleasant things to the main - * processor. We fix this by performing an invalidate, rather than a - * clean + invalidate, before jumping into the kernel. - * - * This funciton is cloned from arch/arm/mach-tegra/headsmp.S, and needs - * to be called for both secondary cores startup and primary core resume - * procedures. Ideally, it should be moved into arch/arm/mm/cache-v7.S. - */ -ENTRY(v7_invalidate_l1) - mov r0, #0 - mcr p15, 0, r0, c7, c5, 0 @ invalidate I cache - mcr p15, 2, r0, c0, c0, 0 - mrc p15, 1, r0, c0, c0, 0 - - ldr r1, =0x7fff - and r2, r1, r0, lsr #13 - - ldr r1, =0x3ff - - and r3, r1, r0, lsr #3 @ NumWays - 1 - add r2, r2, #1 @ NumSets - - and r0, r0, #0x7 - add r0, r0, #4 @ SetShift - - clz r1, r3 @ WayShift - add r4, r3, #1 @ NumWays -1: sub r2, r2, #1 @ NumSets-- - mov r3, r4 @ Temp = NumWays -2: subs r3, r3, #1 @ Temp-- - mov r5, r3, lsl r1 - mov r6, r2, lsl r0 - orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift) - mcr p15, 0, r5, c7, c6, 2 - bgt 2b - cmp r2, #0 - bgt 1b - dsb - isb - mov pc, lr -ENDPROC(v7_invalidate_l1) - ENTRY(shmobile_invalidate_start) bl v7_invalidate_l1 b secondary_startup diff --git a/arch/arm/mach-tegra/headsmp.S b/arch/arm/mach-tegra/headsmp.S index 4a317fa..fb082c4 100644 --- a/arch/arm/mach-tegra/headsmp.S +++ b/arch/arm/mach-tegra/headsmp.S @@ -18,49 +18,6 @@ .section ".text.head", "ax" __CPUINIT -/* - * Tegra specific entry point for secondary CPUs. - * The secondary kernel init calls v7_flush_dcache_all before it enables - * the L1; however, the L1 comes out of reset in an undefined state, so - * the clean + invalidate performed by v7_flush_dcache_all causes a bunch - * of cache lines with uninitialized data and uninitialized tags to get - * written out to memory, which does really unpleasant things to the main - * processor. We fix this by performing an invalidate, rather than a - * clean + invalidate, before jumping into the kernel. - */ -ENTRY(v7_invalidate_l1) - mov r0, #0 - mcr p15, 2, r0, c0, c0, 0 - mrc p15, 1, r0, c0, c0, 0 - - ldr r1, =0x7fff - and r2, r1, r0, lsr #13 - - ldr r1, =0x3ff - - and r3, r1, r0, lsr #3 @ NumWays - 1 - add r2, r2, #1 @ NumSets - - and r0, r0, #0x7 - add r0, r0, #4 @ SetShift - - clz r1, r3 @ WayShift - add r4, r3, #1 @ NumWays -1: sub r2, r2, #1 @ NumSets-- - mov r3, r4 @ Temp = NumWays -2: subs r3, r3, #1 @ Temp-- - mov r5, r3, lsl r1 - mov r6, r2, lsl r0 - orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift) - mcr p15, 0, r5, c7, c6, 2 - bgt 2b - cmp r2, #0 - bgt 1b - dsb - isb - mov pc, lr -ENDPROC(v7_invalidate_l1) - ENTRY(tegra_secondary_startup) bl v7_invalidate_l1 diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index 7539ec2..15451ee 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -19,6 +19,52 @@ #include "proc-macros.S" /* + * The secondary kernel init calls v7_flush_dcache_all before it enables + * the L1; however, the L1 comes out of reset in an undefined state, so + * the clean + invalidate performed by v7_flush_dcache_all causes a bunch + * of cache lines with uninitialized data and uninitialized tags to get + * written out to memory, which does really unpleasant things to the main + * processor. We fix this by performing an invalidate, rather than a + * clean + invalidate, before jumping into the kernel. + * + * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs + * to be called for both secondary cores startup and primary core resume + * procedures. + */ +ENTRY(v7_invalidate_l1) + mov r0, #0 + mcr p15, 2, r0, c0, c0, 0 + mrc p15, 1, r0, c0, c0, 0 + + ldr r1, =0x7fff + and r2, r1, r0, lsr #13 + + ldr r1, =0x3ff + + and r3, r1, r0, lsr #3 @ NumWays - 1 + add r2, r2, #1 @ NumSets + + and r0, r0, #0x7 + add r0, r0, #4 @ SetShift + + clz r1, r3 @ WayShift + add r4, r3, #1 @ NumWays +1: sub r2, r2, #1 @ NumSets-- + mov r3, r4 @ Temp = NumWays +2: subs r3, r3, #1 @ Temp-- + mov r5, r3, lsl r1 + mov r6, r2, lsl r0 + orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift) + mcr p15, 0, r5, c7, c6, 2 + bgt 2b + cmp r2, #0 + bgt 1b + dsb + isb + mov pc, lr +ENDPROC(v7_invalidate_l1) + +/* * v7_flush_icache_all() * * Flush the whole I-cache.