Message ID | 1357777251-13541-8-git-send-email-nicolas.pitre@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 10, 2013 at 12:20:42AM +0000, Nicolas Pitre wrote: > If for whatever reason a CPU is unexpectedly awaken, it shouldn't > re-enter the kernel by using whatever entry vector that might have > been set by a previous operation. > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > --- > arch/arm/common/bL_platsmp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c > index 0acb9f4685..0ae44123bf 100644 > --- a/arch/arm/common/bL_platsmp.c > +++ b/arch/arm/common/bL_platsmp.c > @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) > > static void __ref bL_cpu_die(unsigned int cpu) > { > + unsigned int mpidr, pcpu, pcluster; > + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); > + pcpu = mpidr & 0xff; > + pcluster = (mpidr >> 8) & 0xff; Usual comment about helper functions :) > + bL_set_entry_vector(pcpu, pcluster, NULL); Similar to the power_on story, you need a barrier here (unless you change your platform_ops API to require barriers). > bL_cpu_power_down(); Will
On Mon, 14 Jan 2013, Will Deacon wrote: > On Thu, Jan 10, 2013 at 12:20:42AM +0000, Nicolas Pitre wrote: > > If for whatever reason a CPU is unexpectedly awaken, it shouldn't > > re-enter the kernel by using whatever entry vector that might have > > been set by a previous operation. > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > --- > > arch/arm/common/bL_platsmp.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c > > index 0acb9f4685..0ae44123bf 100644 > > --- a/arch/arm/common/bL_platsmp.c > > +++ b/arch/arm/common/bL_platsmp.c > > @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) > > > > static void __ref bL_cpu_die(unsigned int cpu) > > { > > + unsigned int mpidr, pcpu, pcluster; > > + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); > > + pcpu = mpidr & 0xff; > > + pcluster = (mpidr >> 8) & 0xff; > > Usual comment about helper functions :) > > > + bL_set_entry_vector(pcpu, pcluster, NULL); > > Similar to the power_on story, you need a barrier here (unless you change > your platform_ops API to require barriers). The bL_set_entry_vector() includes a cache flush which itself has a DSB. Hence my previous interrogation. Nicolas
On Mon, Jan 14, 2013 at 04:53:41PM +0000, Nicolas Pitre wrote: > On Mon, 14 Jan 2013, Will Deacon wrote: > > > On Thu, Jan 10, 2013 at 12:20:42AM +0000, Nicolas Pitre wrote: > > > If for whatever reason a CPU is unexpectedly awaken, it shouldn't > > > re-enter the kernel by using whatever entry vector that might have > > > been set by a previous operation. > > > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > > --- > > > arch/arm/common/bL_platsmp.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c > > > index 0acb9f4685..0ae44123bf 100644 > > > --- a/arch/arm/common/bL_platsmp.c > > > +++ b/arch/arm/common/bL_platsmp.c > > > @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) > > > > > > static void __ref bL_cpu_die(unsigned int cpu) > > > { > > > + unsigned int mpidr, pcpu, pcluster; > > > + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); > > > + pcpu = mpidr & 0xff; > > > + pcluster = (mpidr >> 8) & 0xff; > > > > Usual comment about helper functions :) > > > > > + bL_set_entry_vector(pcpu, pcluster, NULL); > > > > Similar to the power_on story, you need a barrier here (unless you change > > your platform_ops API to require barriers). > > The bL_set_entry_vector() includes a cache flush which itself has a DSB. > Hence my previous interrogation. For L1, sure, we always have the dsb for v7. However, for the outer-cache we only have a dsb by virtue of a spin_unlock in l2x0.c... it seems a bit risky to rely on that for ordering your entry_vector write with the power_on. I think the best bet is to put a barrier in power_on, before invoking the platform_ops function and similarly for power_off. What do you reckon? Will
On Mon, Jan 14, 2013 at 05:00:28PM +0000, Will Deacon wrote: > On Mon, Jan 14, 2013 at 04:53:41PM +0000, Nicolas Pitre wrote: > > On Mon, 14 Jan 2013, Will Deacon wrote: > > > > > On Thu, Jan 10, 2013 at 12:20:42AM +0000, Nicolas Pitre wrote: > > > > If for whatever reason a CPU is unexpectedly awaken, it shouldn't > > > > re-enter the kernel by using whatever entry vector that might have > > > > been set by a previous operation. > > > > > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > > > --- > > > > arch/arm/common/bL_platsmp.c | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c > > > > index 0acb9f4685..0ae44123bf 100644 > > > > --- a/arch/arm/common/bL_platsmp.c > > > > +++ b/arch/arm/common/bL_platsmp.c > > > > @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) > > > > > > > > static void __ref bL_cpu_die(unsigned int cpu) > > > > { > > > > + unsigned int mpidr, pcpu, pcluster; > > > > + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); > > > > + pcpu = mpidr & 0xff; > > > > + pcluster = (mpidr >> 8) & 0xff; > > > > > > Usual comment about helper functions :) > > > > > > > + bL_set_entry_vector(pcpu, pcluster, NULL); > > > > > > Similar to the power_on story, you need a barrier here (unless you change > > > your platform_ops API to require barriers). > > > > The bL_set_entry_vector() includes a cache flush which itself has a DSB. > > Hence my previous interrogation. > > For L1, sure, we always have the dsb for v7. However, for the outer-cache we > only have a dsb by virtue of a spin_unlock in l2x0.c... it seems a bit risky > to rely on that for ordering your entry_vector write with the power_on. I was discussing this with Dave earlier, I think we need to fix the outer-cache functions even for the UP case to include a barrier (for PL310 actually we may need to read a register as the cache_wait is a no-op). We assume that cache functions (both inner and outer) fully complete the operation before returning and there is no additional need for barriers.
On Mon, 14 Jan 2013, Will Deacon wrote: > On Mon, Jan 14, 2013 at 04:53:41PM +0000, Nicolas Pitre wrote: > > On Mon, 14 Jan 2013, Will Deacon wrote: > > > > > On Thu, Jan 10, 2013 at 12:20:42AM +0000, Nicolas Pitre wrote: > > > > If for whatever reason a CPU is unexpectedly awaken, it shouldn't > > > > re-enter the kernel by using whatever entry vector that might have > > > > been set by a previous operation. > > > > > > > > Signed-off-by: Nicolas Pitre <nico@linaro.org> > > > > --- > > > > arch/arm/common/bL_platsmp.c | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c > > > > index 0acb9f4685..0ae44123bf 100644 > > > > --- a/arch/arm/common/bL_platsmp.c > > > > +++ b/arch/arm/common/bL_platsmp.c > > > > @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) > > > > > > > > static void __ref bL_cpu_die(unsigned int cpu) > > > > { > > > > + unsigned int mpidr, pcpu, pcluster; > > > > + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); > > > > + pcpu = mpidr & 0xff; > > > > + pcluster = (mpidr >> 8) & 0xff; > > > > > > Usual comment about helper functions :) > > > > > > > + bL_set_entry_vector(pcpu, pcluster, NULL); > > > > > > Similar to the power_on story, you need a barrier here (unless you change > > > your platform_ops API to require barriers). > > > > The bL_set_entry_vector() includes a cache flush which itself has a DSB. > > Hence my previous interrogation. > > For L1, sure, we always have the dsb for v7. However, for the outer-cache we > only have a dsb by virtue of a spin_unlock in l2x0.c... it seems a bit risky > to rely on that for ordering your entry_vector write with the power_on. > > I think the best bet is to put a barrier in power_on, before invoking the > platform_ops function and similarly for power_off. > > What do you reckon? I much prefer adding barriers inside the API when they are needed for proper execution of the API intent. So if I call bL_set_entry_vector(), I trust that by the time it returns the vector is indeed set and ready for use by other processors. The same could be said about the outer cache ops. If a DSB is needed for their intent to be valid, then why isn't this DSB always implied by the corresponding cache op calls? And as you say, there is already one implied by the spinlock used there, so that is not if things would change much in practice. Nicolas
On Mon, Jan 14, 2013 at 05:15:07PM +0000, Nicolas Pitre wrote: > On Mon, 14 Jan 2013, Will Deacon wrote: > > On Mon, Jan 14, 2013 at 04:53:41PM +0000, Nicolas Pitre wrote: > > > The bL_set_entry_vector() includes a cache flush which itself has a DSB. > > > Hence my previous interrogation. > > > > For L1, sure, we always have the dsb for v7. However, for the outer-cache we > > only have a dsb by virtue of a spin_unlock in l2x0.c... it seems a bit risky > > to rely on that for ordering your entry_vector write with the power_on. > > > > I think the best bet is to put a barrier in power_on, before invoking the > > platform_ops function and similarly for power_off. > > > > What do you reckon? > > I much prefer adding barriers inside the API when they are needed for > proper execution of the API intent. So if I call bL_set_entry_vector(), > I trust that by the time it returns the vector is indeed set and ready > for use by other processors. > > The same could be said about the outer cache ops. If a DSB is needed > for their intent to be valid, then why isn't this DSB always implied by > the corresponding cache op calls? And as you say, there is already one > implied by the spinlock used there, so that is not if things would > change much in practice. Ok, so we can fix the outer_cache functions as suggested by Catalin. That still leaves the GIC CPU interface problem in the later patch, which uses a writel_relaxed to disable the CPU interface, so I suppose we can just put a dsb at the end of gic_cpu_if_down(). Will
On Mon, Jan 14, 2013 at 12:15:07PM -0500, Nicolas Pitre wrote: > The same could be said about the outer cache ops. If a DSB is needed > for their intent to be valid, then why isn't this DSB always implied by > the corresponding cache op calls? Hmm, just been thinking about this. The L2x0 calls do contain a DSB but it's not obvious. They hold a raw spinlock, and when that spinlock is dropped, we issue a dsb and sev instruction. Whether the other L2 implementations do this or not I'm not sure - but the above is a requirement of the spinlock implementation, and it just happens to provide the right behaviour for L2x0. But... we _probably_ don't want to impose that down at the L2 cache level of things - at least not for DMA ops, particular for the sanity of the scatter-list operating operations. We really want to avoid doing one DSB per scatterlist entry, doing one DSB per scatterlist operation instead. That does affect how the L2 cache API gets used - maybe we want to separate out the DMA stuff from the other users so that we can have dsbs in that path for non-DMA users. Thoughts?
On Mon, 14 Jan 2013, Russell King - ARM Linux wrote: > On Mon, Jan 14, 2013 at 12:15:07PM -0500, Nicolas Pitre wrote: > > The same could be said about the outer cache ops. If a DSB is needed > > for their intent to be valid, then why isn't this DSB always implied by > > the corresponding cache op calls? > > Hmm, just been thinking about this. > > The L2x0 calls do contain a DSB but it's not obvious. They hold a > raw spinlock, and when that spinlock is dropped, we issue a dsb and > sev instruction. > > Whether the other L2 implementations do this or not I'm not sure - > but the above is a requirement of the spinlock implementation, and > it just happens to provide the right behaviour for L2x0. > > But... we _probably_ don't want to impose that down at the L2 cache > level of things - at least not for DMA ops, particular for the sanity > of the scatter-list operating operations. We really want to avoid > doing one DSB per scatterlist entry, doing one DSB per scatterlist > operation instead. > > That does affect how the L2 cache API gets used - maybe we want to > separate out the DMA stuff from the other users so that we can have > dsbs in that path for non-DMA users. > > Thoughts? The dsb or its intended effect could be confined to outer_sync() and then cache_sync() removed from l2x0_flush_range(). That would allow the sync to be applied when appropriate. However that suffers the same API intent mismatch I was talking about. Maybe adding some asynchronous methods to outer_cache (that could default to the synchronous calls) where the name of the function clearly implies a posted operation would be a better solution. In that case the effect of the operation would be assumed complete only after a terminating outer_sync(). Nicolas
On Mon, Jan 14, 2013 at 06:26:04PM +0000, Russell King - ARM Linux wrote: > On Mon, Jan 14, 2013 at 12:15:07PM -0500, Nicolas Pitre wrote: > > The same could be said about the outer cache ops. If a DSB is needed > > for their intent to be valid, then why isn't this DSB always implied by > > the corresponding cache op calls? > > Hmm, just been thinking about this. > > The L2x0 calls do contain a DSB but it's not obvious. They hold a > raw spinlock, and when that spinlock is dropped, we issue a dsb and > sev instruction. > > Whether the other L2 implementations do this or not I'm not sure - > but the above is a requirement of the spinlock implementation, and > it just happens to provide the right behaviour for L2x0. > > But... we _probably_ don't want to impose that down at the L2 cache > level of things - at least not for DMA ops, particular for the sanity > of the scatter-list operating operations. We really want to avoid > doing one DSB per scatterlist entry, doing one DSB per scatterlist > operation instead. > > That does affect how the L2 cache API gets used - maybe we want to > separate out the DMA stuff from the other users so that we can have > dsbs in that path for non-DMA users. > > Thoughts? Perhaps the existing functions could be renamed to things like: outer_XXX_flush_range() outer_XXX_sync() Where XXX is something like "batch" or "background". Optionally these could be declared somewhere separate to discourage non-DMA code from using them. Other code could still want to do batches of outer cache operations efficiently, but I guess DMA is the main user. Then we could provide simple non-background wrappers which also do the appropriate CPU-side synchronisation, and provide the familiar interface for that. It might be less invasive to rename the new functions instead of the old ones. It partly depends on what proportion of existing uses of these functions are incorrect (i.e., assume full synchronisation). Cheers ---Dave
On Tue, Jan 15, 2013 at 06:40:37PM +0000, Dave Martin wrote: > On Mon, Jan 14, 2013 at 06:26:04PM +0000, Russell King - ARM Linux wrote: > > On Mon, Jan 14, 2013 at 12:15:07PM -0500, Nicolas Pitre wrote: > > > The same could be said about the outer cache ops. If a DSB is needed > > > for their intent to be valid, then why isn't this DSB always implied by > > > the corresponding cache op calls? > > > > Hmm, just been thinking about this. > > > > The L2x0 calls do contain a DSB but it's not obvious. They hold a > > raw spinlock, and when that spinlock is dropped, we issue a dsb and > > sev instruction. > > > > Whether the other L2 implementations do this or not I'm not sure - > > but the above is a requirement of the spinlock implementation, and > > it just happens to provide the right behaviour for L2x0. > > > > But... we _probably_ don't want to impose that down at the L2 cache > > level of things - at least not for DMA ops, particular for the sanity > > of the scatter-list operating operations. We really want to avoid > > doing one DSB per scatterlist entry, doing one DSB per scatterlist > > operation instead. > > > > That does affect how the L2 cache API gets used - maybe we want to > > separate out the DMA stuff from the other users so that we can have > > dsbs in that path for non-DMA users. > > > > Thoughts? > > Perhaps the existing functions could be renamed to things like: > > outer_XXX_flush_range() > outer_XXX_sync() > > Where XXX is something like "batch" or "background". Optionally these > could be declared somewhere separate to discourage non-DMA code from > using them. Other code could still want to do batches of outer cache > operations efficiently, but I guess DMA is the main user. There can be some confusion with using 'background' name because both PL310 and L2X0 have background operations (the former only for the set/way ops). But in software you need to ensure the completion of such operations otherwise the L2 controller behaviour can be unpredictable. So you just want to drop the barriers (outer_sync and dsb), maybe could use the 'relaxed' suffix which matches the I/O accessors.
diff --git a/arch/arm/common/bL_platsmp.c b/arch/arm/common/bL_platsmp.c index 0acb9f4685..0ae44123bf 100644 --- a/arch/arm/common/bL_platsmp.c +++ b/arch/arm/common/bL_platsmp.c @@ -63,6 +63,11 @@ static int bL_cpu_disable(unsigned int cpu) static void __ref bL_cpu_die(unsigned int cpu) { + unsigned int mpidr, pcpu, pcluster; + asm ("mrc p15, 0, %0, c0, c0, 5" : "=r" (mpidr)); + pcpu = mpidr & 0xff; + pcluster = (mpidr >> 8) & 0xff; + bL_set_entry_vector(pcpu, pcluster, NULL); bL_cpu_power_down(); }
If for whatever reason a CPU is unexpectedly awaken, it shouldn't re-enter the kernel by using whatever entry vector that might have been set by a previous operation. Signed-off-by: Nicolas Pitre <nico@linaro.org> --- arch/arm/common/bL_platsmp.c | 5 +++++ 1 file changed, 5 insertions(+)