Message ID | 20170302225515.GG23726@toto (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2 Mar 2017, Edgar E. Iglesias wrote: > On Thu, Mar 02, 2017 at 02:39:55PM -0800, Stefano Stabellini wrote: > > On Thu, 2 Mar 2017, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 02/03/17 19:12, Stefano Stabellini wrote: > > > > On Thu, 2 Mar 2017, Julien Grall wrote: > > > > > On 02/03/17 08:53, Edgar E. Iglesias wrote: > > > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: > > > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: > > > > Julien, from looking at the two diffs, this is simpler and nicer, but if > > > > you look at xen/include/asm-arm/page.h, my patch made > > > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For > > > > consistency, I would prefer to deal with the two functions the same way. > > > > Although it is not a spec requirement, I also think that it is a good > > > > idea to issue cache flushes from cacheline aligned addresses, like > > > > invalidate_dcache_va_range does and Linux does, to make more obvious > > > > what is going on. > > > > > > invalid_dcache_va_range is split because the cache instruction differs for the > > > start and end if unaligned. For them you want to use clean & invalidate rather > > > than invalidate. > > > > > > If you look at the implementation of other cache helpers in Linux (see > > > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align > > > start & end. > > > > I don't think so, unless I am reading dcache_by_line_op wrong. > > > > > > > Also, the invalid_dcache_va_range is using modulo which I would rather avoid. > > > The modulo in this case will not be optimized by the compiler because > > > cacheline_bytes is not a constant. > > > > That is a good point. What if I replace the modulo op with > > > > p & (cacheline_bytes - 1) > > > > in invalidate_dcache_va_range, then add the similar code to > > clean_dcache_va_range and clean_and_invalidate_dcache_va_range? > > > Yeah, if there was some kind of generic ALIGN or ROUND_DOWN macro we could do: > > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) > { > const void *end; > dsb(sy); /* So the CPU issues all writes to the range */ > - for ( end = p + size; p < end; p += cacheline_bytes ) > + > + p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); > + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); Even simpler: end = p + size; p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); > + for ( ; p < end; p += cacheline_bytes ) > asm volatile (__clean_dcache_one(0) : : "r" (p)); > dsb(sy); /* So we know the flushes happen before continuing */ > /* ARM callers assume that dcache_* functions cannot fail. */ > > I think that would achieve the same result as your patch Stefano? Yes, indeed, that's better.
On 02/03/2017 23:07, Stefano Stabellini wrote: > On Thu, 2 Mar 2017, Edgar E. Iglesias wrote: >> On Thu, Mar 02, 2017 at 02:39:55PM -0800, Stefano Stabellini wrote: >>> On Thu, 2 Mar 2017, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On 02/03/17 19:12, Stefano Stabellini wrote: >>>>> On Thu, 2 Mar 2017, Julien Grall wrote: >>>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote: >>>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: >>>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: >>>>> Julien, from looking at the two diffs, this is simpler and nicer, but if >>>>> you look at xen/include/asm-arm/page.h, my patch made >>>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For >>>>> consistency, I would prefer to deal with the two functions the same way. >>>>> Although it is not a spec requirement, I also think that it is a good >>>>> idea to issue cache flushes from cacheline aligned addresses, like >>>>> invalidate_dcache_va_range does and Linux does, to make more obvious >>>>> what is going on. >>>> >>>> invalid_dcache_va_range is split because the cache instruction differs for the >>>> start and end if unaligned. For them you want to use clean & invalidate rather >>>> than invalidate. >>>> >>>> If you look at the implementation of other cache helpers in Linux (see >>>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align >>>> start & end. >>> >>> I don't think so, unless I am reading dcache_by_line_op wrong. >>> >>> >>>> Also, the invalid_dcache_va_range is using modulo which I would rather avoid. >>>> The modulo in this case will not be optimized by the compiler because >>>> cacheline_bytes is not a constant. >>> >>> That is a good point. What if I replace the modulo op with >>> >>> p & (cacheline_bytes - 1) >>> >>> in invalidate_dcache_va_range, then add the similar code to >>> clean_dcache_va_range and clean_and_invalidate_dcache_va_range? >> >> >> Yeah, if there was some kind of generic ALIGN or ROUND_DOWN macro we could do: >> >> --- a/xen/include/asm-arm/page.h >> +++ b/xen/include/asm-arm/page.h >> @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) >> { >> const void *end; >> dsb(sy); /* So the CPU issues all writes to the range */ >> - for ( end = p + size; p < end; p += cacheline_bytes ) >> + >> + p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); >> + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); > > Even simpler: > > end = p + size; > p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); We don't have any ALIGN macro in Xen and the way we use the term align in xen is very similar to ROUNDUP. However a simple p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1)) should work here. Cheers,
--- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) { const void *end; dsb(sy); /* So the CPU issues all writes to the range */ - for ( end = p + size; p < end; p += cacheline_bytes ) + + p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); + for ( ; p < end; p += cacheline_bytes ) asm volatile (__clean_dcache_one(0) : : "r" (p)); dsb(sy); /* So we know the flushes happen before continuing */ /* ARM callers assume that dcache_* functions cannot fail. */