Message ID | cover.1577111363.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
Headers | show |
Series | powerpc/32: switch VDSO to C implementation. | expand |
Wondering why we get something so complicated/redundant for vdso_read_begin() <include/vdso/helpers.h> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd) { u32 seq; while ((seq = READ_ONCE(vd->seq)) & 1) cpu_relax(); smp_rmb(); return seq; } 6e0: 81 05 00 f0 lwz r8,240(r5) 6e4: 71 09 00 01 andi. r9,r8,1 6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158> 6ec: 81 05 00 f0 lwz r8,240(r5) 6f0: 71 0a 00 01 andi. r10,r8,1 6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c> 6f8: r5 being vd pointer Why the first triplet, not only the second triplet ? Something wrong with using READ_ONCE() for that ? Christophe
On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote: > Wondering why we get something so complicated/redundant for > vdso_read_begin() <include/vdso/helpers.h> > > static __always_inline u32 vdso_read_begin(const struct vdso_data *vd) > { > u32 seq; > > while ((seq = READ_ONCE(vd->seq)) & 1) > cpu_relax(); > > smp_rmb(); > return seq; > } > > > 6e0: 81 05 00 f0 lwz r8,240(r5) > 6e4: 71 09 00 01 andi. r9,r8,1 > 6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158> > 6ec: 81 05 00 f0 lwz r8,240(r5) > 6f0: 71 0a 00 01 andi. r10,r8,1 > 6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c> > 6f8: > > r5 being vd pointer > > Why the first triplet, not only the second triplet ? Something wrong > with using READ_ONCE() for that ? It looks like the compiler did loop peeling. What GCC version is this? Please try current trunk (to become GCC 10), or at least GCC 9? Segher
Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : > On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote: >> Wondering why we get something so complicated/redundant for >> vdso_read_begin() <include/vdso/helpers.h> >> >> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd) >> { >> u32 seq; >> >> while ((seq = READ_ONCE(vd->seq)) & 1) >> cpu_relax(); >> >> smp_rmb(); >> return seq; >> } >> >> >> 6e0: 81 05 00 f0 lwz r8,240(r5) >> 6e4: 71 09 00 01 andi. r9,r8,1 >> 6e8: 41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158> >> 6ec: 81 05 00 f0 lwz r8,240(r5) >> 6f0: 71 0a 00 01 andi. r10,r8,1 >> 6f4: 40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c> >> 6f8: >> >> r5 being vd pointer >> >> Why the first triplet, not only the second triplet ? Something wrong >> with using READ_ONCE() for that ? > > It looks like the compiler did loop peeling. What GCC version is this? > Please try current trunk (to become GCC 10), or at least GCC 9? > It is with GCC 5.5 https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more recent than 8.1 With 8.1, the problem doesn't show up. Thanks Christophe
On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote: > Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : > >It looks like the compiler did loop peeling. What GCC version is this? > >Please try current trunk (to become GCC 10), or at least GCC 9? > > It is with GCC 5.5 > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more > recent than 8.1 Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is this hard and/or painful to do? > With 8.1, the problem doesn't show up. Good to hear that. Hrm, I wonder when it was fixed... Segher
On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote: > > Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : > > >It looks like the compiler did loop peeling. What GCC version is this? > > >Please try current trunk (to become GCC 10), or at least GCC 9? > > > > It is with GCC 5.5 > > > > https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more > > recent than 8.1 > > Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is > this hard and/or painful to do? To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2 binaries, as well as a recent 10.0 snapshot. I hope these work, let me know if there are problems. Arnd
Le 16/02/2020 à 19:10, Arnd Bergmann a écrit : > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote: >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : >>>> It looks like the compiler did loop peeling. What GCC version is this? >>>> Please try current trunk (to become GCC 10), or at least GCC 9? >>> >>> It is with GCC 5.5 >>> >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more >>> recent than 8.1 >> >> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is >> this hard and/or painful to do? > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2 > binaries, as well as a recent 10.0 snapshot. > Thanks Arnd, I have built the VDSO with 9.2, I get less performant result than with 8.2 (same performance as with 5.5). After a quick look, I see: - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should avoid that. - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC 8.1 don't need that, all VDSO functions are frameless with 8.1 Christophe
On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote: > Le 16/02/2020 à 19:10, Arnd Bergmann a écrit : > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > >> > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote: > >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : > >>>> It looks like the compiler did loop peeling. What GCC version is this? > >>>> Please try current trunk (to become GCC 10), or at least GCC 9? > >>> > >>> It is with GCC 5.5 > >>> > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more > >>> recent than 8.1 > >> > >> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is > >> this hard and/or painful to do? > > > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2 > > binaries, as well as a recent 10.0 snapshot. > > > > Thanks Arnd, > > I have built the VDSO with 9.2, I get less performant result than with > 8.2 (same performance as with 5.5). > > After a quick look, I see: > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should > avoid that. > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC > 8.1 don't need that, all VDSO functions are frameless with 8.1 If you think it should be fixed in gcc, maybe try to reproduce it in https://godbolt.org/ and open a gcc bug against that. Also, please try the gcc-10 snapshot, which has the highest chance of getting fixes if it shows the same issue (or worse). Arnd
On Wed, Feb 19, 2020 at 10:52:16AM +0100, Arnd Bergmann wrote: > On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy > <christophe.leroy@c-s.fr> wrote: > > Le 16/02/2020 à 19:10, Arnd Bergmann a écrit : > > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool > > > <segher@kernel.crashing.org> wrote: > > >> > > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote: > > >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit : > > >>>> It looks like the compiler did loop peeling. What GCC version is this? > > >>>> Please try current trunk (to become GCC 10), or at least GCC 9? > > >>> > > >>> It is with GCC 5.5 > > >>> > > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more > > >>> recent than 8.1 > > >> > > >> Arnd, can you update the tools? We are at 8.3 and 9.2 now :-) Or is > > >> this hard and/or painful to do? > > > > > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2 > > > binaries, as well as a recent 10.0 snapshot. > > > > > > > Thanks Arnd, > > > > I have built the VDSO with 9.2, I get less performant result than with > > 8.2 (same performance as with 5.5). > > > > After a quick look, I see: > > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should > > avoid that. > > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC > > 8.1 don't need that, all VDSO functions are frameless with 8.1 > > If you think it should be fixed in gcc, maybe try to reproduce it in > https://godbolt.org/ (Feel free to skip this step; and don't put links to godbolt (or anything else external) in our bugzilla, please; such links go stale before you know it.) > and open a gcc bug against that. Yes please :-) > Also, please try the gcc-10 snapshot, which has the highest chance > of getting fixes if it shows the same issue (or worse). If it is a regression, chances are it will be backported. (But not to 9.3, which is due in just a few weeks, just like 8.4). If it is just a side effect of some other change, it will probably *not* be undone, not on trunk (GCC 10) either. It depends. But sure, always test trunk if you can. Segher