Message ID | 20241103145153.105097-14-alexghiti@rivosinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | ab83647fadae2f1f723119dc066b39a461d6d288 |
Headers | show |
Series | Zacas/Zabha support and qspinlocks | expand |
Hi Alexandre, kernel test robot noticed the following build warnings: [auto build test WARNING on arnd-asm-generic/master] [also build test WARNING on robh/for-next tip/locking/core linus/master v6.12-rc6] [cannot apply to next-20241101] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Move-cpufeature-h-macros-into-their-own-header/20241103-230614 base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master patch link: https://lore.kernel.org/r/20241103145153.105097-14-alexghiti%40rivosinc.com patch subject: [PATCH v6 13/13] riscv: Add qspinlock support compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411041609.gxjI2dsw-lkp@intel.com/ includecheck warnings: (new ones prefixed by >>) >> arch/riscv/include/asm/spinlock.h: asm/ticket_spinlock.h is included more than once. >> arch/riscv/include/asm/spinlock.h: asm/qspinlock.h is included more than once. vim +10 arch/riscv/include/asm/spinlock.h 8 9 #define __no_arch_spinlock_redefine > 10 #include <asm/ticket_spinlock.h> 11 #include <asm/qspinlock.h> 12 #include <asm/jump_label.h> 13 14 /* 15 * TODO: Use an alternative instead of a static key when we are able to parse 16 * the extensions string earlier in the boot process. 17 */ 18 DECLARE_STATIC_KEY_TRUE(qspinlock_key); 19 20 #define SPINLOCK_BASE_DECLARE(op, type, type_lock) \ 21 static __always_inline type arch_spin_##op(type_lock lock) \ 22 { \ 23 if (static_branch_unlikely(&qspinlock_key)) \ 24 return queued_spin_##op(lock); \ 25 return ticket_spin_##op(lock); \ 26 } 27 28 SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *) 29 SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *) 30 SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *) 31 SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *) 32 SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *) 33 SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t) 34 35 #elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS) 36 37 #include <asm/qspinlock.h> 38 39 #else 40 > 41 #include <asm/ticket_spinlock.h> 42
On Mon, Nov 4, 2024 at 10:05 AM kernel test robot <lkp@intel.com> wrote: > > Hi Alexandre, > > kernel test robot noticed the following build warnings: > > [auto build test WARNING on arnd-asm-generic/master] > [also build test WARNING on robh/for-next tip/locking/core linus/master v6.12-rc6] > [cannot apply to next-20241101] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Move-cpufeature-h-macros-into-their-own-header/20241103-230614 > base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master > patch link: https://lore.kernel.org/r/20241103145153.105097-14-alexghiti%40rivosinc.com > patch subject: [PATCH v6 13/13] riscv: Add qspinlock support > compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202411041609.gxjI2dsw-lkp@intel.com/ > > includecheck warnings: (new ones prefixed by >>) > >> arch/riscv/include/asm/spinlock.h: asm/ticket_spinlock.h is included more than once. > >> arch/riscv/include/asm/spinlock.h: asm/qspinlock.h is included more than once. Yes but that's in a #ifdef/#elif#else clause so nothing to do here! > > vim +10 arch/riscv/include/asm/spinlock.h > > 8 > 9 #define __no_arch_spinlock_redefine > > 10 #include <asm/ticket_spinlock.h> > 11 #include <asm/qspinlock.h> > 12 #include <asm/jump_label.h> > 13 > 14 /* > 15 * TODO: Use an alternative instead of a static key when we are able to parse > 16 * the extensions string earlier in the boot process. > 17 */ > 18 DECLARE_STATIC_KEY_TRUE(qspinlock_key); > 19 > 20 #define SPINLOCK_BASE_DECLARE(op, type, type_lock) \ > 21 static __always_inline type arch_spin_##op(type_lock lock) \ > 22 { \ > 23 if (static_branch_unlikely(&qspinlock_key)) \ > 24 return queued_spin_##op(lock); \ > 25 return ticket_spin_##op(lock); \ > 26 } > 27 > 28 SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *) > 29 SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *) > 30 SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *) > 31 SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *) > 32 SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *) > 33 SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t) > 34 > 35 #elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS) > 36 > 37 #include <asm/qspinlock.h> > 38 > 39 #else > 40 > > 41 #include <asm/ticket_spinlock.h> > 42 > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki
On Mon, Nov 04, 2024 at 10:09:07AM +0100, Alexandre Ghiti wrote: > On Mon, Nov 4, 2024 at 10:05 AM kernel test robot <lkp@intel.com> wrote: > > > > Hi Alexandre, > > > > kernel test robot noticed the following build warnings: > > > > [auto build test WARNING on arnd-asm-generic/master] > > [also build test WARNING on robh/for-next tip/locking/core linus/master v6.12-rc6] > > [cannot apply to next-20241101] > > [If your patch is applied to the wrong git tree, kindly drop us a note. > > And when submitting patch, we suggest to use '--base' as documented in > > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > > > url: https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Move-cpufeature-h-macros-into-their-own-header/20241103-230614 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master > > patch link: https://lore.kernel.org/r/20241103145153.105097-14-alexghiti%40rivosinc.com > > patch subject: [PATCH v6 13/13] riscv: Add qspinlock support > > compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <lkp@intel.com> > > | Closes: https://lore.kernel.org/oe-kbuild-all/202411041609.gxjI2dsw-lkp@intel.com/ > > > > includecheck warnings: (new ones prefixed by >>) > > >> arch/riscv/include/asm/spinlock.h: asm/ticket_spinlock.h is included more than once. > > >> arch/riscv/include/asm/spinlock.h: asm/qspinlock.h is included more than once. > > Yes but that's in a #ifdef/#elif#else clause so nothing to do here! Thanks for the info, we will fix the bot. Sorry for the false positive report. > > > > > vim +10 arch/riscv/include/asm/spinlock.h > > > > 8 > > 9 #define __no_arch_spinlock_redefine > > > 10 #include <asm/ticket_spinlock.h> > > 11 #include <asm/qspinlock.h> > > 12 #include <asm/jump_label.h> > > 13 > > 14 /* > > 15 * TODO: Use an alternative instead of a static key when we are able to parse > > 16 * the extensions string earlier in the boot process. > > 17 */ > > 18 DECLARE_STATIC_KEY_TRUE(qspinlock_key); > > 19 > > 20 #define SPINLOCK_BASE_DECLARE(op, type, type_lock) \ > > 21 static __always_inline type arch_spin_##op(type_lock lock) \ > > 22 { \ > > 23 if (static_branch_unlikely(&qspinlock_key)) \ > > 24 return queued_spin_##op(lock); \ > > 25 return ticket_spin_##op(lock); \ > > 26 } > > 27 > > 28 SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *) > > 29 SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *) > > 30 SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *) > > 31 SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *) > > 32 SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *) > > 33 SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t) > > 34 > > 35 #elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS) > > 36 > > 37 #include <asm/qspinlock.h> > > 38 > > 39 #else > > 40 > > > 41 #include <asm/ticket_spinlock.h> > > 42 > > > > -- > > 0-DAY CI Kernel Test Service > > https://github.com/intel/lkp-tests/wiki >
On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > In order to produce a generic kernel, a user can select > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > spinlock implementation if Zabha or Ziccrse are not present. > > Note that we can't use alternatives here because the discovery of > extensions is done too late and we need to start with the qspinlock > implementation because the ticket spinlock implementation would pollute > the spinlock value, so let's use static keys. I think the static key toggling takes a mutex (jump_label_lock()) which can take a spinlock (lock->wait_lock) internally, so I don't grok how this works: > +static void __init riscv_spinlock_init(void) > +{ > + char *using_ext = NULL; > + > + if (IS_ENABLED(CONFIG_RISCV_TICKET_SPINLOCKS)) { > + pr_info("Ticket spinlock: enabled\n"); > + return; > + } > + > + if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && > + IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) && > + riscv_isa_extension_available(NULL, ZABHA) && > + riscv_isa_extension_available(NULL, ZACAS)) { > + using_ext = "using Zabha"; > + } else if (riscv_isa_extension_available(NULL, ZICCRSE)) { > + using_ext = "using Ziccrse"; > + } > +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > + else { > + static_branch_disable(&qspinlock_key); > + pr_info("Ticket spinlock: enabled\n"); > + return; > + } > +#endif i.e. we've potentially already used the qspinlock at this point. Will
On Tue, Nov 12, 2024 at 12:43 AM Will Deacon <will@kernel.org> wrote: > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > In order to produce a generic kernel, a user can select > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > spinlock implementation if Zabha or Ziccrse are not present. > > > > Note that we can't use alternatives here because the discovery of > > extensions is done too late and we need to start with the qspinlock > > implementation because the ticket spinlock implementation would pollute > > the spinlock value, so let's use static keys. > > I think the static key toggling takes a mutex (jump_label_lock()) which > can take a spinlock (lock->wait_lock) internally, so I don't grok how > this works: > > > +static void __init riscv_spinlock_init(void) > > +{ > > + char *using_ext = NULL; > > + > > + if (IS_ENABLED(CONFIG_RISCV_TICKET_SPINLOCKS)) { > > + pr_info("Ticket spinlock: enabled\n"); > > + return; > > + } > > + > > + if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && > > + IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) && > > + riscv_isa_extension_available(NULL, ZABHA) && > > + riscv_isa_extension_available(NULL, ZACAS)) { > > + using_ext = "using Zabha"; > > + } else if (riscv_isa_extension_available(NULL, ZICCRSE)) { > > + using_ext = "using Ziccrse"; > > + } > > +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > + else { > > + static_branch_disable(&qspinlock_key); > > + pr_info("Ticket spinlock: enabled\n"); > > + return; > > + } > > +#endif > > i.e. we've potentially already used the qspinlock at this point. Yes, I've used qspinlock here. But riscv_spinlock_init is called with irq_disabled and smp_off. That means this qspinlock only performs a test-set lock behavior by qspinlock fast-path. The qspinlock is a clean implementation. After qspin_unlock, the lock value remains at zero, but the ticket lock makes the value dirty. So we use Qspinlock at first or change it to ticket-lock before irq & smp up. > > Will
On Tue, Nov 12, 2024 at 09:49:15AM +0800, Guo Ren wrote: > On Tue, Nov 12, 2024 at 12:43 AM Will Deacon <will@kernel.org> wrote: > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > In order to produce a generic kernel, a user can select > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > Note that we can't use alternatives here because the discovery of > > > extensions is done too late and we need to start with the qspinlock > > > implementation because the ticket spinlock implementation would pollute > > > the spinlock value, so let's use static keys. > > > > I think the static key toggling takes a mutex (jump_label_lock()) which > > can take a spinlock (lock->wait_lock) internally, so I don't grok how > > this works: > > > > > +static void __init riscv_spinlock_init(void) > > > +{ > > > + char *using_ext = NULL; > > > + > > > + if (IS_ENABLED(CONFIG_RISCV_TICKET_SPINLOCKS)) { > > > + pr_info("Ticket spinlock: enabled\n"); > > > + return; > > > + } > > > + > > > + if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && > > > + IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) && > > > + riscv_isa_extension_available(NULL, ZABHA) && > > > + riscv_isa_extension_available(NULL, ZACAS)) { > > > + using_ext = "using Zabha"; > > > + } else if (riscv_isa_extension_available(NULL, ZICCRSE)) { > > > + using_ext = "using Ziccrse"; > > > + } > > > +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > > + else { > > > + static_branch_disable(&qspinlock_key); > > > + pr_info("Ticket spinlock: enabled\n"); > > > + return; > > > + } > > > +#endif > > > > i.e. we've potentially already used the qspinlock at this point. > Yes, I've used qspinlock here. But riscv_spinlock_init is called with > irq_disabled and smp_off. That means this qspinlock only performs a > test-set lock behavior by qspinlock fast-path. That's... horrendous. Will
On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > In order to produce a generic kernel, a user can select > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > spinlock implementation if Zabha or Ziccrse are not present. > > Note that we can't use alternatives here because the discovery of > extensions is done too late and we need to start with the qspinlock > implementation because the ticket spinlock implementation would pollute > the spinlock value, so let's use static keys. > > This is largely based on Guo's work and Leonardo reviews at [1]. > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > Signed-off-by: Guo Ren <guoren@kernel.org> > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) breaks boot on polarfire soc. It dies before outputting anything to the console. My .config has: # CONFIG_RISCV_TICKET_SPINLOCKS is not set # CONFIG_RISCV_QUEUED_SPINLOCKS is not set CONFIG_RISCV_COMBO_SPINLOCKS=y
On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > In order to produce a generic kernel, a user can select > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > spinlock implementation if Zabha or Ziccrse are not present. > > > > Note that we can't use alternatives here because the discovery of > > extensions is done too late and we need to start with the qspinlock > > implementation because the ticket spinlock implementation would pollute > > the spinlock value, so let's use static keys. > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > Signed-off-by: Guo Ren <guoren@kernel.org> > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > breaks boot on polarfire soc. It dies before outputting anything to the > console. My .config has: > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > CONFIG_RISCV_COMBO_SPINLOCKS=y I pointed out some of the fragility during review: https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ so I'm kinda surprised it got merged tbh :/ Will
On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > In order to produce a generic kernel, a user can select > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > Note that we can't use alternatives here because the discovery of > > > extensions is done too late and we need to start with the qspinlock > > > implementation because the ticket spinlock implementation would pollute > > > the spinlock value, so let's use static keys. > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > breaks boot on polarfire soc. It dies before outputting anything to the > > console. My .config has: > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > I pointed out some of the fragility during review: > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > so I'm kinda surprised it got merged tbh :/ Maybe it could be reverted rather than having a broken boot with the default settings in -rc1.
On 28/11/2024 15:14, Conor Dooley wrote: > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: >> On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: >>> On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: >>>> In order to produce a generic kernel, a user can select >>>> CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket >>>> spinlock implementation if Zabha or Ziccrse are not present. >>>> >>>> Note that we can't use alternatives here because the discovery of >>>> extensions is done too late and we need to start with the qspinlock >>>> implementation because the ticket spinlock implementation would pollute >>>> the spinlock value, so let's use static keys. >>>> >>>> This is largely based on Guo's work and Leonardo reviews at [1]. >>>> >>>> Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] >>>> Signed-off-by: Guo Ren <guoren@kernel.org> >>>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> >>> This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) >>> breaks boot on polarfire soc. It dies before outputting anything to the >>> console. My .config has: >>> >>> # CONFIG_RISCV_TICKET_SPINLOCKS is not set >>> # CONFIG_RISCV_QUEUED_SPINLOCKS is not set >>> CONFIG_RISCV_COMBO_SPINLOCKS=y >> I pointed out some of the fragility during review: >> >> https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ >> >> so I'm kinda surprised it got merged tbh :/ > Maybe it could be reverted rather than having a broken boot with the > default settings in -rc1. No need to rush before we know what's happening,I guess you bisected to this commit right? I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a kernel, anything? Does the polarfire soc provide Ziccrse? > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > On 28/11/2024 15:14, Conor Dooley wrote: > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > In order to produce a generic kernel, a user can select > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > extensions is done too late and we need to start with the qspinlock > > > > > implementation because the ticket spinlock implementation would pollute > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > console. My .config has: > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > I pointed out some of the fragility during review: > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > so I'm kinda surprised it got merged tbh :/ > > Maybe it could be reverted rather than having a broken boot with the > > default settings in -rc1. > > > No need to rush before we know what's happening,I guess you bisected to this > commit right? The symptom is a failure to boot, without any console output, of course I bisected it before blaming something specific. But I don't think it is "rushing" as having -rc1 broken with an option's default is a massive pain in the arse when it comes to testing. > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > kernel, anything? I don't have the former cos it died immediately on boot. config is attached. It reproduces in QEMU so you don't need any hardware. > Does the polarfire soc provide Ziccrse? I don't think that is relevant because ziccrse is not listed in the dts, so the kernel should not be assuming that LR/SC has a forward progress guarantee. It's not even getting as far as riscv_spinlock_init() given several things before that should be emitting logs, so it doesn't even get to make any decisions about Ziccrse. CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_GENERIC_IRQ_DEBUGFS=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_BPF_SYSCALL=y CONFIG_PREEMPT_RT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_CGROUPS=y CONFIG_CGROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y CONFIG_CGROUP_BPF=y CONFIG_NAMESPACES=y CONFIG_USER_NS=y CONFIG_CHECKPOINT_RESTORE=y CONFIG_BLK_DEV_INITRD=y CONFIG_EXPERT=y # CONFIG_SYSFS_SYSCALL is not set CONFIG_PROFILING=y CONFIG_KEXEC=y CONFIG_KEXEC_FILE=y CONFIG_ARCH_MICROCHIP=y CONFIG_ARCH_RENESAS=y CONFIG_ARCH_SOPHGO=y CONFIG_SOC_STARFIVE=y CONFIG_ARCH_THEAD=y CONFIG_ARCH_VIRT=y CONFIG_NONPORTABLE=y CONFIG_SMP=y CONFIG_RANDOMIZE_BASE=y CONFIG_CMDLINE="earlycon keep_bootcon reboot=cold" CONFIG_HIBERNATION=y CONFIG_CPU_FREQ=y CONFIG_VIRTUALIZATION=y CONFIG_KVM=m CONFIG_ACPI=y CONFIG_JUMP_LABEL=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_SPARSEMEM_MANUAL=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y CONFIG_IP_PNP_BOOTP=y CONFIG_IP_PNP_RARP=y CONFIG_NETLINK_DIAG=y CONFIG_NET_9P=y CONFIG_NET_9P_VIRTIO=y CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_HOST_GENERIC=y CONFIG_PCIE_XILINX=y CONFIG_PCIE_MICROCHIP_HOST=y CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_FW_LOADER_USER_HELPER=y CONFIG_DEBUG_DRIVER=y CONFIG_AX45MP_L2_CACHE=y CONFIG_SIFIVE_CCACHE=y CONFIG_EFI_ZBOOT=y CONFIG_MTD=y CONFIG_MTD_CMDLINE_PARTS=y CONFIG_MTD_CFI=y CONFIG_MTD_JEDECPROBE=y CONFIG_MTD_SPI_NAND=y CONFIG_MTD_SPI_NOR=y # CONFIG_MTD_SPI_NOR_USE_4K_SECTORS is not set CONFIG_OF_OVERLAY=y CONFIG_ZRAM=y CONFIG_ZRAM_MEMORY_TRACKING=y CONFIG_BLK_DEV_LOOP=y CONFIG_VIRTIO_BLK=y CONFIG_BLK_DEV_NVME=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_SCSI_VIRTIO=y CONFIG_ATA=y CONFIG_SATA_AHCI=y CONFIG_SATA_AHCI_PLATFORM=y CONFIG_NETDEVICES=y CONFIG_VIRTIO_NET=y CONFIG_MACB=y CONFIG_E1000E=y CONFIG_R8169=y CONFIG_MICROSEMI_PHY=y CONFIG_VITESSE_PHY=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_JOYDEV=m CONFIG_INPUT_JOYSTICK=y CONFIG_JOYSTICK_SENSEHAT=m CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_OF_PLATFORM=y CONFIG_VIRTIO_CONSOLE=y CONFIG_HW_RANDOM=y CONFIG_HW_RANDOM_VIRTIO=y CONFIG_HW_RANDOM_POLARFIRE_SOC=y CONFIG_I2C=y CONFIG_I2C_CHARDEV=y CONFIG_I2C_MICROCHIP_CORE=y CONFIG_SPI=y CONFIG_SPI_MICROCHIP_CORE=y CONFIG_SPI_MICROCHIP_CORE_QSPI=y CONFIG_SPI_SIFIVE=y # CONFIG_PTP_1588_CLOCK is not set CONFIG_GPIO_SYSFS=y CONFIG_GPIO_POLARFIRE_SOC=y CONFIG_GPIO_SIFIVE=y CONFIG_AUXDISPLAY=y CONFIG_FB=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_SOUND=y CONFIG_SND=y CONFIG_SND_SOC=y CONFIG_SND_SOC_MAX9867=y CONFIG_USB=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_XHCI_PLATFORM=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_HCD_PLATFORM=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_OHCI_HCD_PLATFORM=y CONFIG_USB_STORAGE=y CONFIG_USB_UAS=y CONFIG_USB_MUSB_HDRC=y CONFIG_USB_MUSB_POLARFIRE_SOC=y CONFIG_USB_INVENTRA_DMA=y CONFIG_NOP_USB_XCEIV=y CONFIG_MMC=y CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PLTFM=y CONFIG_MMC_SDHCI_CADENCE=y CONFIG_MMC_SPI=y CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_LEDS_GPIO=y CONFIG_LEDS_PWM=y CONFIG_RTC_CLASS=y CONFIG_RTC_DRV_PCF2123=y CONFIG_DMADEVICES=y CONFIG_SF_PDMA=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_BALLOON=y CONFIG_VIRTIO_INPUT=y CONFIG_VIRTIO_MMIO=y CONFIG_MAILBOX=y CONFIG_POLARFIRE_SOC_MAILBOX=y CONFIG_RPMSG_CHAR=y CONFIG_RPMSG_CTRL=y CONFIG_RPMSG_VIRTIO=y CONFIG_POLARFIRE_SOC_SYS_CTRL=y CONFIG_IIO=y CONFIG_ADXL345_SPI=y CONFIG_MCP320X=y CONFIG_MCP3564=y CONFIG_PAC1934=y CONFIG_SD_ADC_MODULATOR=y CONFIG_HTS221=m CONFIG_IIO_ST_LSM6DSX=m CONFIG_IIO_ST_MAGN_3AXIS=m CONFIG_IIO_ST_PRESS=m CONFIG_PWM=y CONFIG_PWM_DEBUG=y CONFIG_PWM_MICROCHIP_CORE=y CONFIG_LIBNVDIMM=y CONFIG_FPGA=y CONFIG_FPGA_BRIDGE=y CONFIG_FPGA_REGION=y CONFIG_FPGA_MGR_MICROCHIP_SPI=y CONFIG_EXT4_FS=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_HUGETLBFS=y CONFIG_NFS_FS=y CONFIG_NFS_V3_ACL=y CONFIG_NFS_V4=y CONFIG_NFS_V4_1=y CONFIG_NFS_V4_2=y CONFIG_NFS_V4_1_MIGRATION=y CONFIG_ROOT_NFS=y CONFIG_9P_FS=y CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_ISO8859_1=m CONFIG_CRYPTO_DEFLATE=y CONFIG_CRYPTO_ZSTD=y CONFIG_CRYPTO_USER_API_HASH=y CONFIG_CRYPTO_DEV_VIRTIO=y CONFIG_PRINTK_TIME=y CONFIG_DYNAMIC_DEBUG=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_PAGEALLOC=y CONFIG_SCHED_STACK_END_CHECK=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_WQ_WATCHDOG=y CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_PLIST=y CONFIG_DEBUG_SG=y # CONFIG_RCU_TRACE is not set CONFIG_RCU_EQS_DEBUG=y CONFIG_SAMPLES=y CONFIG_MEMTEST=y
On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > On 28/11/2024 15:14, Conor Dooley wrote: > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > In order to produce a generic kernel, a user can select > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > console. My .config has: > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > I pointed out some of the fragility during review: > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > Maybe it could be reverted rather than having a broken boot with the > > > default settings in -rc1. > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > commit right? > > The symptom is a failure to boot, without any console output, of course > I bisected it before blaming something specific. But I don't think it is > "rushing" as having -rc1 broken with an option's default is a massive pain > in the arse when it comes to testing. > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > kernel, anything? > > I don't have the former cos it died immediately on boot. config is > attached. It reproduces in QEMU so you don't need any hardware. If QEMU could reproduce, could you provide a dmesg by the below method? Qemu cmd append: -s -S ref: https://qemu-project.gitlab.io/qemu/system/gdb.html Connect gdb and in console: 1. file vmlinux 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt 3. dmesg Then, we could get the kernel's early boot logs from memory. > > > Does the polarfire soc provide Ziccrse? > > I don't think that is relevant because ziccrse is not listed in the dts, > so the kernel should not be assuming that LR/SC has a forward progress > guarantee. It's not even getting as far as riscv_spinlock_init() given > several things before that should be emitting logs, so it doesn't even > get to make any decisions about Ziccrse.
On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > console. My .config has: > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > I pointed out some of the fragility during review: > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > Maybe it could be reverted rather than having a broken boot with the > > > > default settings in -rc1. > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > commit right? > > > > The symptom is a failure to boot, without any console output, of course > > I bisected it before blaming something specific. But I don't think it is > > "rushing" as having -rc1 broken with an option's default is a massive pain > > in the arse when it comes to testing. > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > kernel, anything? > > > > I don't have the former cos it died immediately on boot. config is > > attached. It reproduces in QEMU so you don't need any hardware. > If QEMU could reproduce, could you provide a dmesg by the below method? > > Qemu cmd append: -s -S > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > Connect gdb and in console: > 1. file vmlinux > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > 3. dmesg > > Then, we could get the kernel's early boot logs from memory. I've reproduced it on qemu, thx for the config. Reading symbols from ../build-rv64lp64/vmlinux... (gdb) tar rem:1234 Remote debugging using :1234 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); (gdb) bt #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 #5 0xffffffff80b27c54 in rt_mutex_slowtrylock (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 #6 0xffffffff80b295ea in rt_mutex_try_acquire (lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, subclass=subclass@entry=0) at /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 #10 0xffffffff80010682 in arch_jump_label_transform_queue (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 #11 0xffffffff801d86b2 in __jump_label_update (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 <__tracepoint_ptr_initcall_finish>, init=init@entry=true) at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 #12 0xffffffff801d890c in jump_label_update (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at /home/guoren/source/kernel/linux/kernel/jump_label.c:920 #13 0xffffffff801d8be8 in static_key_disable_cpuslocked (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at /home/guoren/source/kernel/linux/kernel/jump_label.c:240 #14 0xffffffff801d8c04 in static_key_disable (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at /home/guoren/source/kernel/linux/kernel/jump_label.c:248 #15 0xffffffff80c04a1a in riscv_spinlock_init () at /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 #17 0xffffffff80c007a2 in start_kernel () at /home/guoren/source/kernel/linux/init/main.c:922 #18 0xffffffff80001164 in _start_kernel () Backtrace stopped: frame did not save the PC (gdb) p /x lock $1 = 0xffffffff81b9a5b8 (gdb) p /x *lock $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, {locked_pending = 0x0, tail = 0x2}}} > > > > > > Does the polarfire soc provide Ziccrse? > > > > I don't think that is relevant because ziccrse is not listed in the dts, > > so the kernel should not be assuming that LR/SC has a forward progress > > guarantee. It's not even getting as far as riscv_spinlock_init() given > > several things before that should be emitting logs, so it doesn't even > > get to make any decisions about Ziccrse. > > > > -- > Best Regards > Guo Ren
Hi Conor & Alexandre, On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > console. My .config has: > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > default settings in -rc1. > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > commit right? > > > > > > The symptom is a failure to boot, without any console output, of course > > > I bisected it before blaming something specific. But I don't think it is > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > in the arse when it comes to testing. > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > kernel, anything? > > > > > > I don't have the former cos it died immediately on boot. config is > > > attached. It reproduces in QEMU so you don't need any hardware. > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > Qemu cmd append: -s -S > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > Connect gdb and in console: > > 1. file vmlinux > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > 3. dmesg > > > > Then, we could get the kernel's early boot logs from memory. > I've reproduced it on qemu, thx for the config. > > Reading symbols from ../build-rv64lp64/vmlinux... > (gdb) tar rem:1234 > Remote debugging using :1234 > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > (gdb) bt > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > <text_mutex>) at > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > #6 0xffffffff80b295ea in rt_mutex_try_acquire > (lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > subclass=subclass@entry=0) at > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > #10 0xffffffff80010682 in arch_jump_label_transform_queue > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > #11 0xffffffff801d86b2 in __jump_label_update > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > #12 0xffffffff801d890c in jump_label_update > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > #14 0xffffffff801d8c04 in static_key_disable > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > #17 0xffffffff80c007a2 in start_kernel () at > /home/guoren/source/kernel/linux/init/main.c:922 > #18 0xffffffff80001164 in _start_kernel () > Backtrace stopped: frame did not save the PC > (gdb) p /x lock > $1 = 0xffffffff81b9a5b8 > (gdb) p /x *lock > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > {locked_pending = 0x0, tail = 0x2}}} I have for you here a fast fixup for reference. (PS: I'm digging into the root cause mentioned by Will Deacon.) diff --git a/arch/riscv/include/asm/text-patching.h b/arch/riscv/include/asm/text-patching.h index 7228e266b9a1..0439609f1cff 100644 --- a/arch/riscv/include/asm/text-patching.h +++ b/arch/riscv/include/asm/text-patching.h @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); int patch_text(void *addr, u32 *insns, size_t len); extern int riscv_patch_in_stop_machine; +extern int riscv_patch_in_spinlock_init; #endif /* _ASM_RISCV_PATCH_H */ diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c index 6eee6f736f68..d9a5a5c1933d 100644 --- a/arch/riscv/kernel/jump_label.c +++ b/arch/riscv/kernel/jump_label.c @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, insn = RISCV_INSN_NOP; } - mutex_lock(&text_mutex); + if (!riscv_patch_in_spinlock_init) + mutex_lock(&text_mutex); patch_insn_write(addr, &insn, sizeof(insn)); - mutex_unlock(&text_mutex); + if (!riscv_patch_in_spinlock_init) + mutex_unlock(&text_mutex); return true; } diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c index db13c9ddf9e3..ab009cf855c2 100644 --- a/arch/riscv/kernel/patch.c +++ b/arch/riscv/kernel/patch.c @@ -24,6 +24,7 @@ struct patch_insn { }; int riscv_patch_in_stop_machine = false; +int riscv_patch_in_spinlock_init = false; #ifdef CONFIG_MMU @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const void *insn, size_t len) * safe but triggers a lockdep failure, so just elide it for that * specific case. */ - if (!riscv_patch_in_stop_machine) + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) lockdep_assert_held(&text_mutex); preempt_disable(); diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 016b48fcd6f2..87ddf1702be4 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) } #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) else { + riscv_patch_in_spinlock_init = 1; static_branch_disable(&qspinlock_key); + riscv_patch_in_spinlock_init = 0; pr_info("Ticket spinlock: enabled\n"); return; }
Hi everyone, On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > Hi Conor & Alexandre, > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > commit right? > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > I bisected it before blaming something specific. But I don't think it is > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > in the arse when it comes to testing. > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > kernel, anything? > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > Qemu cmd append: -s -S > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > Connect gdb and in console: > > > 1. file vmlinux > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > 3. dmesg > > > > > > Then, we could get the kernel's early boot logs from memory. > > I've reproduced it on qemu, thx for the config. > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > (gdb) tar rem:1234 > > Remote debugging using :1234 > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > (gdb) bt > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > <text_mutex>) at > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > subclass=subclass@entry=0) at > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > #11 0xffffffff801d86b2 in __jump_label_update > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > #12 0xffffffff801d890c in jump_label_update > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > #14 0xffffffff801d8c04 in static_key_disable > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > #17 0xffffffff80c007a2 in start_kernel () at > > /home/guoren/source/kernel/linux/init/main.c:922 > > #18 0xffffffff80001164 in _start_kernel () > > Backtrace stopped: frame did not save the PC > > (gdb) p /x lock > > $1 = 0xffffffff81b9a5b8 > > (gdb) p /x *lock > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > {locked_pending = 0x0, tail = 0x2}}} > > I have for you here a fast fixup for reference. (PS: I'm digging into > the root cause mentioned by Will Deacon.) > > diff --git a/arch/riscv/include/asm/text-patching.h > b/arch/riscv/include/asm/text-patching.h > index 7228e266b9a1..0439609f1cff 100644 > --- a/arch/riscv/include/asm/text-patching.h > +++ b/arch/riscv/include/asm/text-patching.h > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > int patch_text(void *addr, u32 *insns, size_t len); > > extern int riscv_patch_in_stop_machine; > +extern int riscv_patch_in_spinlock_init; > > #endif /* _ASM_RISCV_PATCH_H */ > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > index 6eee6f736f68..d9a5a5c1933d 100644 > --- a/arch/riscv/kernel/jump_label.c > +++ b/arch/riscv/kernel/jump_label.c > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > jump_entry *entry, > insn = RISCV_INSN_NOP; > } > > - mutex_lock(&text_mutex); > + if (!riscv_patch_in_spinlock_init) > + mutex_lock(&text_mutex); > patch_insn_write(addr, &insn, sizeof(insn)); > - mutex_unlock(&text_mutex); > + if (!riscv_patch_in_spinlock_init) > + mutex_unlock(&text_mutex); > > return true; > } > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > index db13c9ddf9e3..ab009cf855c2 100644 > --- a/arch/riscv/kernel/patch.c > +++ b/arch/riscv/kernel/patch.c > @@ -24,6 +24,7 @@ struct patch_insn { > }; > > int riscv_patch_in_stop_machine = false; > +int riscv_patch_in_spinlock_init = false; > > #ifdef CONFIG_MMU > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > void *insn, size_t len) > * safe but triggers a lockdep failure, so just elide it for that > * specific case. > */ > - if (!riscv_patch_in_stop_machine) > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > lockdep_assert_held(&text_mutex); > > preempt_disable(); > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index 016b48fcd6f2..87ddf1702be4 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > } > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > else { > + riscv_patch_in_spinlock_init = 1; > static_branch_disable(&qspinlock_key); > + riscv_patch_in_spinlock_init = 0; > pr_info("Ticket spinlock: enabled\n"); > return; > } > > > > -- > Best Regards > Guo Ren Thanks Guo for looking into this. Your solution is not very pretty but I don't have anything better :/ Unless introducing a static_branch_XXX_nolock() API? I gave it a try and it fixes the issue, but not sure this will be accepted. The thing is the usage of static branches is temporary, we'll use alternatives when I finish working on getting the extensions very early from the ACPI tables (I have a poc that works, just needs some cleaning). So let's say that I make this early extension parsing my priority for 6.14, can we live with Guo's hack in this release? Or should we revert this commit? Thanks, Alex
On Fri, Nov 29, 2024 at 11:31:44AM +0100, Alexandre Ghiti wrote: > Hi everyone, > > On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > > > Hi Conor & Alexandre, > > > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > > commit right? > > > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > > I bisected it before blaming something specific. But I don't think it is > > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > > in the arse when it comes to testing. > > > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > > kernel, anything? > > > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > > > Qemu cmd append: -s -S > > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > > > Connect gdb and in console: > > > > 1. file vmlinux > > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > > 3. dmesg > > > > > > > > Then, we could get the kernel's early boot logs from memory. > > > I've reproduced it on qemu, thx for the config. > > > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > > (gdb) tar rem:1234 > > > Remote debugging using :1234 > > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > > (gdb) bt > > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > > <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > > subclass=subclass@entry=0) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > > #11 0xffffffff801d86b2 in __jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > > #12 0xffffffff801d890c in jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > > #14 0xffffffff801d8c04 in static_key_disable > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > > #17 0xffffffff80c007a2 in start_kernel () at > > > /home/guoren/source/kernel/linux/init/main.c:922 > > > #18 0xffffffff80001164 in _start_kernel () > > > Backtrace stopped: frame did not save the PC > > > (gdb) p /x lock > > > $1 = 0xffffffff81b9a5b8 > > > (gdb) p /x *lock > > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > > {locked_pending = 0x0, tail = 0x2}}} > > > > I have for you here a fast fixup for reference. (PS: I'm digging into > > the root cause mentioned by Will Deacon.) > > > > diff --git a/arch/riscv/include/asm/text-patching.h > > b/arch/riscv/include/asm/text-patching.h > > index 7228e266b9a1..0439609f1cff 100644 > > --- a/arch/riscv/include/asm/text-patching.h > > +++ b/arch/riscv/include/asm/text-patching.h > > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > > int patch_text(void *addr, u32 *insns, size_t len); > > > > extern int riscv_patch_in_stop_machine; > > +extern int riscv_patch_in_spinlock_init; > > > > #endif /* _ASM_RISCV_PATCH_H */ > > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > > index 6eee6f736f68..d9a5a5c1933d 100644 > > --- a/arch/riscv/kernel/jump_label.c > > +++ b/arch/riscv/kernel/jump_label.c > > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > > jump_entry *entry, > > insn = RISCV_INSN_NOP; > > } > > > > - mutex_lock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_lock(&text_mutex); > > patch_insn_write(addr, &insn, sizeof(insn)); > > - mutex_unlock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_unlock(&text_mutex); > > > > return true; > > } > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index db13c9ddf9e3..ab009cf855c2 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -24,6 +24,7 @@ struct patch_insn { > > }; > > > > int riscv_patch_in_stop_machine = false; > > +int riscv_patch_in_spinlock_init = false; > > > > #ifdef CONFIG_MMU > > > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > > void *insn, size_t len) > > * safe but triggers a lockdep failure, so just elide it for that > > * specific case. > > */ > > - if (!riscv_patch_in_stop_machine) > > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > > lockdep_assert_held(&text_mutex); > > > > preempt_disable(); > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > > index 016b48fcd6f2..87ddf1702be4 100644 > > --- a/arch/riscv/kernel/setup.c > > +++ b/arch/riscv/kernel/setup.c > > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > > } > > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > else { > > + riscv_patch_in_spinlock_init = 1; > > static_branch_disable(&qspinlock_key); > > + riscv_patch_in_spinlock_init = 0; > > pr_info("Ticket spinlock: enabled\n"); > > return; > > } > > > > > > > > -- > > Best Regards > > Guo Ren > > Thanks Guo for looking into this. > > Your solution is not very pretty but I don't have anything better :/ > Unless introducing a static_branch_XXX_nolock() API? I gave it a try > and it fixes the issue, but not sure this will be accepted. > > The thing is the usage of static branches is temporary, we'll use > alternatives when I finish working on getting the extensions very > early from the ACPI tables (I have a poc that works, just needs some > cleaning). > > So let's say that I make this early extension parsing my priority for > 6.14, can we live with Guo's hack in this release? Or should we revert > this commit? I tried this diff, and it doesn't actually fix the problem - either in QEMU or in hardware. I'll do some more poking.
On Fri, Nov 29, 2024 at 11:18:24AM +0000, Conor Dooley wrote: > On Fri, Nov 29, 2024 at 11:31:44AM +0100, Alexandre Ghiti wrote: > > Hi everyone, > > > > On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > Hi Conor & Alexandre, > > > > > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > > > commit right? > > > > > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > > > I bisected it before blaming something specific. But I don't think it is > > > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > > > in the arse when it comes to testing. > > > > > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > > > kernel, anything? > > > > > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > > > > > Qemu cmd append: -s -S > > > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > > > > > Connect gdb and in console: > > > > > 1. file vmlinux > > > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > > > 3. dmesg > > > > > > > > > > Then, we could get the kernel's early boot logs from memory. > > > > I've reproduced it on qemu, thx for the config. > > > > > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > > > (gdb) tar rem:1234 > > > > Remote debugging using :1234 > > > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > > > (gdb) bt > > > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > > > <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > > > subclass=subclass@entry=0) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > > > #11 0xffffffff801d86b2 in __jump_label_update > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > > > #12 0xffffffff801d890c in jump_label_update > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > > > #14 0xffffffff801d8c04 in static_key_disable > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > > > #17 0xffffffff80c007a2 in start_kernel () at > > > > /home/guoren/source/kernel/linux/init/main.c:922 > > > > #18 0xffffffff80001164 in _start_kernel () > > > > Backtrace stopped: frame did not save the PC > > > > (gdb) p /x lock > > > > $1 = 0xffffffff81b9a5b8 > > > > (gdb) p /x *lock > > > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > > > {locked_pending = 0x0, tail = 0x2}}} > > > > > > I have for you here a fast fixup for reference. (PS: I'm digging into > > > the root cause mentioned by Will Deacon.) > > > > > > diff --git a/arch/riscv/include/asm/text-patching.h > > > b/arch/riscv/include/asm/text-patching.h > > > index 7228e266b9a1..0439609f1cff 100644 > > > --- a/arch/riscv/include/asm/text-patching.h > > > +++ b/arch/riscv/include/asm/text-patching.h > > > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > > > int patch_text(void *addr, u32 *insns, size_t len); > > > > > > extern int riscv_patch_in_stop_machine; > > > +extern int riscv_patch_in_spinlock_init; > > > > > > #endif /* _ASM_RISCV_PATCH_H */ > > > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > > > index 6eee6f736f68..d9a5a5c1933d 100644 > > > --- a/arch/riscv/kernel/jump_label.c > > > +++ b/arch/riscv/kernel/jump_label.c > > > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > > > jump_entry *entry, > > > insn = RISCV_INSN_NOP; > > > } > > > > > > - mutex_lock(&text_mutex); > > > + if (!riscv_patch_in_spinlock_init) > > > + mutex_lock(&text_mutex); > > > patch_insn_write(addr, &insn, sizeof(insn)); > > > - mutex_unlock(&text_mutex); > > > + if (!riscv_patch_in_spinlock_init) > > > + mutex_unlock(&text_mutex); > > > > > > return true; > > > } > > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > > index db13c9ddf9e3..ab009cf855c2 100644 > > > --- a/arch/riscv/kernel/patch.c > > > +++ b/arch/riscv/kernel/patch.c > > > @@ -24,6 +24,7 @@ struct patch_insn { > > > }; > > > > > > int riscv_patch_in_stop_machine = false; > > > +int riscv_patch_in_spinlock_init = false; > > > > > > #ifdef CONFIG_MMU > > > > > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > > > void *insn, size_t len) > > > * safe but triggers a lockdep failure, so just elide it for that > > > * specific case. > > > */ > > > - if (!riscv_patch_in_stop_machine) > > > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > > > lockdep_assert_held(&text_mutex); > > > > > > preempt_disable(); > > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > > > index 016b48fcd6f2..87ddf1702be4 100644 > > > --- a/arch/riscv/kernel/setup.c > > > +++ b/arch/riscv/kernel/setup.c > > > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > > > } > > > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > > else { > > > + riscv_patch_in_spinlock_init = 1; > > > static_branch_disable(&qspinlock_key); > > > + riscv_patch_in_spinlock_init = 0; > > > pr_info("Ticket spinlock: enabled\n"); > > > return; > > > } > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > Thanks Guo for looking into this. > > > > Your solution is not very pretty but I don't have anything better :/ > > Unless introducing a static_branch_XXX_nolock() API? I gave it a try > > and it fixes the issue, but not sure this will be accepted. > > > > The thing is the usage of static branches is temporary, we'll use > > alternatives when I finish working on getting the extensions very > > early from the ACPI tables (I have a poc that works, just needs some > > cleaning). > > > > So let's say that I make this early extension parsing my priority for > > 6.14, can we live with Guo's hack in this release? Or should we revert > > this commit? > > I tried this diff, and it doesn't actually fix the problem - either in > QEMU or in hardware. I'll do some more poking. Looks like it might have been my fault, typo while hand-applying the diff pasted above cos it was corrupted :/ Rebuilding..
On Fri, Nov 29, 2024 at 11:43:44AM +0000, Conor Dooley wrote: > On Fri, Nov 29, 2024 at 11:18:24AM +0000, Conor Dooley wrote: > > > > I tried this diff, and it doesn't actually fix the problem - either in > > QEMU or in hardware. I'll do some more poking. > > Looks like it might have been my fault, typo while hand-applying the > diff pasted above cos it was corrupted :/ Rebuilding.. sans-typo, the diff does make it boot, my bad.
On Fri, Nov 29, 2024 at 6:32 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > Hi everyone, > > On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > > > Hi Conor & Alexandre, > > > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > > commit right? > > > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > > I bisected it before blaming something specific. But I don't think it is > > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > > in the arse when it comes to testing. > > > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > > kernel, anything? > > > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > > > Qemu cmd append: -s -S > > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > > > Connect gdb and in console: > > > > 1. file vmlinux > > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > > 3. dmesg > > > > > > > > Then, we could get the kernel's early boot logs from memory. > > > I've reproduced it on qemu, thx for the config. > > > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > > (gdb) tar rem:1234 > > > Remote debugging using :1234 > > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > > (gdb) bt > > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > > <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > > subclass=subclass@entry=0) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > > #11 0xffffffff801d86b2 in __jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > > #12 0xffffffff801d890c in jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > > #14 0xffffffff801d8c04 in static_key_disable > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > > #17 0xffffffff80c007a2 in start_kernel () at > > > /home/guoren/source/kernel/linux/init/main.c:922 > > > #18 0xffffffff80001164 in _start_kernel () > > > Backtrace stopped: frame did not save the PC > > > (gdb) p /x lock > > > $1 = 0xffffffff81b9a5b8 > > > (gdb) p /x *lock > > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > > {locked_pending = 0x0, tail = 0x2}}} > > > > I have for you here a fast fixup for reference. (PS: I'm digging into > > the root cause mentioned by Will Deacon.) > > > > diff --git a/arch/riscv/include/asm/text-patching.h > > b/arch/riscv/include/asm/text-patching.h > > index 7228e266b9a1..0439609f1cff 100644 > > --- a/arch/riscv/include/asm/text-patching.h > > +++ b/arch/riscv/include/asm/text-patching.h > > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > > int patch_text(void *addr, u32 *insns, size_t len); > > > > extern int riscv_patch_in_stop_machine; > > +extern int riscv_patch_in_spinlock_init; > > > > #endif /* _ASM_RISCV_PATCH_H */ > > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > > index 6eee6f736f68..d9a5a5c1933d 100644 > > --- a/arch/riscv/kernel/jump_label.c > > +++ b/arch/riscv/kernel/jump_label.c > > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > > jump_entry *entry, > > insn = RISCV_INSN_NOP; > > } > > > > - mutex_lock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_lock(&text_mutex); > > patch_insn_write(addr, &insn, sizeof(insn)); > > - mutex_unlock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_unlock(&text_mutex); > > > > return true; > > } > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index db13c9ddf9e3..ab009cf855c2 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -24,6 +24,7 @@ struct patch_insn { > > }; > > > > int riscv_patch_in_stop_machine = false; > > +int riscv_patch_in_spinlock_init = false; > > > > #ifdef CONFIG_MMU > > > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > > void *insn, size_t len) > > * safe but triggers a lockdep failure, so just elide it for that > > * specific case. > > */ > > - if (!riscv_patch_in_stop_machine) > > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > > lockdep_assert_held(&text_mutex); > > > > preempt_disable(); > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > > index 016b48fcd6f2..87ddf1702be4 100644 > > --- a/arch/riscv/kernel/setup.c > > +++ b/arch/riscv/kernel/setup.c > > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > > } > > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > else { > > + riscv_patch_in_spinlock_init = 1; > > static_branch_disable(&qspinlock_key); > > + riscv_patch_in_spinlock_init = 0; > > pr_info("Ticket spinlock: enabled\n"); > > return; > > } > > > > > > > > -- > > Best Regards > > Guo Ren > > Thanks Guo for looking into this. > > Your solution is not very pretty but I don't have anything better :/ > Unless introducing a static_branch_XXX_nolock() API? I gave it a try > and it fixes the issue, but not sure this will be accepted. > > The thing is the usage of static branches is temporary, we'll use > alternatives when I finish working on getting the extensions very > early from the ACPI tables (I have a poc that works, just needs some > cleaning). > > So let's say that I make this early extension parsing my priority for > 6.14, can we live with Guo's hack in this release? Or should we revert > this commit? I almost get the root cause. Please give me a while. > > Thanks, > > Alex -- Best Regards Guo Ren
Hi Alexandre & Conor On Fri, Nov 29, 2024 at 8:50 PM Guo Ren <guoren@kernel.org> wrote: > > On Fri, Nov 29, 2024 at 6:32 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > > > Hi everyone, > > > > On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > Hi Conor & Alexandre, > > > > > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > > > commit right? > > > > > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > > > I bisected it before blaming something specific. But I don't think it is > > > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > > > in the arse when it comes to testing. > > > > > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > > > kernel, anything? > > > > > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > > > > > Qemu cmd append: -s -S > > > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > > > > > Connect gdb and in console: > > > > > 1. file vmlinux > > > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > > > 3. dmesg > > > > > > > > > > Then, we could get the kernel's early boot logs from memory. > > > > I've reproduced it on qemu, thx for the config. > > > > > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > > > (gdb) tar rem:1234 > > > > Remote debugging using :1234 > > > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > > > (gdb) bt > > > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > > > <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > > > subclass=subclass@entry=0) at > > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > > > #11 0xffffffff801d86b2 in __jump_label_update > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > > > #12 0xffffffff801d890c in jump_label_update > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > > > #14 0xffffffff801d8c04 in static_key_disable > > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > > > #17 0xffffffff80c007a2 in start_kernel () at > > > > /home/guoren/source/kernel/linux/init/main.c:922 > > > > #18 0xffffffff80001164 in _start_kernel () > > > > Backtrace stopped: frame did not save the PC > > > > (gdb) p /x lock > > > > $1 = 0xffffffff81b9a5b8 > > > > (gdb) p /x *lock > > > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > > > {locked_pending = 0x0, tail = 0x2}}} > > > > > > I have for you here a fast fixup for reference. (PS: I'm digging into > > > the root cause mentioned by Will Deacon.) > > > > > > diff --git a/arch/riscv/include/asm/text-patching.h > > > b/arch/riscv/include/asm/text-patching.h > > > index 7228e266b9a1..0439609f1cff 100644 > > > --- a/arch/riscv/include/asm/text-patching.h > > > +++ b/arch/riscv/include/asm/text-patching.h > > > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > > > int patch_text(void *addr, u32 *insns, size_t len); > > > > > > extern int riscv_patch_in_stop_machine; > > > +extern int riscv_patch_in_spinlock_init; > > > > > > #endif /* _ASM_RISCV_PATCH_H */ > > > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > > > index 6eee6f736f68..d9a5a5c1933d 100644 > > > --- a/arch/riscv/kernel/jump_label.c > > > +++ b/arch/riscv/kernel/jump_label.c > > > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > > > jump_entry *entry, > > > insn = RISCV_INSN_NOP; > > > } > > > > > > - mutex_lock(&text_mutex); > > > + if (!riscv_patch_in_spinlock_init) > > > + mutex_lock(&text_mutex); > > > patch_insn_write(addr, &insn, sizeof(insn)); > > > - mutex_unlock(&text_mutex); > > > + if (!riscv_patch_in_spinlock_init) > > > + mutex_unlock(&text_mutex); > > > > > > return true; > > > } > > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > > index db13c9ddf9e3..ab009cf855c2 100644 > > > --- a/arch/riscv/kernel/patch.c > > > +++ b/arch/riscv/kernel/patch.c > > > @@ -24,6 +24,7 @@ struct patch_insn { > > > }; > > > > > > int riscv_patch_in_stop_machine = false; > > > +int riscv_patch_in_spinlock_init = false; > > > > > > #ifdef CONFIG_MMU > > > > > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > > > void *insn, size_t len) > > > * safe but triggers a lockdep failure, so just elide it for that > > > * specific case. > > > */ > > > - if (!riscv_patch_in_stop_machine) > > > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > > > lockdep_assert_held(&text_mutex); > > > > > > preempt_disable(); > > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > > > index 016b48fcd6f2..87ddf1702be4 100644 > > > --- a/arch/riscv/kernel/setup.c > > > +++ b/arch/riscv/kernel/setup.c > > > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > > > } > > > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > > else { > > > + riscv_patch_in_spinlock_init = 1; > > > static_branch_disable(&qspinlock_key); > > > + riscv_patch_in_spinlock_init = 0; > > > pr_info("Ticket spinlock: enabled\n"); > > > return; > > > } > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > Thanks Guo for looking into this. > > > > Your solution is not very pretty but I don't have anything better :/ > > Unless introducing a static_branch_XXX_nolock() API? I gave it a try > > and it fixes the issue, but not sure this will be accepted. > > > > The thing is the usage of static branches is temporary, we'll use > > alternatives when I finish working on getting the extensions very > > early from the ACPI tables (I have a poc that works, just needs some > > cleaning). > > > > So let's say that I make this early extension parsing my priority for > > 6.14, can we live with Guo's hack in this release? Or should we revert > > this commit? > I almost get the root cause. Please give me a while. Here is the root cause (CONFIG_RT_MUTEXES=y): When CONFIG_RT_MUTEXES=y, rt_mutex_try_acquire would change from rt_mutex_cmpxchg_acquire to rt_mutex_slowtrylock. rt_mutex_slowtrylock() *raw_spin_lock_irqsave(&lock->wait_lock, flags);* ret = __rt_mutex_slowtrylock(lock); *raw_spin_unlock_irqrestore(&lock->wait_lock, flags);* Because queued_spin_#ops to ticket_#ops is changed one by one by jump_label, spinlock usage would cause a deadlock during the change. That means in arch/riscv/kernel/jump_label.c: arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); -> raw_spin_lock -> queued_spin_lock | -> raw_spin_unlock -> queued_spin_unlock patch_insn_write -> change the raw_spin_lock to ticket_lock mutex_unlock(&text_mutex); ... arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); -> raw_spin_lock -> ticket_lock | -> raw_spin_unlock; -> queued_spin_unlock // *cause the problem* patch_insn_write -> change the raw_spin_unlock to ticket_unlock mutex_unlock(&text_mutex); ... arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); -> raw_spin_lock -> ticket_lock // *deadlock* | -> raw_spin_unlock -> ticket_unlock patch_insn_write -> change other raw_spin_#op -> ticket_#op mutex_unlock(&text_mutex); So, the solution is to disable mutex usage of arch_jump_label_transform_queue() during spinlock_init, just like we have done for stop_machine. Ps: The plan of improvement (remove the jump_label): Use the Alternative to improve the performance of combo_spinlock's ticket_lock (there is no branch jump for ticket_lock) and reduce its code size. -- Best Regards Guo Ren
On Fri, Nov 29, 2024 at 6:32 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > Hi everyone, > > On Fri, Nov 29, 2024 at 7:28 AM Guo Ren <guoren@kernel.org> wrote: > > > > Hi Conor & Alexandre, > > > > On Fri, Nov 29, 2024 at 10:58 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > On Fri, Nov 29, 2024 at 8:55 AM Guo Ren <guoren@kernel.org> wrote: > > > > > > > > On Fri, Nov 29, 2024 at 12:19 AM Conor Dooley <conor@kernel.org> wrote: > > > > > > > > > > On Thu, Nov 28, 2024 at 03:50:09PM +0100, Alexandre Ghiti wrote: > > > > > > On 28/11/2024 15:14, Conor Dooley wrote: > > > > > > > On Thu, Nov 28, 2024 at 01:41:36PM +0000, Will Deacon wrote: > > > > > > > > On Thu, Nov 28, 2024 at 12:56:55PM +0000, Conor Dooley wrote: > > > > > > > > > On Sun, Nov 03, 2024 at 03:51:53PM +0100, Alexandre Ghiti wrote: > > > > > > > > > > In order to produce a generic kernel, a user can select > > > > > > > > > > CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket > > > > > > > > > > spinlock implementation if Zabha or Ziccrse are not present. > > > > > > > > > > > > > > > > > > > > Note that we can't use alternatives here because the discovery of > > > > > > > > > > extensions is done too late and we need to start with the qspinlock > > > > > > > > > > implementation because the ticket spinlock implementation would pollute > > > > > > > > > > the spinlock value, so let's use static keys. > > > > > > > > > > > > > > > > > > > > This is largely based on Guo's work and Leonardo reviews at [1]. > > > > > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] > > > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > > > > > > > > > This patch (now commit ab83647fadae2 ("riscv: Add qspinlock support")) > > > > > > > > > breaks boot on polarfire soc. It dies before outputting anything to the > > > > > > > > > console. My .config has: > > > > > > > > > > > > > > > > > > # CONFIG_RISCV_TICKET_SPINLOCKS is not set > > > > > > > > > # CONFIG_RISCV_QUEUED_SPINLOCKS is not set > > > > > > > > > CONFIG_RISCV_COMBO_SPINLOCKS=y > > > > > > > > I pointed out some of the fragility during review: > > > > > > > > > > > > > > > > https://lore.kernel.org/all/20241111164259.GA20042@willie-the-truck/ > > > > > > > > > > > > > > > > so I'm kinda surprised it got merged tbh :/ > > > > > > > Maybe it could be reverted rather than having a broken boot with the > > > > > > > default settings in -rc1. > > > > > > > > > > > > > > > > > > No need to rush before we know what's happening,I guess you bisected to this > > > > > > commit right? > > > > > > > > > > The symptom is a failure to boot, without any console output, of course > > > > > I bisected it before blaming something specific. But I don't think it is > > > > > "rushing" as having -rc1 broken with an option's default is a massive pain > > > > > in the arse when it comes to testing. > > > > > > > > > > > I don't have this soc, so can you provide $stval/$sepc/$scause, a config, a > > > > > > kernel, anything? > > > > > > > > > > I don't have the former cos it died immediately on boot. config is > > > > > attached. It reproduces in QEMU so you don't need any hardware. > > > > If QEMU could reproduce, could you provide a dmesg by the below method? > > > > > > > > Qemu cmd append: -s -S > > > > ref: https://qemu-project.gitlab.io/qemu/system/gdb.html > > > > > > > > Connect gdb and in console: > > > > 1. file vmlinux > > > > 2. source ./Documentation/admin-guide/kdump/gdbmacros.txt > > > > 3. dmesg > > > > > > > > Then, we could get the kernel's early boot logs from memory. > > > I've reproduced it on qemu, thx for the config. > > > > > > Reading symbols from ../build-rv64lp64/vmlinux... > > > (gdb) tar rem:1234 > > > Remote debugging using :1234 > > > ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > 49 atomic_cond_read_acquire(&lock->val, ticket == (u16)VAL); > > > (gdb) bt > > > #0 ticket_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/asm-generic/ticket_spinlock.h:49 > > > #1 arch_spin_lock (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/arch/riscv/include/asm/spinlock.h:28 > > > #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81b9a5b8 <text_mutex>) > > > at /home/guoren/source/kernel/linux/kernel/locking/spinlock_debug.c:116 > > > #3 0xffffffff80b2ea0e in __raw_spin_lock_irqsave > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/include/linux/spinlock_api_smp.h:111 > > > #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81b9a5b8 > > > <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/spinlock.c:162 > > > #5 0xffffffff80b27c54 in rt_mutex_slowtrylock > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1393 > > > #6 0xffffffff80b295ea in rt_mutex_try_acquire > > > (lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:319 > > > #7 __rt_mutex_lock (state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex.c:1805 > > > #8 __mutex_lock_common (ip=18446744071562135170, nest_lock=0x0, > > > subclass=0, state=2, lock=0xffffffff81b9a5b8 <text_mutex>) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:518 > > > #9 mutex_lock_nested (lock=0xffffffff81b9a5b8 <text_mutex>, > > > subclass=subclass@entry=0) at > > > /home/guoren/source/kernel/linux/kernel/locking/rtmutex_api.c:529 > > > #10 0xffffffff80010682 in arch_jump_label_transform_queue > > > (entry=entry@entry=0xffffffff8158da28, type=<optimized out>) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/jump_label.c:39 > > > #11 0xffffffff801d86b2 in __jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>, > > > entry=0xffffffff8158da28, stop=stop@entry=0xffffffff815a5e68 > > > <__tracepoint_ptr_initcall_finish>, init=init@entry=true) > > > at /home/guoren/source/kernel/linux/kernel/jump_label.c:513 > > > #12 0xffffffff801d890c in jump_label_update > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:920 > > > #13 0xffffffff801d8be8 in static_key_disable_cpuslocked > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:240 > > > #14 0xffffffff801d8c04 in static_key_disable > > > (key=key@entry=0xffffffff81a1abb0 <qspinlock_key>) at > > > /home/guoren/source/kernel/linux/kernel/jump_label.c:248 > > > #15 0xffffffff80c04a1a in riscv_spinlock_init () at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:271 > > > #16 setup_arch (cmdline_p=cmdline_p@entry=0xffffffff81a03e88) at > > > /home/guoren/source/kernel/linux/arch/riscv/kernel/setup.c:336 > > > #17 0xffffffff80c007a2 in start_kernel () at > > > /home/guoren/source/kernel/linux/init/main.c:922 > > > #18 0xffffffff80001164 in _start_kernel () > > > Backtrace stopped: frame did not save the PC > > > (gdb) p /x lock > > > $1 = 0xffffffff81b9a5b8 > > > (gdb) p /x *lock > > > $2 = {{val = {counter = 0x20000}, {locked = 0x0, pending = 0x0}, > > > {locked_pending = 0x0, tail = 0x2}}} > > > > I have for you here a fast fixup for reference. (PS: I'm digging into > > the root cause mentioned by Will Deacon.) > > > > diff --git a/arch/riscv/include/asm/text-patching.h > > b/arch/riscv/include/asm/text-patching.h > > index 7228e266b9a1..0439609f1cff 100644 > > --- a/arch/riscv/include/asm/text-patching.h > > +++ b/arch/riscv/include/asm/text-patching.h > > @@ -12,5 +12,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len); > > int patch_text(void *addr, u32 *insns, size_t len); > > > > extern int riscv_patch_in_stop_machine; > > +extern int riscv_patch_in_spinlock_init; > > > > #endif /* _ASM_RISCV_PATCH_H */ > > diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c > > index 6eee6f736f68..d9a5a5c1933d 100644 > > --- a/arch/riscv/kernel/jump_label.c > > +++ b/arch/riscv/kernel/jump_label.c > > @@ -36,9 +36,11 @@ bool arch_jump_label_transform_queue(struct > > jump_entry *entry, > > insn = RISCV_INSN_NOP; > > } > > > > - mutex_lock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_lock(&text_mutex); > > patch_insn_write(addr, &insn, sizeof(insn)); > > - mutex_unlock(&text_mutex); > > + if (!riscv_patch_in_spinlock_init) > > + mutex_unlock(&text_mutex); > > > > return true; > > } > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > index db13c9ddf9e3..ab009cf855c2 100644 > > --- a/arch/riscv/kernel/patch.c > > +++ b/arch/riscv/kernel/patch.c > > @@ -24,6 +24,7 @@ struct patch_insn { > > }; > > > > int riscv_patch_in_stop_machine = false; > > +int riscv_patch_in_spinlock_init = false; > > > > #ifdef CONFIG_MMU > > > > @@ -131,7 +132,7 @@ static int __patch_insn_write(void *addr, const > > void *insn, size_t len) > > * safe but triggers a lockdep failure, so just elide it for that > > * specific case. > > */ > > - if (!riscv_patch_in_stop_machine) > > + if (!riscv_patch_in_stop_machine && !riscv_patch_in_spinlock_init) > > lockdep_assert_held(&text_mutex); > > > > preempt_disable(); > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > > index 016b48fcd6f2..87ddf1702be4 100644 > > --- a/arch/riscv/kernel/setup.c > > +++ b/arch/riscv/kernel/setup.c > > @@ -268,7 +268,9 @@ static void __init riscv_spinlock_init(void) > > } > > #if defined(CONFIG_RISCV_COMBO_SPINLOCKS) > > else { > > + riscv_patch_in_spinlock_init = 1; > > static_branch_disable(&qspinlock_key); > > + riscv_patch_in_spinlock_init = 0; > > pr_info("Ticket spinlock: enabled\n"); > > return; > > } > > > > > > > > -- > > Best Regards > > Guo Ren > > Thanks Guo for looking into this. > > Your solution is not very pretty but I don't have anything better :/ Here is another solution (Only one file modified, maybe better): diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c index 6eee6f736f68..654ed159c830 100644 --- a/arch/riscv/kernel/jump_label.c +++ b/arch/riscv/kernel/jump_label.c @@ -36,9 +36,15 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, insn = RISCV_INSN_NOP; } - mutex_lock(&text_mutex); - patch_insn_write(addr, &insn, sizeof(insn)); - mutex_unlock(&text_mutex); + if (early_boot_irqs_disabled) { + riscv_patch_in_stop_machine = 1; + patch_insn_write(addr, &insn, sizeof(insn)); + riscv_patch_in_stop_machine = 0; + } else { + mutex_lock(&text_mutex); + patch_insn_write(addr, &insn, sizeof(insn)); + mutex_unlock(&text_mutex); + } return true; } > Unless introducing a static_branch_XXX_nolock() API? I gave it a try > and it fixes the issue, but not sure this will be accepted. > > The thing is the usage of static branches is temporary, we'll use > alternatives when I finish working on getting the extensions very The "alternatives" also need patch codes one by one, which means it will meet the same problem as the jump_label. So, you will still provide a patch like the one above for the alternative implementation. > early from the ACPI tables (I have a poc that works, just needs some > cleaning). > > So let's say that I make this early extension parsing my priority for > 6.14, can we live with Guo's hack in this release? Or should we revert > this commit? > > Thanks, > > Alex -- Best Regards Guo Ren
diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt index 22f2990392ff..cf26042480e2 100644 --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt @@ -20,7 +20,7 @@ | openrisc: | ok | | parisc: | TODO | | powerpc: | ok | - | riscv: | TODO | + | riscv: | ok | | s390: | TODO | | sh: | TODO | | sparc: | ok | diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 093ee6537331..f5698ecc5ccc 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -82,6 +82,7 @@ config RISCV select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP select ARCH_WANTS_NO_INSTR select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE + select ARCH_WEAK_RELEASE_ACQUIRE if ARCH_USE_QUEUED_SPINLOCKS select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU select BUILDTIME_TABLE_SORT if MMU select CLINT_TIMER if RISCV_M_MODE @@ -507,6 +508,39 @@ config NODES_SHIFT Specify the maximum number of NUMA Nodes available on the target system. Increases memory reserved to accommodate various tables. +choice + prompt "RISC-V spinlock type" + default RISCV_COMBO_SPINLOCKS + +config RISCV_TICKET_SPINLOCKS + bool "Using ticket spinlock" + +config RISCV_QUEUED_SPINLOCKS + bool "Using queued spinlock" + depends on SMP && MMU && NONPORTABLE + select ARCH_USE_QUEUED_SPINLOCKS + help + The queued spinlock implementation requires the forward progress + guarantee of cmpxchg()/xchg() atomic operations: CAS with Zabha or + LR/SC with Ziccrse provide such guarantee. + + Select this if and only if Zabha or Ziccrse is available on your + platform, RISCV_QUEUED_SPINLOCKS must not be selected for platforms + without one of those extensions. + + If unsure, select RISCV_COMBO_SPINLOCKS, which will use qspinlocks + when supported and otherwise ticket spinlocks. + +config RISCV_COMBO_SPINLOCKS + bool "Using combo spinlock" + depends on SMP && MMU + select ARCH_USE_QUEUED_SPINLOCKS + help + Embed both queued spinlock and ticket lock so that the spinlock + implementation can be chosen at runtime. + +endchoice + config RISCV_ALTERNATIVE bool depends on !XIP_KERNEL diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 1461af12da6e..de13d5a234f8 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -6,10 +6,12 @@ generic-y += early_ioremap.h generic-y += flat.h generic-y += kvm_para.h generic-y += mmzone.h +generic-y += mcs_spinlock.h generic-y += parport.h -generic-y += spinlock.h generic-y += spinlock_types.h +generic-y += ticket_spinlock.h generic-y += qrwlock.h generic-y += qrwlock_types.h +generic-y += qspinlock.h generic-y += user.h generic-y += vmlinux.lds.h diff --git a/arch/riscv/include/asm/spinlock.h b/arch/riscv/include/asm/spinlock.h new file mode 100644 index 000000000000..e5121b89acea --- /dev/null +++ b/arch/riscv/include/asm/spinlock.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ASM_RISCV_SPINLOCK_H +#define __ASM_RISCV_SPINLOCK_H + +#ifdef CONFIG_RISCV_COMBO_SPINLOCKS +#define _Q_PENDING_LOOPS (1 << 9) + +#define __no_arch_spinlock_redefine +#include <asm/ticket_spinlock.h> +#include <asm/qspinlock.h> +#include <asm/jump_label.h> + +/* + * TODO: Use an alternative instead of a static key when we are able to parse + * the extensions string earlier in the boot process. + */ +DECLARE_STATIC_KEY_TRUE(qspinlock_key); + +#define SPINLOCK_BASE_DECLARE(op, type, type_lock) \ +static __always_inline type arch_spin_##op(type_lock lock) \ +{ \ + if (static_branch_unlikely(&qspinlock_key)) \ + return queued_spin_##op(lock); \ + return ticket_spin_##op(lock); \ +} + +SPINLOCK_BASE_DECLARE(lock, void, arch_spinlock_t *) +SPINLOCK_BASE_DECLARE(unlock, void, arch_spinlock_t *) +SPINLOCK_BASE_DECLARE(is_locked, int, arch_spinlock_t *) +SPINLOCK_BASE_DECLARE(is_contended, int, arch_spinlock_t *) +SPINLOCK_BASE_DECLARE(trylock, bool, arch_spinlock_t *) +SPINLOCK_BASE_DECLARE(value_unlocked, int, arch_spinlock_t) + +#elif defined(CONFIG_RISCV_QUEUED_SPINLOCKS) + +#include <asm/qspinlock.h> + +#else + +#include <asm/ticket_spinlock.h> + +#endif + +#include <asm/qrwlock.h> + +#endif /* __ASM_RISCV_SPINLOCK_H */ diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index a2cde65b69e9..438e4f6ad2ad 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -244,6 +244,42 @@ static void __init parse_dtb(void) #endif } +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS) +DEFINE_STATIC_KEY_TRUE(qspinlock_key); +EXPORT_SYMBOL(qspinlock_key); +#endif + +static void __init riscv_spinlock_init(void) +{ + char *using_ext = NULL; + + if (IS_ENABLED(CONFIG_RISCV_TICKET_SPINLOCKS)) { + pr_info("Ticket spinlock: enabled\n"); + return; + } + + if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && + IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) && + riscv_isa_extension_available(NULL, ZABHA) && + riscv_isa_extension_available(NULL, ZACAS)) { + using_ext = "using Zabha"; + } else if (riscv_isa_extension_available(NULL, ZICCRSE)) { + using_ext = "using Ziccrse"; + } +#if defined(CONFIG_RISCV_COMBO_SPINLOCKS) + else { + static_branch_disable(&qspinlock_key); + pr_info("Ticket spinlock: enabled\n"); + return; + } +#endif + + if (!using_ext) + pr_err("Queued spinlock without Zabha or Ziccrse"); + else + pr_info("Queued spinlock %s: enabled\n", using_ext); +} + extern void __init init_rt_signal_env(void); void __init setup_arch(char **cmdline_p) @@ -297,6 +333,7 @@ void __init setup_arch(char **cmdline_p) riscv_set_dma_cache_alignment(); riscv_user_isa_enable(); + riscv_spinlock_init(); } bool arch_cpu_is_hotpluggable(int cpu) diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h index 0655aa5b57b2..bf47cca2c375 100644 --- a/include/asm-generic/qspinlock.h +++ b/include/asm-generic/qspinlock.h @@ -136,6 +136,7 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock) } #endif +#ifndef __no_arch_spinlock_redefine /* * Remapping spinlock architecture specific functions to the corresponding * queued spinlock functions. @@ -146,5 +147,6 @@ static __always_inline bool virt_spin_lock(struct qspinlock *lock) #define arch_spin_lock(l) queued_spin_lock(l) #define arch_spin_trylock(l) queued_spin_trylock(l) #define arch_spin_unlock(l) queued_spin_unlock(l) +#endif #endif /* __ASM_GENERIC_QSPINLOCK_H */ diff --git a/include/asm-generic/ticket_spinlock.h b/include/asm-generic/ticket_spinlock.h index cfcff22b37b3..325779970d8a 100644 --- a/include/asm-generic/ticket_spinlock.h +++ b/include/asm-generic/ticket_spinlock.h @@ -89,6 +89,7 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock) return (s16)((val >> 16) - (val & 0xffff)) > 1; } +#ifndef __no_arch_spinlock_redefine /* * Remapping spinlock architecture specific functions to the corresponding * ticket spinlock functions. @@ -99,5 +100,6 @@ static __always_inline int ticket_spin_is_contended(arch_spinlock_t *lock) #define arch_spin_lock(l) ticket_spin_lock(l) #define arch_spin_trylock(l) ticket_spin_trylock(l) #define arch_spin_unlock(l) ticket_spin_unlock(l) +#endif #endif /* __ASM_GENERIC_TICKET_SPINLOCK_H */