Message ID | 20240823091100.598162-1-qiaozhe@iscas.ac.cn (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: Report error when repeatedly recording CPU hardware ID | expand |
On Fri, Aug 23, 2024 at 2:44 PM Zhe Qiao <qiaozhe@iscas.ac.cn> wrote: > > In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] > array records the CPU hardware ID, if the same CPU hardware attribute has > been recorded, an error report is issued, thereby ensuring the uniqueness > of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. > > Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> > --- > arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > index 0f8f1c95ac38..698f9fe791f7 100644 > --- a/arch/riscv/kernel/smpboot.c > +++ b/arch/riscv/kernel/smpboot.c > @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) > #define acpi_parse_and_init_cpus(...) do { } while (0) > #endif > > +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) There is no such thing as mpidr in the RISC-V world. s/mpidr/hartid/ > +{ > + unsigned int i; > + > + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) > + if (cpuid_to_hartid_map(i) == hart) > + return true; > + return false; > +} > + > static void __init of_parse_and_init_cpus(void) > { > struct device_node *dn; > @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) > if (rc < 0) > continue; > > + if (is_mpidr_duplicate(cpuid, hart)) { > + pr_err("%pOF: duplicate cpu reg properties in the DT\n", > + dn); > + continue; > + } > + > if (hart == cpuid_to_hartid_map(0)) { > BUG_ON(found_boot_cpu); > found_boot_cpu = 1; > -- > 2.43.0 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv Regards, Anup
On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote: > In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] > array records the CPU hardware ID, if the same CPU hardware attribute has > been recorded, an error report is issued, thereby ensuring the uniqueness > of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. Why is this actually required? On what system did you encounter this? > Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> > --- > arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > index 0f8f1c95ac38..698f9fe791f7 100644 > --- a/arch/riscv/kernel/smpboot.c > +++ b/arch/riscv/kernel/smpboot.c > @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) > #define acpi_parse_and_init_cpus(...) do { } while (0) > #endif > > +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) > +{ > + unsigned int i; > + > + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) > + if (cpuid_to_hartid_map(i) == hart) > + return true; > + return false; > +} > + > static void __init of_parse_and_init_cpus(void) > { > struct device_node *dn; > @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) > if (rc < 0) > continue; > > + if (is_mpidr_duplicate(cpuid, hart)) { > + pr_err("%pOF: duplicate cpu reg properties in the DT\n", > + dn); > + continue; Why would we continue in this case? If the devicetree is this broken, why shouldn't we just BUG() and abort immediately? > + } > + > if (hart == cpuid_to_hartid_map(0)) { > BUG_ON(found_boot_cpu); > found_boot_cpu = 1; > -- > 2.43.0 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 2024/8/23 18:08, Anup Patel wrote: > On Fri, Aug 23, 2024 at 2:44 PM Zhe Qiao <qiaozhe@iscas.ac.cn> wrote: >> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] >> array records the CPU hardware ID, if the same CPU hardware attribute has >> been recorded, an error report is issued, thereby ensuring the uniqueness >> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. >> >> Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> >> --- >> arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c >> index 0f8f1c95ac38..698f9fe791f7 100644 >> --- a/arch/riscv/kernel/smpboot.c >> +++ b/arch/riscv/kernel/smpboot.c >> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) >> #define acpi_parse_and_init_cpus(...) do { } while (0) >> #endif >> >> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) > There is no such thing as mpidr in the RISC-V world. > > s/mpidr/hartid/ I'm very sorry, I didn't notice that using mpidr on function names is incorrect in the RISC-V architecture. This is because I referred to the processing methods on the ARM64 architecture, so I did not modify the function name naming. Thanks for your reminder. >> +{ >> + unsigned int i; >> + >> + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) >> + if (cpuid_to_hartid_map(i) == hart) >> + return true; >> + return false; >> +} >> + >> static void __init of_parse_and_init_cpus(void) >> { >> struct device_node *dn; >> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) >> if (rc < 0) >> continue; >> >> + if (is_mpidr_duplicate(cpuid, hart)) { >> + pr_err("%pOF: duplicate cpu reg properties in the DT\n", >> + dn); >> + continue; >> + } >> + >> if (hart == cpuid_to_hartid_map(0)) { >> BUG_ON(found_boot_cpu); >> found_boot_cpu = 1; >> -- >> 2.43.0 >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv > Regards, > Anup Thanks Zhe
On 2024/8/23 20:57, Conor Dooley wrote: > On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote: >> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] >> array records the CPU hardware ID, if the same CPU hardware attribute has >> been recorded, an error report is issued, thereby ensuring the uniqueness >> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. > Why is this actually required? On what system did you encounter this? This is not actually a patch submitted for problems encountered in actual development environments, but rather a comparison of ARM architecture when I was learning Linux kernel and found similar judgments on ARM architecture. In addition, if the same attribute exists on the CPU hardware ID and is recorded in __cpuid_to_hartid_map[], the kernel may need to make a judgment on this error. >> Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> >> --- >> arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c >> index 0f8f1c95ac38..698f9fe791f7 100644 >> --- a/arch/riscv/kernel/smpboot.c >> +++ b/arch/riscv/kernel/smpboot.c >> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) >> #define acpi_parse_and_init_cpus(...) do { } while (0) >> #endif >> >> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) >> +{ >> + unsigned int i; >> + >> + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) >> + if (cpuid_to_hartid_map(i) == hart) >> + return true; >> + return false; >> +} >> + >> static void __init of_parse_and_init_cpus(void) >> { >> struct device_node *dn; >> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) >> if (rc < 0) >> continue; >> >> + if (is_mpidr_duplicate(cpuid, hart)) { >> + pr_err("%pOF: duplicate cpu reg properties in the DT\n", >> + dn); >> + continue; > Why would we continue in this case? If the devicetree is this broken, > why shouldn't we just BUG() and abort immediately? This is because I did not find any judgment on this issue in the previous code during the analysis process, so I did not take more aggressive measures in this regard, but only issued an error alarm. >> + } >> + >> if (hart == cpuid_to_hartid_map(0)) { >> BUG_ON(found_boot_cpu); >> found_boot_cpu = 1; >> -- >> 2.43.0 >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv Thanks Zhe
On Sat, Aug 24, 2024 at 10:41:28AM +0800, qiaozhe wrote: > > > > On 2024/8/23 20:57, Conor Dooley wrote: > > > On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote: > >> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] > >> array records the CPU hardware ID, if the same CPU hardware attribute has > >> been recorded, an error report is issued, thereby ensuring the uniqueness > >> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. > > Why is this actually required? On what system did you encounter this? > > This is not actually a patch submitted for problems encountered in actual > development environments, but rather a comparison of ARM architecture when > I was learning Linux kernel and found similar judgments on ARM architecture. Okay, it's good that you didn't find such a bad dtb "in the wild" :) > In addition, if the same attribute exists on the CPU hardware ID and is > recorded in __cpuid_to_hartid_map[], the kernel may need to make a judgment > on this error. > > > >> Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> > >> --- > >> arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ > >> 1 file changed, 16 insertions(+) > >> > >> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > >> index 0f8f1c95ac38..698f9fe791f7 100644 > >> --- a/arch/riscv/kernel/smpboot.c > >> +++ b/arch/riscv/kernel/smpboot.c > >> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) > >> #define acpi_parse_and_init_cpus(...) do { } while (0) > >> #endif > >> > >> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) > >> +{ > >> + unsigned int i; > >> + > >> + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) > >> + if (cpuid_to_hartid_map(i) == hart) > >> + return true; > >> + return false; > >> +} > >> + > >> static void __init of_parse_and_init_cpus(void) > >> { > >> struct device_node *dn; > >> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) > >> if (rc < 0) > >> continue; > >> > >> + if (is_mpidr_duplicate(cpuid, hart)) { > >> + pr_err("%pOF: duplicate cpu reg properties in the DT\n", > >> + dn); > >> + continue; > > > > Why would we continue in this case? If the devicetree is this broken, > > why shouldn't we just BUG() and abort immediately? > > This is because I did not find any judgment on this issue in the previous code > during the analysis process, so I did not take more aggressive measures in this > regard, but only issued an error alarm. What do you think though? Should we continue to boot in this case? If you read the function a bit further, you'll see that we abort boot if there are two instances of the boot CPU. Do you think the same should be done for all CPUs? Thanks, Conor. > > >> + } > >> + > >> if (hart == cpuid_to_hartid_map(0)) { > >> BUG_ON(found_boot_cpu); > >> found_boot_cpu = 1; > >> -- > >> 2.43.0
On 2024/8/26 16:15, Conor Dooley wrote: > On Sat, Aug 24, 2024 at 10:41:28AM +0800, qiaozhe wrote: >> >> >> On 2024/8/23 20:57, Conor Dooley wrote: >> >>> On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote: >>>> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] >>>> array records the CPU hardware ID, if the same CPU hardware attribute has >>>> been recorded, an error report is issued, thereby ensuring the uniqueness >>>> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. >>> Why is this actually required? On what system did you encounter this? >> This is not actually a patch submitted for problems encountered in actual >> development environments, but rather a comparison of ARM architecture when >> I was learning Linux kernel and found similar judgments on ARM architecture. > Okay, it's good that you didn't find such a bad dtb "in the wild" :) > >> In addition, if the same attribute exists on the CPU hardware ID and is >> recorded in __cpuid_to_hartid_map[], the kernel may need to make a judgment >> on this error. >> >> >>>> Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> >>>> --- >>>> arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ >>>> 1 file changed, 16 insertions(+) >>>> >>>> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c >>>> index 0f8f1c95ac38..698f9fe791f7 100644 >>>> --- a/arch/riscv/kernel/smpboot.c >>>> +++ b/arch/riscv/kernel/smpboot.c >>>> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) >>>> #define acpi_parse_and_init_cpus(...) do { } while (0) >>>> #endif >>>> >>>> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) >>>> +{ >>>> + unsigned int i; >>>> + >>>> + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) >>>> + if (cpuid_to_hartid_map(i) == hart) >>>> + return true; >>>> + return false; >>>> +} >>>> + >>>> static void __init of_parse_and_init_cpus(void) >>>> { >>>> struct device_node *dn; >>>> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) >>>> if (rc < 0) >>>> continue; >>>> >>>> + if (is_mpidr_duplicate(cpuid, hart)) { >>>> + pr_err("%pOF: duplicate cpu reg properties in the DT\n", >>>> + dn); >>>> + continue; >>> Why would we continue in this case? If the devicetree is this broken, >>> why shouldn't we just BUG() and abort immediately? >> This is because I did not find any judgment on this issue in the previous code >> during the analysis process, so I did not take more aggressive measures in this >> regard, but only issued an error alarm. > What do you think though? Should we continue to boot in this case? > If you read the function a bit further, you'll see that we abort boot > if there are two instances of the boot CPU. Do you think the same should > be done for all CPUs? Yes, I saw that if there are two boot CPUs, a BUG will occur. For all CPUs, if there are two CPUs with the same attributes, I think a bug should be generated directly. This will attract more attention than issuing false warnings, while also reducing the difficulty of troubleshooting related issues. These are some of my opinions, I don't know if they are reasonable to consider. What do you think? > Thanks, > Conor. > >>>> + } >>>> + >>>> if (hart == cpuid_to_hartid_map(0)) { >>>> BUG_ON(found_boot_cpu); >>>> found_boot_cpu = 1; >>>> -- >>>> 2.43.0 Thanks, Zhe
On Tue, Aug 27, 2024 at 09:39:30AM +0800, qiaozhe wrote: > > On 2024/8/26 16:15, Conor Dooley wrote: > > > On Sat, Aug 24, 2024 at 10:41:28AM +0800, qiaozhe wrote: > >> > >> > >> On 2024/8/23 20:57, Conor Dooley wrote: > >> > >>> On Fri, Aug 23, 2024 at 05:11:00PM +0800, Zhe Qiao wrote: > >>>> In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] > >>>> array records the CPU hardware ID, if the same CPU hardware attribute has > >>>> been recorded, an error report is issued, thereby ensuring the uniqueness > >>>> of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. > >>> Why is this actually required? On what system did you encounter this? > >> This is not actually a patch submitted for problems encountered in actual > >> development environments, but rather a comparison of ARM architecture when > >> I was learning Linux kernel and found similar judgments on ARM architecture. > > Okay, it's good that you didn't find such a bad dtb "in the wild" :) > > > >> In addition, if the same attribute exists on the CPU hardware ID and is > >> recorded in __cpuid_to_hartid_map[], the kernel may need to make a judgment > >> on this error. > >> > >> > >>>> Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> > >>>> --- > >>>> arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ > >>>> 1 file changed, 16 insertions(+) > >>>> > >>>> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > >>>> index 0f8f1c95ac38..698f9fe791f7 100644 > >>>> --- a/arch/riscv/kernel/smpboot.c > >>>> +++ b/arch/riscv/kernel/smpboot.c > >>>> @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) > >>>> #define acpi_parse_and_init_cpus(...) do { } while (0) > >>>> #endif > >>>> > >>>> +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) > >>>> +{ > >>>> + unsigned int i; > >>>> + > >>>> + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) > >>>> + if (cpuid_to_hartid_map(i) == hart) > >>>> + return true; > >>>> + return false; > >>>> +} > >>>> + > >>>> static void __init of_parse_and_init_cpus(void) > >>>> { > >>>> struct device_node *dn; > >>>> @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) > >>>> if (rc < 0) > >>>> continue; > >>>> > >>>> + if (is_mpidr_duplicate(cpuid, hart)) { > >>>> + pr_err("%pOF: duplicate cpu reg properties in the DT\n", > >>>> + dn); > >>>> + continue; > >>> Why would we continue in this case? If the devicetree is this broken, > >>> why shouldn't we just BUG() and abort immediately? > >> This is because I did not find any judgment on this issue in the previous code > >> during the analysis process, so I did not take more aggressive measures in this > >> regard, but only issued an error alarm. > > What do you think though? Should we continue to boot in this case? > > If you read the function a bit further, you'll see that we abort boot > > if there are two instances of the boot CPU. Do you think the same should > > be done for all CPUs? > > > Yes, I saw that if there are two boot CPUs, a BUG will occur. For all CPUs, > > if there are two CPUs with the same attributes, I think a bug should be generated > > directly. This will attract more attention than issuing false warnings, while also > > reducing the difficulty of troubleshooting related issues. > > > These are some of my opinions, I don't know if they are reasonable to consider. > > What do you think? I think that BUG()ing here would be reasonable to do.
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 0f8f1c95ac38..698f9fe791f7 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -118,6 +118,16 @@ static void __init acpi_parse_and_init_cpus(void) #define acpi_parse_and_init_cpus(...) do { } while (0) #endif +static bool __init is_mpidr_duplicate(unsigned int cpuid, u64 hart) +{ + unsigned int i; + + for (i = 1; (i < cpuid) && (i < NR_CPUS); i++) + if (cpuid_to_hartid_map(i) == hart) + return true; + return false; +} + static void __init of_parse_and_init_cpus(void) { struct device_node *dn; @@ -131,6 +141,12 @@ static void __init of_parse_and_init_cpus(void) if (rc < 0) continue; + if (is_mpidr_duplicate(cpuid, hart)) { + pr_err("%pOF: duplicate cpu reg properties in the DT\n", + dn); + continue; + } + if (hart == cpuid_to_hartid_map(0)) { BUG_ON(found_boot_cpu); found_boot_cpu = 1;
In the of_parse_and_init_cpus() function, when the __cpuid_to_hartid_map[] array records the CPU hardware ID, if the same CPU hardware attribute has been recorded, an error report is issued, thereby ensuring the uniqueness of the CPU hardware ID recorded in the __cpuid_to_hartid_map[] array. Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> --- arch/riscv/kernel/smpboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)