diff mbox series

[1/2] riscv: avoid the PIC offset of static percpu data in module beyond 2G limits

Message ID 1582097304-5547-2-git-send-email-vincent.chen@sifive.com (mailing list archive)
State New, archived
Headers show
Series solve static percpu symbol issue in module and refine code model of module | expand

Commit Message

Vincent Chen Feb. 19, 2020, 7:28 a.m. UTC
The compiler uses the PIC-relative method to access static variables
instead of GOT when the code model is PIC. Therefore, the limitation of
the access range from the instruction to the symbol address is +-2GB.
Under this circumstance, the kernel cannot load a kernel module if this
module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
is that kernel relocates the .data..percpu section of the kernel module to
the end of kernel's .data..percpu. Hence, the distance between the per-CPU
symbols and the instruction will exceed the 2GB limits. To solve this
problem, the kernel should place the loaded module in the memory area
[&_end-2G, VMALLOC_END].

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Suggested-by: Alex Ghiti <alex@ghiti.fr>
Suggested-by: Anup Patel <anup@brainfault.org>

---
 arch/riscv/kernel/module.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Alexandre Ghiti Feb. 19, 2020, 5:52 p.m. UTC | #1
Hi Vincent,

On 2/19/20 8:28 AM, Vincent Chen wrote:
> The compiler uses the PIC-relative method to access static variables
> instead of GOT when the code model is PIC. Therefore, the limitation of
> the access range from the instruction to the symbol address is +-2GB.
> Under this circumstance, the kernel cannot load a kernel module if this
> module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
> is that kernel relocates the .data..percpu section of the kernel module to
> the end of kernel's .data..percpu. Hence, the distance between the per-CPU
> symbols and the instruction will exceed the 2GB limits. To solve this
> problem, the kernel should place the loaded module in the memory area
> [&_end-2G, VMALLOC_END].
> 
> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> Suggested-by: Alex Ghiti <alex@ghiti.fr>
> Suggested-by: Anup Patel <anup@brainfault.org>
> 
> ---
>   arch/riscv/kernel/module.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index b7401858d872..c498beb82369 100644
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -8,6 +8,10 @@
>   #include <linux/err.h>
>   #include <linux/errno.h>
>   #include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +#include <linux/sizes.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
>   
>   static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
>   {
> @@ -386,3 +390,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>   
>   	return 0;
>   }
> +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> +#ifdef CONFIG_MAXPHYSMEM_2GB
> +#define VMALLOC_MODULE_START \
> +	max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> +#else
> +#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
> +#endif

I would use the same definition for both cases:

#define VMALLOC_MODULE_START \
	max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)

as it avoids ifdefs and amounts to the same. And maybe you can avoid the 
definition of VMALLOC_MODULE_START at the same time.

> +void *module_alloc(unsigned long size)
> +{
> +	return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> +	VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> +	__builtin_return_address(0));
> +}
> +#endif
> 

It's weird checkpatch does not complain about the alignment of those lines.

Otherwise, I have just tested it and it works, so you can add:

Tested-by: Alexandre Ghiti <alex@ghiti.fr>

Thanks,

Alex
Carlos de Paula Feb. 19, 2020, 10:43 p.m. UTC | #2
On Wed, Feb 19, 2020 at 2:53 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Vincent,
>
> On 2/19/20 8:28 AM, Vincent Chen wrote:
> > The compiler uses the PIC-relative method to access static variables
> > instead of GOT when the code model is PIC. Therefore, the limitation of
> > the access range from the instruction to the symbol address is +-2GB.
> > Under this circumstance, the kernel cannot load a kernel module if this
> > module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
> > is that kernel relocates the .data..percpu section of the kernel module to
> > the end of kernel's .data..percpu. Hence, the distance between the per-CPU
> > symbols and the instruction will exceed the 2GB limits. To solve this
> > problem, the kernel should place the loaded module in the memory area
> > [&_end-2G, VMALLOC_END].
> >
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Suggested-by: Alex Ghiti <alex@ghiti.fr>
> > Suggested-by: Anup Patel <anup@brainfault.org>
> >
> > ---
> >   arch/riscv/kernel/module.c | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index b7401858d872..c498beb82369 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -8,6 +8,10 @@
> >   #include <linux/err.h>
> >   #include <linux/errno.h>
> >   #include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/sizes.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> >
> >   static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
> >   {
> > @@ -386,3 +390,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >
> >       return 0;
> >   }
> > +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> > +#ifdef CONFIG_MAXPHYSMEM_2GB
> > +#define VMALLOC_MODULE_START \
> > +     max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> > +#else
> > +#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
> > +#endif
>
> I would use the same definition for both cases:
>
> #define VMALLOC_MODULE_START \
>         max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
>
> as it avoids ifdefs and amounts to the same. And maybe you can avoid the
> definition of VMALLOC_MODULE_START at the same time.
>
> > +void *module_alloc(unsigned long size)
> > +{
> > +     return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> > +     VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > +     __builtin_return_address(0));
> > +}
> > +#endif
> >
>
> It's weird checkpatch does not complain about the alignment of those lines.
>
> Otherwise, I have just tested it and it works, so you can add:
>
> Tested-by: Alexandre Ghiti <alex@ghiti.fr>
>
> Thanks,
>
> Alex
>

Thanks for the patch, applied on v5.5.0 and v5.6.0-rc2. Worked fine on
Qemu and Unleashed:

root@debian10-riscv64:~# sudo modprobe openvswitch
[  124.257220] openvswitch: Open vSwitch switching datapath

root@debian10-riscv64:~# modprobe br_netfilter
[  193.168269] Bridge firewalling registered

root@debian10-riscv64:~# lsmod
Module                  Size  Used by
br_netfilter           23054  0
bridge                217063  2 br_netfilter
stp                     2891  1 bridge
llc                     5968  2 bridge,stp
openvswitch           197057  0
nsh                     3501  1 openvswitch
nf_conncount           11362  1 openvswitch
nf_nat                 39088  1 openvswitch
nf_conntrack          143270  3 nf_nat,openvswitch,nf_conncount
nf_defrag_ipv6         10091  2 nf_conntrack,openvswitch
nf_defrag_ipv4          2410  1 nf_conntrack
ip_tables              16409  0

If desired, add:

Tested-by: Carlos de Paula <me@carlosedp.com>
Vincent Chen Feb. 20, 2020, 2:29 a.m. UTC | #3
On Thu, Feb 20, 2020 at 1:52 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Vincent,
>
> On 2/19/20 8:28 AM, Vincent Chen wrote:
> > The compiler uses the PIC-relative method to access static variables
> > instead of GOT when the code model is PIC. Therefore, the limitation of
> > the access range from the instruction to the symbol address is +-2GB.
> > Under this circumstance, the kernel cannot load a kernel module if this
> > module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
> > is that kernel relocates the .data..percpu section of the kernel module to
> > the end of kernel's .data..percpu. Hence, the distance between the per-CPU
> > symbols and the instruction will exceed the 2GB limits. To solve this
> > problem, the kernel should place the loaded module in the memory area
> > [&_end-2G, VMALLOC_END].
> >
> > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > Suggested-by: Alex Ghiti <alex@ghiti.fr>
> > Suggested-by: Anup Patel <anup@brainfault.org>
> >
> > ---
> >   arch/riscv/kernel/module.c | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index b7401858d872..c498beb82369 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -8,6 +8,10 @@
> >   #include <linux/err.h>
> >   #include <linux/errno.h>
> >   #include <linux/moduleloader.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/sizes.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> >
> >   static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
> >   {
> > @@ -386,3 +390,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >
> >       return 0;
> >   }
> > +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> > +#ifdef CONFIG_MAXPHYSMEM_2GB
> > +#define VMALLOC_MODULE_START \
> > +     max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> > +#else
> > +#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
> > +#endif
>
> I would use the same definition for both cases:
>
> #define VMALLOC_MODULE_START \
>         max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
>
> as it avoids ifdefs and amounts to the same. And maybe you can avoid the
> definition of VMALLOC_MODULE_START at the same time.
>
Thanks for your comments. I will follow your suggestion to use the
same definition for both cases. For the definition of
VMALLOC_MODULE_START, I may prefer to keep it , because I think it may
be more readable than directly passing the max() function to the
__vmalloc_node_range(). I am afriad that I misunderstood what you
meant. If possible, could you give me an example? Thank you.

> > +void *module_alloc(unsigned long size)
> > +{
> > +     return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> > +     VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > +     __builtin_return_address(0));
> > +}
> > +#endif
> >
>
> It's weird checkpatch does not complain about the alignment of those lines.
>
I will modify it.
> Otherwise, I have just tested it and it works, so you can add:
>
> Tested-by: Alexandre Ghiti <alex@ghiti.fr>
>
> Thanks,
>
> Alex

Thank you for testing this patch, I will add it.
Vincent Chen Feb. 20, 2020, 2:31 a.m. UTC | #4
On Thu, Feb 20, 2020 at 6:43 AM Carlos Eduardo de Paula
<me@carlosedp.com> wrote:
>
> On Wed, Feb 19, 2020 at 2:53 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> >
> > Hi Vincent,
> >
> > On 2/19/20 8:28 AM, Vincent Chen wrote:
> > > The compiler uses the PIC-relative method to access static variables
> > > instead of GOT when the code model is PIC. Therefore, the limitation of
> > > the access range from the instruction to the symbol address is +-2GB.
> > > Under this circumstance, the kernel cannot load a kernel module if this
> > > module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
> > > is that kernel relocates the .data..percpu section of the kernel module to
> > > the end of kernel's .data..percpu. Hence, the distance between the per-CPU
> > > symbols and the instruction will exceed the 2GB limits. To solve this
> > > problem, the kernel should place the loaded module in the memory area
> > > [&_end-2G, VMALLOC_END].
> > >
> > > Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
> > > Suggested-by: Alex Ghiti <alex@ghiti.fr>
> > > Suggested-by: Anup Patel <anup@brainfault.org>
> > >
> > > ---
> > >   arch/riscv/kernel/module.c | 18 ++++++++++++++++++
> > >   1 file changed, 18 insertions(+)
> > >
> > > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > > index b7401858d872..c498beb82369 100644
> > > --- a/arch/riscv/kernel/module.c
> > > +++ b/arch/riscv/kernel/module.c
> > > @@ -8,6 +8,10 @@
> > >   #include <linux/err.h>
> > >   #include <linux/errno.h>
> > >   #include <linux/moduleloader.h>
> > > +#include <linux/vmalloc.h>
> > > +#include <linux/sizes.h>
> > > +#include <asm/pgtable.h>
> > > +#include <asm/sections.h>
> > >
> > >   static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
> > >   {
> > > @@ -386,3 +390,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > >
> > >       return 0;
> > >   }
> > > +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> > > +#ifdef CONFIG_MAXPHYSMEM_2GB
> > > +#define VMALLOC_MODULE_START \
> > > +     max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> > > +#else
> > > +#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
> > > +#endif
> >
> > I would use the same definition for both cases:
> >
> > #define VMALLOC_MODULE_START \
> >         max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> >
> > as it avoids ifdefs and amounts to the same. And maybe you can avoid the
> > definition of VMALLOC_MODULE_START at the same time.
> >
> > > +void *module_alloc(unsigned long size)
> > > +{
> > > +     return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> > > +     VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > > +     __builtin_return_address(0));
> > > +}
> > > +#endif
> > >
> >
> > It's weird checkpatch does not complain about the alignment of those lines.
> >
> > Otherwise, I have just tested it and it works, so you can add:
> >
> > Tested-by: Alexandre Ghiti <alex@ghiti.fr>
> >
> > Thanks,
> >
> > Alex
> >
>
> Thanks for the patch, applied on v5.5.0 and v5.6.0-rc2. Worked fine on
> Qemu and Unleashed:
>
> root@debian10-riscv64:~# sudo modprobe openvswitch
> [  124.257220] openvswitch: Open vSwitch switching datapath
>
> root@debian10-riscv64:~# modprobe br_netfilter
> [  193.168269] Bridge firewalling registered
>
> root@debian10-riscv64:~# lsmod
> Module                  Size  Used by
> br_netfilter           23054  0
> bridge                217063  2 br_netfilter
> stp                     2891  1 bridge
> llc                     5968  2 bridge,stp
> openvswitch           197057  0
> nsh                     3501  1 openvswitch
> nf_conncount           11362  1 openvswitch
> nf_nat                 39088  1 openvswitch
> nf_conntrack          143270  3 nf_nat,openvswitch,nf_conncount
> nf_defrag_ipv6         10091  2 nf_conntrack,openvswitch
> nf_defrag_ipv4          2410  1 nf_conntrack
> ip_tables              16409  0
>
> If desired, add:
>
> Tested-by: Carlos de Paula <me@carlosedp.com>

Thank you for testing this patch, I will add it.

Vincent Chen
Alexandre Ghiti Feb. 20, 2020, 5:53 a.m. UTC | #5
Hi Vincent,

On 2/19/20 9:29 PM, Vincent Chen wrote:
> On Thu, Feb 20, 2020 at 1:52 AM Alexandre Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Vincent,
>>
>> On 2/19/20 8:28 AM, Vincent Chen wrote:
>>> The compiler uses the PIC-relative method to access static variables
>>> instead of GOT when the code model is PIC. Therefore, the limitation of
>>> the access range from the instruction to the symbol address is +-2GB.
>>> Under this circumstance, the kernel cannot load a kernel module if this
>>> module has static per-CPU symbols declared by DEFINE_PER_CPU(). The reason
>>> is that kernel relocates the .data..percpu section of the kernel module to
>>> the end of kernel's .data..percpu. Hence, the distance between the per-CPU
>>> symbols and the instruction will exceed the 2GB limits. To solve this
>>> problem, the kernel should place the loaded module in the memory area
>>> [&_end-2G, VMALLOC_END].
>>>
>>> Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
>>> Suggested-by: Alex Ghiti <alex@ghiti.fr>
>>> Suggested-by: Anup Patel <anup@brainfault.org>
>>>
>>> ---
>>>    arch/riscv/kernel/module.c | 18 ++++++++++++++++++
>>>    1 file changed, 18 insertions(+)
>>>
>>> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
>>> index b7401858d872..c498beb82369 100644
>>> --- a/arch/riscv/kernel/module.c
>>> +++ b/arch/riscv/kernel/module.c
>>> @@ -8,6 +8,10 @@
>>>    #include <linux/err.h>
>>>    #include <linux/errno.h>
>>>    #include <linux/moduleloader.h>
>>> +#include <linux/vmalloc.h>
>>> +#include <linux/sizes.h>
>>> +#include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>>
>>>    static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
>>>    {
>>> @@ -386,3 +390,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>>>
>>>        return 0;
>>>    }
>>> +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
>>> +#ifdef CONFIG_MAXPHYSMEM_2GB
>>> +#define VMALLOC_MODULE_START \
>>> +     max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
>>> +#else
>>> +#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
>>> +#endif
>>
>> I would use the same definition for both cases:
>>
>> #define VMALLOC_MODULE_START \
>>          max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
>>
>> as it avoids ifdefs and amounts to the same. And maybe you can avoid the
>> definition of VMALLOC_MODULE_START at the same time.
>>
> Thanks for your comments. I will follow your suggestion to use the
> same definition for both cases. For the definition of
> VMALLOC_MODULE_START, I may prefer to keep it , because I think it may
> be more readable than directly passing the max() function to the
> __vmalloc_node_range(). I am afriad that I misunderstood what you
> meant. If possible, could you give me an example? Thank you.
> 

I meant you could get rid of VMALLOC_MODULE_START definition if there 
was only one, but I don't mind, you can keep it if you prefer.

>>> +void *module_alloc(unsigned long size)
>>> +{
>>> +     return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
>>> +     VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
>>> +     __builtin_return_address(0));
>>> +}
>>> +#endif
>>>
>>
>> It's weird checkpatch does not complain about the alignment of those lines.
>>
> I will modify it.
>> Otherwise, I have just tested it and it works, so you can add:
>>
>> Tested-by: Alexandre Ghiti <alex@ghiti.fr>
>>
>> Thanks,
>>
>> Alex
> 
> Thank you for testing this patch, I will add it.
> 

Thanks again,

Alex
diff mbox series

Patch

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index b7401858d872..c498beb82369 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -8,6 +8,10 @@ 
 #include <linux/err.h>
 #include <linux/errno.h>
 #include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+#include <linux/sizes.h>
+#include <asm/pgtable.h>
+#include <asm/sections.h>
 
 static int apply_r_riscv_32_rela(struct module *me, u32 *location, Elf_Addr v)
 {
@@ -386,3 +390,17 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 
 	return 0;
 }
+#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#ifdef CONFIG_MAXPHYSMEM_2GB
+#define VMALLOC_MODULE_START \
+	max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
+#else
+#define VMALLOC_MODULE_START PFN_ALIGN((unsigned long)&_end - SZ_2G)
+#endif
+void *module_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
+	VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+	__builtin_return_address(0));
+}
+#endif