diff mbox

Arm: mm: ftrace: Only set text back to ro after kernel has been marked ro

Message ID 20170823135836.52fb44fc@gandalf.local.home (mailing list archive)
State New, archived
Headers show

Commit Message

Steven Rostedt Aug. 23, 2017, 5:58 p.m. UTC
ftrace needs to modify the kernel text in order to enable function tracing.
For security reasons, the kernel text is marked to read-only (ro) at the end
of system bootup. When enabling function tracing after that, ftrace calls
arch specific code that needs to enable the modification of kernel text
while ftrace does the update, and reset it back again when finished.

The issue arises when function tracing is enabled during system bootup. The
text hasn't been marked as read-only yet, but the same code to modify the
kernel is executed, and when it is finished, it will cause the kernel to
become read-only. This causes issues for other init code that requires
modification of kernel text during system bootup. This appears to cause
issue with Raspberry Pi 2.

By implementing the feature that is used in x86 to deal with this issue, it
fixes the problem. The solution is simple. Have a variable
(kernel_set_to_readonly) get set when the system finished boot and marks the
kernel to readonly. If that variable is not set, both
kernel_set_to_readonly() and kernel_set_to_rw() return without doing any
modifications. Those functions are used by ftrace to change the permissions
of the kernel text. By not doing anything, ftrace will not mess with the
permissions when it is enabled at system bootup.

Link: http://lkml.kernel.org/r/20170821153402.7so2u364htvt6tnf@camel2.lan
Link: https://github.com/raspberrypi/linux/issues/2166#issuecomment-323355145
Reported-by: Matthias Reichl <hias@horus.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Phil Elwell <phil@raspberrypi.org>
Cc: linux-rpi-kernel@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Fixes: 80d6b0c2ee ("ARM: mm: allow text and rodata sections to be read-only")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/arm/mm/init.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Matthias Reichl Aug. 23, 2017, 6:20 p.m. UTC | #1
On Wed, Aug 23, 2017 at 01:58:36PM -0400, Steven Rostedt wrote:
> 
> ftrace needs to modify the kernel text in order to enable function tracing.
> For security reasons, the kernel text is marked to read-only (ro) at the end
> of system bootup. When enabling function tracing after that, ftrace calls
> arch specific code that needs to enable the modification of kernel text
> while ftrace does the update, and reset it back again when finished.
> 
> The issue arises when function tracing is enabled during system bootup. The
> text hasn't been marked as read-only yet, but the same code to modify the
> kernel is executed, and when it is finished, it will cause the kernel to
> become read-only. This causes issues for other init code that requires
> modification of kernel text during system bootup. This appears to cause
> issue with Raspberry Pi 2.
> 
> By implementing the feature that is used in x86 to deal with this issue, it
> fixes the problem. The solution is simple. Have a variable
> (kernel_set_to_readonly) get set when the system finished boot and marks the
> kernel to readonly. If that variable is not set, both
> kernel_set_to_readonly() and kernel_set_to_rw() return without doing any
> modifications. Those functions are used by ftrace to change the permissions
> of the kernel text. By not doing anything, ftrace will not mess with the
> permissions when it is enabled at system bootup.
> 
> Link: http://lkml.kernel.org/r/20170821153402.7so2u364htvt6tnf@camel2.lan
> Link: https://github.com/raspberrypi/linux/issues/2166#issuecomment-323355145
> Reported-by: Matthias Reichl <hias@horus.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Eric Anholt <eric@anholt.net>
> Cc: Stefan Wahren <stefan.wahren@i2se.com>
> Cc: Phil Elwell <phil@raspberrypi.org>
> Cc: linux-rpi-kernel@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: stable@vger.kernel.org
> Fixes: 80d6b0c2ee ("ARM: mm: allow text and rodata sections to be read-only")
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Tested-by: Matthias Reichl <hias@horus.com>

Thanks, again, for resolving the issue so quickly!

so long,

Hias

> ---
>  arch/arm/mm/init.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index ad80548..fd75f38 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
>  	return 0;
>  }
>  
> +static int kernel_set_to_readonly;
> +
>  void mark_rodata_ro(void)
>  {
> +	kernel_set_to_readonly = 1;
> +
>  	stop_machine(__mark_rodata_ro, NULL, NULL);
>  }
>  
>  void set_kernel_text_rw(void)
>  {
> +	if (!kernel_set_to_readonly)
> +		return;
> +
>  	set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false,
>  				current->active_mm);
>  }
>  
>  void set_kernel_text_ro(void)
>  {
> +	if (!kernel_set_to_readonly)
> +		return;
> +
>  	set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true,
>  				current->active_mm);
>  }
> -- 
> 2.9.5
>
Kees Cook Aug. 23, 2017, 6:48 p.m. UTC | #2
On Wed, Aug 23, 2017 at 10:58 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> ftrace needs to modify the kernel text in order to enable function tracing.
> For security reasons, the kernel text is marked to read-only (ro) at the end
> of system bootup. When enabling function tracing after that, ftrace calls
> arch specific code that needs to enable the modification of kernel text
> while ftrace does the update, and reset it back again when finished.
>
> The issue arises when function tracing is enabled during system bootup. The
> text hasn't been marked as read-only yet, but the same code to modify the
> kernel is executed, and when it is finished, it will cause the kernel to
> become read-only. This causes issues for other init code that requires
> modification of kernel text during system bootup. This appears to cause
> issue with Raspberry Pi 2.
>
> By implementing the feature that is used in x86 to deal with this issue, it
> fixes the problem. The solution is simple. Have a variable
> (kernel_set_to_readonly) get set when the system finished boot and marks the
> kernel to readonly. If that variable is not set, both
> kernel_set_to_readonly() and kernel_set_to_rw() return without doing any
> modifications. Those functions are used by ftrace to change the permissions
> of the kernel text. By not doing anything, ftrace will not mess with the
> permissions when it is enabled at system bootup.
>
> Link: http://lkml.kernel.org/r/20170821153402.7so2u364htvt6tnf@camel2.lan
> Link: https://github.com/raspberrypi/linux/issues/2166#issuecomment-323355145
> Reported-by: Matthias Reichl <hias@horus.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Eric Anholt <eric@anholt.net>
> Cc: Stefan Wahren <stefan.wahren@i2se.com>
> Cc: Phil Elwell <phil@raspberrypi.org>
> Cc: linux-rpi-kernel@lists.infradead.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: stable@vger.kernel.org
> Fixes: 80d6b0c2ee ("ARM: mm: allow text and rodata sections to be read-only")
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  arch/arm/mm/init.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index ad80548..fd75f38 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
>         return 0;
>  }
>
> +static int kernel_set_to_readonly;

Adding a comment here might be a good idea, something like:

/* Has system boot-up reached mark_rodata_ro() yet? */

Otherwise:

Acked-by: Kees Cook <keescook@chromium.org>

> +
>  void mark_rodata_ro(void)
>  {
> +       kernel_set_to_readonly = 1;
> +
>         stop_machine(__mark_rodata_ro, NULL, NULL);
>  }
>
>  void set_kernel_text_rw(void)
>  {
> +       if (!kernel_set_to_readonly)
> +               return;
> +
>         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false,
>                                 current->active_mm);
>  }
>
>  void set_kernel_text_ro(void)
>  {
> +       if (!kernel_set_to_readonly)
> +               return;
> +
>         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true,
>                                 current->active_mm);
>  }

Does arm64 suffer from a similar condition? (It looks like no, as text
patching is done with a fixmap poke.)

-Kees
Steven Rostedt Aug. 23, 2017, 7:03 p.m. UTC | #3
On Wed, 23 Aug 2017 11:48:13 -0700
Kees Cook <keescook@chromium.org> wrote:

> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index ad80548..fd75f38 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
> >         return 0;
> >  }
> >
> > +static int kernel_set_to_readonly;  
> 
> Adding a comment here might be a good idea, something like:
> 
> /* Has system boot-up reached mark_rodata_ro() yet? */

I don't mind adding a comment, but the above is rather self explanatory
(one can easily see that it is set in mark_rodata_ro() with a simple
search).

If a comment is to be added, something a bit more descriptive of the
functionality of the variable would be appropriate:

/*
 * Ignore modifying kernel text permissions until the kernel core calls
 * make_rodata_ro() at system start up.
 */

I can resend with the comment, or whoever takes this could add it
themselves.

-- Steve


> 
> Otherwise:
> 
> Acked-by: Kees Cook <keescook@chromium.org>
> 
> > +
> >  void mark_rodata_ro(void)
> >  {
> > +       kernel_set_to_readonly = 1;
> > +
> >         stop_machine(__mark_rodata_ro, NULL, NULL);
> >  }
> >
> >  void set_kernel_text_rw(void)
> >  {
> > +       if (!kernel_set_to_readonly)
> > +               return;
> > +
> >         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false,
> >                                 current->active_mm);
> >  }
> >
> >  void set_kernel_text_ro(void)
> >  {
> > +       if (!kernel_set_to_readonly)
> > +               return;
> > +
> >         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true,
> >                                 current->active_mm);
> >  }  
> 
> Does arm64 suffer from a similar condition? (It looks like no, as text
> patching is done with a fixmap poke.)
> 
> -Kees
>
Matthias Reichl Dec. 5, 2017, 11:47 a.m. UTC | #4
On Wed, Aug 23, 2017 at 03:03:51PM -0400, Steven Rostedt wrote:
> On Wed, 23 Aug 2017 11:48:13 -0700
> Kees Cook <keescook@chromium.org> wrote:
> 
> > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > > index ad80548..fd75f38 100644
> > > --- a/arch/arm/mm/init.c
> > > +++ b/arch/arm/mm/init.c
> > > @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
> > >         return 0;
> > >  }
> > >
> > > +static int kernel_set_to_readonly;  
> > 
> > Adding a comment here might be a good idea, something like:
> > 
> > /* Has system boot-up reached mark_rodata_ro() yet? */
> 
> I don't mind adding a comment, but the above is rather self explanatory
> (one can easily see that it is set in mark_rodata_ro() with a simple
> search).
> 
> If a comment is to be added, something a bit more descriptive of the
> functionality of the variable would be appropriate:
> 
> /*
>  * Ignore modifying kernel text permissions until the kernel core calls
>  * make_rodata_ro() at system start up.
>  */
> 
> I can resend with the comment, or whoever takes this could add it
> themselves.

Gentle ping: this patch doesn't seem to have landed in upstream
trees yet. Is any more work required?

It would be nice to have this fix added. Just tested next-20171205
on RPi B+, it oopses when the function tracer is enabled during boot.
next-20171205 plus this patch boots up fine.

so long,

Hias

> 
> -- Steve
> 
> 
> > 
> > Otherwise:
> > 
> > Acked-by: Kees Cook <keescook@chromium.org>
> > 
> > > +
> > >  void mark_rodata_ro(void)
> > >  {
> > > +       kernel_set_to_readonly = 1;
> > > +
> > >         stop_machine(__mark_rodata_ro, NULL, NULL);
> > >  }
> > >
> > >  void set_kernel_text_rw(void)
> > >  {
> > > +       if (!kernel_set_to_readonly)
> > > +               return;
> > > +
> > >         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false,
> > >                                 current->active_mm);
> > >  }
> > >
> > >  void set_kernel_text_ro(void)
> > >  {
> > > +       if (!kernel_set_to_readonly)
> > > +               return;
> > > +
> > >         set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true,
> > >                                 current->active_mm);
> > >  }  
> > 
> > Does arm64 suffer from a similar condition? (It looks like no, as text
> > patching is done with a fixmap poke.)
> > 
> > -Kees
> > 
>
Russell King (Oracle) Dec. 5, 2017, 1:14 p.m. UTC | #5
On Tue, Dec 05, 2017 at 12:47:09PM +0100, Matthias Reichl wrote:
> On Wed, Aug 23, 2017 at 03:03:51PM -0400, Steven Rostedt wrote:
> > On Wed, 23 Aug 2017 11:48:13 -0700
> > Kees Cook <keescook@chromium.org> wrote:
> > 
> > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > > > index ad80548..fd75f38 100644
> > > > --- a/arch/arm/mm/init.c
> > > > +++ b/arch/arm/mm/init.c
> > > > @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static int kernel_set_to_readonly;  
> > > 
> > > Adding a comment here might be a good idea, something like:
> > > 
> > > /* Has system boot-up reached mark_rodata_ro() yet? */
> > 
> > I don't mind adding a comment, but the above is rather self explanatory
> > (one can easily see that it is set in mark_rodata_ro() with a simple
> > search).
> > 
> > If a comment is to be added, something a bit more descriptive of the
> > functionality of the variable would be appropriate:
> > 
> > /*
> >  * Ignore modifying kernel text permissions until the kernel core calls
> >  * make_rodata_ro() at system start up.
> >  */
> > 
> > I can resend with the comment, or whoever takes this could add it
> > themselves.
> 
> Gentle ping: this patch doesn't seem to have landed in upstream
> trees yet. Is any more work required?
> 
> It would be nice to have this fix added. Just tested next-20171205
> on RPi B+, it oopses when the function tracer is enabled during boot.
> next-20171205 plus this patch boots up fine.

When does it oops?

Reading through this code, I'm left wondering why we switch the rodata
section to be writable here - if we're poking at kernel text, then
surely we shouldn't be the read-only data read-write?

Should kernel_set_to_readonly also be a rodata-after-init variable?
Matthias Reichl Dec. 5, 2017, 1:23 p.m. UTC | #6
On Tue, Dec 05, 2017 at 01:14:17PM +0000, Russell King - ARM Linux wrote:
> On Tue, Dec 05, 2017 at 12:47:09PM +0100, Matthias Reichl wrote:
> > On Wed, Aug 23, 2017 at 03:03:51PM -0400, Steven Rostedt wrote:
> > > On Wed, 23 Aug 2017 11:48:13 -0700
> > > Kees Cook <keescook@chromium.org> wrote:
> > > 
> > > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > > > > index ad80548..fd75f38 100644
> > > > > --- a/arch/arm/mm/init.c
> > > > > +++ b/arch/arm/mm/init.c
> > > > > @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > +static int kernel_set_to_readonly;  
> > > > 
> > > > Adding a comment here might be a good idea, something like:
> > > > 
> > > > /* Has system boot-up reached mark_rodata_ro() yet? */
> > > 
> > > I don't mind adding a comment, but the above is rather self explanatory
> > > (one can easily see that it is set in mark_rodata_ro() with a simple
> > > search).
> > > 
> > > If a comment is to be added, something a bit more descriptive of the
> > > functionality of the variable would be appropriate:
> > > 
> > > /*
> > >  * Ignore modifying kernel text permissions until the kernel core calls
> > >  * make_rodata_ro() at system start up.
> > >  */
> > > 
> > > I can resend with the comment, or whoever takes this could add it
> > > themselves.
> > 
> > Gentle ping: this patch doesn't seem to have landed in upstream
> > trees yet. Is any more work required?
> > 
> > It would be nice to have this fix added. Just tested next-20171205
> > on RPi B+, it oopses when the function tracer is enabled during boot.
> > next-20171205 plus this patch boots up fine.
> 
> When does it oops?

Rather early in the boot process:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.15.0-rc2-next-20171205 (hias@camel2) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #3 Tue Dec 5 12:35:08 CET 2017
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] OF: fdt: Machine model: Raspberry Pi Model B Plus Rev 1.2
[    0.000000] earlycon: pl11 at MMIO32 0x20201000 (options '')
[    0.000000] bootconsole [pl11] enabled
[    0.000000] Memory policy: Data cache writeback
[    0.000000] cma: Reserved 32 MiB at 0x13c00000
[    0.000000] CPU: All CPU(s) started in SVC mode.
[    0.000000] random: fast init done
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 89408
[    0.000000] Kernel command line: bcm2708_fb.fbwidth=1280 bcm2708_fb.fbheight=1024 bcm2708_fb.fbswap=1 dma.dmachans=0x7f35 bcm2708.boardrev=0x10 bcm2708.serial=0x59ce1e57 bcm2708.uart_clock=48000000 bcm2708.disk_led_gpio=47 bcm2708.disk_led_active_low=0 smsc95xx.macaddr=B8:27:EB:CE:1E:57 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  root=/dev/mmcblk0p2 rw rootwait elevator=noop earlycon=pl011,mmio32,0x20201000 console=/dev/ttyAMA0,115200 ftrace=function
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Memory: 311776K/360448K available (7168K kernel code, 561K rwdata, 2212K rodata, 1024K init, 683K bss, 15904K reserved, 32768K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0xd6800000 - 0xff800000   ( 656 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd6000000   ( 352 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (8160 kB)
[    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
[    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 562 kB)
[    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 684 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 25789 entries in 76 pages
[    0.000000] Starting tracer 'function'
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000052] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns
[    0.008889] clocksource: timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
[    0.018979] bcm2835: system timer (irq = 27)
[    0.028070] Console: colour dummy device 80x30
[    0.033154] Calibrating delay loop... 697.95 BogoMIPS (lpj=3489792)
[    0.090179] pid_max: default: 32768 minimum: 301
[    0.097851] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.104852] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.117353] CPU: Testing write buffer coherency: ok
[    0.127472] Setting up static identity map for 0x100000 - 0x100054
[    0.145654] devtmpfs: initialized
[    0.235561] VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
[    0.245751] Unable to handle kernel paging request at virtual address c09eb214
[    0.253373] pgd = 8aaa5336
[    0.256250] [c09eb214] *pgd=0080840e(bad)
[    0.260567] Internal error: Oops: 80d [#1] ARM
[    0.265073] Modules linked in:
[    0.268188] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-next-20171205 #3
[    0.275592] Hardware name: BCM2835
[    0.279046] task: 11ad8790 task.stack: 886fda4c
[    0.283670] PC is at ksysfs_init+0x64/0xb0
[    0.287840] LR is at internal_create_group+0x294/0x2c4
[    0.293049] pc : [<c0b0a33c>]    lr : [<c0292898>]    psr: 20000053
[    0.299400] sp : d3a3ded0  ip : c028e908  fp : d3a3dee4
[    0.304694] r10: 00000000  r9 : c0c8c500  r8 : c0c8c500
[    0.309991] r7 : 00000000  r6 : c0c04048  r5 : c0c8d19c  r4 : 00000000
[    0.316607] r3 : 00000024  r2 : c0abf19c  r1 : c09eb20c  r0 : d3a81d80
[    0.323224] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
[    0.330543] Control: 00c5387d  Table: 00004008  DAC: 00000051
[    0.336367] Process swapper (pid: 1, stack limit = 0xfa32e9e1)
[    0.342280] Stack: (0xd3a3ded0 to 0xd3a3e000)
[    0.346703] dec0:                                     00000001 c0b0a2d8 d3a3df5c d3a3dee8
[    0.354999] dee0: c01027b0 c0b0a2e4 c0a280c0 00000000 d3a3df5c d3a3df00 c0138578 c0b00650
[    0.363294] df00: d3a3df00 c0c0c328 00000000 c0a280d4 0000009f c0a280d4 00000001 00000001
[    0.371590] df20: 000000a0 c0a27454 d5fffd21 d5fffd28 c014f4c0 4710e3ed 00000001 000000a0
[    0.379886] df40: c0b5681c c0b7cb84 c0c8c500 c0c8c500 d3a3df94 d3a3df60 c0b00ef8 c01026f8
[    0.388180] df60: 00000001 00000001 00000000 c0b00644 00000000 c07463a4 00000000 00000000
[    0.396475] df80: 00000000 00000000 d3a3dfac d3a3df98 c07463bc c0b00df0 ffffffff 00000000
[    0.404769] dfa0: 00000000 d3a3dfb0 c01010e8 c07463b0 00000000 00000000 00000000 00000000
[    0.413061] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    0.421354] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    0.429689] [<c0b0a33c>] (ksysfs_init) from [<c01027b0>] (do_one_initcall+0xc4/0x184)
[    0.437652] [<c01027b0>] (do_one_initcall) from [<c0b00ef8>] (kernel_init_freeable+0x114/0x1d4)
[    0.446490] [<c0b00ef8>] (kernel_init_freeable) from [<c07463bc>] (kernel_init+0x18/0x11c)
[    0.454887] [<c07463bc>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    0.462559] Exception stack(0xd3a3dfb0 to 0xd3a3dff8)
[    0.467683] dfa0:                                     00000000 00000000 00000000 00000000
[    0.475976] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    0.484266] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    0.490982] Code: e3a00000 e89da830 e59f1048 e5950000 (e5813008) 
[    0.497194] ---[ end trace 53d55c7b93eb8c51 ]---
[    0.502064] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.502064] 
[    0.511346] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.511346] 

so long,

Hias

> 
> Reading through this code, I'm left wondering why we switch the rodata
> section to be writable here - if we're poking at kernel text, then
> surely we shouldn't be the read-only data read-write?
> 
> Should kernel_set_to_readonly also be a rodata-after-init variable?
Phil Elwell Dec. 5, 2017, 1:30 p.m. UTC | #7
On 05/12/2017 13:23, Matthias Reichl wrote:
> On Tue, Dec 05, 2017 at 01:14:17PM +0000, Russell King - ARM Linux wrote:
>> On Tue, Dec 05, 2017 at 12:47:09PM +0100, Matthias Reichl wrote:
>>> On Wed, Aug 23, 2017 at 03:03:51PM -0400, Steven Rostedt wrote:
>>>> On Wed, 23 Aug 2017 11:48:13 -0700
>>>> Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>>>>> index ad80548..fd75f38 100644
>>>>>> --- a/arch/arm/mm/init.c
>>>>>> +++ b/arch/arm/mm/init.c
>>>>>> @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
>>>>>>         return 0;
>>>>>>  }
>>>>>>
>>>>>> +static int kernel_set_to_readonly;  
>>>>>
>>>>> Adding a comment here might be a good idea, something like:
>>>>>
>>>>> /* Has system boot-up reached mark_rodata_ro() yet? */
>>>>
>>>> I don't mind adding a comment, but the above is rather self explanatory
>>>> (one can easily see that it is set in mark_rodata_ro() with a simple
>>>> search).
>>>>
>>>> If a comment is to be added, something a bit more descriptive of the
>>>> functionality of the variable would be appropriate:
>>>>
>>>> /*
>>>>  * Ignore modifying kernel text permissions until the kernel core calls
>>>>  * make_rodata_ro() at system start up.
>>>>  */
>>>>
>>>> I can resend with the comment, or whoever takes this could add it
>>>> themselves.
>>>
>>> Gentle ping: this patch doesn't seem to have landed in upstream
>>> trees yet. Is any more work required?
>>>
>>> It would be nice to have this fix added. Just tested next-20171205
>>> on RPi B+, it oopses when the function tracer is enabled during boot.
>>> next-20171205 plus this patch boots up fine.
>>
>> When does it oops?
> 
> Rather early in the boot process:
> 
> [    0.000000] Booting Linux on physical CPU 0x0
> [    0.000000] Linux version 4.15.0-rc2-next-20171205 (hias@camel2) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #3 Tue Dec 5 12:35:08 CET 2017
> [    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> [    0.000000] OF: fdt: Machine model: Raspberry Pi Model B Plus Rev 1.2
> [    0.000000] earlycon: pl11 at MMIO32 0x20201000 (options '')
> [    0.000000] bootconsole [pl11] enabled
> [    0.000000] Memory policy: Data cache writeback
> [    0.000000] cma: Reserved 32 MiB at 0x13c00000
> [    0.000000] CPU: All CPU(s) started in SVC mode.
> [    0.000000] random: fast init done
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 89408
> [    0.000000] Kernel command line: bcm2708_fb.fbwidth=1280 bcm2708_fb.fbheight=1024 bcm2708_fb.fbswap=1 dma.dmachans=0x7f35 bcm2708.boardrev=0x10 bcm2708.serial=0x59ce1e57 bcm2708.uart_clock=48000000 bcm2708.disk_led_gpio=47 bcm2708.disk_led_active_low=0 smsc95xx.macaddr=B8:27:EB:CE:1E:57 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  root=/dev/mmcblk0p2 rw rootwait elevator=noop earlycon=pl011,mmio32,0x20201000 console=/dev/ttyAMA0,115200 ftrace=function
> [    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> [    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> [    0.000000] Memory: 311776K/360448K available (7168K kernel code, 561K rwdata, 2212K rodata, 1024K init, 683K bss, 15904K reserved, 32768K cma-reserved)
> [    0.000000] Virtual kernel memory layout:
> [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> [    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> [    0.000000]     vmalloc : 0xd6800000 - 0xff800000   ( 656 MB)
> [    0.000000]     lowmem  : 0xc0000000 - 0xd6000000   ( 352 MB)
> [    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
> [    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (8160 kB)
> [    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
> [    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 562 kB)
> [    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 684 kB)
> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> [    0.000000] ftrace: allocating 25789 entries in 76 pages
> [    0.000000] Starting tracer 'function'
> [    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> [    0.000052] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns
> [    0.008889] clocksource: timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
> [    0.018979] bcm2835: system timer (irq = 27)
> [    0.028070] Console: colour dummy device 80x30
> [    0.033154] Calibrating delay loop... 697.95 BogoMIPS (lpj=3489792)
> [    0.090179] pid_max: default: 32768 minimum: 301
> [    0.097851] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> [    0.104852] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> [    0.117353] CPU: Testing write buffer coherency: ok
> [    0.127472] Setting up static identity map for 0x100000 - 0x100054
> [    0.145654] devtmpfs: initialized
> [    0.235561] VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
> [    0.245751] Unable to handle kernel paging request at virtual address c09eb214
> [    0.253373] pgd = 8aaa5336
> [    0.256250] [c09eb214] *pgd=0080840e(bad)
> [    0.260567] Internal error: Oops: 80d [#1] ARM
> [    0.265073] Modules linked in:
> [    0.268188] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-next-20171205 #3
> [    0.275592] Hardware name: BCM2835
> [    0.279046] task: 11ad8790 task.stack: 886fda4c
> [    0.283670] PC is at ksysfs_init+0x64/0xb0
> [    0.287840] LR is at internal_create_group+0x294/0x2c4
> [    0.293049] pc : [<c0b0a33c>]    lr : [<c0292898>]    psr: 20000053
> [    0.299400] sp : d3a3ded0  ip : c028e908  fp : d3a3dee4
> [    0.304694] r10: 00000000  r9 : c0c8c500  r8 : c0c8c500
> [    0.309991] r7 : 00000000  r6 : c0c04048  r5 : c0c8d19c  r4 : 00000000
> [    0.316607] r3 : 00000024  r2 : c0abf19c  r1 : c09eb20c  r0 : d3a81d80
> [    0.323224] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> [    0.330543] Control: 00c5387d  Table: 00004008  DAC: 00000051
> [    0.336367] Process swapper (pid: 1, stack limit = 0xfa32e9e1)
> [    0.342280] Stack: (0xd3a3ded0 to 0xd3a3e000)
> [    0.346703] dec0:                                     00000001 c0b0a2d8 d3a3df5c d3a3dee8
> [    0.354999] dee0: c01027b0 c0b0a2e4 c0a280c0 00000000 d3a3df5c d3a3df00 c0138578 c0b00650
> [    0.363294] df00: d3a3df00 c0c0c328 00000000 c0a280d4 0000009f c0a280d4 00000001 00000001
> [    0.371590] df20: 000000a0 c0a27454 d5fffd21 d5fffd28 c014f4c0 4710e3ed 00000001 000000a0
> [    0.379886] df40: c0b5681c c0b7cb84 c0c8c500 c0c8c500 d3a3df94 d3a3df60 c0b00ef8 c01026f8
> [    0.388180] df60: 00000001 00000001 00000000 c0b00644 00000000 c07463a4 00000000 00000000
> [    0.396475] df80: 00000000 00000000 d3a3dfac d3a3df98 c07463bc c0b00df0 ffffffff 00000000
> [    0.404769] dfa0: 00000000 d3a3dfb0 c01010e8 c07463b0 00000000 00000000 00000000 00000000
> [    0.413061] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    0.421354] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [    0.429689] [<c0b0a33c>] (ksysfs_init) from [<c01027b0>] (do_one_initcall+0xc4/0x184)
> [    0.437652] [<c01027b0>] (do_one_initcall) from [<c0b00ef8>] (kernel_init_freeable+0x114/0x1d4)
> [    0.446490] [<c0b00ef8>] (kernel_init_freeable) from [<c07463bc>] (kernel_init+0x18/0x11c)
> [    0.454887] [<c07463bc>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
> [    0.462559] Exception stack(0xd3a3dfb0 to 0xd3a3dff8)
> [    0.467683] dfa0:                                     00000000 00000000 00000000 00000000
> [    0.475976] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    0.484266] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [    0.490982] Code: e3a00000 e89da830 e59f1048 e5950000 (e5813008) 
> [    0.497194] ---[ end trace 53d55c7b93eb8c51 ]---
> [    0.502064] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.502064] 
> [    0.511346] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.511346] 
> 
> so long,
> 
> Hias
> 
>>
>> Reading through this code, I'm left wondering why we switch the rodata
>> section to be writable here - if we're poking at kernel text, then
>> surely we shouldn't be the read-only data read-write?
>>
>> Should kernel_set_to_readonly also be a rodata-after-init variable?

This was my initial explanation:

1. Data which is marked __ro_after_init is initially writeable.

2. The ro_perms data covers kernel text, read-only data and __ro_after_init data.

3. set_kernel_text_rw marks everything in ro_perms as writeable.

4. set_kernel_text_ro marks everything in ro_perms as read-only, including the __ro_after_init data.

5. Using the function tracing code involves code modification, resulting in calls to
   __ftrace_modify_code and set_kernel_text_ro.

6. Therefore if function tracing is enabled before kernel_init has completed then the __ro_after_init
   data is made read-only prematurely.

Phil
Russell King (Oracle) Dec. 5, 2017, 1:36 p.m. UTC | #8
On Tue, Dec 05, 2017 at 01:30:11PM +0000, Phil Elwell wrote:
> On 05/12/2017 13:23, Matthias Reichl wrote:
> > On Tue, Dec 05, 2017 at 01:14:17PM +0000, Russell King - ARM Linux wrote:
> >> On Tue, Dec 05, 2017 at 12:47:09PM +0100, Matthias Reichl wrote:
> >>> On Wed, Aug 23, 2017 at 03:03:51PM -0400, Steven Rostedt wrote:
> >>>> On Wed, 23 Aug 2017 11:48:13 -0700
> >>>> Kees Cook <keescook@chromium.org> wrote:
> >>>>
> >>>>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> >>>>>> index ad80548..fd75f38 100644
> >>>>>> --- a/arch/arm/mm/init.c
> >>>>>> +++ b/arch/arm/mm/init.c
> >>>>>> @@ -745,19 +745,29 @@ static int __mark_rodata_ro(void *unused)
> >>>>>>         return 0;
> >>>>>>  }
> >>>>>>
> >>>>>> +static int kernel_set_to_readonly;  
> >>>>>
> >>>>> Adding a comment here might be a good idea, something like:
> >>>>>
> >>>>> /* Has system boot-up reached mark_rodata_ro() yet? */
> >>>>
> >>>> I don't mind adding a comment, but the above is rather self explanatory
> >>>> (one can easily see that it is set in mark_rodata_ro() with a simple
> >>>> search).
> >>>>
> >>>> If a comment is to be added, something a bit more descriptive of the
> >>>> functionality of the variable would be appropriate:
> >>>>
> >>>> /*
> >>>>  * Ignore modifying kernel text permissions until the kernel core calls
> >>>>  * make_rodata_ro() at system start up.
> >>>>  */
> >>>>
> >>>> I can resend with the comment, or whoever takes this could add it
> >>>> themselves.
> >>>
> >>> Gentle ping: this patch doesn't seem to have landed in upstream
> >>> trees yet. Is any more work required?
> >>>
> >>> It would be nice to have this fix added. Just tested next-20171205
> >>> on RPi B+, it oopses when the function tracer is enabled during boot.
> >>> next-20171205 plus this patch boots up fine.
> >>
> >> When does it oops?
> > 
> > Rather early in the boot process:
> > 
> > [    0.000000] Booting Linux on physical CPU 0x0
> > [    0.000000] Linux version 4.15.0-rc2-next-20171205 (hias@camel2) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #3 Tue Dec 5 12:35:08 CET 2017
> > [    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
> > [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> > [    0.000000] OF: fdt: Machine model: Raspberry Pi Model B Plus Rev 1.2
> > [    0.000000] earlycon: pl11 at MMIO32 0x20201000 (options '')
> > [    0.000000] bootconsole [pl11] enabled
> > [    0.000000] Memory policy: Data cache writeback
> > [    0.000000] cma: Reserved 32 MiB at 0x13c00000
> > [    0.000000] CPU: All CPU(s) started in SVC mode.
> > [    0.000000] random: fast init done
> > [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 89408
> > [    0.000000] Kernel command line: bcm2708_fb.fbwidth=1280 bcm2708_fb.fbheight=1024 bcm2708_fb.fbswap=1 dma.dmachans=0x7f35 bcm2708.boardrev=0x10 bcm2708.serial=0x59ce1e57 bcm2708.uart_clock=48000000 bcm2708.disk_led_gpio=47 bcm2708.disk_led_active_low=0 smsc95xx.macaddr=B8:27:EB:CE:1E:57 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  root=/dev/mmcblk0p2 rw rootwait elevator=noop earlycon=pl011,mmio32,0x20201000 console=/dev/ttyAMA0,115200 ftrace=function
> > [    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
> > [    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
> > [    0.000000] Memory: 311776K/360448K available (7168K kernel code, 561K rwdata, 2212K rodata, 1024K init, 683K bss, 15904K reserved, 32768K cma-reserved)
> > [    0.000000] Virtual kernel memory layout:
> > [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> > [    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> > [    0.000000]     vmalloc : 0xd6800000 - 0xff800000   ( 656 MB)
> > [    0.000000]     lowmem  : 0xc0000000 - 0xd6000000   ( 352 MB)
> > [    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
> > [    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (8160 kB)
> > [    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
> > [    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 562 kB)
> > [    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 684 kB)
> > [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> > [    0.000000] ftrace: allocating 25789 entries in 76 pages
> > [    0.000000] Starting tracer 'function'
> > [    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> > [    0.000052] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns
> > [    0.008889] clocksource: timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
> > [    0.018979] bcm2835: system timer (irq = 27)
> > [    0.028070] Console: colour dummy device 80x30
> > [    0.033154] Calibrating delay loop... 697.95 BogoMIPS (lpj=3489792)
> > [    0.090179] pid_max: default: 32768 minimum: 301
> > [    0.097851] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > [    0.104852] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > [    0.117353] CPU: Testing write buffer coherency: ok
> > [    0.127472] Setting up static identity map for 0x100000 - 0x100054
> > [    0.145654] devtmpfs: initialized
> > [    0.235561] VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
> > [    0.245751] Unable to handle kernel paging request at virtual address c09eb214
> > [    0.253373] pgd = 8aaa5336
> > [    0.256250] [c09eb214] *pgd=0080840e(bad)
> > [    0.260567] Internal error: Oops: 80d [#1] ARM
> > [    0.265073] Modules linked in:
> > [    0.268188] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-next-20171205 #3
> > [    0.275592] Hardware name: BCM2835
> > [    0.279046] task: 11ad8790 task.stack: 886fda4c
> > [    0.283670] PC is at ksysfs_init+0x64/0xb0
> > [    0.287840] LR is at internal_create_group+0x294/0x2c4
> > [    0.293049] pc : [<c0b0a33c>]    lr : [<c0292898>]    psr: 20000053
> > [    0.299400] sp : d3a3ded0  ip : c028e908  fp : d3a3dee4
> > [    0.304694] r10: 00000000  r9 : c0c8c500  r8 : c0c8c500
> > [    0.309991] r7 : 00000000  r6 : c0c04048  r5 : c0c8d19c  r4 : 00000000
> > [    0.316607] r3 : 00000024  r2 : c0abf19c  r1 : c09eb20c  r0 : d3a81d80
> > [    0.323224] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > [    0.330543] Control: 00c5387d  Table: 00004008  DAC: 00000051
> > [    0.336367] Process swapper (pid: 1, stack limit = 0xfa32e9e1)
> > [    0.342280] Stack: (0xd3a3ded0 to 0xd3a3e000)
> > [    0.346703] dec0:                                     00000001 c0b0a2d8 d3a3df5c d3a3dee8
> > [    0.354999] dee0: c01027b0 c0b0a2e4 c0a280c0 00000000 d3a3df5c d3a3df00 c0138578 c0b00650
> > [    0.363294] df00: d3a3df00 c0c0c328 00000000 c0a280d4 0000009f c0a280d4 00000001 00000001
> > [    0.371590] df20: 000000a0 c0a27454 d5fffd21 d5fffd28 c014f4c0 4710e3ed 00000001 000000a0
> > [    0.379886] df40: c0b5681c c0b7cb84 c0c8c500 c0c8c500 d3a3df94 d3a3df60 c0b00ef8 c01026f8
> > [    0.388180] df60: 00000001 00000001 00000000 c0b00644 00000000 c07463a4 00000000 00000000
> > [    0.396475] df80: 00000000 00000000 d3a3dfac d3a3df98 c07463bc c0b00df0 ffffffff 00000000
> > [    0.404769] dfa0: 00000000 d3a3dfb0 c01010e8 c07463b0 00000000 00000000 00000000 00000000
> > [    0.413061] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [    0.421354] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> > [    0.429689] [<c0b0a33c>] (ksysfs_init) from [<c01027b0>] (do_one_initcall+0xc4/0x184)
> > [    0.437652] [<c01027b0>] (do_one_initcall) from [<c0b00ef8>] (kernel_init_freeable+0x114/0x1d4)
> > [    0.446490] [<c0b00ef8>] (kernel_init_freeable) from [<c07463bc>] (kernel_init+0x18/0x11c)
> > [    0.454887] [<c07463bc>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
> > [    0.462559] Exception stack(0xd3a3dfb0 to 0xd3a3dff8)
> > [    0.467683] dfa0:                                     00000000 00000000 00000000 00000000
> > [    0.475976] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [    0.484266] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [    0.490982] Code: e3a00000 e89da830 e59f1048 e5950000 (e5813008) 
> > [    0.497194] ---[ end trace 53d55c7b93eb8c51 ]---
> > [    0.502064] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    0.502064] 
> > [    0.511346] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    0.511346] 
> > 
> > so long,
> > 
> > Hias
> > 
> >>
> >> Reading through this code, I'm left wondering why we switch the rodata
> >> section to be writable here - if we're poking at kernel text, then
> >> surely we shouldn't be the read-only data read-write?
> >>
> >> Should kernel_set_to_readonly also be a rodata-after-init variable?
> 
> This was my initial explanation:
> 
> 1. Data which is marked __ro_after_init is initially writeable.
> 
> 2. The ro_perms data covers kernel text, read-only data and __ro_after_init data.
> 
> 3. set_kernel_text_rw marks everything in ro_perms as writeable.
> 
> 4. set_kernel_text_ro marks everything in ro_perms as read-only, including the __ro_after_init data.
> 
> 5. Using the function tracing code involves code modification, resulting in calls to
>    __ftrace_modify_code and set_kernel_text_ro.
> 
> 6. Therefore if function tracing is enabled before kernel_init has completed then the __ro_after_init
>    data is made read-only prematurely.

My question still stands, but let me rephrase.  Do we need
set_kernel_text_*() to touch the read-only data?
Kees Cook Dec. 5, 2017, 7:35 p.m. UTC | #9
On Tue, Dec 5, 2017 at 5:36 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Dec 05, 2017 at 01:30:11PM +0000, Phil Elwell wrote:
>> This was my initial explanation:
>>
>> 1. Data which is marked __ro_after_init is initially writeable.
>>
>> 2. The ro_perms data covers kernel text, read-only data and __ro_after_init data.
>>
>> 3. set_kernel_text_rw marks everything in ro_perms as writeable.
>>
>> 4. set_kernel_text_ro marks everything in ro_perms as read-only, including the __ro_after_init data.
>>
>> 5. Using the function tracing code involves code modification, resulting in calls to
>>    __ftrace_modify_code and set_kernel_text_ro.
>>
>> 6. Therefore if function tracing is enabled before kernel_init has completed then the __ro_after_init
>>    data is made read-only prematurely.
>
> My question still stands, but let me rephrase.  Do we need
> set_kernel_text_*() to touch the read-only data?

We don't _need_ to, but they're all contiguous, so the ro_perms array
used by set_kernel_text_*() is actually only a single entry:

static struct section_perm ro_perms[] = {
        /* Make kernel code and rodata RX (set RO). */
        {
                .name   = "text/rodata RO",
                .start  = (unsigned long)_stext,
                .end    = (unsigned long)__init_begin,
...


-Kees
Russell King (Oracle) Dec. 5, 2017, 8:09 p.m. UTC | #10
On Tue, Dec 05, 2017 at 11:35:59AM -0800, Kees Cook wrote:
> We don't _need_ to, but they're all contiguous, so the ro_perms array
> used by set_kernel_text_*() is actually only a single entry:
> 
> static struct section_perm ro_perms[] = {
>         /* Make kernel code and rodata RX (set RO). */
>         {
>                 .name   = "text/rodata RO",
>                 .start  = (unsigned long)_stext,
>                 .end    = (unsigned long)__init_begin,
> ...

Well, they may not be contiguous - it depends on DEBUG_ALIGN_RODATA.

Either way, we have __start_rodata_section_aligned, which is either
the start of the read-only data section, or the start of the first
section beyond __start_rodata if DEBUG_ALIGN_RODATA is not set.

Given that __start_rodata_section_aligned will always be less than
__init_begin, is there any reason not to make the above end at
__start_rodata_section_aligned, thereby allowing more of the read-only
data (in the case of DEBUG_ALIGN_RODATA=n) or all of the read-only
data (in the case of DEBUG_ALIGN_RODATA=y) to remain write-protected?
Kees Cook Dec. 5, 2017, 8:14 p.m. UTC | #11
On Tue, Dec 5, 2017 at 12:09 PM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Dec 05, 2017 at 11:35:59AM -0800, Kees Cook wrote:
>> We don't _need_ to, but they're all contiguous, so the ro_perms array
>> used by set_kernel_text_*() is actually only a single entry:
>>
>> static struct section_perm ro_perms[] = {
>>         /* Make kernel code and rodata RX (set RO). */
>>         {
>>                 .name   = "text/rodata RO",
>>                 .start  = (unsigned long)_stext,
>>                 .end    = (unsigned long)__init_begin,
>> ...
>
> Well, they may not be contiguous - it depends on DEBUG_ALIGN_RODATA.

Maybe I'm picking a slightly wrong word. I guess I meant adjacent. The
range _stext to __init_begin is all read-only, though there may be
padding (controlled by DEBUG_ALIGN_RODATA), to allow a split for NX
markings on rodata.

> Either way, we have __start_rodata_section_aligned, which is either
> the start of the read-only data section, or the start of the first
> section beyond __start_rodata if DEBUG_ALIGN_RODATA is not set.
>
> Given that __start_rodata_section_aligned will always be less than
> __init_begin, is there any reason not to make the above end at
> __start_rodata_section_aligned, thereby allowing more of the read-only
> data (in the case of DEBUG_ALIGN_RODATA=n) or all of the read-only
> data (in the case of DEBUG_ALIGN_RODATA=y) to remain write-protected?

Sure, there's no reason not to split this into two entries. It'll
require some reworking of the function calls to get it right,
obviously.

-Kees
Matthias Reichl June 29, 2018, 2:47 p.m. UTC | #12
On Tue, Dec 05, 2017 at 12:14:46PM -0800, Kees Cook wrote:
> On Tue, Dec 5, 2017 at 12:09 PM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Tue, Dec 05, 2017 at 11:35:59AM -0800, Kees Cook wrote:
> >> We don't _need_ to, but they're all contiguous, so the ro_perms array
> >> used by set_kernel_text_*() is actually only a single entry:
> >>
> >> static struct section_perm ro_perms[] = {
> >>         /* Make kernel code and rodata RX (set RO). */
> >>         {
> >>                 .name   = "text/rodata RO",
> >>                 .start  = (unsigned long)_stext,
> >>                 .end    = (unsigned long)__init_begin,
> >> ...
> >
> > Well, they may not be contiguous - it depends on DEBUG_ALIGN_RODATA.
> 
> Maybe I'm picking a slightly wrong word. I guess I meant adjacent. The
> range _stext to __init_begin is all read-only, though there may be
> padding (controlled by DEBUG_ALIGN_RODATA), to allow a split for NX
> markings on rodata.
> 
> > Either way, we have __start_rodata_section_aligned, which is either
> > the start of the read-only data section, or the start of the first
> > section beyond __start_rodata if DEBUG_ALIGN_RODATA is not set.
> >
> > Given that __start_rodata_section_aligned will always be less than
> > __init_begin, is there any reason not to make the above end at
> > __start_rodata_section_aligned, thereby allowing more of the read-only
> > data (in the case of DEBUG_ALIGN_RODATA=n) or all of the read-only
> > data (in the case of DEBUG_ALIGN_RODATA=y) to remain write-protected?
> 
> Sure, there's no reason not to split this into two entries. It'll
> require some reworking of the function calls to get it right,
> obviously.

Gentle ping, arm is still oopsing when the function tracer is
enabled at boot time.

Tested on bcm2835 (RPiB+) with current mainline tree
(githash 90368a37fbbe) and bcm2835_defconfig.

arm64 seems to be fine, tested on bcm2837 (RPi3) with same tree and
arm64 defconfig plus function tracer enabled.

earlycon log on bvm2835:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.18.0-rc2+ (hias@camel2) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1 Fri Jun 29 10:55:21 CEST 2018
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] OF: fdt: Machine model: Raspberry Pi Model B Plus Rev 1.2
[    0.000000] earlycon: pl11 at MMIO32 0x20201000 (options '')
[    0.000000] bootconsole [pl11] enabled
[    0.000000] Memory policy: Data cache writeback
[    0.000000] cma: Reserved 32 MiB at 0x13c00000
[    0.000000] CPU: All CPU(s) started in SVC mode.
[    0.000000] random: get_random_bytes called from start_kernel+0x88/0x464 with crng_init=0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 89408
[    0.000000] Kernel command line: bcm2708_fb.fbwidth=1280 bcm2708_fb.fbheight=1024 bcm2708_fb.fbswap=1 dma.dmachans=0x7f35 bcm2708.boardrev=0x10 bcm2708.serial=0x59ce1e57 bcm2708.uart_clock=48000000 bcm2708.disk_led_gpio=47 bcm2708.disk_led_active_low=0 smsc95xx.macaddr=B8:27:EB:CE:1E:57 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  root=/dev/mmcblk0p2 rw rootwait elevator=noop earlycon=pl011,mmio32,0x20201000 console=/dev/ttyAMA0,115200 ftrace=function
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Memory: 311736K/360448K available (7168K kernel code, 610K rwdata, 2328K rodata, 1024K init, 675K bss, 15944K reserved, 32768K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0xd6800000 - 0xff800000   ( 656 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd6000000   ( 352 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (8160 kB)
[    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
[    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 611 kB)
[    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 676 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 27246 entries in 80 pages
[    0.000000] Starting tracer 'function'
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000054] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns
[    0.008887] clocksource: timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns
[    0.018977] bcm2835: system timer (irq = 27)
[    0.028348] Console: colour dummy device 80x30
[    0.033444] Calibrating delay loop... 695.09 BogoMIPS (lpj=3475456)
[    0.100506] pid_max: default: 32768 minimum: 301
[    0.107059] Unable to handle kernel paging request at virtual address c0a07250
[    0.114665] pgd = (ptrval)
[    0.117541] [c0a07250] *pgd=00a0840e(bad)
[    0.121858] Internal error: Oops: 80d [#1] ARM
[    0.126369] Modules linked in:
[    0.129491] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc2+ #1
[    0.135755] Hardware name: BCM2835
[    0.139240] PC is at uts_ns_init+0x40/0x58
[    0.143395] LR is at   (null)
[    0.146406] pc : [<c0b0fde4>]    lr : [<00000000>]    psr: 60000053
[    0.152757] sp : c0c01f88  ip : c0c07120  fp : c0c01fa4
[    0.158054] r10: d5fff9c0  r9 : c0b59a28  r8 : c0c04040
[    0.163352] r7 : c0c04048  r6 : ffffffff  r5 : c0c98980  r4 : c0c989cc
[    0.169969] r3 : c0a07250  r2 : 00000000  r1 : c0209384  r0 : d3a1b300
[    0.176586] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
[    0.183906] Control: 00c5387d  Table: 00004008  DAC: 00000051
[    0.189730] Process swapper (pid: 0, stack limit = 0x(ptrval))
[    0.195645] Stack: (0xc0c01f88 to 0xc0c02000)
[    0.200072] 1f80:                   00000004 00000186 00000000 c0c98980 c0c01ff4 c0c01fa8
[    0.208369] 1fa0: c0b00ebc c0b0fdb0 ffffffff ffffffff 00000000 c0b00864 00000000 c0b59a28
[    0.216665] 1fc0: 1297841c 00000000 00000000 c0b00330 00000051 00c0387d 00000c42 15feca00
[    0.224961] 1fe0: 410fb767 00c5387d 00000000 c0c01ff8 00000000 c0b00b40 00000000 00000000
[    0.233301] [<c0b0fde4>] (uts_ns_init) from [<c0b00ebc>] (start_kernel+0x388/0x464)
[    0.241081] [<c0b00ebc>] (start_kernel) from [<00000000>] (  (null))
[    0.247541] Code: e3a03701 e59f0014 ebdbe4f1 e59f3010 (e5830000) 
[    0.253755] ---[ end trace fe6d5629dd3ec6e3 ]---
[    0.258446] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.265255] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
[    2.749496] random: fast init done

so long,

Hias
Steven Rostedt June 29, 2018, 3:16 p.m. UTC | #13
On Fri, 29 Jun 2018 16:47:14 +0200
Matthias Reichl <hias@horus.com> wrote:

> On Tue, Dec 05, 2017 at 12:14:46PM -0800, Kees Cook wrote:
> > On Tue, Dec 5, 2017 at 12:09 PM, Russell King - ARM Linux
> > <linux@armlinux.org.uk> wrote:  
> > > On Tue, Dec 05, 2017 at 11:35:59AM -0800, Kees Cook wrote:  
> > >> We don't _need_ to, but they're all contiguous, so the ro_perms array
> > >> used by set_kernel_text_*() is actually only a single entry:
> > >>
> > >> static struct section_perm ro_perms[] = {
> > >>         /* Make kernel code and rodata RX (set RO). */
> > >>         {
> > >>                 .name   = "text/rodata RO",
> > >>                 .start  = (unsigned long)_stext,
> > >>                 .end    = (unsigned long)__init_begin,
> > >> ...  
> > >
> > > Well, they may not be contiguous - it depends on DEBUG_ALIGN_RODATA.  
> > 
> > Maybe I'm picking a slightly wrong word. I guess I meant adjacent. The
> > range _stext to __init_begin is all read-only, though there may be
> > padding (controlled by DEBUG_ALIGN_RODATA), to allow a split for NX
> > markings on rodata.
> >   
> > > Either way, we have __start_rodata_section_aligned, which is either
> > > the start of the read-only data section, or the start of the first
> > > section beyond __start_rodata if DEBUG_ALIGN_RODATA is not set.
> > >
> > > Given that __start_rodata_section_aligned will always be less than
> > > __init_begin, is there any reason not to make the above end at
> > > __start_rodata_section_aligned, thereby allowing more of the read-only
> > > data (in the case of DEBUG_ALIGN_RODATA=n) or all of the read-only
> > > data (in the case of DEBUG_ALIGN_RODATA=y) to remain write-protected?  
> > 
> > Sure, there's no reason not to split this into two entries. It'll
> > require some reworking of the function calls to get it right,
> > obviously.  
> 
> Gentle ping, arm is still oopsing when the function tracer is
> enabled at boot time.
> 

I take it that my patch never got applied:

 http://lkml.kernel.org/r/20180621124710.453ee0ae@gandalf.local.home

-- Steve


> Tested on bcm2835 (RPiB+) with current mainline tree
> (githash 90368a37fbbe) and bcm2835_defconfig.
> 
> arm64 seems to be fine, tested on bcm2837 (RPi3) with same tree and
> arm64 defconfig plus function tracer enabled.
Matthias Reichl June 29, 2018, 5:29 p.m. UTC | #14
On Fri, Jun 29, 2018 at 11:16:58AM -0400, Steven Rostedt wrote:
> On Fri, 29 Jun 2018 16:47:14 +0200
> Matthias Reichl <hias@horus.com> wrote:
> 
> > On Tue, Dec 05, 2017 at 12:14:46PM -0800, Kees Cook wrote:
> > > On Tue, Dec 5, 2017 at 12:09 PM, Russell King - ARM Linux
> > > <linux@armlinux.org.uk> wrote:  
> > > > On Tue, Dec 05, 2017 at 11:35:59AM -0800, Kees Cook wrote:  
> > > >> We don't _need_ to, but they're all contiguous, so the ro_perms array
> > > >> used by set_kernel_text_*() is actually only a single entry:
> > > >>
> > > >> static struct section_perm ro_perms[] = {
> > > >>         /* Make kernel code and rodata RX (set RO). */
> > > >>         {
> > > >>                 .name   = "text/rodata RO",
> > > >>                 .start  = (unsigned long)_stext,
> > > >>                 .end    = (unsigned long)__init_begin,
> > > >> ...  
> > > >
> > > > Well, they may not be contiguous - it depends on DEBUG_ALIGN_RODATA.  
> > > 
> > > Maybe I'm picking a slightly wrong word. I guess I meant adjacent. The
> > > range _stext to __init_begin is all read-only, though there may be
> > > padding (controlled by DEBUG_ALIGN_RODATA), to allow a split for NX
> > > markings on rodata.
> > >   
> > > > Either way, we have __start_rodata_section_aligned, which is either
> > > > the start of the read-only data section, or the start of the first
> > > > section beyond __start_rodata if DEBUG_ALIGN_RODATA is not set.
> > > >
> > > > Given that __start_rodata_section_aligned will always be less than
> > > > __init_begin, is there any reason not to make the above end at
> > > > __start_rodata_section_aligned, thereby allowing more of the read-only
> > > > data (in the case of DEBUG_ALIGN_RODATA=n) or all of the read-only
> > > > data (in the case of DEBUG_ALIGN_RODATA=y) to remain write-protected?  
> > > 
> > > Sure, there's no reason not to split this into two entries. It'll
> > > require some reworking of the function calls to get it right,
> > > obviously.  
> > 
> > Gentle ping, arm is still oopsing when the function tracer is
> > enabled at boot time.
> > 
> 
> I take it that my patch never got applied:
> 
>  http://lkml.kernel.org/r/20180621124710.453ee0ae@gandalf.local.home

Yes, sorry, forgot to include this info in my mail.

Your patch no longer applies cleanly - a8e53c151fe7a added a
debug_checkwx() to mark_rodata_ro() - but when applying it manually
it still fixes the oops.

so long,

Hias
> 
> -- Steve
> 
> 
> > Tested on bcm2835 (RPiB+) with current mainline tree
> > (githash 90368a37fbbe) and bcm2835_defconfig.
> > 
> > arm64 seems to be fine, tested on bcm2837 (RPi3) with same tree and
> > arm64 defconfig plus function tracer enabled.
>
diff mbox

Patch

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ad80548..fd75f38 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -745,19 +745,29 @@  static int __mark_rodata_ro(void *unused)
 	return 0;
 }
 
+static int kernel_set_to_readonly;
+
 void mark_rodata_ro(void)
 {
+	kernel_set_to_readonly = 1;
+
 	stop_machine(__mark_rodata_ro, NULL, NULL);
 }
 
 void set_kernel_text_rw(void)
 {
+	if (!kernel_set_to_readonly)
+		return;
+
 	set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false,
 				current->active_mm);
 }
 
 void set_kernel_text_ro(void)
 {
+	if (!kernel_set_to_readonly)
+		return;
+
 	set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true,
 				current->active_mm);
 }