diff mbox series

m68k 54418 fails to execute user space

Message ID c252ab82-dc4a-4267-84c3-4a20471524c5@yoseli.org (mailing list archive)
State New
Headers show
Series m68k 54418 fails to execute user space | expand

Commit Message

Jean-Michel Hautbois June 19, 2024, 12:29 p.m. UTC
Hello !

I am (still :-)) trying to make an upstream kernel work on my mcf54418 
custom board. The kernel boots nicely now, but I have an issue when 
executing binaries.

I tried uclibc-ng and glibc, with similar results.

First, I need a patch to change the ELF_ET_DYN_BASE value [1].
Next, I switched from jffs2 to ubifs, but the behavior is the same, with 
one or the other (just that jffs2 needs an extra patch to use 
dirty_folio callback) [2].

Now, I wrote a very dumb and classical program:

$> cat hello.c
#include <stdio.h>

int main()
{
	printf("Hello World on coldfire\n");
}

And I linked it statically, and dynamically (later called hello-stat and 
hello-dyn).

-- I added another patch to help debug the issue:
  }

Now comes the (fun ?) part.

When I execute /bin/bash as my init, and I launch the hello-dyn, I get:
[    4.140000] Run /bin/bash as init process
[    4.150000]   with arguments:
[    4.150000]     /bin/bash
[    4.150000]   with environment:
[    4.160000]     HOME=/
[    4.160000]     TERM=linux
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-5.2# [  166.510000] random: crng init done

bash-5.2# /bin/hello-dyn
Hello World on coldfir[  382.450000] page: refcount:2 mapcount:1 
mapping:4206dc84 index:0x94 pfn:0x20ac9
e
[  382.460000] aops:0x413b1470 ino:9f dentry name:"libc.so.6"
[  382.460000] flags: 0x38(uptodate|dirty|lru|zone=0)
[  382.470000] raw: 00000038 4fed43dc 4fed4544 4206dc84 00000094 
00000000 00000000 00000002
[  382.480000] raw: 41592000
[  382.480000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
[  382.490000] kernel BUG at fs/ubifs/file.c:1477!
[  382.490000] *** TRAP #7 ***   FORMAT=4
[  382.490000] Current process id is 26
[  382.490000] BAD KERNEL TRAP: 00000000
[  382.490000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
[  382.490000] SR: 2010  SP: 4286bda4  a2: 41f24510
[  382.490000] d0: 00000023    d1: 000000b9    d2: 4fed4444    d3: 6014c000
[  382.490000] d4: 41593419    d5: 60150000    a0: 4286bce8    a1: 414d5180
[  382.490000] Process hello-dyn (pid: 26, task=41f24510)
[  382.490000] Frame format=4 eff addr=41450e8d pc=414674a3
[  382.490000] Stack from 4286bde0:
[  382.490000]         000005c5 4145aeed 41467485 4fed4444 410586de 
4206dc84 4fed4444 41593419
[  382.490000]         4286bf1a 41074196 4fed4444 00000000 ffffffff 
fffffffe 4286be9e 60150000
[  382.490000]         00000001 41387c58 41073de0 4107797c 00000001 
4283f184 ffffffff 42884600
[  382.490000]         60150000 42884600 4283a3d0 4286a000 4283a3fc 
00000000 00000000 00000000
[  382.490000]         00000000 4286bf5c 41074456 4286bf1a 4283f184 
6014c000 60150000 4286be9e
[  382.490000]         4283a3fc 4286bef6 4286bf1a 4102a05a 4283a3d4 
4283f344 4283a3d0 42830000
[  382.490000] Call Trace: [<410586de>] folio_mark_dirty+0x44/0x58
[  382.490000]  [<41074196>] unmap_page_range+0x3b6/0x602
[  382.490000]  [<41387c58>] mas_find+0x0/0xfa
[  382.490000]  [<41073de0>] unmap_page_range+0x0/0x602
[  382.490000]  [<4107797c>] vma_next+0x0/0x14
[  382.490000]  [<41074456>] unmap_vmas+0x74/0x98
[  382.490000]  [<4102a05a>] up_read+0x0/0x36
[  382.490000]  [<41079ab2>] exit_mmap+0xd4/0x1c0
[  382.490000]  [<41009300>] arch_local_irq_enable+0x0/0xc
[  382.490000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
[  382.490000]  [<41006b76>] __mmput+0x2e/0x86
[  382.490000]  [<4100a080>] do_exit+0x21e/0x6e8
[  382.490000]  [<4100a6c8>] sys_exit_group+0x0/0x14
[  382.490000]  [<4100a686>] do_group_exit+0x22/0x64
[  382.490000]  [<4100a6dc>] pid_child_should_wake+0x0/0x56
[  382.490000]  [<41005848>] system_call+0x54/0xa8
[  382.490000]
[  382.490000] Code: 74a3 4879 4145 0e8d 4eb9 4139 adb8 4e47 <241f> 4e75 
91c8 f4e8 5288 f4e8 5288 f4e8 5288 f4e8 41e8 000d b1fc 0000 07f0 63e6
[  382.490000] Disabling lock debugging due to kernel taint
[  382.490000] note: hello-dyn[26] exited with irqs disabled
[  382.500000] Fixing recursive fault but reboot is needed!
[  387.520000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: folio->private != NULL, in fs/ubifs/file.c:1016
[  387.530000] UBIFS warning (ubi0:0 pid 24): ubifs_ro_mode: switched to 
read-only mode, error -22
[  387.540000] CPU: 0 PID: 24 Comm: kworker/u4:1 Tainted: G      D 
      6.10.0-rc4-00050-g21c7ef993c85 #491
[  387.540000] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
[  387.540000] Stack from 41c7fc1c:
[  387.540000]         41c7fc1c 4145ed0d 4145ed0d 00000000 0012d0a0 
41e4d000 413a271a 4145ed0d
[  387.540000]         410fbf84 41e4d000 ffffffea 414674a3 000003f8 
4fed4444 41c7fd72 41c7fcea
[  387.540000]         41c7fce6 41c7fd72 00002230 41058b72 41051efc 
41c7fd72 41051efc 41c7fd72
[  387.540000]         41058e74 4fed4444 41c7fd72 00000000 00000064 
00000400 4281c200 4206dc84
[  387.540000]         00000000 410fb9dc 4206dc84 41c7fd72 410fbf00 
00000000 41059f34 4206dc84
[  387.540000]         41c7fd72 4206dbac 00000400 00000000 00000004 
41c7fd72 4206dbac 41c7fd72
[  387.540000] Call Trace: [<413a271a>] dump_stack+0xc/0x10
[  387.540000]  [<410fbf84>] ubifs_writepage+0x84/0x1e0
[  387.540000]  [<41058b72>] writeback_iter+0x0/0x2b0
[  387.540000]  [<41051efc>] folio_unlock+0x0/0x5c
[  387.540000]  [<41051efc>] folio_unlock+0x0/0x5c
[  387.540000]  [<41058e74>] write_cache_pages+0x52/0x6e
[  387.540000]  [<410fb9dc>] ubifs_writepages+0x14/0x1a
[  387.540000]  [<410fbf00>] ubifs_writepage+0x0/0x1e0
[  387.540000]  [<41059f34>] do_writepages+0x76/0x192
[  387.540000]  [<41024378>] vruntime_eligible+0x70/0x8c
[  387.540000]  [<410b7b06>] __writeback_single_inode+0x46/0x182
[  387.540000]  [<410b7e9a>] writeback_sb_inodes+0x2c/0x31a
[  387.540000]  [<410b7ffc>] writeback_sb_inodes+0x18e/0x31a
[  387.540000]  [<4109784a>] super_trylock_shared+0x0/0x40
[  387.540000]  [<410b7e6e>] writeback_sb_inodes+0x0/0x31a
[  387.540000]  [<4102a05a>] up_read+0x0/0x36
[  387.540000]  [<41020000>] devm_register_sys_off_handler+0x18/0x64
[  387.540000]  [<410b81f4>] __writeback_inodes_wb+0x6c/0x9e
[  387.540000]  [<41059ce8>] wb_over_bg_thresh+0x0/0x7c
[  387.540000]  [<410b8332>] queue_io+0x0/0xaa
[  387.540000]  [<410b8484>] wb_writeback.isra.0+0xa8/0x132
[  387.540000]  [<410b84ea>] wb_writeback.isra.0+0x10e/0x132
[  387.540000]  [<410b72d8>] arch_local_irq_enable+0x0/0xc
[  387.540000]  [<410b72cc>] arch_local_irq_disable+0x0/0xc
[  387.540000]  [<410b740a>] finish_writeback_work+0x0/0x44
[  387.540000]  [<410b83dc>] wb_writeback.isra.0+0x0/0x132
[  387.540000]  [<410b8ac4>] wb_workfn+0x188/0x286
[  387.540000]  [<41393fba>] sized_strscpy+0x0/0x13e
[  387.540000]  [<41016312>] list_del_init+0x0/0x18
[  387.540000]  [<41021ccc>] arch_local_irq_disable+0x0/0x12
[  387.540000]  [<41018382>] process_scheduled_works+0x172/0x220
[  387.540000]  [<410162e8>] arch_local_irq_disable+0x0/0xc
[  387.540000]  [<41017b8c>] assign_work+0x0/0x54
[  387.540000]  [<41018afa>] worker_thread+0x0/0x212
[  387.540000]  [<41018cd2>] worker_thread+0x1d8/0x212
[  387.540000]  [<4101d3f4>] kthread_exit+0x0/0x14
[  387.540000]  [<41018afa>] worker_thread+0x0/0x212
[  387.540000]  [<4101d6ac>] kthread+0x9a/0xa4
[  387.540000]  [<4101d612>] kthread+0x0/0xa4
[  387.540000]  [<410033d0>] ret_from_kernel_thread+0xc/0x14
[  387.540000]
[  387.550000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: !c->ro_media && !c->ro_mount, in fs/ubifs/journal.c:108
[  387.570000] UBIFS error (ubi0:0 pid 24): make_reservation: cannot 
reserve 4144 bytes in jhead 2, error -30
[  387.580000] UBIFS error (ubi0:0 pid 24): do_writepage: cannot write 
folio 148 of inode 159, error -30
[  387.590000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: folio->private != NULL, in fs/ubifs/file.c:944
[  387.600000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:550

If I reboot, and launch hello-stat from bash, I get:
[    4.140000] Run /bin/bash as init process
[    4.150000]   with arguments:
[    4.150000]     /bin/bash
[    4.150000]   with environment:
[    4.160000]     HOME=/
[    4.160000]     TERM=linux
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-5.2# /bin/hello-stat

I don't have a printf() printed, and a second launch results in:
bash-5.2# /bin/hello-stat
[   18.540000] page: refcount:2 mapcount:1 mapping:4207134c index:0x29 
pfn:0x27f23
[   18.550000] aops:0x413b1470 ino:8f dentry name:"hello-stat"
[   18.560000] flags: 0x38(uptodate|dirty|lru|zone=0)
[   18.560000] raw: 00000038 4fed5a14 414f3fdc 4207134c 00000029 
00000000 00000000 00000002
[   18.570000] raw: 4fe46000
[   18.570000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
[   18.580000] kernel BUG at fs/ubifs/file.c:1477!
[   18.580000] *** TRAP #7 ***   FORMAT=4
[   18.580000] Current process id is 28
[   18.580000] BAD KERNEL TRAP: 00000000
[   18.580000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
[   18.580000] SR: 2010  SP: 4288bc9c  a2: 41f24a10
[   18.580000] d0: 00000023    d1: 000000b8    d2: 4ffda0ec    d3: 80054000
[   18.580000] d4: 4fe47419    d5: 8005a000    a0: 4288bbe0    a1: 414d5180
[   18.580000] Process hello-stat (pid: 28, task=41f24a10)
[   18.580000] Frame format=4 eff addr=41450e8d pc=414674a3
[   18.580000] Stack from 4288bcd8:
[   18.580000]         000005c5 4145aeed 41467485 4ffda0ec 410586de 
4207134c 4ffda0ec 4fe47419
[   18.580000]         4288be12 41074196 4ffda0ec 00000000 ffffffff 
fffffffe 4288bd96 8005a000
[   18.580000]         00000001 41387c58 41073de0 4107797c 00000001 
4283eb64 ffffffff 42890800
[   18.580000]         8005a000 42890800 4283a1f0 4288a000 4283a21c 
00000000 00000000 00000000
[   18.580000]         00000000 4288be54 41074456 4288be12 4283eb64 
80052000 8005a000 4288bd96
[   18.580000]         4283a21c 4288bdee 4288be12 4102a05a 4283a1f4 
4283eaf4 4283a1f0 42830000
[   18.580000] Call Trace: [<410586de>] folio_mark_dirty+0x44/0x58
[   18.580000]  [<41074196>] unmap_page_range+0x3b6/0x602
[   18.580000]  [<41387c58>] mas_find+0x0/0xfa
[   18.580000]  [<41073de0>] unmap_page_range+0x0/0x602
[   18.580000]  [<4107797c>] vma_next+0x0/0x14
[   18.580000]  [<41074456>] unmap_vmas+0x74/0x98
[   18.580000]  [<4102a05a>] up_read+0x0/0x36
[   18.580000]  [<41079ab2>] exit_mmap+0xd4/0x1c0
[   18.580000]  [<41009300>] arch_local_irq_enable+0x0/0xc
[   18.580000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
[   18.580000]  [<41014283>] __sys_setreuid+0x99/0x138
[   18.580000]  [<41006b76>] __mmput+0x2e/0x86
[   18.580000]  [<4100a080>] do_exit+0x21e/0x6e8
[   18.580000]  [<4100a686>] do_group_exit+0x22/0x64
[   18.580000]  [<4101276e>] get_signal+0x420/0x502
[   18.580000]  [<41003d68>] test_ti_thread_flag+0x0/0x14
[   18.580000]  [<41004444>] do_notify_resume+0x4a/0x4e2
[   18.580000]  [<41011cac>] force_sig_fault_to_task+0x40/0x4c
[   18.580000]  [<410058fe>] Lsignal_return+0x14/0x26
[   18.580000]
[   18.580000] Code: 74a3 4879 4145 0e8d 4eb9 4139 adb8 4e47 <241f> 4e75 
91c8 f4e8 5288 f4e8 5288 f4e8 5288 f4e8 41e8 000d b1fc 0000 07f0 63e6
[   18.580000] Disabling lock debugging due to kernel taint
[   18.580000] note: hello-stat[28] exited with irqs disabled
[   18.590000] Fixing recursive fault but reboot is needed!
[   23.600000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: folio->private != NULL, in fs/ubifs/file.c:1016
[   23.610000] UBIFS warning (ubi0:0 pid 24): ubifs_ro_mode: switched to 
read-only mode, error -22
[   23.620000] CPU: 0 PID: 24 Comm: kworker/u4:1 Tainted: G      D 
      6.10.0-rc4-00050-g21c7ef993c85 #491
[   23.620000] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
[   23.620000] Stack from 41c7fc1c:
[   23.620000]         41c7fc1c 4145ed0d 4145ed0d 00000000 00233cf8 
41e4d000 413a271a 4145ed0d
[   23.620000]         410fbf84 41e4d000 ffffffea 414674a3 000003f8 
4ffda0ec 41c7fd72 41c7fcea
[   23.620000]         41c7fce6 41c7fd72 ffff9408 41058b72 41051efc 
41c7fd72 41051efc 41c7fd72
[   23.620000]         41058e74 4ffda0ec 41c7fd72 00000000 00000064 
00000400 4281c200 4207134c
[   23.620000]         00000000 410fb9dc 4207134c 41c7fd72 410fbf00 
00000000 41059f34 4207134c
[   23.620000]         41c7fd72 42071274 00000400 00000000 00000004 
41c7fd72 42071274 41c7fd72
[   23.620000] Call Trace: [<413a271a>] dump_stack+0xc/0x10
[   23.620000]  [<410fbf84>] ubifs_writepage+0x84/0x1e0
[   23.620000]  [<41058b72>] writeback_iter+0x0/0x2b0
[   23.620000]  [<41051efc>] folio_unlock+0x0/0x5c
[   23.620000]  [<41051efc>] folio_unlock+0x0/0x5c
[   23.620000]  [<41058e74>] write_cache_pages+0x52/0x6e
[   23.620000]  [<410fb9dc>] ubifs_writepages+0x14/0x1a
[   23.620000]  [<410fbf00>] ubifs_writepage+0x0/0x1e0
[   23.620000]  [<41059f34>] do_writepages+0x76/0x192
[   23.620000]  [<41024378>] vruntime_eligible+0x70/0x8c
[   23.620000]  [<410b7b06>] __writeback_single_inode+0x46/0x182
[   23.620000]  [<410b7e9a>] writeback_sb_inodes+0x2c/0x31a
[   23.620000]  [<410b7ffc>] writeback_sb_inodes+0x18e/0x31a
[   23.620000]  [<4109784a>] super_trylock_shared+0x0/0x40
[   23.620000]  [<410b7e6e>] writeback_sb_inodes+0x0/0x31a
[   23.620000]  [<4102a05a>] up_read+0x0/0x36
[   23.620000]  [<41020000>] devm_register_sys_off_handler+0x18/0x64
[   23.620000]  [<410b81f4>] __writeback_inodes_wb+0x6c/0x9e
[   23.620000]  [<41059ce8>] wb_over_bg_thresh+0x0/0x7c
[   23.620000]  [<410b8332>] queue_io+0x0/0xaa
[   23.620000]  [<410b8484>] wb_writeback.isra.0+0xa8/0x132
[   23.620000]  [<410b84ea>] wb_writeback.isra.0+0x10e/0x132
[   23.620000]  [<410b72d8>] arch_local_irq_enable+0x0/0xc
[   23.620000]  [<410b72cc>] arch_local_irq_disable+0x0/0xc
[   23.620000]  [<410b740a>] finish_writeback_work+0x0/0x44
[   23.620000]  [<410b83dc>] wb_writeback.isra.0+0x0/0x132
[   23.620000]  [<410b8ac4>] wb_workfn+0x188/0x286
[   23.620000]  [<41393fba>] sized_strscpy+0x0/0x13e
[   23.620000]  [<41016312>] list_del_init+0x0/0x18
[   23.620000]  [<41021ccc>] arch_local_irq_disable+0x0/0x12
[   23.620000]  [<41018382>] process_scheduled_works+0x172/0x220
[   23.620000]  [<410162e8>] arch_local_irq_disable+0x0/0xc
[   23.620000]  [<41017b8c>] assign_work+0x0/0x54
[   23.620000]  [<41018afa>] worker_thread+0x0/0x212
[   23.620000]  [<41018cd2>] worker_thread+0x1d8/0x212
[   23.620000]  [<4101d3f4>] kthread_exit+0x0/0x14
[   23.620000]  [<41018afa>] worker_thread+0x0/0x212
[   23.620000]  [<4101d6ac>] kthread+0x9a/0xa4
[   23.620000]  [<4101d612>] kthread+0x0/0xa4
[   23.620000]  [<410033d0>] ret_from_kernel_thread+0xc/0x14
[   23.620000]
[   23.630000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: !c->ro_media && !c->ro_mount, in fs/ubifs/journal.c:108
[   23.650000] UBIFS error (ubi0:0 pid 24): make_reservation: cannot 
reserve 4144 bytes in jhead 2, error -30
[   23.660000] UBIFS error (ubi0:0 pid 24): do_writepage: cannot write 
folio 41 of inode 143, error -30
[   23.670000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: folio->private != NULL, in fs/ubifs/file.c:944
[   23.680000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
assert failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:550

Now, if I change the init process, and call /bin/hello-dyn instead:
[    4.140000] Run /bin/hello-dyn as init process
[    4.150000]   with arguments:
[    4.150000]     /bin/hello-dyn
[    4.150000]   with environment:
[    4.160000]     HOME=/
[    4.160000]     TERM=linux
Hello World on coldfire
[    4.470000] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000000
[    4.470000] CPU: 0 PID: 1 Comm: hello-dyn Not tainted 
6.10.0-rc4-00050-g21c7ef993c85 #491
[    4.470000] Stack from 4186df2c:
[    4.470000]         4186df2c 4145ed0d 4145ed0d 41009300 00000001 
000000ff 413a271a 4145ed0d
[    4.470000]         4139a532 41009300 00000000 000000ff 00000001 
418a0010 00000000 410092f4
[    4.470000]         4186dfac 41009fd4 41452c80 00000000 00000000 
4100a6c8 00000000 00000001
[    4.470000]         00000000 00000001 41872010 6014f468 6014c000 
41872010 6014f468 6014c000
[    4.470000]         601501cc 4100a686 00000000 4186c000 00000000 
4100a6dc 00000000 41005848
[    4.470000]         00000000 00000001 000000f7 00000000 00000001 
60161700 60161700 601501d4
[    4.470000] Call Trace: [<41009300>] arch_local_irq_enable+0x0/0xc
[    4.470000]  [<413a271a>] dump_stack+0xc/0x10
[    4.470000]  [<4139a532>] panic+0xc6/0x25c
[    4.470000]  [<41009300>] arch_local_irq_enable+0x0/0xc
[    4.470000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
[    4.470000]  [<41009fd4>] do_exit+0x172/0x6e8
[    4.470000]  [<4100a6c8>] sys_exit_group+0x0/0x14
[    4.470000]  [<4100a686>] do_group_exit+0x22/0x64
[    4.470000]  [<4100a6dc>] pid_child_should_wake+0x0/0x56
[    4.470000]  [<41005848>] system_call+0x54/0xa8
[    4.470000]
[    4.470000] ---[ end Kernel panic - not syncing: Attempted to kill 
init! exitcode=0x00000000 ]---

Which is expected. So, I modified hello-dyn a bit:
$> cat hello.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
	printf("Hello World on coldfire, execute bash !\n");
	/* Let's now execute /bin/bash with execve */
	char *args[] = {"/bin/bash", NULL};
	execve(args[0], args, NULL);
	/* If we reach here, it means execve failed */
	perror("execve");
	exit(EXIT_FAILURE);
}

And now, I get:
Hello World on coldfire, execute bash !
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-5.2# ls
[  122.030000] page: refcount:3 mapcount:2 mapping:4206cdc4 index:0x94 
pfn:0x20ab2
[  122.030000] aops:0x413b1470 ino:9f dentry name:"libc.so.6"
[  122.040000] flags: 0x38(uptodate|dirty|lru|zone=0)
[  122.040000] raw: 00000038 4fed3fec 4fed4208 4206cdc4 00000094 
00000000 00000001 00000003
[  122.050000] raw: 41564000
[  122.050000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
[  122.060000] kernel BUG at fs/ubifs/file.c:1477!
[  122.060000] *** TRAP #7 ***   FORMAT=4
[  122.060000] Current process id is 26
[  122.060000] BAD KERNEL TRAP: 00000000
[  122.060000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
[  122.060000] SR: 2010  SP: 42871cec  a2: 41f24510
[  122.060000] d0: 00000023    d1: 000000b9    d2: 4fed4108    d3: 601ce000
[  122.060000] d4: 41565411    d5: 601d2000    a0: 42871c30    a1: 414d5180
[  122.060000] Process bash (pid: 26, task=41f24510)
[  122.060000] Frame format=4 eff addr=41450e8d pc=414674a3
<snip>

I can see the page refcount growing, and a difference between it and 
mapcount of 1. But I can't get exactly what is going on.

Any pointer, ideas, will be greatly appreciated, as I am out of clue 
right now :-(.

Thanks !

[1]: 
https://lore.kernel.org/linux-m68k/82ba7ea3-d290-494c-b3e3-93d9dc7c6c0d@yoseli.org/
[2]: 
https://lore.kernel.org/linux-m68k/0b657056-3a7f-46ba-8e99-a8fe2203901f@yoseli.org/

Comments

Jean-Michel Hautbois June 24, 2024, 8:56 a.m. UTC | #1
Hi all,

I keep going with my investigations, hoping that someone will have an 
idea to help :-) !

I think I found something, but not sure:

On 19/06/2024 14:29, Jean-Michel Hautbois wrote:
> Hello !
> 
> I am (still :-)) trying to make an upstream kernel work on my mcf54418 
> custom board. The kernel boots nicely now, but I have an issue when 
> executing binaries.
> 
> I tried uclibc-ng and glibc, with similar results.
> 
> First, I need a patch to change the ELF_ET_DYN_BASE value [1].
> Next, I switched from jffs2 to ubifs, but the behavior is the same, with 
> one or the other (just that jffs2 needs an extra patch to use 
> dirty_folio callback) [2].
> 
> Now, I wrote a very dumb and classical program:
> 
> $> cat hello.c
> #include <stdio.h>
> 
> int main()
> {
>      printf("Hello World on coldfire\n");
> }
> 
> And I linked it statically, and dynamically (later called hello-stat and 
> hello-dyn).
> 
> -- I added another patch to help debug the issue:
> diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
> index a1f46919934c..d01bd3276204 100644
> --- a/fs/ubifs/file.c
> +++ b/fs/ubifs/file.c
> @@ -1474,6 +1474,7 @@ static bool ubifs_dirty_folio(struct address_space 
> *mapping,
>           * An attempt to dirty a page without budgeting for it - should 
> not
>           * happen.
>           */
> +       VM_BUG_ON_FOLIO(ret != false, folio);
>          ubifs_assert(c, ret == false);
>          return ret;
>   }
> 
> Now comes the (fun ?) part.
> 
> When I execute /bin/bash as my init, and I launch the hello-dyn, I get:
> [    4.140000] Run /bin/bash as init process
> [    4.150000]   with arguments:
> [    4.150000]     /bin/bash
> [    4.150000]   with environment:
> [    4.160000]     HOME=/
> [    4.160000]     TERM=linux
> bash: cannot set terminal process group (-1): Inappropriate ioctl for 
> device
> bash: no job control in this shell
> bash-5.2# [  166.510000] random: crng init done
> 
> bash-5.2# /bin/hello-dyn
> Hello World on coldfir[  382.450000] page: refcount:2 mapcount:1 
> mapping:4206dc84 index:0x94 pfn:0x20ac9
> e
> [  382.460000] aops:0x413b1470 ino:9f dentry name:"libc.so.6"
> [  382.460000] flags: 0x38(uptodate|dirty|lru|zone=0)
> [  382.470000] raw: 00000038 4fed43dc 4fed4544 4206dc84 00000094 
> 00000000 00000000 00000002
> [  382.480000] raw: 41592000
> [  382.480000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
> [  382.490000] kernel BUG at fs/ubifs/file.c:1477!
> [  382.490000] *** TRAP #7 ***   FORMAT=4
> [  382.490000] Current process id is 26
> [  382.490000] BAD KERNEL TRAP: 00000000
> [  382.490000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
> [  382.490000] SR: 2010  SP: 4286bda4  a2: 41f24510
> [  382.490000] d0: 00000023    d1: 000000b9    d2: 4fed4444    d3: 6014c000
> [  382.490000] d4: 41593419    d5: 60150000    a0: 4286bce8    a1: 414d5180
> [  382.490000] Process hello-dyn (pid: 26, task=41f24510)
> [  382.490000] Frame format=4 eff addr=41450e8d pc=414674a3
> [  382.490000] Stack from 4286bde0:
> [  382.490000]         000005c5 4145aeed 41467485 4fed4444 410586de 
> 4206dc84 4fed4444 41593419
> [  382.490000]         4286bf1a 41074196 4fed4444 00000000 ffffffff 
> fffffffe 4286be9e 60150000
> [  382.490000]         00000001 41387c58 41073de0 4107797c 00000001 
> 4283f184 ffffffff 42884600
> [  382.490000]         60150000 42884600 4283a3d0 4286a000 4283a3fc 
> 00000000 00000000 00000000
> [  382.490000]         00000000 4286bf5c 41074456 4286bf1a 4283f184 
> 6014c000 60150000 4286be9e
> [  382.490000]         4283a3fc 4286bef6 4286bf1a 4102a05a 4283a3d4 
> 4283f344 4283a3d0 42830000
> [  382.490000] Call Trace: [<410586de>] folio_mark_dirty+0x44/0x58
> [  382.490000]  [<41074196>] unmap_page_range+0x3b6/0x602
> [  382.490000]  [<41387c58>] mas_find+0x0/0xfa
> [  382.490000]  [<41073de0>] unmap_page_range+0x0/0x602
> [  382.490000]  [<4107797c>] vma_next+0x0/0x14
> [  382.490000]  [<41074456>] unmap_vmas+0x74/0x98
> [  382.490000]  [<4102a05a>] up_read+0x0/0x36
> [  382.490000]  [<41079ab2>] exit_mmap+0xd4/0x1c0
> [  382.490000]  [<41009300>] arch_local_irq_enable+0x0/0xc
> [  382.490000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
> [  382.490000]  [<41006b76>] __mmput+0x2e/0x86
> [  382.490000]  [<4100a080>] do_exit+0x21e/0x6e8
> [  382.490000]  [<4100a6c8>] sys_exit_group+0x0/0x14
> [  382.490000]  [<4100a686>] do_group_exit+0x22/0x64
> [  382.490000]  [<4100a6dc>] pid_child_should_wake+0x0/0x56
> [  382.490000]  [<41005848>] system_call+0x54/0xa8
> [  382.490000]
> [  382.490000] Code: 74a3 4879 4145 0e8d 4eb9 4139 adb8 4e47 <241f> 4e75 
> 91c8 f4e8 5288 f4e8 5288 f4e8 5288 f4e8 41e8 000d b1fc 0000 07f0 63e6
> [  382.490000] Disabling lock debugging due to kernel taint
> [  382.490000] note: hello-dyn[26] exited with irqs disabled
> [  382.500000] Fixing recursive fault but reboot is needed!
> [  387.520000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: folio->private != NULL, in fs/ubifs/file.c:1016
> [  387.530000] UBIFS warning (ubi0:0 pid 24): ubifs_ro_mode: switched to 
> read-only mode, error -22
> [  387.540000] CPU: 0 PID: 24 Comm: kworker/u4:1 Tainted: G      D      
> 6.10.0-rc4-00050-g21c7ef993c85 #491
> [  387.540000] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
> [  387.540000] Stack from 41c7fc1c:
> [  387.540000]         41c7fc1c 4145ed0d 4145ed0d 00000000 0012d0a0 
> 41e4d000 413a271a 4145ed0d
> [  387.540000]         410fbf84 41e4d000 ffffffea 414674a3 000003f8 
> 4fed4444 41c7fd72 41c7fcea
> [  387.540000]         41c7fce6 41c7fd72 00002230 41058b72 41051efc 
> 41c7fd72 41051efc 41c7fd72
> [  387.540000]         41058e74 4fed4444 41c7fd72 00000000 00000064 
> 00000400 4281c200 4206dc84
> [  387.540000]         00000000 410fb9dc 4206dc84 41c7fd72 410fbf00 
> 00000000 41059f34 4206dc84
> [  387.540000]         41c7fd72 4206dbac 00000400 00000000 00000004 
> 41c7fd72 4206dbac 41c7fd72
> [  387.540000] Call Trace: [<413a271a>] dump_stack+0xc/0x10
> [  387.540000]  [<410fbf84>] ubifs_writepage+0x84/0x1e0
> [  387.540000]  [<41058b72>] writeback_iter+0x0/0x2b0
> [  387.540000]  [<41051efc>] folio_unlock+0x0/0x5c
> [  387.540000]  [<41051efc>] folio_unlock+0x0/0x5c
> [  387.540000]  [<41058e74>] write_cache_pages+0x52/0x6e
> [  387.540000]  [<410fb9dc>] ubifs_writepages+0x14/0x1a
> [  387.540000]  [<410fbf00>] ubifs_writepage+0x0/0x1e0
> [  387.540000]  [<41059f34>] do_writepages+0x76/0x192
> [  387.540000]  [<41024378>] vruntime_eligible+0x70/0x8c
> [  387.540000]  [<410b7b06>] __writeback_single_inode+0x46/0x182
> [  387.540000]  [<410b7e9a>] writeback_sb_inodes+0x2c/0x31a
> [  387.540000]  [<410b7ffc>] writeback_sb_inodes+0x18e/0x31a
> [  387.540000]  [<4109784a>] super_trylock_shared+0x0/0x40
> [  387.540000]  [<410b7e6e>] writeback_sb_inodes+0x0/0x31a
> [  387.540000]  [<4102a05a>] up_read+0x0/0x36
> [  387.540000]  [<41020000>] devm_register_sys_off_handler+0x18/0x64
> [  387.540000]  [<410b81f4>] __writeback_inodes_wb+0x6c/0x9e
> [  387.540000]  [<41059ce8>] wb_over_bg_thresh+0x0/0x7c
> [  387.540000]  [<410b8332>] queue_io+0x0/0xaa
> [  387.540000]  [<410b8484>] wb_writeback.isra.0+0xa8/0x132
> [  387.540000]  [<410b84ea>] wb_writeback.isra.0+0x10e/0x132
> [  387.540000]  [<410b72d8>] arch_local_irq_enable+0x0/0xc
> [  387.540000]  [<410b72cc>] arch_local_irq_disable+0x0/0xc
> [  387.540000]  [<410b740a>] finish_writeback_work+0x0/0x44
> [  387.540000]  [<410b83dc>] wb_writeback.isra.0+0x0/0x132
> [  387.540000]  [<410b8ac4>] wb_workfn+0x188/0x286
> [  387.540000]  [<41393fba>] sized_strscpy+0x0/0x13e
> [  387.540000]  [<41016312>] list_del_init+0x0/0x18
> [  387.540000]  [<41021ccc>] arch_local_irq_disable+0x0/0x12
> [  387.540000]  [<41018382>] process_scheduled_works+0x172/0x220
> [  387.540000]  [<410162e8>] arch_local_irq_disable+0x0/0xc
> [  387.540000]  [<41017b8c>] assign_work+0x0/0x54
> [  387.540000]  [<41018afa>] worker_thread+0x0/0x212
> [  387.540000]  [<41018cd2>] worker_thread+0x1d8/0x212
> [  387.540000]  [<4101d3f4>] kthread_exit+0x0/0x14
> [  387.540000]  [<41018afa>] worker_thread+0x0/0x212
> [  387.540000]  [<4101d6ac>] kthread+0x9a/0xa4
> [  387.540000]  [<4101d612>] kthread+0x0/0xa4
> [  387.540000]  [<410033d0>] ret_from_kernel_thread+0xc/0x14
> [  387.540000]
> [  387.550000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: !c->ro_media && !c->ro_mount, in fs/ubifs/journal.c:108
> [  387.570000] UBIFS error (ubi0:0 pid 24): make_reservation: cannot 
> reserve 4144 bytes in jhead 2, error -30
> [  387.580000] UBIFS error (ubi0:0 pid 24): do_writepage: cannot write 
> folio 148 of inode 159, error -30
> [  387.590000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: folio->private != NULL, in fs/ubifs/file.c:944
> [  387.600000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:550
> 
> If I reboot, and launch hello-stat from bash, I get:
> [    4.140000] Run /bin/bash as init process
> [    4.150000]   with arguments:
> [    4.150000]     /bin/bash
> [    4.150000]   with environment:
> [    4.160000]     HOME=/
> [    4.160000]     TERM=linux
> bash: cannot set terminal process group (-1): Inappropriate ioctl for 
> device
> bash: no job control in this shell
> bash-5.2# /bin/hello-stat
> 
> I don't have a printf() printed, and a second launch results in:
> bash-5.2# /bin/hello-stat
> [   18.540000] page: refcount:2 mapcount:1 mapping:4207134c index:0x29 
> pfn:0x27f23
> [   18.550000] aops:0x413b1470 ino:8f dentry name:"hello-stat"
> [   18.560000] flags: 0x38(uptodate|dirty|lru|zone=0)
> [   18.560000] raw: 00000038 4fed5a14 414f3fdc 4207134c 00000029 
> 00000000 00000000 00000002
> [   18.570000] raw: 4fe46000
> [   18.570000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
> [   18.580000] kernel BUG at fs/ubifs/file.c:1477!
> [   18.580000] *** TRAP #7 ***   FORMAT=4
> [   18.580000] Current process id is 28
> [   18.580000] BAD KERNEL TRAP: 00000000
> [   18.580000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
> [   18.580000] SR: 2010  SP: 4288bc9c  a2: 41f24a10
> [   18.580000] d0: 00000023    d1: 000000b8    d2: 4ffda0ec    d3: 80054000
> [   18.580000] d4: 4fe47419    d5: 8005a000    a0: 4288bbe0    a1: 414d5180
> [   18.580000] Process hello-stat (pid: 28, task=41f24a10)
> [   18.580000] Frame format=4 eff addr=41450e8d pc=414674a3
> [   18.580000] Stack from 4288bcd8:
> [   18.580000]         000005c5 4145aeed 41467485 4ffda0ec 410586de 
> 4207134c 4ffda0ec 4fe47419
> [   18.580000]         4288be12 41074196 4ffda0ec 00000000 ffffffff 
> fffffffe 4288bd96 8005a000
> [   18.580000]         00000001 41387c58 41073de0 4107797c 00000001 
> 4283eb64 ffffffff 42890800
> [   18.580000]         8005a000 42890800 4283a1f0 4288a000 4283a21c 
> 00000000 00000000 00000000
> [   18.580000]         00000000 4288be54 41074456 4288be12 4283eb64 
> 80052000 8005a000 4288bd96
> [   18.580000]         4283a21c 4288bdee 4288be12 4102a05a 4283a1f4 
> 4283eaf4 4283a1f0 42830000
> [   18.580000] Call Trace: [<410586de>] folio_mark_dirty+0x44/0x58
> [   18.580000]  [<41074196>] unmap_page_range+0x3b6/0x602
> [   18.580000]  [<41387c58>] mas_find+0x0/0xfa
> [   18.580000]  [<41073de0>] unmap_page_range+0x0/0x602
> [   18.580000]  [<4107797c>] vma_next+0x0/0x14
> [   18.580000]  [<41074456>] unmap_vmas+0x74/0x98
> [   18.580000]  [<4102a05a>] up_read+0x0/0x36
> [   18.580000]  [<41079ab2>] exit_mmap+0xd4/0x1c0
> [   18.580000]  [<41009300>] arch_local_irq_enable+0x0/0xc
> [   18.580000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
> [   18.580000]  [<41014283>] __sys_setreuid+0x99/0x138
> [   18.580000]  [<41006b76>] __mmput+0x2e/0x86
> [   18.580000]  [<4100a080>] do_exit+0x21e/0x6e8
> [   18.580000]  [<4100a686>] do_group_exit+0x22/0x64
> [   18.580000]  [<4101276e>] get_signal+0x420/0x502
> [   18.580000]  [<41003d68>] test_ti_thread_flag+0x0/0x14
> [   18.580000]  [<41004444>] do_notify_resume+0x4a/0x4e2
> [   18.580000]  [<41011cac>] force_sig_fault_to_task+0x40/0x4c
> [   18.580000]  [<410058fe>] Lsignal_return+0x14/0x26
> [   18.580000]
> [   18.580000] Code: 74a3 4879 4145 0e8d 4eb9 4139 adb8 4e47 <241f> 4e75 
> 91c8 f4e8 5288 f4e8 5288 f4e8 5288 f4e8 41e8 000d b1fc 0000 07f0 63e6
> [   18.580000] Disabling lock debugging due to kernel taint
> [   18.580000] note: hello-stat[28] exited with irqs disabled
> [   18.590000] Fixing recursive fault but reboot is needed!
> [   23.600000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: folio->private != NULL, in fs/ubifs/file.c:1016
> [   23.610000] UBIFS warning (ubi0:0 pid 24): ubifs_ro_mode: switched to 
> read-only mode, error -22
> [   23.620000] CPU: 0 PID: 24 Comm: kworker/u4:1 Tainted: G      D      
> 6.10.0-rc4-00050-g21c7ef993c85 #491
> [   23.620000] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
> [   23.620000] Stack from 41c7fc1c:
> [   23.620000]         41c7fc1c 4145ed0d 4145ed0d 00000000 00233cf8 
> 41e4d000 413a271a 4145ed0d
> [   23.620000]         410fbf84 41e4d000 ffffffea 414674a3 000003f8 
> 4ffda0ec 41c7fd72 41c7fcea
> [   23.620000]         41c7fce6 41c7fd72 ffff9408 41058b72 41051efc 
> 41c7fd72 41051efc 41c7fd72
> [   23.620000]         41058e74 4ffda0ec 41c7fd72 00000000 00000064 
> 00000400 4281c200 4207134c
> [   23.620000]         00000000 410fb9dc 4207134c 41c7fd72 410fbf00 
> 00000000 41059f34 4207134c
> [   23.620000]         41c7fd72 42071274 00000400 00000000 00000004 
> 41c7fd72 42071274 41c7fd72
> [   23.620000] Call Trace: [<413a271a>] dump_stack+0xc/0x10
> [   23.620000]  [<410fbf84>] ubifs_writepage+0x84/0x1e0
> [   23.620000]  [<41058b72>] writeback_iter+0x0/0x2b0
> [   23.620000]  [<41051efc>] folio_unlock+0x0/0x5c
> [   23.620000]  [<41051efc>] folio_unlock+0x0/0x5c
> [   23.620000]  [<41058e74>] write_cache_pages+0x52/0x6e
> [   23.620000]  [<410fb9dc>] ubifs_writepages+0x14/0x1a
> [   23.620000]  [<410fbf00>] ubifs_writepage+0x0/0x1e0
> [   23.620000]  [<41059f34>] do_writepages+0x76/0x192
> [   23.620000]  [<41024378>] vruntime_eligible+0x70/0x8c
> [   23.620000]  [<410b7b06>] __writeback_single_inode+0x46/0x182
> [   23.620000]  [<410b7e9a>] writeback_sb_inodes+0x2c/0x31a
> [   23.620000]  [<410b7ffc>] writeback_sb_inodes+0x18e/0x31a
> [   23.620000]  [<4109784a>] super_trylock_shared+0x0/0x40
> [   23.620000]  [<410b7e6e>] writeback_sb_inodes+0x0/0x31a
> [   23.620000]  [<4102a05a>] up_read+0x0/0x36
> [   23.620000]  [<41020000>] devm_register_sys_off_handler+0x18/0x64
> [   23.620000]  [<410b81f4>] __writeback_inodes_wb+0x6c/0x9e
> [   23.620000]  [<41059ce8>] wb_over_bg_thresh+0x0/0x7c
> [   23.620000]  [<410b8332>] queue_io+0x0/0xaa
> [   23.620000]  [<410b8484>] wb_writeback.isra.0+0xa8/0x132
> [   23.620000]  [<410b84ea>] wb_writeback.isra.0+0x10e/0x132
> [   23.620000]  [<410b72d8>] arch_local_irq_enable+0x0/0xc
> [   23.620000]  [<410b72cc>] arch_local_irq_disable+0x0/0xc
> [   23.620000]  [<410b740a>] finish_writeback_work+0x0/0x44
> [   23.620000]  [<410b83dc>] wb_writeback.isra.0+0x0/0x132
> [   23.620000]  [<410b8ac4>] wb_workfn+0x188/0x286
> [   23.620000]  [<41393fba>] sized_strscpy+0x0/0x13e
> [   23.620000]  [<41016312>] list_del_init+0x0/0x18
> [   23.620000]  [<41021ccc>] arch_local_irq_disable+0x0/0x12
> [   23.620000]  [<41018382>] process_scheduled_works+0x172/0x220
> [   23.620000]  [<410162e8>] arch_local_irq_disable+0x0/0xc
> [   23.620000]  [<41017b8c>] assign_work+0x0/0x54
> [   23.620000]  [<41018afa>] worker_thread+0x0/0x212
> [   23.620000]  [<41018cd2>] worker_thread+0x1d8/0x212
> [   23.620000]  [<4101d3f4>] kthread_exit+0x0/0x14
> [   23.620000]  [<41018afa>] worker_thread+0x0/0x212
> [   23.620000]  [<4101d6ac>] kthread+0x9a/0xa4
> [   23.620000]  [<4101d612>] kthread+0x0/0xa4
> [   23.620000]  [<410033d0>] ret_from_kernel_thread+0xc/0x14
> [   23.620000]
> [   23.630000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: !c->ro_media && !c->ro_mount, in fs/ubifs/journal.c:108
> [   23.650000] UBIFS error (ubi0:0 pid 24): make_reservation: cannot 
> reserve 4144 bytes in jhead 2, error -30
> [   23.660000] UBIFS error (ubi0:0 pid 24): do_writepage: cannot write 
> folio 41 of inode 143, error -30
> [   23.670000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: folio->private != NULL, in fs/ubifs/file.c:944
> [   23.680000] UBIFS error (ubi0:0 pid 24): ubifs_assert_failed: UBIFS 
> assert failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:550
> 
> Now, if I change the init process, and call /bin/hello-dyn instead:
> [    4.140000] Run /bin/hello-dyn as init process
> [    4.150000]   with arguments:
> [    4.150000]     /bin/hello-dyn
> [    4.150000]   with environment:
> [    4.160000]     HOME=/
> [    4.160000]     TERM=linux
> Hello World on coldfire
> [    4.470000] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x00000000
> [    4.470000] CPU: 0 PID: 1 Comm: hello-dyn Not tainted 
> 6.10.0-rc4-00050-g21c7ef993c85 #491
> [    4.470000] Stack from 4186df2c:
> [    4.470000]         4186df2c 4145ed0d 4145ed0d 41009300 00000001 
> 000000ff 413a271a 4145ed0d
> [    4.470000]         4139a532 41009300 00000000 000000ff 00000001 
> 418a0010 00000000 410092f4
> [    4.470000]         4186dfac 41009fd4 41452c80 00000000 00000000 
> 4100a6c8 00000000 00000001
> [    4.470000]         00000000 00000001 41872010 6014f468 6014c000 
> 41872010 6014f468 6014c000
> [    4.470000]         601501cc 4100a686 00000000 4186c000 00000000 
> 4100a6dc 00000000 41005848
> [    4.470000]         00000000 00000001 000000f7 00000000 00000001 
> 60161700 60161700 601501d4
> [    4.470000] Call Trace: [<41009300>] arch_local_irq_enable+0x0/0xc
> [    4.470000]  [<413a271a>] dump_stack+0xc/0x10
> [    4.470000]  [<4139a532>] panic+0xc6/0x25c
> [    4.470000]  [<41009300>] arch_local_irq_enable+0x0/0xc
> [    4.470000]  [<410092f4>] arch_local_irq_disable+0x0/0xc
> [    4.470000]  [<41009fd4>] do_exit+0x172/0x6e8
> [    4.470000]  [<4100a6c8>] sys_exit_group+0x0/0x14
> [    4.470000]  [<4100a686>] do_group_exit+0x22/0x64
> [    4.470000]  [<4100a6dc>] pid_child_should_wake+0x0/0x56
> [    4.470000]  [<41005848>] system_call+0x54/0xa8
> [    4.470000]
> [    4.470000] ---[ end Kernel panic - not syncing: Attempted to kill 
> init! exitcode=0x00000000 ]---
> 
> Which is expected. So, I modified hello-dyn a bit:
> $> cat hello.c
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> 
> int main()
> {
>      printf("Hello World on coldfire, execute bash !\n");
>      /* Let's now execute /bin/bash with execve */
>      char *args[] = {"/bin/bash", NULL};
>      execve(args[0], args, NULL);
>      /* If we reach here, it means execve failed */
>      perror("execve");
>      exit(EXIT_FAILURE);
> }
> 
> And now, I get:
> Hello World on coldfire, execute bash !
> bash: cannot set terminal process group (-1): Inappropriate ioctl for 
> device
> bash: no job control in this shell
> bash-5.2# ls
> [  122.030000] page: refcount:3 mapcount:2 mapping:4206cdc4 index:0x94 
> pfn:0x20ab2
> [  122.030000] aops:0x413b1470 ino:9f dentry name:"libc.so.6"
> [  122.040000] flags: 0x38(uptodate|dirty|lru|zone=0)
> [  122.040000] raw: 00000038 4fed3fec 4fed4208 4206cdc4 00000094 
> 00000000 00000001 00000003
> [  122.050000] raw: 41564000
> [  122.050000] page dumped because: VM_BUG_ON_FOLIO(ret != false)
> [  122.060000] kernel BUG at fs/ubifs/file.c:1477!
> [  122.060000] *** TRAP #7 ***   FORMAT=4
> [  122.060000] Current process id is 26
> [  122.060000] BAD KERNEL TRAP: 00000000
> [  122.060000] PC: [<410fba84>] ubifs_dirty_folio+0x3e/0x42
> [  122.060000] SR: 2010  SP: 42871cec  a2: 41f24510
> [  122.060000] d0: 00000023    d1: 000000b9    d2: 4fed4108    d3: 601ce000
> [  122.060000] d4: 41565411    d5: 601d2000    a0: 42871c30    a1: 414d5180
> [  122.060000] Process bash (pid: 26, task=41f24510)
> [  122.060000] Frame format=4 eff addr=41450e8d pc=414674a3
> <snip>

When I printk the do_page_fault first debug, I get for the first call to ls:
bash-5.2# ls
[   14.700000] do page fault:
[   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 0, 
(ptrval)

This call works almost fine (I still have the assert failed: 
folio->private != NULL issue).

And when I call it a second time, I get:
bash-5.2# ls
[   19.820000] do page fault:
[   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 2, 
(ptrval)

The address corresponds to the defined zone ELF_ET_DYN_BASE as I set it 
to 0x70000000.

regs->pc is not the same as the address. It might be unrelevant, but any 
help is appreciated to understand the process behind :-).

I keep digging, and I am in the asm part which fears me a bit !

Thanks !
JM
Michael Schmitz June 26, 2024, 1:56 a.m. UTC | #2
Jean-Michel,

On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>
> When I printk the do_page_fault first debug, I get for the first call 
> to ls:
> bash-5.2# ls
> [   14.700000] do page fault:
> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 
> 0, (ptrval)

Page not present, read fault. Please disable obfuscation of kernel 
pointer addresses by printk. Maybe also disable address space 
randomization while debugging this.

> This call works almost fine (I still have the assert failed: 
> folio->private != NULL issue).
>
> And when I call it a second time, I get:
> bash-5.2# ls
> [   19.820000] do page fault:
> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 
> 2, (ptrval)

Page not present, write fault.

It would be helpful if you could get a dump of /proc/1/maps before the 
execve() syscall in your helloworld init replacement. That might confirm 
all these addresses are legit (assuming mappings survive across 
execve(), that is), and what they correspond to.

>
> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
> it to 0x70000000.
>
> regs->pc is not the same as the address. It might be unrelevant, but 
> any help is appreciated to understand the process behind :-).
>
> I keep digging, and I am in the asm part which fears me a bit !

I don't see that you'd need to look at any asm code here.

Cheers,

     Michael

>
> Thanks !
> JM
Jean-Michel Hautbois June 26, 2024, 5:35 a.m. UTC | #3
Hi Michael,

On 26/06/2024 03:56, Michael Schmitz wrote:
> Jean-Michel,
> 
> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>
>> When I printk the do_page_fault first debug, I get for the first call 
>> to ls:
>> bash-5.2# ls
>> [   14.700000] do page fault:
>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 
>> 0, (ptrval)
> 
> Page not present, read fault. Please disable obfuscation of kernel 
> pointer addresses by printk. Maybe also disable address space 
> randomization while debugging this.

Randomization is disabled, I think I don't have hash pointers... Will 
heck again, thanks for the explanation.

> 
>> This call works almost fine (I still have the assert failed: 
>> folio->private != NULL issue).
>>
>> And when I call it a second time, I get:
>> bash-5.2# ls
>> [   19.820000] do page fault:
>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 
>> 2, (ptrval)
> 
> Page not present, write fault.
> 
> It would be helpful if you could get a dump of /proc/1/maps before the 
> execve() syscall in your helloworld init replacement. That might confirm 
> all these addresses are legit (assuming mappings survive across 
> execve(), that is), and what they correspond to.

Executing mount /proc will make it impossible to then call cat 
/proc/1/maps :-(. Can I enable a printk somewhere in the kernel to do 
the same maybe ?

JM

> 
>>
>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>> it to 0x70000000.
>>
>> regs->pc is not the same as the address. It might be unrelevant, but 
>> any help is appreciated to understand the process behind :-).
>>
>> I keep digging, and I am in the asm part which fears me a bit !
> 
> I don't see that you'd need to look at any asm code here.
> 
> Cheers,
> 
>      Michael
> 
>>
>> Thanks !
>> JM
Jean-Michel Hautbois June 26, 2024, 7:01 a.m. UTC | #4
Michael,

On 26/06/2024 03:56, Michael Schmitz wrote:
> Jean-Michel,
> 
> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>
>> When I printk the do_page_fault first debug, I get for the first call 
>> to ls:
>> bash-5.2# ls
>> [   14.700000] do page fault:
>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 
>> 0, (ptrval)
> 
> Page not present, read fault. Please disable obfuscation of kernel 
> pointer addresses by printk. Maybe also disable address space 
> randomization while debugging this.
> 
>> This call works almost fine (I still have the assert failed: 
>> folio->private != NULL issue).
>>
>> And when I call it a second time, I get:
>> bash-5.2# ls
>> [   19.820000] do page fault:
>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 
>> 2, (ptrval)
> 
> Page not present, write fault.
> 
> It would be helpful if you could get a dump of /proc/1/maps before the 
> execve() syscall in your helloworld init replacement. That might confirm 
> all these addresses are legit (assuming mappings survive across 
> execve(), that is), and what they correspond to.

I changed the code a bit, and display an error only when error_code is 
not 0. The result is for the bash launch:
[    3.864000] Run /bin/bash as init process
[    3.865000]   with arguments:
[    3.866000]     /bin/bash
[    3.868000]   with environment:
[    3.869000]     HOME=/
[    3.871000]     TERM=linux
[    4.250000] do page fault:
[    4.250000] regs->sr=0x2000, regs->pc=0x41366984, address=0x700b3364, 
2, 41fb0000
[    4.258000] do page fault:
[    4.258000] regs->sr=0x2000, regs->pc=0x41366984, address=0x6002140c, 
2, 41fb0000
[    4.273000] do page fault:
[    4.273000] regs->sr=0x0, regs->pc=0x60011996, address=0x6001ff74, 3, 
41fb0000
[    4.332000] do page fault:
[    4.332000] regs->sr=0x10, regs->pc=0x60012bf4, address=0x700adf44, 
3, 41fb0000
[    4.497000] do page fault:
[    4.497000] regs->sr=0x0, regs->pc=0x60016344, address=0x600592d0, 2, 
41fb0000
[    4.505000] do page fault:
[    4.505000] regs->sr=0x10, regs->pc=0x60007320, address=0x60055f54, 
3, 41fb0000
[    4.533000] do page fault:
[    4.533000] regs->sr=0x0, regs->pc=0x60016344, address=0x60066244, 2, 
41fb0000
[    4.539000] do page fault:
[    4.539000] regs->sr=0x10, regs->pc=0x60007320, address=0x60065f54, 
3, 41fb0000
[    4.635000] do page fault:
[    4.635000] regs->sr=0x0, regs->pc=0x600098d2, address=0x60068014, 2, 
41fb0000
[    4.637000] do page fault:
[    4.637000] regs->sr=0x0, regs->pc=0x60016344, address=0x600a2d1c, 2, 
41fb0000
[    4.647000] do page fault:
[    4.647000] regs->sr=0x10, regs->pc=0x60007320, address=0x600a1f54, 
3, 41fb0000
[    5.163000] do page fault:
[    5.163000] regs->sr=0x0, regs->pc=0x60016344, address=0x601d1d28, 2, 
41fb0000
[    5.175000] do page fault:
[    5.175000] regs->sr=0x10, regs->pc=0x60007320, address=0x601cdf44, 
3, 41fb0000
[    5.212000] do page fault:
[    5.212000] regs->sr=0x14, regs->pc=0x6000b04a, address=0x601ce004, 
2, 41fb0000
[    5.373000] do page fault:
[    5.373000] regs->sr=0x14, regs->pc=0x6000b04a, address=0x60056004, 
2, 41fb0000
[    5.396000] random: crng init done
[    5.401000] do page fault:
[    5.401000] regs->sr=0x14, regs->pc=0x6000b04a, address=0x700ae004, 
2, 41fb0000
[    5.414000] do page fault:
[    5.414000] regs->sr=0x0, regs->pc=0x6000b42a, address=0x700b0000, 2, 
41fb0000
[    5.487000] do page fault:
[    5.487000] regs->sr=0x0, regs->pc=0x6019cbf0, address=0x601d5010, 2, 
41fb0000
[    5.488000] do page fault:
[    5.488000] regs->sr=0x0, regs->pc=0x6019cbf6, address=0x601db288, 2, 
41fb0000
[    5.552000] do page fault:
[    5.552000] regs->sr=0x0, regs->pc=0x600e1032, address=0x601d21d4, 3, 
41fb0000
[    5.569000] do page fault:
[    5.569000] regs->sr=0x0, regs->pc=0x600df938, address=0x700ba8e4, 2, 
41fb0000
[    5.580000] do page fault:
[    5.580000] regs->sr=0x8, regs->pc=0x70038476, address=0x700b4890, 2, 
41fb0000
[    5.634000] do page fault:
[    5.634000] regs->sr=0x0, regs->pc=0x6011cbf2, address=0x700be004, 2, 
41fb0000
[    5.677000] do page fault:
[    5.677000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700c71b4, 2, 
41fb0000
[    5.684000] do page fault:
[    5.684000] regs->sr=0x2004, regs->pc=0x410a724a, address=0x700c0000, 
2, 41fb0000
[    5.686000] do page fault:
[    5.686000] regs->sr=0x2004, regs->pc=0x410a724a, address=0x700c2000, 
2, 41fb0000
[    6.058000] do page fault:
[    6.058000] regs->sr=0x10, regs->pc=0x7004b63c, address=0x700b8d2a, 
3, 41fb0000
[    6.206000] do page fault:
[    6.206000] regs->sr=0x0, regs->pc=0x700365d2, address=0x6005a468, 2, 
41fb0000
[    6.231000] do page fault:
[    6.231000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700c483c, 2, 
41fb0000
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
[    6.327000] do page fault:
[    6.327000] regs->sr=0x4, regs->pc=0x70092344, address=0x700bc410, 3, 
41fb0000
[    6.349000] do page fault:
[    6.349000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700c80f4, 2, 
41fb0000
[    6.438000] do page fault:
[    6.438000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700ca114, 2, 
41fb0000
[    6.452000] do page fault:
[    6.452000] regs->sr=0x8, regs->pc=0x600f28d0, address=0xbffcfd4c, 2, 
41fb0000
[    6.481000] do page fault:
[    6.481000] regs->sr=0x0, regs->pc=0x6008b6a8, address=0xbffcd1a0, 2, 
41fb0000
[    6.493000] do page fault:
[    6.493000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700cc474, 2, 
41fb0000
[    6.516000] do page fault:
[    6.516000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700ce1bc, 2, 
41fb0000
[    6.517000] do page fault:
[    6.517000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700d000c, 2, 
41fb0000
[    6.519000] do page fault:
[    6.519000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700d246c, 2, 
41fb0000
[    6.520000] do page fault:
[    6.520000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700d42bc, 2, 
41fb0000
[    6.528000] do page fault:
[    6.528000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700d6004, 2, 
41fb0000
[    6.543000] do page fault:
[    6.543000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700d8584, 2, 
41fb0000
[    6.544000] do page fault:
[    6.544000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700da3d4, 2, 
41fb0000
[    6.546000] do page fault:
[    6.546000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700dc224, 2, 
41fb0000
[    6.547000] do page fault:
[    6.547000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700de074, 2, 
41fb0000
[    6.576000] do page fault:
[    6.576000] regs->sr=0x0, regs->pc=0x6011ca12, address=0x700e02c4, 2, 
41fb0000
bash-5.2#

My kernel is loaded at 0x41000000.
JM
Jean-Michel Hautbois June 26, 2024, 1:28 p.m. UTC | #5
Hi Michael,

On 26/06/2024 03:56, Michael Schmitz wrote:
> Jean-Michel,
> 
> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>
>> When I printk the do_page_fault first debug, I get for the first call 
>> to ls:
>> bash-5.2# ls
>> [   14.700000] do page fault:
>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 
>> 0, (ptrval)
> 
> Page not present, read fault. Please disable obfuscation of kernel 
> pointer addresses by printk. Maybe also disable address space 
> randomization while debugging this.
> 
>> This call works almost fine (I still have the assert failed: 
>> folio->private != NULL issue).
>>
>> And when I call it a second time, I get:
>> bash-5.2# ls
>> [   19.820000] do page fault:
>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 
>> 2, (ptrval)
> 
> Page not present, write fault.
> 
> It would be helpful if you could get a dump of /proc/1/maps before the 
> execve() syscall in your helloworld init replacement. That might confirm 
> all these addresses are legit (assuming mappings survive across 
> execve(), that is), and what they correspond to.
> 
>>
>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>> it to 0x70000000.
>>
>> regs->pc is not the same as the address. It might be unrelevant, but 
>> any help is appreciated to understand the process behind :-).
>>
>> I keep digging, and I am in the asm part which fears me a bit !
> 
> I don't see that you'd need to look at any asm code here.

I add a small test in do_page_fault, and in case of an error, it panics. 
The result follows:
  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
[    3.857000] Run /bin/bash as init process
[    3.858000]   with arguments:
[    3.861000]     /bin/bash
[    3.862000]   with environment:
[    3.863000]     HOME=/
[    3.864000]     TERM=linux
[    4.242000] do page fault:
[    4.242000] regs->sr=0x2000, regs->pc=0x41366924, address=0x700b3364, 
2, 41fb0000
[    4.242000] Kernel panic - not syncing: page fault error
[    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
6.10.0-rc5-g927da6cf01fe-dirty #25
[    4.242000] Stack from 4186dda8:
[    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
00000000 4136ee10 41423aa4
[    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
4186de60 41fb0000 41d51a60
[    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
700b3364 00000002 41fb0000
[    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
41d51a00 4186de60 41d51a60
[    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
700b3364 00000002 00000000
[    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
4186de60 00000000 00000000
[    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
[    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
[    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
[    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
arch/m68k/kernel/traps.c:775)
[    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
arch/m68k/kernel/traps.c:775)
[    4.242000] buserr (arch/m68k/kernel/entry.S:116)
[    4.242000] ma_slots (lib/maple_tree.c:759)
[    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
fs/binfmt_elf.c:421 (discriminator 1))
[    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
[    4.242000] memset (arch/m68k/lib/memset.c:11)
[    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
[    4.242000] memset (arch/m68k/lib/memset.c:11)
[    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 fs/exec.c:1891 
fs/exec.c:1867)
[    4.242000] copy_strings_kernel (fs/exec.c:669)
[    4.242000] count_strings_kernel (fs/exec.c:473)
[    4.242000] kernel_execve (fs/exec.c:2058)
[    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
[    4.242000] run_init_process (init/main.c:1389)
[    4.242000] _printk (kernel/printk/printk.c:2365)
[    4.242000] kernel_init (init/main.c:1508)
[    4.242000] kernel_init (init/main.c:1459)
[    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
[    4.242000]
[    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---

Looks like a memory mapping failure, but why ?
My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
starts at 0x40000000 and ends at 0x50000000 (256MB).

It looks like a TLB write miss which is obscure to me :-).

I tried to use the /proc but as expected it is not alive after mounting it.
Thanks,
JM


> Cheers,
> 
>      Michael
> 
>>
>> Thanks !
>> JM
Michael Schmitz June 26, 2024, 7:36 p.m. UTC | #6
Jean-Michel,

On 27/06/24 01:28, Jean-Michel Hautbois wrote:
> Hi Michael,
>
> On 26/06/2024 03:56, Michael Schmitz wrote:
>> Jean-Michel,
>>
>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>
>>> When I printk the do_page_fault first debug, I get for the first 
>>> call to ls:
>>> bash-5.2# ls
>>> [   14.700000] do page fault:
>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
>>> address=0x70069ee6, 0, (ptrval)
>>
>> Page not present, read fault. Please disable obfuscation of kernel 
>> pointer addresses by printk. Maybe also disable address space 
>> randomization while debugging this.
>>
>>> This call works almost fine (I still have the assert failed: 
>>> folio->private != NULL issue).
>>>
>>> And when I call it a second time, I get:
>>> bash-5.2# ls
>>> [   19.820000] do page fault:
>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
>>> address=0x700e2004, 2, (ptrval)
>>
>> Page not present, write fault.
>>
>> It would be helpful if you could get a dump of /proc/1/maps before 
>> the execve() syscall in your helloworld init replacement. That might 
>> confirm all these addresses are legit (assuming mappings survive 
>> across execve(), that is), and what they correspond to.
>>
>>>
>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>>> it to 0x70000000.
>>>
>>> regs->pc is not the same as the address. It might be unrelevant, but 
>>> any help is appreciated to understand the process behind :-).
>>>
>>> I keep digging, and I am in the asm part which fears me a bit !
>>
>> I don't see that you'd need to look at any asm code here.
>
> I add a small test in do_page_fault, and in case of an error, it 
> panics. The result follows:

Please take a look at the comments at the start of 
arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
error_code are explained there.

error_code != 0 is just one possible case out of the four that are 
handled by do_page_fault(). It does not signify 'no error' - if there 
hadn't been a page fault, do_page_fault() would not have been called.

You just forced a panic each time a write fault and/or a protection 
fault happens. Write faults are absolutely expected to happen when 
loading a library - ld.so needs to perform relocation after loading a 
dynamic library, and that means writes to the GOT in the library's data 
segment (PIC assumed).


>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
> [    3.857000] Run /bin/bash as init process
> [    3.858000]   with arguments:
> [    3.861000]     /bin/bash
> [    3.862000]   with environment:
> [    3.863000]     HOME=/
> [    3.864000]     TERM=linux
> [    4.242000] do page fault:
> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
> address=0x700b3364, 2, 41fb0000
> [    4.242000] Kernel panic - not syncing: page fault error
> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
> 6.10.0-rc5-g927da6cf01fe-dirty #25
> [    4.242000] Stack from 4186dda8:
> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
> 00000000 4136ee10 41423aa4
> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
> 4186de60 41fb0000 41d51a60
> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
> 700b3364 00000002 41fb0000
> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
> 41d51a00 4186de60 41d51a60
> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
> 700b3364 00000002 00000000
> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
> 4186de60 00000000 00000000
> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
> arch/m68k/kernel/traps.c:775)
> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
> arch/m68k/kernel/traps.c:775)
> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
> [    4.242000] ma_slots (lib/maple_tree.c:759)
> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
> fs/binfmt_elf.c:421 (discriminator 1))
> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
> [    4.242000] memset (arch/m68k/lib/memset.c:11)
> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
> [    4.242000] memset (arch/m68k/lib/memset.c:11)
> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
> fs/exec.c:1891 fs/exec.c:1867)
> [    4.242000] copy_strings_kernel (fs/exec.c:669)
> [    4.242000] count_strings_kernel (fs/exec.c:473)
> [    4.242000] kernel_execve (fs/exec.c:2058)
> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
> [    4.242000] run_init_process (init/main.c:1389)
> [    4.242000] _printk (kernel/printk/printk.c:2365)
> [    4.242000] kernel_init (init/main.c:1508)
> [    4.242000] kernel_init (init/main.c:1459)
> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
> [    4.242000]
> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
>
> Looks like a memory mapping failure, but why ?
> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
> starts at 0x40000000 and ends at 0x50000000 (256MB).
0x41fb0000 seems to be init's page directory. The fault address is in 
the range where I'd expect dynamic libraries to reside.
>
> It looks like a TLB write miss which is obscure to me :-).
>
> I tried to use the /proc but as expected it is not alive after 
> mounting it.

The memory map ought to be accessible through sysrq - an alternative 
would be to modify the ELF binfmt handler and dump the map once ld.so 
has finished with relocations.

Cheers,

     Michael


> Thanks,
> JM
>
>
>> Cheers,
>>
>>      Michael
>>
>>>
>>> Thanks !
>>> JM
Jean-Michel Hautbois June 26, 2024, 8:29 p.m. UTC | #7
Hi Michael,

On 26/06/2024 21:36, Michael Schmitz wrote:
> Jean-Michel,
> 
> On 27/06/24 01:28, Jean-Michel Hautbois wrote:
>> Hi Michael,
>>
>> On 26/06/2024 03:56, Michael Schmitz wrote:
>>> Jean-Michel,
>>>
>>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>>
>>>> When I printk the do_page_fault first debug, I get for the first 
>>>> call to ls:
>>>> bash-5.2# ls
>>>> [   14.700000] do page fault:
>>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
>>>> address=0x70069ee6, 0, (ptrval)
>>>
>>> Page not present, read fault. Please disable obfuscation of kernel 
>>> pointer addresses by printk. Maybe also disable address space 
>>> randomization while debugging this.
>>>
>>>> This call works almost fine (I still have the assert failed: 
>>>> folio->private != NULL issue).
>>>>
>>>> And when I call it a second time, I get:
>>>> bash-5.2# ls
>>>> [   19.820000] do page fault:
>>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
>>>> address=0x700e2004, 2, (ptrval)
>>>
>>> Page not present, write fault.
>>>
>>> It would be helpful if you could get a dump of /proc/1/maps before 
>>> the execve() syscall in your helloworld init replacement. That might 
>>> confirm all these addresses are legit (assuming mappings survive 
>>> across execve(), that is), and what they correspond to.
>>>
>>>>
>>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>>>> it to 0x70000000.
>>>>
>>>> regs->pc is not the same as the address. It might be unrelevant, but 
>>>> any help is appreciated to understand the process behind :-).
>>>>
>>>> I keep digging, and I am in the asm part which fears me a bit !
>>>
>>> I don't see that you'd need to look at any asm code here.
>>
>> I add a small test in do_page_fault, and in case of an error, it 
>> panics. The result follows:
> 
> Please take a look at the comments at the start of 
> arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
> error_code are explained there.
> 
> error_code != 0 is just one possible case out of the four that are 
> handled by do_page_fault(). It does not signify 'no error' - if there 
> hadn't been a page fault, do_page_fault() would not have been called.
> 
> You just forced a panic each time a write fault and/or a protection 
> fault happens. Write faults are absolutely expected to happen when 
> loading a library - ld.so needs to perform relocation after loading a 
> dynamic library, and that means writes to the GOT in the library's data 
> segment (PIC assumed).

I feel dumb -_- thanks for the explanation (I *knew* that page faults 
are expected, but as I couldn't not see when it works or not, I kept 
focus on this silly part).

> 
> 
>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>> [    3.857000] Run /bin/bash as init process
>> [    3.858000]   with arguments:
>> [    3.861000]     /bin/bash
>> [    3.862000]   with environment:
>> [    3.863000]     HOME=/
>> [    3.864000]     TERM=linux
>> [    4.242000] do page fault:
>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
>> address=0x700b3364, 2, 41fb0000
>> [    4.242000] Kernel panic - not syncing: page fault error
>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
>> 6.10.0-rc5-g927da6cf01fe-dirty #25
>> [    4.242000] Stack from 4186dda8:
>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
>> 00000000 4136ee10 41423aa4
>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
>> 4186de60 41fb0000 41d51a60
>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
>> 700b3364 00000002 41fb0000
>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
>> 41d51a00 4186de60 41d51a60
>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
>> 700b3364 00000002 00000000
>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
>> 4186de60 00000000 00000000
>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
>> arch/m68k/kernel/traps.c:775)
>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
>> arch/m68k/kernel/traps.c:775)
>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
>> fs/binfmt_elf.c:421 (discriminator 1))
>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
>> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
>> fs/exec.c:1891 fs/exec.c:1867)
>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>> [    4.242000] kernel_execve (fs/exec.c:2058)
>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>> [    4.242000] run_init_process (init/main.c:1389)
>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>> [    4.242000] kernel_init (init/main.c:1508)
>> [    4.242000] kernel_init (init/main.c:1459)
>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>> [    4.242000]
>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
>>
>> Looks like a memory mapping failure, but why ?
>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
>> starts at 0x40000000 and ends at 0x50000000 (256MB).
> 0x41fb0000 seems to be init's page directory. The fault address is in 
> the range where I'd expect dynamic libraries to reside.

How can I check it is init's page directory ?

>>
>> It looks like a TLB write miss which is obscure to me :-).
>>
>> I tried to use the /proc but as expected it is not alive after 
>> mounting it.
> 
> The memory map ought to be accessible through sysrq - an alternative 
> would be to modify the ELF binfmt handler and dump the map once ld.so 
> has finished with relocations.

I can use sysrq but AFAIK I can show the memory info, not the memory map ?
Modifying the ELF binfmt handler might be doable...

JM

> 
> Cheers,
> 
>      Michael
> 
> 
>> Thanks,
>> JM
>>
>>
>>> Cheers,
>>>
>>>      Michael
>>>
>>>>
>>>> Thanks !
>>>> JM
Jean-Michel Hautbois June 27, 2024, 12:36 p.m. UTC | #8
Michael,

On 26/06/2024 21:36, Michael Schmitz wrote:
> Jean-Michel,
> 
> On 27/06/24 01:28, Jean-Michel Hautbois wrote:
>> Hi Michael,
>>
>> On 26/06/2024 03:56, Michael Schmitz wrote:
>>> Jean-Michel,
>>>
>>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>>
>>>> When I printk the do_page_fault first debug, I get for the first 
>>>> call to ls:
>>>> bash-5.2# ls
>>>> [   14.700000] do page fault:
>>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
>>>> address=0x70069ee6, 0, (ptrval)
>>>
>>> Page not present, read fault. Please disable obfuscation of kernel 
>>> pointer addresses by printk. Maybe also disable address space 
>>> randomization while debugging this.
>>>
>>>> This call works almost fine (I still have the assert failed: 
>>>> folio->private != NULL issue).
>>>>
>>>> And when I call it a second time, I get:
>>>> bash-5.2# ls
>>>> [   19.820000] do page fault:
>>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
>>>> address=0x700e2004, 2, (ptrval)
>>>
>>> Page not present, write fault.
>>>
>>> It would be helpful if you could get a dump of /proc/1/maps before 
>>> the execve() syscall in your helloworld init replacement. That might 
>>> confirm all these addresses are legit (assuming mappings survive 
>>> across execve(), that is), and what they correspond to.
>>>
>>>>
>>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>>>> it to 0x70000000.
>>>>
>>>> regs->pc is not the same as the address. It might be unrelevant, but 
>>>> any help is appreciated to understand the process behind :-).
>>>>
>>>> I keep digging, and I am in the asm part which fears me a bit !
>>>
>>> I don't see that you'd need to look at any asm code here.
>>
>> I add a small test in do_page_fault, and in case of an error, it 
>> panics. The result follows:
> 
> Please take a look at the comments at the start of 
> arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
> error_code are explained there.
> 
> error_code != 0 is just one possible case out of the four that are 
> handled by do_page_fault(). It does not signify 'no error' - if there 
> hadn't been a page fault, do_page_fault() would not have been called.
> 
> You just forced a panic each time a write fault and/or a protection 
> fault happens. Write faults are absolutely expected to happen when 
> loading a library - ld.so needs to perform relocation after loading a 
> dynamic library, and that means writes to the GOT in the library's data 
> segment (PIC assumed).
> 
> 
>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>> [    3.857000] Run /bin/bash as init process
>> [    3.858000]   with arguments:
>> [    3.861000]     /bin/bash
>> [    3.862000]   with environment:
>> [    3.863000]     HOME=/
>> [    3.864000]     TERM=linux
>> [    4.242000] do page fault:
>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
>> address=0x700b3364, 2, 41fb0000
>> [    4.242000] Kernel panic - not syncing: page fault error
>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
>> 6.10.0-rc5-g927da6cf01fe-dirty #25
>> [    4.242000] Stack from 4186dda8:
>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
>> 00000000 4136ee10 41423aa4
>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
>> 4186de60 41fb0000 41d51a60
>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
>> 700b3364 00000002 41fb0000
>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
>> 41d51a00 4186de60 41d51a60
>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
>> 700b3364 00000002 00000000
>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
>> 4186de60 00000000 00000000
>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
>> arch/m68k/kernel/traps.c:775)
>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
>> arch/m68k/kernel/traps.c:775)
>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
>> fs/binfmt_elf.c:421 (discriminator 1))
>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
>> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
>> fs/exec.c:1891 fs/exec.c:1867)
>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>> [    4.242000] kernel_execve (fs/exec.c:2058)
>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>> [    4.242000] run_init_process (init/main.c:1389)
>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>> [    4.242000] kernel_init (init/main.c:1508)
>> [    4.242000] kernel_init (init/main.c:1459)
>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>> [    4.242000]
>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
>>
>> Looks like a memory mapping failure, but why ?
>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
>> starts at 0x40000000 and ends at 0x50000000 (256MB).
> 0x41fb0000 seems to be init's page directory. The fault address is in 
> the range where I'd expect dynamic libraries to reside.
>>
>> It looks like a TLB write miss which is obscure to me :-).
>>
>> I tried to use the /proc but as expected it is not alive after 
>> mounting it.
> 
> The memory map ought to be accessible through sysrq - an alternative 
> would be to modify the ELF binfmt handler and dump the map once ld.so 
> has finished with relocations.

I added a dump in the binfmt_elf file:
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a43897b03ce9..395f556f3a90 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f, 
const struct elf_phdr *phdr,
         return ret == -ENOENT ? 0 : ret;
  }

+static int dump_memory_map(struct task_struct *task)
+{
+    struct mm_struct *mm = task->mm;
+    struct vm_area_struct *vma;
+       MA_STATE(mas, &mm->mm_mt, 0, -1);
+    struct file *file;
+    struct path *path;
+    char *buf;
+    char *pathname;
+
+    // Acquire the read lock for mmap_lock
+    down_read(&mm->mmap_lock);
+       mas_lock(&mas);
+    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, 
ULONG_MAX)) {
+        if (vma->vm_file) {
+            buf = (char *)__get_free_page(GFP_KERNEL);
+            if (!buf) {
+                continue; // Handle memory allocation failure
+            }
+
+            file = vma->vm_file;
+            path = &file->f_path;
+            pathname = d_path(path, buf, PAGE_SIZE);
+            if (IS_ERR(pathname)) {
+                pathname = NULL;
+            }
+
+            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
+                vma->vm_start, vma->vm_end,
+                vma->vm_flags & VM_READ ? 'r' : '-',
+                vma->vm_flags & VM_WRITE ? 'w' : '-',
+                vma->vm_flags & VM_EXEC ? 'x' : '-',
+                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
+                vma->vm_pgoff << PAGE_SHIFT,
+                MAJOR(file->f_inode->i_rdev),
+                MINOR(file->f_inode->i_rdev),
+                file->f_inode->i_ino,
+                pathname ? pathname : "");
+
+            free_page((unsigned long)buf);
+        } else {
+            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
+                vma->vm_start, vma->vm_end,
+                vma->vm_flags & VM_READ ? 'r' : '-',
+                vma->vm_flags & VM_WRITE ? 'w' : '-',
+                vma->vm_flags & VM_EXEC ? 'x' : '-',
+                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
+                vma->vm_pgoff << PAGE_SHIFT);
+        }
+    }
+       mas_unlock(&mas);
+    // Release the read lock for mmap_lock
+    up_read(&mm->mmap_lock);
+
+    return 0;
+}
+
  static int load_elf_binary(struct linux_binprm *bprm)
  {
         struct file *interpreter = NULL; /* to shut gcc up */
@@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm *bprm)

         finalize_exec(bprm);
         START_THREAD(elf_ex, regs, elf_entry, bprm->p);
+       if (current->pid == 1) {  // Check if this is the init process
+            dump_memory_map(current);
+    }
         retval = 0;
  out:
         return retval;

I think it is quick and dirty, but seems to do the trick.
I then get in my console:
[    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
[    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
[    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
[    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
[    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
[    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0

But nothing rings a bell at this level for me...
Thanks !
Greg Ungerer June 27, 2024, 2:46 p.m. UTC | #9
Hi JM,

On 27/6/24 22:36, Jean-Michel Hautbois wrote:
> Michael,
> 
> On 26/06/2024 21:36, Michael Schmitz wrote:
>> Jean-Michel,
>>
>> On 27/06/24 01:28, Jean-Michel Hautbois wrote:
>>> Hi Michael,
>>>
>>> On 26/06/2024 03:56, Michael Schmitz wrote:
>>>> Jean-Michel,
>>>>
>>>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>>>
>>>>> When I printk the do_page_fault first debug, I get for the first call to ls:
>>>>> bash-5.2# ls
>>>>> [   14.700000] do page fault:
>>>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 0, (ptrval)
>>>>
>>>> Page not present, read fault. Please disable obfuscation of kernel pointer addresses by printk. Maybe also disable address space randomization while debugging this.
>>>>
>>>>> This call works almost fine (I still have the assert failed: folio->private != NULL issue).
>>>>>
>>>>> And when I call it a second time, I get:
>>>>> bash-5.2# ls
>>>>> [   19.820000] do page fault:
>>>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 2, (ptrval)
>>>>
>>>> Page not present, write fault.
>>>>
>>>> It would be helpful if you could get a dump of /proc/1/maps before the execve() syscall in your helloworld init replacement. That might confirm all these addresses are legit (assuming mappings survive across execve(), that is), and what they correspond to.
>>>>
>>>>>
>>>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set it to 0x70000000.
>>>>>
>>>>> regs->pc is not the same as the address. It might be unrelevant, but any help is appreciated to understand the process behind :-).
>>>>>
>>>>> I keep digging, and I am in the asm part which fears me a bit !
>>>>
>>>> I don't see that you'd need to look at any asm code here.
>>>
>>> I add a small test in do_page_fault, and in case of an error, it panics. The result follows:
>>
>> Please take a look at the comments at the start of arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in error_code are explained there.
>>
>> error_code != 0 is just one possible case out of the four that are handled by do_page_fault(). It does not signify 'no error' - if there hadn't been a page fault, do_page_fault() would not have been called.
>>
>> You just forced a panic each time a write fault and/or a protection fault happens. Write faults are absolutely expected to happen when loading a library - ld.so needs to perform relocation after loading a dynamic library, and that means writes to the GOT in the library's data segment (PIC assumed).
>>
>>
>>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>>> [    3.857000] Run /bin/bash as init process
>>> [    3.858000]   with arguments:
>>> [    3.861000]     /bin/bash
>>> [    3.862000]   with environment:
>>> [    3.863000]     HOME=/
>>> [    3.864000]     TERM=linux
>>> [    4.242000] do page fault:
>>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, address=0x700b3364, 2, 41fb0000
>>> [    4.242000] Kernel panic - not syncing: page fault error
>>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 6.10.0-rc5-g927da6cf01fe-dirty #25
>>> [    4.242000] Stack from 4186dda8:
>>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 00000000 4136ee10 41423aa4
>>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 4186de60 41fb0000 41d51a60
>>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 700b3364 00000002 41fb0000
>>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 41d51a00 4186de60 41d51a60
>>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 700b3364 00000002 00000000
>>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 4186de60 00000000 00000000
>>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 arch/m68k/kernel/traps.c:775)
>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 arch/m68k/kernel/traps.c:775)
>>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) fs/binfmt_elf.c:421 (discriminator 1))
>>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 fs/exec.c:1891 fs/exec.c:1867)
>>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>>> [    4.242000] kernel_execve (fs/exec.c:2058)
>>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>>> [    4.242000] run_init_process (init/main.c:1389)
>>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>>> [    4.242000] kernel_init (init/main.c:1508)
>>> [    4.242000] kernel_init (init/main.c:1459)
>>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>>> [    4.242000]
>>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
>>>
>>> Looks like a memory mapping failure, but why ?
>>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM starts at 0x40000000 and ends at 0x50000000 (256MB).
>> 0x41fb0000 seems to be init's page directory. The fault address is in the range where I'd expect dynamic libraries to reside.
>>>
>>> It looks like a TLB write miss which is obscure to me :-).
>>>
>>> I tried to use the /proc but as expected it is not alive after mounting it.
>>
>> The memory map ought to be accessible through sysrq - an alternative would be to modify the ELF binfmt handler and dump the map once ld.so has finished with relocations.
> 
> I added a dump in the binfmt_elf file:
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index a43897b03ce9..395f556f3a90 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr,
>          return ret == -ENOENT ? 0 : ret;
>   }
> 
> +static int dump_memory_map(struct task_struct *task)
> +{
> +    struct mm_struct *mm = task->mm;
> +    struct vm_area_struct *vma;
> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
> +    struct file *file;
> +    struct path *path;
> +    char *buf;
> +    char *pathname;
> +
> +    // Acquire the read lock for mmap_lock
> +    down_read(&mm->mmap_lock);
> +       mas_lock(&mas);
> +    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, ULONG_MAX)) {
> +        if (vma->vm_file) {
> +            buf = (char *)__get_free_page(GFP_KERNEL);
> +            if (!buf) {
> +                continue; // Handle memory allocation failure
> +            }
> +
> +            file = vma->vm_file;
> +            path = &file->f_path;
> +            pathname = d_path(path, buf, PAGE_SIZE);
> +            if (IS_ERR(pathname)) {
> +                pathname = NULL;
> +            }
> +
> +            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
> +                vma->vm_start, vma->vm_end,
> +                vma->vm_flags & VM_READ ? 'r' : '-',
> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
> +                vma->vm_pgoff << PAGE_SHIFT,
> +                MAJOR(file->f_inode->i_rdev),
> +                MINOR(file->f_inode->i_rdev),
> +                file->f_inode->i_ino,
> +                pathname ? pathname : "");
> +
> +            free_page((unsigned long)buf);
> +        } else {
> +            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
> +                vma->vm_start, vma->vm_end,
> +                vma->vm_flags & VM_READ ? 'r' : '-',
> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
> +                vma->vm_pgoff << PAGE_SHIFT);
> +        }
> +    }
> +       mas_unlock(&mas);
> +    // Release the read lock for mmap_lock
> +    up_read(&mm->mmap_lock);
> +
> +    return 0;
> +}
> +
>   static int load_elf_binary(struct linux_binprm *bprm)
>   {
>          struct file *interpreter = NULL; /* to shut gcc up */
> @@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
> 
>          finalize_exec(bprm);
>          START_THREAD(elf_ex, regs, elf_entry, bprm->p);
> +       if (current->pid == 1) {  // Check if this is the init process
> +            dump_memory_map(current);
> +    }
>          retval = 0;
>   out:
>          return retval;
> 
> I think it is quick and dirty, but seems to do the trick.
> I then get in my console:
> [    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> [    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> [    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
> [    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
> [    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
> [    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0
> 
> But nothing rings a bell at this level for me...
> Thanks !

Here is the same dump trace generated on my newly resurrected M5475EVB for comparison:

[snip]
Freeing unused kernel image (initmem) memory: 80K
This architecture does not have kernel memory protection.
Run /sbin/init as init process
Run /etc/init as init process
Run /bin/init as init process
process '/bin/init' started with executable stack
60000000-60008000 r-xp 00000000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
60008000-6000c000 rw-p 00006000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
80000000-80004000 r-xp 00000000 00:00 1882624 /bin/init
80004000-80008000 rw-p 00002000 00:00 1882624 /bin/init
bfc9a000-bfcbc000 rwxp bffde000 00:00 0
Welcome to
...

Execution otherwise continues as normal to a shell after this.

Regards
Greg
Jean-Michel Hautbois June 27, 2024, 2:52 p.m. UTC | #10
Hi Greg,

On 27/06/2024 16:46, Greg Ungerer wrote:
> Hi JM,
> 
> On 27/6/24 22:36, Jean-Michel Hautbois wrote:
>> Michael,
>>
>> On 26/06/2024 21:36, Michael Schmitz wrote:
>>> Jean-Michel,
>>>
>>> On 27/06/24 01:28, Jean-Michel Hautbois wrote:
>>>> Hi Michael,
>>>>
>>>> On 26/06/2024 03:56, Michael Schmitz wrote:
>>>>> Jean-Michel,
>>>>>
>>>>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>>>>
>>>>>> When I printk the do_page_fault first debug, I get for the first 
>>>>>> call to ls:
>>>>>> bash-5.2# ls
>>>>>> [   14.700000] do page fault:
>>>>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
>>>>>> address=0x70069ee6, 0, (ptrval)
>>>>>
>>>>> Page not present, read fault. Please disable obfuscation of kernel 
>>>>> pointer addresses by printk. Maybe also disable address space 
>>>>> randomization while debugging this.
>>>>>
>>>>>> This call works almost fine (I still have the assert failed: 
>>>>>> folio->private != NULL issue).
>>>>>>
>>>>>> And when I call it a second time, I get:
>>>>>> bash-5.2# ls
>>>>>> [   19.820000] do page fault:
>>>>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
>>>>>> address=0x700e2004, 2, (ptrval)
>>>>>
>>>>> Page not present, write fault.
>>>>>
>>>>> It would be helpful if you could get a dump of /proc/1/maps before 
>>>>> the execve() syscall in your helloworld init replacement. That 
>>>>> might confirm all these addresses are legit (assuming mappings 
>>>>> survive across execve(), that is), and what they correspond to.
>>>>>
>>>>>>
>>>>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I 
>>>>>> set it to 0x70000000.
>>>>>>
>>>>>> regs->pc is not the same as the address. It might be unrelevant, 
>>>>>> but any help is appreciated to understand the process behind :-).
>>>>>>
>>>>>> I keep digging, and I am in the asm part which fears me a bit !
>>>>>
>>>>> I don't see that you'd need to look at any asm code here.
>>>>
>>>> I add a small test in do_page_fault, and in case of an error, it 
>>>> panics. The result follows:
>>>
>>> Please take a look at the comments at the start of 
>>> arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
>>> error_code are explained there.
>>>
>>> error_code != 0 is just one possible case out of the four that are 
>>> handled by do_page_fault(). It does not signify 'no error' - if there 
>>> hadn't been a page fault, do_page_fault() would not have been called.
>>>
>>> You just forced a panic each time a write fault and/or a protection 
>>> fault happens. Write faults are absolutely expected to happen when 
>>> loading a library - ld.so needs to perform relocation after loading a 
>>> dynamic library, and that means writes to the GOT in the library's 
>>> data segment (PIC assumed).
>>>
>>>
>>>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>>>> [    3.857000] Run /bin/bash as init process
>>>> [    3.858000]   with arguments:
>>>> [    3.861000]     /bin/bash
>>>> [    3.862000]   with environment:
>>>> [    3.863000]     HOME=/
>>>> [    3.864000]     TERM=linux
>>>> [    4.242000] do page fault:
>>>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
>>>> address=0x700b3364, 2, 41fb0000
>>>> [    4.242000] Kernel panic - not syncing: page fault error
>>>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
>>>> 6.10.0-rc5-g927da6cf01fe-dirty #25
>>>> [    4.242000] Stack from 4186dda8:
>>>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
>>>> 00000000 4136ee10 41423aa4
>>>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
>>>> 4186de60 41fb0000 41d51a60
>>>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
>>>> 700b3364 00000002 41fb0000
>>>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
>>>> 41d51a00 4186de60 41d51a60
>>>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
>>>> 700b3364 00000002 00000000
>>>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
>>>> 4186de60 00000000 00000000
>>>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>>>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>>>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 
>>>> 1))
>>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
>>>> arch/m68k/kernel/traps.c:775)
>>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
>>>> arch/m68k/kernel/traps.c:775)
>>>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>>>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
>>>> fs/binfmt_elf.c:421 (discriminator 1))
>>>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
>>>> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
>>>> fs/exec.c:1891 fs/exec.c:1867)
>>>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>>>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>>>> [    4.242000] kernel_execve (fs/exec.c:2058)
>>>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>>>> [    4.242000] run_init_process (init/main.c:1389)
>>>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>>>> [    4.242000] kernel_init (init/main.c:1508)
>>>> [    4.242000] kernel_init (init/main.c:1459)
>>>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>>>> [    4.242000]
>>>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error 
>>>> ]---
>>>>
>>>> Looks like a memory mapping failure, but why ?
>>>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
>>>> starts at 0x40000000 and ends at 0x50000000 (256MB).
>>> 0x41fb0000 seems to be init's page directory. The fault address is in 
>>> the range where I'd expect dynamic libraries to reside.
>>>>
>>>> It looks like a TLB write miss which is obscure to me :-).
>>>>
>>>> I tried to use the /proc but as expected it is not alive after 
>>>> mounting it.
>>>
>>> The memory map ought to be accessible through sysrq - an alternative 
>>> would be to modify the ELF binfmt handler and dump the map once ld.so 
>>> has finished with relocations.
>>
>> I added a dump in the binfmt_elf file:
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index a43897b03ce9..395f556f3a90 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f, 
>> const struct elf_phdr *phdr,
>>          return ret == -ENOENT ? 0 : ret;
>>   }
>>
>> +static int dump_memory_map(struct task_struct *task)
>> +{
>> +    struct mm_struct *mm = task->mm;
>> +    struct vm_area_struct *vma;
>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>> +    struct file *file;
>> +    struct path *path;
>> +    char *buf;
>> +    char *pathname;
>> +
>> +    // Acquire the read lock for mmap_lock
>> +    down_read(&mm->mmap_lock);
>> +       mas_lock(&mas);
>> +    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, 
>> ULONG_MAX)) {
>> +        if (vma->vm_file) {
>> +            buf = (char *)__get_free_page(GFP_KERNEL);
>> +            if (!buf) {
>> +                continue; // Handle memory allocation failure
>> +            }
>> +
>> +            file = vma->vm_file;
>> +            path = &file->f_path;
>> +            pathname = d_path(path, buf, PAGE_SIZE);
>> +            if (IS_ERR(pathname)) {
>> +                pathname = NULL;
>> +            }
>> +
>> +            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
>> +                vma->vm_start, vma->vm_end,
>> +                vma->vm_flags & VM_READ ? 'r' : '-',
>> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
>> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
>> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
>> +                vma->vm_pgoff << PAGE_SHIFT,
>> +                MAJOR(file->f_inode->i_rdev),
>> +                MINOR(file->f_inode->i_rdev),
>> +                file->f_inode->i_ino,
>> +                pathname ? pathname : "");
>> +
>> +            free_page((unsigned long)buf);
>> +        } else {
>> +            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
>> +                vma->vm_start, vma->vm_end,
>> +                vma->vm_flags & VM_READ ? 'r' : '-',
>> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
>> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
>> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
>> +                vma->vm_pgoff << PAGE_SHIFT);
>> +        }
>> +    }
>> +       mas_unlock(&mas);
>> +    // Release the read lock for mmap_lock
>> +    up_read(&mm->mmap_lock);
>> +
>> +    return 0;
>> +}
>> +
>>   static int load_elf_binary(struct linux_binprm *bprm)
>>   {
>>          struct file *interpreter = NULL; /* to shut gcc up */
>> @@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm 
>> *bprm)
>>
>>          finalize_exec(bprm);
>>          START_THREAD(elf_ex, regs, elf_entry, bprm->p);
>> +       if (current->pid == 1) {  // Check if this is the init process
>> +            dump_memory_map(current);
>> +    }
>>          retval = 0;
>>   out:
>>          return retval;
>>
>> I think it is quick and dirty, but seems to do the trick.
>> I then get in my console:
>> [    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> [    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> [    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
>> [    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
>> [    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
>> [    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0
>>
>> But nothing rings a bell at this level for me...
>> Thanks !
> 
> Here is the same dump trace generated on my newly resurrected M5475EVB 
> for comparison:
> 
> [snip]
> Freeing unused kernel image (initmem) memory: 80K
> This architecture does not have kernel memory protection.
> Run /sbin/init as init process
> Run /etc/init as init process
> Run /bin/init as init process
> process '/bin/init' started with executable stack

I don't have this message, I suppose it is related to uClibc vs libc ?

> 60000000-60008000 r-xp 00000000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
> 60008000-6000c000 rw-p 00006000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
> 80000000-80004000 r-xp 00000000 00:00 1882624 /bin/init
> 80004000-80008000 rw-p 00002000 00:00 1882624 /bin/init

You init is at 0x8000000 and not 0x7000000... Interesting. Even if I 
don't think it has a big impact...

> bfc9a000-bfcbc000 rwxp bffde000 00:00 0
> Welcome to
> ...
> 
> Execution otherwise continues as normal to a shell after this.
> 
> Regards
> Greg
> 
>
Michael Schmitz June 27, 2024, 11:58 p.m. UTC | #11
Jean-Michel,

Am 28.06.2024 um 00:36 schrieb Jean-Michel Hautbois:
>>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>>> [    3.857000] Run /bin/bash as init process
>>> [    3.858000]   with arguments:
>>> [    3.861000]     /bin/bash
>>> [    3.862000]   with environment:
>>> [    3.863000]     HOME=/
>>> [    3.864000]     TERM=linux
>>> [    4.242000] do page fault:
>>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924,
>>> address=0x700b3364, 2, 41fb0000
>>> [    4.242000] Kernel panic - not syncing: page fault error
>>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted
>>> 6.10.0-rc5-g927da6cf01fe-dirty #25
>>> [    4.242000] Stack from 4186dda8:
>>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001
>>> 00000000 4136ee10 41423aa4
>>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d
>>> 4186de60 41fb0000 41d51a60
>>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924
>>> 700b3364 00000002 41fb0000
>>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012
>>> 41d51a00 4186de60 41d51a60
>>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60
>>> 700b3364 00000002 00000000
>>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660
>>> 4186de60 00000000 00000000
>>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725
>>> arch/m68k/kernel/traps.c:775)
>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748
>>> arch/m68k/kernel/traps.c:775)
>>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1)
>>> fs/binfmt_elf.c:421 (discriminator 1))
>>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97
>>> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839
>>> fs/exec.c:1891 fs/exec.c:1867)
>>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>>> [    4.242000] kernel_execve (fs/exec.c:2058)
>>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>>> [    4.242000] run_init_process (init/main.c:1389)
>>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>>> [    4.242000] kernel_init (init/main.c:1508)
>>> [    4.242000] kernel_init (init/main.c:1459)
>>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>>> [    4.242000]
>>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error
>>> ]---
>>>
>>> Looks like a memory mapping failure, but why ?
>>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM
>>> starts at 0x40000000 and ends at 0x50000000 (256MB).
>> 0x41fb0000 seems to be init's page directory. The fault address is in
>> the range where I'd expect dynamic libraries to reside.
>>>
>>> It looks like a TLB write miss which is obscure to me :-).
>>>
>>> I tried to use the /proc but as expected it is not alive after
>>> mounting it.
>>
>> The memory map ought to be accessible through sysrq - an alternative
>> would be to modify the ELF binfmt handler and dump the map once ld.so
>> has finished with relocations.
>
> I added a dump in the binfmt_elf file:
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index a43897b03ce9..395f556f3a90 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f,
> const struct elf_phdr *phdr,
>         return ret == -ENOENT ? 0 : ret;
>  }
>
> +static int dump_memory_map(struct task_struct *task)
> +{
> +    struct mm_struct *mm = task->mm;
> +    struct vm_area_struct *vma;
> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
> +    struct file *file;
> +    struct path *path;
> +    char *buf;
> +    char *pathname;
> +
> +    // Acquire the read lock for mmap_lock
> +    down_read(&mm->mmap_lock);
> +       mas_lock(&mas);
> +    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
> ULONG_MAX)) {
> +        if (vma->vm_file) {
> +            buf = (char *)__get_free_page(GFP_KERNEL);
> +            if (!buf) {
> +                continue; // Handle memory allocation failure
> +            }
> +
> +            file = vma->vm_file;
> +            path = &file->f_path;
> +            pathname = d_path(path, buf, PAGE_SIZE);
> +            if (IS_ERR(pathname)) {
> +                pathname = NULL;
> +            }
> +
> +            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
> +                vma->vm_start, vma->vm_end,
> +                vma->vm_flags & VM_READ ? 'r' : '-',
> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
> +                vma->vm_pgoff << PAGE_SHIFT,
> +                MAJOR(file->f_inode->i_rdev),
> +                MINOR(file->f_inode->i_rdev),
> +                file->f_inode->i_ino,
> +                pathname ? pathname : "");
> +
> +            free_page((unsigned long)buf);
> +        } else {
> +            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
> +                vma->vm_start, vma->vm_end,
> +                vma->vm_flags & VM_READ ? 'r' : '-',
> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
> +                vma->vm_pgoff << PAGE_SHIFT);
> +        }
> +    }
> +       mas_unlock(&mas);
> +    // Release the read lock for mmap_lock
> +    up_read(&mm->mmap_lock);
> +
> +    return 0;
> +}
> +
>  static int load_elf_binary(struct linux_binprm *bprm)
>  {
>         struct file *interpreter = NULL; /* to shut gcc up */
> @@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
>
>         finalize_exec(bprm);
>         START_THREAD(elf_ex, regs, elf_entry, bprm->p);
> +       if (current->pid == 1) {  // Check if this is the init process
> +            dump_memory_map(current);
> +    }
>         retval = 0;
>  out:
>         return retval;
>
> I think it is quick and dirty, but seems to do the trick.
> I then get in my console:
> [    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> [    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> [    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
> [    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
> [    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
> [    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0
>
> But nothing rings a bell at this level for me...

You are again using bash as init process, so the process of interest 
(/bin/ls) is a child of init and does not get logged here.

But we are getting some mapping information at last. No shared libraries 
mapped as yet, but that's the same in Greg's dump.

Comparing with what I get from cat /proc/1/maps on a 030 system., the 
rwxp section at 0x700b4000 is the process heap, and the rwp section at 
0xbfe7a000 is the process stack.

I forgot to take into account that libraries are loaded only the binary 
starts executing. Can you print the same map dump in the exit syscall 
code path? That ought to show all loaded libraries at that point.

Cheers,

	Michael


> Thanks !
Jean-Michel Hautbois June 28, 2024, 7:24 a.m. UTC | #12
Hi Michael,

On 28/06/2024 01:58, Michael Schmitz wrote:
> Jean-Michel,
> 
> Am 28.06.2024 um 00:36 schrieb Jean-Michel Hautbois:
>>>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
>>>> [    3.857000] Run /bin/bash as init process
>>>> [    3.858000]   with arguments:
>>>> [    3.861000]     /bin/bash
>>>> [    3.862000]   with environment:
>>>> [    3.863000]     HOME=/
>>>> [    3.864000]     TERM=linux
>>>> [    4.242000] do page fault:
>>>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924,
>>>> address=0x700b3364, 2, 41fb0000
>>>> [    4.242000] Kernel panic - not syncing: page fault error
>>>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted
>>>> 6.10.0-rc5-g927da6cf01fe-dirty #25
>>>> [    4.242000] Stack from 4186dda8:
>>>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001
>>>> 00000000 4136ee10 41423aa4
>>>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d
>>>> 4186de60 41fb0000 41d51a60
>>>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924
>>>> 700b3364 00000002 41fb0000
>>>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012
>>>> 41d51a00 4186de60 41d51a60
>>>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60
>>>> 700b3364 00000002 00000000
>>>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660
>>>> 4186de60 00000000 00000000
>>>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
>>>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
>>>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 
>>>> 1))
>>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725
>>>> arch/m68k/kernel/traps.c:775)
>>>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748
>>>> arch/m68k/kernel/traps.c:775)
>>>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
>>>> [    4.242000] ma_slots (lib/maple_tree.c:759)
>>>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
>>>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1)
>>>> fs/binfmt_elf.c:421 (discriminator 1))
>>>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
>>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97
>>>> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
>>>> [    4.242000] memset (arch/m68k/lib/memset.c:11)
>>>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839
>>>> fs/exec.c:1891 fs/exec.c:1867)
>>>> [    4.242000] copy_strings_kernel (fs/exec.c:669)
>>>> [    4.242000] count_strings_kernel (fs/exec.c:473)
>>>> [    4.242000] kernel_execve (fs/exec.c:2058)
>>>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
>>>> [    4.242000] run_init_process (init/main.c:1389)
>>>> [    4.242000] _printk (kernel/printk/printk.c:2365)
>>>> [    4.242000] kernel_init (init/main.c:1508)
>>>> [    4.242000] kernel_init (init/main.c:1459)
>>>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
>>>> [    4.242000]
>>>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error
>>>> ]---
>>>>
>>>> Looks like a memory mapping failure, but why ?
>>>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM
>>>> starts at 0x40000000 and ends at 0x50000000 (256MB).
>>> 0x41fb0000 seems to be init's page directory. The fault address is in
>>> the range where I'd expect dynamic libraries to reside.
>>>>
>>>> It looks like a TLB write miss which is obscure to me :-).
>>>>
>>>> I tried to use the /proc but as expected it is not alive after
>>>> mounting it.
>>>
>>> The memory map ought to be accessible through sysrq - an alternative
>>> would be to modify the ELF binfmt handler and dump the map once ld.so
>>> has finished with relocations.
>>
>> I added a dump in the binfmt_elf file:
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index a43897b03ce9..395f556f3a90 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f,
>> const struct elf_phdr *phdr,
>>         return ret == -ENOENT ? 0 : ret;
>>  }
>>
>> +static int dump_memory_map(struct task_struct *task)
>> +{
>> +    struct mm_struct *mm = task->mm;
>> +    struct vm_area_struct *vma;
>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>> +    struct file *file;
>> +    struct path *path;
>> +    char *buf;
>> +    char *pathname;
>> +
>> +    // Acquire the read lock for mmap_lock
>> +    down_read(&mm->mmap_lock);
>> +       mas_lock(&mas);
>> +    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>> ULONG_MAX)) {
>> +        if (vma->vm_file) {
>> +            buf = (char *)__get_free_page(GFP_KERNEL);
>> +            if (!buf) {
>> +                continue; // Handle memory allocation failure
>> +            }
>> +
>> +            file = vma->vm_file;
>> +            path = &file->f_path;
>> +            pathname = d_path(path, buf, PAGE_SIZE);
>> +            if (IS_ERR(pathname)) {
>> +                pathname = NULL;
>> +            }
>> +
>> +            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
>> +                vma->vm_start, vma->vm_end,
>> +                vma->vm_flags & VM_READ ? 'r' : '-',
>> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
>> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
>> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
>> +                vma->vm_pgoff << PAGE_SHIFT,
>> +                MAJOR(file->f_inode->i_rdev),
>> +                MINOR(file->f_inode->i_rdev),
>> +                file->f_inode->i_ino,
>> +                pathname ? pathname : "");
>> +
>> +            free_page((unsigned long)buf);
>> +        } else {
>> +            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
>> +                vma->vm_start, vma->vm_end,
>> +                vma->vm_flags & VM_READ ? 'r' : '-',
>> +                vma->vm_flags & VM_WRITE ? 'w' : '-',
>> +                vma->vm_flags & VM_EXEC ? 'x' : '-',
>> +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
>> +                vma->vm_pgoff << PAGE_SHIFT);
>> +        }
>> +    }
>> +       mas_unlock(&mas);
>> +    // Release the read lock for mmap_lock
>> +    up_read(&mm->mmap_lock);
>> +
>> +    return 0;
>> +}
>> +
>>  static int load_elf_binary(struct linux_binprm *bprm)
>>  {
>>         struct file *interpreter = NULL; /* to shut gcc up */
>> @@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm 
>> *bprm)
>>
>>         finalize_exec(bprm);
>>         START_THREAD(elf_ex, regs, elf_entry, bprm->p);
>> +       if (current->pid == 1) {  // Check if this is the init process
>> +            dump_memory_map(current);
>> +    }
>>         retval = 0;
>>  out:
>>         return retval;
>>
>> I think it is quick and dirty, but seems to do the trick.
>> I then get in my console:
>> [    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> [    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> [    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
>> [    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
>> [    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
>> [    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0
>>
>> But nothing rings a bell at this level for me...
> 
> You are again using bash as init process, so the process of interest 
> (/bin/ls) is a child of init and does not get logged here.
> 
> But we are getting some mapping information at last. No shared libraries 
> mapped as yet, but that's the same in Greg's dump.
> 
> Comparing with what I get from cat /proc/1/maps on a 030 system., the 
> rwxp section at 0x700b4000 is the process heap, and the rwp section at 
> 0xbfe7a000 is the process stack.
> 
> I forgot to take into account that libraries are loaded only the binary 
> starts executing. Can you print the same map dump in the exit syscall 
> code path? That ought to show all loaded libraries at that point.

Thanks for your suggestion !
I changed it a bit, and I added a call in do_exit() as suggested. When
executing ls I get:

bash-5.2# ls -l
load_elf_binary: Dump memory for ls (31):
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]

do_exit: Dump memory for ls (31):
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]

When I call it a second time, I get:

bash-5.2# ls -l
load_elf_binary: Dump memory for ls (33):
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
do_exit: Dump memory for ls (33):
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]

The first call generates the "ls" output, not the second one.
The helper looks like:
diff --git a/mm/mmap.c b/mm/mmap.c
index 83b4682ec85c..14d861e9cba2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly = 
CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
  static bool ignore_rlimit_data;
  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);

+int dump_memory_map(struct task_struct *task)
+{
+    struct mm_struct *mm = task->mm;
+    struct vm_area_struct *vma;
+    struct file *file;
+    struct path *path;
+    char *buf;
+    char *pathname;
+
+       if (!mm) {
+               return -ENOMEM;
+       }
+
+       MA_STATE(mas, &mm->mm_mt, 0, -1);
+    // Acquire the read lock for mmap_lock
+    down_read(&mm->mmap_lock);
+       mas_lock(&mas);
+       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, 
ULONG_MAX)) {
+               char perms[5] = "---p"; // Default permissions
+               // Set permissions based on vm_flags
+               if (vma->vm_flags & VM_READ) perms[0] = 'r';
+               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
+               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
+               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
+
+               if (vma->vm_file) { // If there's an associated file
+                       buf = (char *)__get_free_page(GFP_KERNEL);
+                       if (!buf) {
+                               continue; // Handle memory allocation 
failure
+                       }
+
+                       file = vma->vm_file;
+                       path = &file->f_path;
+                       pathname = d_path(path, buf, PAGE_SIZE);
+                       if (IS_ERR(pathname)) {
+                               pathname = NULL;
+                       }
+
+                       // Print memory area information with file path
+                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu %s\n",
+                               vma->vm_start, vma->vm_end,
+                               perms,
+                               vma->vm_pgoff << PAGE_SHIFT,
+                               MAJOR(file_inode(file)->i_rdev),
+                               MINOR(file_inode(file)->i_rdev),
+                               file_inode(file)->i_ino,
+                               pathname ? pathname : "");
+
+                       free_page((unsigned long)buf);
+               } else {
+                       char *special_area_name = NULL;
+
+                       // Check for heap
+                       if (vma->vm_end > mm->start_brk && vma->vm_start 
< mm->brk) {
+                               special_area_name = "[heap]";
+                       }
+                       // Check for stack
+                       else if (vma->vm_start <= mm->start_stack && 
vma->vm_end >= mm->start_stack) {
+                               special_area_name = "[stack]";
+                       }
+                       // Check for vdso
+                       else if (vma->vm_flags & VM_EXEC && 
vma->vm_flags & VM_READ && !vma->vm_file) {
+                               special_area_name = "[vdso]";
+                       }
+
+                       // Print memory area information without file path
+                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
+                               vma->vm_start, vma->vm_end,
+                               perms,
+                               vma->vm_pgoff << PAGE_SHIFT,
+                               special_area_name ? special_area_name : 
"    ");
+               }
+       }
+       mas_unlock(&mas);
+    // Release the read lock for mmap_lock
+    up_read(&mm->mmap_lock);
+
+    return 0;
+}
+EXPORT_SYMBOL(dump_memory_map);


Thanks,
JM
Michael Schmitz June 28, 2024, 7:48 a.m. UTC | #13
Jean-Michel,

Am 28.06.2024 um 19:24 schrieb Jean-Michel Hautbois:
>> I forgot to take into account that libraries are loaded only the
>> binary starts executing. Can you print the same map dump in the exit
>> syscall code path? That ought to show all loaded libraries at that point.
>
> Thanks for your suggestion !
> I changed it a bit, and I added a call in do_exit() as suggested. When
> executing ls I get:
>
> bash-5.2# ls -l
> load_elf_binary: Dump memory for ls (31):
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>
> do_exit: Dump memory for ls (31):
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
> mmap: 60030000-60032000 rw-p 60030000 00:00 0
> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>
> When I call it a second time, I get:
>
> bash-5.2# ls -l
> load_elf_binary: Dump memory for ls (33):
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
> do_exit: Dump memory for ls (33):
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
> mmap: 60030000-60032000 rw-p 60030000 00:00 0
> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox

No heap in this second call. Can you print mm->start_brk and mm->brk please?

The process memory layout is a little unusual (I would have expected the 
binary to be mapped before the dynamic libraries, not after). Is that 
expected on Coldfire, Greg?

Cheers,

	Michael


> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>
> The first call generates the "ls" output, not the second one.
> The helper looks like:
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 83b4682ec85c..14d861e9cba2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>  static bool ignore_rlimit_data;
>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>
> +int dump_memory_map(struct task_struct *task)
> +{
> +    struct mm_struct *mm = task->mm;
> +    struct vm_area_struct *vma;
> +    struct file *file;
> +    struct path *path;
> +    char *buf;
> +    char *pathname;
> +
> +       if (!mm) {
> +               return -ENOMEM;
> +       }
> +
> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
> +    // Acquire the read lock for mmap_lock
> +    down_read(&mm->mmap_lock);
> +       mas_lock(&mas);
> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
> ULONG_MAX)) {
> +               char perms[5] = "---p"; // Default permissions
> +               // Set permissions based on vm_flags
> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
> +
> +               if (vma->vm_file) { // If there's an associated file
> +                       buf = (char *)__get_free_page(GFP_KERNEL);
> +                       if (!buf) {
> +                               continue; // Handle memory allocation
> failure
> +                       }
> +
> +                       file = vma->vm_file;
> +                       path = &file->f_path;
> +                       pathname = d_path(path, buf, PAGE_SIZE);
> +                       if (IS_ERR(pathname)) {
> +                               pathname = NULL;
> +                       }
> +
> +                       // Print memory area information with file path
> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu %s\n",
> +                               vma->vm_start, vma->vm_end,
> +                               perms,
> +                               vma->vm_pgoff << PAGE_SHIFT,
> +                               MAJOR(file_inode(file)->i_rdev),
> +                               MINOR(file_inode(file)->i_rdev),
> +                               file_inode(file)->i_ino,
> +                               pathname ? pathname : "");
> +
> +                       free_page((unsigned long)buf);
> +               } else {
> +                       char *special_area_name = NULL;
> +
> +                       // Check for heap
> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
> < mm->brk) {
> +                               special_area_name = "[heap]";
> +                       }
> +                       // Check for stack
> +                       else if (vma->vm_start <= mm->start_stack &&
> vma->vm_end >= mm->start_stack) {
> +                               special_area_name = "[stack]";
> +                       }
> +                       // Check for vdso
> +                       else if (vma->vm_flags & VM_EXEC &&
> vma->vm_flags & VM_READ && !vma->vm_file) {
> +                               special_area_name = "[vdso]";
> +                       }
> +
> +                       // Print memory area information without file path
> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
> +                               vma->vm_start, vma->vm_end,
> +                               perms,
> +                               vma->vm_pgoff << PAGE_SHIFT,
> +                               special_area_name ? special_area_name :
> "    ");
> +               }
> +       }
> +       mas_unlock(&mas);
> +    // Release the read lock for mmap_lock
> +    up_read(&mm->mmap_lock);
> +
> +    return 0;
> +}
> +EXPORT_SYMBOL(dump_memory_map);
>
>
> Thanks,
> JM
Jean-Michel Hautbois June 28, 2024, 8:02 a.m. UTC | #14
Michael,

On 28/06/2024 09:48, Michael Schmitz wrote:
> Jean-Michel,
> 
> Am 28.06.2024 um 19:24 schrieb Jean-Michel Hautbois:
>>> I forgot to take into account that libraries are loaded only the
>>> binary starts executing. Can you print the same map dump in the exit
>>> syscall code path? That ought to show all loaded libraries at that 
>>> point.
>>
>> Thanks for your suggestion !
>> I changed it a bit, and I added a call in do_exit() as suggested. When
>> executing ls I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> do_exit: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> When I call it a second time, I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>> do_exit: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> 
> No heap in this second call. Can you print mm->start_brk and mm->brk 
> please?

Sure:
mmap: start_brk: 700ca000, brk: 700ca000

> 
> The process memory layout is a little unusual (I would have expected the 
> binary to be mapped before the dynamic libraries, not after). Is that 
> expected on Coldfire, Greg?
> 
> Cheers,
> 
>      Michael
> 
> 
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>
>> The first call generates the "ls" output, not the second one.
>> The helper looks like:
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 83b4682ec85c..14d861e9cba2 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
>> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>>  static bool ignore_rlimit_data;
>>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>>
>> +int dump_memory_map(struct task_struct *task)
>> +{
>> +    struct mm_struct *mm = task->mm;
>> +    struct vm_area_struct *vma;
>> +    struct file *file;
>> +    struct path *path;
>> +    char *buf;
>> +    char *pathname;
>> +
>> +       if (!mm) {
>> +               return -ENOMEM;
>> +       }
>> +
>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>> +    // Acquire the read lock for mmap_lock
>> +    down_read(&mm->mmap_lock);
>> +       mas_lock(&mas);
>> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>> ULONG_MAX)) {
>> +               char perms[5] = "---p"; // Default permissions
>> +               // Set permissions based on vm_flags
>> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
>> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
>> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
>> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
>> +
>> +               if (vma->vm_file) { // If there's an associated file
>> +                       buf = (char *)__get_free_page(GFP_KERNEL);
>> +                       if (!buf) {
>> +                               continue; // Handle memory allocation
>> failure
>> +                       }
>> +
>> +                       file = vma->vm_file;
>> +                       path = &file->f_path;
>> +                       pathname = d_path(path, buf, PAGE_SIZE);
>> +                       if (IS_ERR(pathname)) {
>> +                               pathname = NULL;
>> +                       }
>> +
>> +                       // Print memory area information with file path
>> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu 
>> %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               MAJOR(file_inode(file)->i_rdev),
>> +                               MINOR(file_inode(file)->i_rdev),
>> +                               file_inode(file)->i_ino,
>> +                               pathname ? pathname : "");
>> +
>> +                       free_page((unsigned long)buf);
>> +               } else {
>> +                       char *special_area_name = NULL;
>> +
>> +                       // Check for heap
>> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
>> < mm->brk) {
>> +                               special_area_name = "[heap]";
>> +                       }
>> +                       // Check for stack
>> +                       else if (vma->vm_start <= mm->start_stack &&
>> vma->vm_end >= mm->start_stack) {
>> +                               special_area_name = "[stack]";
>> +                       }
>> +                       // Check for vdso
>> +                       else if (vma->vm_flags & VM_EXEC &&
>> vma->vm_flags & VM_READ && !vma->vm_file) {
>> +                               special_area_name = "[vdso]";
>> +                       }
>> +
>> +                       // Print memory area information without file 
>> path
>> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               special_area_name ? special_area_name :
>> "    ");
>> +               }
>> +       }
>> +       mas_unlock(&mas);
>> +    // Release the read lock for mmap_lock
>> +    up_read(&mm->mmap_lock);
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(dump_memory_map);
>>
>>
>> Thanks,
>> JM
Jean-Michel Hautbois June 28, 2024, 11:25 a.m. UTC | #15
Michael,

On 28/06/2024 09:48, Michael Schmitz wrote:
> Jean-Michel,
> 
> Am 28.06.2024 um 19:24 schrieb Jean-Michel Hautbois:
>>> I forgot to take into account that libraries are loaded only the
>>> binary starts executing. Can you print the same map dump in the exit
>>> syscall code path? That ought to show all loaded libraries at that 
>>> point.
>>
>> Thanks for your suggestion !
>> I changed it a bit, and I added a call in do_exit() as suggested. When
>> executing ls I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> do_exit: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> When I call it a second time, I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>> do_exit: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> 
> No heap in this second call. Can you print mm->start_brk and mm->brk 
> please?
> 
> The process memory layout is a little unusual (I would have expected the 
> binary to be mapped before the dynamic libraries, not after). Is that 
> expected on Coldfire, Greg?
> 

I did the test, and called ls twice to see what differs:
bash-5.2# ls -l > /dev/null
load_elf_binary: Dump memory for ls (31):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
do_exit: Dump memory for ls (31):
mmap: start_brk: 700ca000, brk: 700ec000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
bash-5.2# ls -l > /dev/null
load_elf_binary: Dump memory for ls (33):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bf894000-bf8b6000 rw-p bffde000 00:00 0 [stack]
do_exit: Dump memory for ls (33):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: bf894000-bf8b6000 rw-p bffde000 00:00 0 [stack]

The second time, there seems to be no heap...

JM

> Cheers,
> 
>      Michael
> 
> 
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>
>> The first call generates the "ls" output, not the second one.
>> The helper looks like:
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 83b4682ec85c..14d861e9cba2 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
>> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>>  static bool ignore_rlimit_data;
>>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>>
>> +int dump_memory_map(struct task_struct *task)
>> +{
>> +    struct mm_struct *mm = task->mm;
>> +    struct vm_area_struct *vma;
>> +    struct file *file;
>> +    struct path *path;
>> +    char *buf;
>> +    char *pathname;
>> +
>> +       if (!mm) {
>> +               return -ENOMEM;
>> +       }
>> +
>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>> +    // Acquire the read lock for mmap_lock
>> +    down_read(&mm->mmap_lock);
>> +       mas_lock(&mas);
>> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>> ULONG_MAX)) {
>> +               char perms[5] = "---p"; // Default permissions
>> +               // Set permissions based on vm_flags
>> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
>> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
>> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
>> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
>> +
>> +               if (vma->vm_file) { // If there's an associated file
>> +                       buf = (char *)__get_free_page(GFP_KERNEL);
>> +                       if (!buf) {
>> +                               continue; // Handle memory allocation
>> failure
>> +                       }
>> +
>> +                       file = vma->vm_file;
>> +                       path = &file->f_path;
>> +                       pathname = d_path(path, buf, PAGE_SIZE);
>> +                       if (IS_ERR(pathname)) {
>> +                               pathname = NULL;
>> +                       }
>> +
>> +                       // Print memory area information with file path
>> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu 
>> %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               MAJOR(file_inode(file)->i_rdev),
>> +                               MINOR(file_inode(file)->i_rdev),
>> +                               file_inode(file)->i_ino,
>> +                               pathname ? pathname : "");
>> +
>> +                       free_page((unsigned long)buf);
>> +               } else {
>> +                       char *special_area_name = NULL;
>> +
>> +                       // Check for heap
>> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
>> < mm->brk) {
>> +                               special_area_name = "[heap]";
>> +                       }
>> +                       // Check for stack
>> +                       else if (vma->vm_start <= mm->start_stack &&
>> vma->vm_end >= mm->start_stack) {
>> +                               special_area_name = "[stack]";
>> +                       }
>> +                       // Check for vdso
>> +                       else if (vma->vm_flags & VM_EXEC &&
>> vma->vm_flags & VM_READ && !vma->vm_file) {
>> +                               special_area_name = "[vdso]";
>> +                       }
>> +
>> +                       // Print memory area information without file 
>> path
>> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               special_area_name ? special_area_name :
>> "    ");
>> +               }
>> +       }
>> +       mas_unlock(&mas);
>> +    // Release the read lock for mmap_lock
>> +    up_read(&mm->mmap_lock);
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(dump_memory_map);
>>
>>
>> Thanks,
>> JM
Michael Schmitz June 29, 2024, 3:41 a.m. UTC | #16
Jean-Michel,

Am 28.06.2024 um 23:25 schrieb Jean-Michel Hautbois:
>>
>> No heap in this second call. Can you print mm->start_brk and mm->brk
>> please?
>>
>> The process memory layout is a little unusual (I would have expected
>> the binary to be mapped before the dynamic libraries, not after). Is
>> that expected on Coldfire, Greg?
>>
>
> I did the test, and called ls twice to see what differs:
> bash-5.2# ls -l > /dev/null
> load_elf_binary: Dump memory for ls (31):
> mmap: start_brk: 700ca000, brk: 700ca000
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
> mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
> do_exit: Dump memory for ls (31):
> mmap: start_brk: 700ca000, brk: 700ec000

ls managed to allocate heap space OK ...

> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
> mmap: 60030000-60032000 rw-p 60030000 00:00 0
> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
> mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
> bash-5.2# ls -l > /dev/null
> load_elf_binary: Dump memory for ls (33):
> mmap: start_brk: 700ca000, brk: 700ca000
> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
> mmap: bf894000-bf8b6000 rw-p bffde000 00:00 0 [stack]
> do_exit: Dump memory for ls (33):
> mmap: start_brk: 700ca000, brk: 700ca000

No heap space allocated here. I would have expected that to cause an 
error message from libc ...

Can you print a brief meminfo summary such as found in fs/proc/meminfo.c 
(total, free, available, buffer and cached for starters)?

Cheers,

	Michael


> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
> mmap: 60030000-60032000 rw-p 60030000 00:00 0
> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> mmap: bf894000-bf8b6000 rw-p bffde000 00:00 0 [stack]
>
> The second time, there seems to be no heap...
>
> JM
>
>> Cheers,
>>
>>      Michael
>>
>>
>>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>>
>>> The first call generates the "ls" output, not the second one.
>>> The helper looks like:
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 83b4682ec85c..14d861e9cba2 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
>>> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>>>  static bool ignore_rlimit_data;
>>>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>>>
>>> +int dump_memory_map(struct task_struct *task)
>>> +{
>>> +    struct mm_struct *mm = task->mm;
>>> +    struct vm_area_struct *vma;
>>> +    struct file *file;
>>> +    struct path *path;
>>> +    char *buf;
>>> +    char *pathname;
>>> +
>>> +       if (!mm) {
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>>> +    // Acquire the read lock for mmap_lock
>>> +    down_read(&mm->mmap_lock);
>>> +       mas_lock(&mas);
>>> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>>> ULONG_MAX)) {
>>> +               char perms[5] = "---p"; // Default permissions
>>> +               // Set permissions based on vm_flags
>>> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
>>> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
>>> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
>>> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
>>> +
>>> +               if (vma->vm_file) { // If there's an associated file
>>> +                       buf = (char *)__get_free_page(GFP_KERNEL);
>>> +                       if (!buf) {
>>> +                               continue; // Handle memory allocation
>>> failure
>>> +                       }
>>> +
>>> +                       file = vma->vm_file;
>>> +                       path = &file->f_path;
>>> +                       pathname = d_path(path, buf, PAGE_SIZE);
>>> +                       if (IS_ERR(pathname)) {
>>> +                               pathname = NULL;
>>> +                       }
>>> +
>>> +                       // Print memory area information with file path
>>> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu
>>> %s\n",
>>> +                               vma->vm_start, vma->vm_end,
>>> +                               perms,
>>> +                               vma->vm_pgoff << PAGE_SHIFT,
>>> +                               MAJOR(file_inode(file)->i_rdev),
>>> +                               MINOR(file_inode(file)->i_rdev),
>>> +                               file_inode(file)->i_ino,
>>> +                               pathname ? pathname : "");
>>> +
>>> +                       free_page((unsigned long)buf);
>>> +               } else {
>>> +                       char *special_area_name = NULL;
>>> +
>>> +                       // Check for heap
>>> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
>>> < mm->brk) {
>>> +                               special_area_name = "[heap]";
>>> +                       }
>>> +                       // Check for stack
>>> +                       else if (vma->vm_start <= mm->start_stack &&
>>> vma->vm_end >= mm->start_stack) {
>>> +                               special_area_name = "[stack]";
>>> +                       }
>>> +                       // Check for vdso
>>> +                       else if (vma->vm_flags & VM_EXEC &&
>>> vma->vm_flags & VM_READ && !vma->vm_file) {
>>> +                               special_area_name = "[vdso]";
>>> +                       }
>>> +
>>> +                       // Print memory area information without file
>>> path
>>> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
>>> +                               vma->vm_start, vma->vm_end,
>>> +                               perms,
>>> +                               vma->vm_pgoff << PAGE_SHIFT,
>>> +                               special_area_name ? special_area_name :
>>> "    ");
>>> +               }
>>> +       }
>>> +       mas_unlock(&mas);
>>> +    // Release the read lock for mmap_lock
>>> +    up_read(&mm->mmap_lock);
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(dump_memory_map);
>>>
>>>
>>> Thanks,
>>> JM
Jean-Michel Hautbois June 29, 2024, 7:57 a.m. UTC | #17
Michael,

On 29/06/2024 05:41, Michael Schmitz wrote:
> Jean-Michel,
> 
> Am 28.06.2024 um 23:25 schrieb Jean-Michel Hautbois:
>>>
>>> No heap in this second call. Can you print mm->start_brk and mm->brk
>>> please?
>>>
>>> The process memory layout is a little unusual (I would have expected
>>> the binary to be mapped before the dynamic libraries, not after). Is
>>> that expected on Coldfire, Greg?
>>>
>>
>> I did the test, and called ls twice to see what differs:
>> bash-5.2# ls -l > /dev/null
>> load_elf_binary: Dump memory for ls (31):
>> mmap: start_brk: 700ca000, brk: 700ca000
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
>> do_exit: Dump memory for ls (31):
>> mmap: start_brk: 700ca000, brk: 700ec000
> 
> ls managed to allocate heap space OK ...
> 
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
>> mmap: bfa48000-bfa6a000 rw-p bffde000 00:00 0 [stack]
>> bash-5.2# ls -l > /dev/null
>> load_elf_binary: Dump memory for ls (33):
>> mmap: start_brk: 700ca000, brk: 700ca000
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bf894000-bf8b6000 rw-p bffde000 00:00 0 [stack]
>> do_exit: Dump memory for ls (33):
>> mmap: start_brk: 700ca000, brk: 700ca000
> 
> No heap space allocated here. I would have expected that to cause an 
> error message from libc ...
> 
> Can you print a brief meminfo summary such as found in fs/proc/meminfo.c 
> (total, free, available, buffer and cached for starters)?

I dumped using sysrq, before and after each ls:
bash-5.2# sysrq: Show Memory
Mem-Info:
active_anon:1 inactive_anon:39 isolated_anon:0
  active_file:0 inactive_file:223 isolated_file:0
  unevictable:0 dirty:0 writeback:0 

  slab_reclaimable:109 slab_unreclaimable:641 

  mapped:212 shmem:0 pagetables:3 

  sec_pagetables:0 bounce:0 

  kernel_misc_reclaimable:0 

  free:28227 free_pcp:200 free_cma:0
Node 0 active_anon:8kB inactive_anon:312kB active_file:0kB 
inactive_file:1784kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:1696kB dirty:0kB writeback:0kB shmem:0kB 
writeback_tmp:0kB kernel_stack:168kB pagetables:24kB s
ec_pagetables:0kB all_unreclaimable? no
DMA free:225816kB boost:0kB min:1952kB low:2440kB high:2928kB 
reserved_highatomic:0KB active_anon:8kB inactive_anon:312kB 
active_file:0kB inactive_file:1784kB unevictable:0kB writepending:0kB 
present:262144kB managed:238752kB mlocked:0k
B bounce:0kB free_pcp:1600kB local_pcp:1600kB free_cma:0kB
lowmem_reserve[]: 0 0 0
DMA: 3*8kB (UME) 0*16kB 2*32kB (UM) 1*64kB (M) 1*128kB (U) 1*256kB (U) 
2*512kB (UM) 1*1024kB (E) 3*2048kB (UME) 3*4096kB (UME) 25*8192kB (M) = 
225816kB
230 total pagecache pages
0 pages in swap cache
Free swap  = 0kB
Total swap = 0kB
32768 pages RAM
0 pages HighMem/MovableOnly
2924 pages reserved

bash-5.2# ls -l > /dev/null
load_elf_binary: Dump memory for ls (31):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bfb18000-bfb3a000 rw-p bffde000 00:00 0 [stack]
do_exit: Dump memory for ls (31):
mmap: start_brk: 700ca000, brk: 700ec000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
mmap: bfb18000-bfb3a000 rw-p bffde000 00:00 0 [stack]
bash-5.2# sysrq: Show Memory
Mem-Info:
active_anon:0 inactive_anon:47 isolated_anon:0
  active_file:83 inactive_file:197 isolated_file:0 

  unevictable:0 dirty:0 writeback:0 

  slab_reclaimable:110 slab_unreclaimable:650 

  mapped:219 shmem:0 pagetables:3 

  sec_pagetables:0 bounce:0 

  kernel_misc_reclaimable:0
  free:28206 free_pcp:160 free_cma:0
Node 0 active_anon:0kB inactive_anon:376kB active_file:664kB 
inactive_file:1576kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:1752kB dirty:0kB writeback:0kB shmem:0kB 
writeback_tmp:0kB kernel_stack:168kB pagetables:24kB
  sec_pagetables:0kB all_unreclaimable? no
DMA free:225648kB boost:0kB min:1952kB low:2440kB high:2928kB 
reserved_highatomic:0KB active_anon:0kB inactive_anon:376kB 
active_file:664kB inactive_file:1576kB unevictable:0kB writepending:0kB 
present:262144kB managed:238752kB mlocked:
0kB bounce:0kB free_pcp:1280kB local_pcp:1280kB free_cma:0kB
lowmem_reserve[]: 0 0 0
DMA: 2*8kB (UE) 0*16kB 1*32kB (U) 1*64kB (M) 2*128kB (UM) 2*256kB (UM) 
1*512kB (U) 1*1024kB (E) 3*2048kB (UME) 3*4096kB (UME) 25*8192kB (M) = 
225648kB
280 total pagecache pages
0 pages in swap cache
Free swap  = 0kB
Total swap = 0kB
32768 pages RAM
0 pages HighMem/MovableOnly
2924 pages reserved

bash-5.2# ls -l > /dev/null
load_elf_binary: Dump memory for ls (33):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
mmap: bf89e000-bf8c0000 rw-p bffde000 00:00 0 [stack]
do_exit: Dump memory for ls (33):
mmap: start_brk: 700ca000, brk: 700ca000
mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
mmap: 60030000-60032000 rw-p 60030000 00:00 0
mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
mmap: 60160000-6016e000 rw-p 60160000 00:00 0
mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
mmap: bf89e000-bf8c0000 rw-p bffde000 00:00 0 [stack]
bash-5.2# sysrq: Show Memory
Mem-Info:
active_anon:0 inactive_anon:47 isolated_anon:0
  active_file:196 inactive_file:84 isolated_file:0
  unevictable:0 dirty:0 writeback:0
  slab_reclaimable:110 slab_unreclaimable:650
  mapped:219 shmem:0 pagetables:3
  sec_pagetables:0 bounce:0
  kernel_misc_reclaimable:0
  free:28206 free_pcp:159 free_cma:0
Node 0 active_anon:0kB inactive_anon:376kB active_file:1568kB 
inactive_file:672kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB mapped:1752kB dirty:0kB writeback:0kB shmem:0kB 
writeback_tmp:0kB kernel_stack:176kB pagetables:24kB
  sec_pagetables:0kB all_unreclaimable? no
DMA free:225648kB boost:0kB min:1952kB low:2440kB high:2928kB 
reserved_highatomic:0KB active_anon:0kB inactive_anon:376kB 
active_file:1568kB inactive_file:672kB unevictable:0kB writepending:0kB 
present:262144kB managed:238752kB mlocked:
0kB bounce:0kB free_pcp:1272kB local_pcp:1272kB free_cma:0kB
lowmem_reserve[]: 0 0 0
DMA: 2*8kB (UE) 0*16kB 1*32kB (U) 1*64kB (M) 2*128kB (UM) 2*256kB (UM) 
1*512kB (U) 1*1024kB (E) 3*2048kB (UME) 3*4096kB (UME) 25*8192kB (M) = 
225648kB
280 total pagecache pages
0 pages in swap cache
Free swap  = 0kB
Total swap = 0kB
32768 pages RAM
0 pages HighMem/MovableOnly
2924 pages reserved

JM
Michael Schmitz June 29, 2024, 8:01 a.m. UTC | #18
Hi Andreas,

Am 29.06.2024 um 18:57 schrieb Andreas Schwab:
> On Jun 29 2024, Michael Schmitz wrote:
>
>> No heap space allocated here. I would have expected that to cause an error
>> message from libc ...
>
> No, the heap starts at _end and there is still enough space before the
> next page boundary.

Right - not all that unusual then.

Thanks,

	Michael
Greg Ungerer June 30, 2024, 10:35 p.m. UTC | #19
Hi Michael,

On 28/6/24 17:48, Michael Schmitz wrote:
> Jean-Michel,
> 
> Am 28.06.2024 um 19:24 schrieb Jean-Michel Hautbois:
>>> I forgot to take into account that libraries are loaded only the
>>> binary starts executing. Can you print the same map dump in the exit
>>> syscall code path? That ought to show all loaded libraries at that point.
>>
>> Thanks for your suggestion !
>> I changed it a bit, and I added a call in do_exit() as suggested. When
>> executing ls I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> do_exit: Dump memory for ls (31):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>
>> When I call it a second time, I get:
>>
>> bash-5.2# ls -l
>> load_elf_binary: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>> do_exit: Dump memory for ls (33):
>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
> 
> No heap in this second call. Can you print mm->start_brk and mm->brk please?
> 
> The process memory layout is a little unusual (I would have expected the binary to be mapped before the dynamic libraries, not after). Is that expected on Coldfire, Greg?

I am not entirely sure of the history behind the layouts. But for the M547x family
I have done most MMU work on this is typical. So like this:

# cat /proc/1/maps
60000000-60008000 r-xp 00000000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
60008000-6000a000 r--p 00006000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
6000a000-6000c000 rw-p 00008000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
6000c000-6000e000 rw-p 00000000 00:00 0
60010000-60014000 r-xp 00000000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
60014000-60016000 r--p 00002000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
60016000-60018000 rw-p 00004000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
60018000-60028000 rw-p 00000000 00:00 0
60028000-60080000 r-xp 00000000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
60080000-60082000 r--p 00056000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
60082000-60084000 rw-p 00058000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
60084000-60086000 rw-p 00000000 00:00 0
80000000-80004000 r-xp 00000000 1f:00 1882624    /bin/init
80004000-80006000 r--p 00002000 1f:00 1882624    /bin/init
80006000-80008000 rw-p 00004000 1f:00 1882624    /bin/init
80008000-8000a000 rwxp 00000000 00:00 0          [heap]
bfdba000-bfddc000 rwxp 00000000 00:00 0          [stack]

The 0x60000000 library addresses are due to mmaping - and that is based on the
definition of TASK_UNMAPPED_BASE for ColdFire, from arch/m68k/include/asm/processor.h

     /* This decides where the kernel will search for a free chunk of vm
      * space during mmap's.
      */
     #ifdef CONFIG_MMU
     #if defined(CONFIG_COLDFIRE)
     #define TASK_UNMAPPED_BASE      0x60000000UL


The application address range of 0x80000000 are baked in at link time:

$ m68k-linux-objdump --headers bin/init

init:     file format elf32-m68k

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
   0 .interp       0000000d  80000114  80000114  00000114  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   1 .hash         00000170  80000124  80000124  00000124  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   2 .gnu.hash     000001b4  80000294  80000294  00000294  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   3 .dynsym       00000350  80000448  80000448  00000448  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   4 .dynstr       00000174  80000798  80000798  00000798  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   5 .rela.dyn     00000018  8000090c  8000090c  0000090c  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   6 .rela.plt     00000240  80000924  80000924  00000924  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   7 .init         00000014  80000b64  80000b64  00000b64  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   8 .plt          00000498  80000b78  80000b78  00000b78  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   9 .text         00001258  80001010  80001010  00001010  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  10 .fini         0000000e  80002268  80002268  00002268  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  11 .rodata       00000257  80002276  80002276  00002276  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  12 .eh_frame     00000004  800024d0  800024d0  000024d0  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  13 .ctors        00000008  80005f30  80005f30  00003f30  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  14 .dtors        00000008  80005f38  80005f38  00003f38  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  15 .dynamic      000000c0  80005f40  80005f40  00003f40  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  16 .got          000000cc  80006000  80006000  00004000  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  17 .data         00000008  800060cc  800060cc  000040cc  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  18 .bss          0000002c  800060d4  800060d4  000040d4  2**2
                   ALLOC


I am not sure why JM's link has applications linked at 0x70000000.
Is that a glibc thing?  My examples above are all based on uClibc.

While we are talking about link addresses, JM, can you tell me what
your kernel is linked at?  For me it is from a base near 0 (well actually
128k offset, but there is some meaning to that address) which matches
the physical DRAM which starts at address 0:

$ m68k-linux-objdump --headers linux/vmlinux

linux/vmlinux:     file format elf32-m68k

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
   0 .text         002e1800  00020000  00020000  00002000  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   1 .rodata       0003f398  00302000  00302000  002e4000  2**4
                   CONTENTS, ALLOC, LOAD, DATA
...


I know the 54418 typically has DRAM at a different physical offset
(I think it is 0x40000000?), so wondering if the VM layout was
adjusted in some way to cater for that difference?

Regards
Greg





> Cheers,
> 
>      Michael
> 
> 
>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>
>> The first call generates the "ls" output, not the second one.
>> The helper looks like:
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 83b4682ec85c..14d861e9cba2 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
>> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>>  static bool ignore_rlimit_data;
>>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>>
>> +int dump_memory_map(struct task_struct *task)
>> +{
>> +    struct mm_struct *mm = task->mm;
>> +    struct vm_area_struct *vma;
>> +    struct file *file;
>> +    struct path *path;
>> +    char *buf;
>> +    char *pathname;
>> +
>> +       if (!mm) {
>> +               return -ENOMEM;
>> +       }
>> +
>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>> +    // Acquire the read lock for mmap_lock
>> +    down_read(&mm->mmap_lock);
>> +       mas_lock(&mas);
>> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>> ULONG_MAX)) {
>> +               char perms[5] = "---p"; // Default permissions
>> +               // Set permissions based on vm_flags
>> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
>> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
>> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
>> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
>> +
>> +               if (vma->vm_file) { // If there's an associated file
>> +                       buf = (char *)__get_free_page(GFP_KERNEL);
>> +                       if (!buf) {
>> +                               continue; // Handle memory allocation
>> failure
>> +                       }
>> +
>> +                       file = vma->vm_file;
>> +                       path = &file->f_path;
>> +                       pathname = d_path(path, buf, PAGE_SIZE);
>> +                       if (IS_ERR(pathname)) {
>> +                               pathname = NULL;
>> +                       }
>> +
>> +                       // Print memory area information with file path
>> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               MAJOR(file_inode(file)->i_rdev),
>> +                               MINOR(file_inode(file)->i_rdev),
>> +                               file_inode(file)->i_ino,
>> +                               pathname ? pathname : "");
>> +
>> +                       free_page((unsigned long)buf);
>> +               } else {
>> +                       char *special_area_name = NULL;
>> +
>> +                       // Check for heap
>> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
>> < mm->brk) {
>> +                               special_area_name = "[heap]";
>> +                       }
>> +                       // Check for stack
>> +                       else if (vma->vm_start <= mm->start_stack &&
>> vma->vm_end >= mm->start_stack) {
>> +                               special_area_name = "[stack]";
>> +                       }
>> +                       // Check for vdso
>> +                       else if (vma->vm_flags & VM_EXEC &&
>> vma->vm_flags & VM_READ && !vma->vm_file) {
>> +                               special_area_name = "[vdso]";
>> +                       }
>> +
>> +                       // Print memory area information without file path
>> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
>> +                               vma->vm_start, vma->vm_end,
>> +                               perms,
>> +                               vma->vm_pgoff << PAGE_SHIFT,
>> +                               special_area_name ? special_area_name :
>> "    ");
>> +               }
>> +       }
>> +       mas_unlock(&mas);
>> +    // Release the read lock for mmap_lock
>> +    up_read(&mm->mmap_lock);
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(dump_memory_map);
>>
>>
>> Thanks,
>> JM
Jean-Michel Hautbois July 1, 2024, 5:47 a.m. UTC | #20
Hi Greg, Michael,

On 01/07/2024 00:35, Greg Ungerer wrote:
> Hi Michael,
> 
> On 28/6/24 17:48, Michael Schmitz wrote:
>> Jean-Michel,
>>
>> Am 28.06.2024 um 19:24 schrieb Jean-Michel Hautbois:
>>>> I forgot to take into account that libraries are loaded only the
>>>> binary starts executing. Can you print the same map dump in the exit
>>>> syscall code path? That ought to show all loaded libraries at that 
>>>> point.
>>>
>>> Thanks for your suggestion !
>>> I changed it a bit, and I added a call in do_exit() as suggested. When
>>> executing ls I get:
>>>
>>> bash-5.2# ls -l
>>> load_elf_binary: Dump memory for ls (31):
>>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>>
>>> do_exit: Dump memory for ls (31):
>>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>>> mmap: 700ca000-700ec000 rwxp 700ca000 00:00 0 [heap]
>>> mmap: bfc1e000-bfc40000 rw-p bffde000 00:00 0 [stack]
>>>
>>> When I call it a second time, I get:
>>>
>>> bash-5.2# ls -l
>>> load_elf_binary: Dump memory for ls (33):
>>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>>> mmap: 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
>>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>>> mmap: 700c2000-700ca000 rw-p 000c0000 00:00 28 /bin/busybox
>>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>> do_exit: Dump memory for ls (33):
>>> mmap: 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
>>> mmap: 6001e000-60020000 r--p 0001c000 00:00 178 /lib/ld.so.1
>>> mmap: 60020000-60022000 rw-p 0001e000 00:00 178 /lib/ld.so.1
>>> mmap: 60022000-6002c000 r-xp 00000000 00:00 193 /lib/libresolv.so.2
>>> mmap: 6002c000-6002e000 r--p 00008000 00:00 193 /lib/libresolv.so.2
>>> mmap: 6002e000-60030000 rw-p 0000a000 00:00 193 /lib/libresolv.so.2
>>> mmap: 60030000-60032000 rw-p 60030000 00:00 0
>>> mmap: 60032000-6015a000 r-xp 00000000 00:00 185 /lib/libc.so.6
>>> mmap: 6015a000-6015c000 r--p 00126000 00:00 185 /lib/libc.so.6
>>> mmap: 6015c000-60160000 rw-p 00128000 00:00 185 /lib/libc.so.6
>>> mmap: 60160000-6016e000 rw-p 60160000 00:00 0
>>> mmap: 70000000-700c2000 r-xp 00000000 00:00 28 /bin/busybox
>>> mmap: 700c2000-700c4000 r--p 000c0000 00:00 28 /bin/busybox
>>> mmap: 700c4000-700ca000 rw-p 000c2000 00:00 28 /bin/busybox
>>
>> No heap in this second call. Can you print mm->start_brk and mm->brk 
>> please?
>>
>> The process memory layout is a little unusual (I would have expected 
>> the binary to be mapped before the dynamic libraries, not after). Is 
>> that expected on Coldfire, Greg?
> 
> I am not entirely sure of the history behind the layouts. But for the 
> M547x family
> I have done most MMU work on this is typical. So like this:
> 
> # cat /proc/1/maps
> 60000000-60008000 r-xp 00000000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
> 60008000-6000a000 r--p 00006000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
> 6000a000-6000c000 rw-p 00008000 1f:00 550544     /lib/ld-uClibc-0.9.33.2.so
> 6000c000-6000e000 rw-p 00000000 00:00 0
> 60010000-60014000 r-xp 00000000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
> 60014000-60016000 r--p 00002000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
> 60016000-60018000 rw-p 00004000 1f:00 1194384    /lib/libcrypt-0.9.33.2.so
> 60018000-60028000 rw-p 00000000 00:00 0
> 60028000-60080000 r-xp 00000000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
> 60080000-60082000 r--p 00056000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
> 60082000-60084000 rw-p 00058000 1f:00 184160     /lib/libuClibc-0.9.33.2.so
> 60084000-60086000 rw-p 00000000 00:00 0
> 80000000-80004000 r-xp 00000000 1f:00 1882624    /bin/init
> 80004000-80006000 r--p 00002000 1f:00 1882624    /bin/init
> 80006000-80008000 rw-p 00004000 1f:00 1882624    /bin/init
> 80008000-8000a000 rwxp 00000000 00:00 0          [heap]
> bfdba000-bfddc000 rwxp 00000000 00:00 0          [stack]
> 
> The 0x60000000 library addresses are due to mmaping - and that is based 
> on the
> definition of TASK_UNMAPPED_BASE for ColdFire, from 
> arch/m68k/include/asm/processor.h
> 
>      /* This decides where the kernel will search for a free chunk of vm
>       * space during mmap's.
>       */
>      #ifdef CONFIG_MMU
>      #if defined(CONFIG_COLDFIRE)
>      #define TASK_UNMAPPED_BASE      0x60000000UL
> 
> 
> The application address range of 0x80000000 are baked in at link time:
> 
> $ m68k-linux-objdump --headers bin/init
> 
> init:     file format elf32-m68k
> 
> Sections:
> Idx Name          Size      VMA       LMA       File off  Algn
>    0 .interp       0000000d  80000114  80000114  00000114  2**0
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    1 .hash         00000170  80000124  80000124  00000124  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    2 .gnu.hash     000001b4  80000294  80000294  00000294  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    3 .dynsym       00000350  80000448  80000448  00000448  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    4 .dynstr       00000174  80000798  80000798  00000798  2**0
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    5 .rela.dyn     00000018  8000090c  8000090c  0000090c  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    6 .rela.plt     00000240  80000924  80000924  00000924  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>    7 .init         00000014  80000b64  80000b64  00000b64  2**1
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>    8 .plt          00000498  80000b78  80000b78  00000b78  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>    9 .text         00001258  80001010  80001010  00001010  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>   10 .fini         0000000e  80002268  80002268  00002268  2**1
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>   11 .rodata       00000257  80002276  80002276  00002276  2**0
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>   12 .eh_frame     00000004  800024d0  800024d0  000024d0  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, DATA
>   13 .ctors        00000008  80005f30  80005f30  00003f30  2**2
>                    CONTENTS, ALLOC, LOAD, DATA
>   14 .dtors        00000008  80005f38  80005f38  00003f38  2**2
>                    CONTENTS, ALLOC, LOAD, DATA
>   15 .dynamic      000000c0  80005f40  80005f40  00003f40  2**2
>                    CONTENTS, ALLOC, LOAD, DATA
>   16 .got          000000cc  80006000  80006000  00004000  2**2
>                    CONTENTS, ALLOC, LOAD, DATA
>   17 .data         00000008  800060cc  800060cc  000040cc  2**2
>                    CONTENTS, ALLOC, LOAD, DATA
>   18 .bss          0000002c  800060d4  800060d4  000040d4  2**2
>                    ALLOC

In my case I have:
m68k-linux-objdump --headers ../buildroot/output/target/bin/bash

../buildroot/output/target/bin/bash:     file format elf32-m68k

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
   0 .interp       0000000d  00000154  00000154  00000154  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   1 .note.ABI-tag 00000020  00000164  00000164  00000164  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   2 .hash         00002f74  00000184  00000184  00000184  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   3 .gnu.hash     00003180  000030f8  000030f8  000030f8  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   4 .dynsym       00007d40  00006278  00006278  00006278  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   5 .dynstr       0000771b  0000dfb8  0000dfb8  0000dfb8  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   6 .gnu.version  00000fa8  000156d4  000156d4  000156d4  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   7 .gnu.version_r 000000a0  0001667c  0001667c  0001667c  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   8 .rela.dyn     0000a788  0001671c  0001671c  0001671c  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   9 .rela.plt     00003624  00020ea4  00020ea4  00020ea4  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  10 .init         00000024  000244c8  000244c8  000244c8  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  11 .plt          00006c60  000244ec  000244ec  000244ec  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  12 .text         0006a008  0002b14c  0002b14c  0002b14c  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  13 .fini         00000018  00095154  00095154  00095154  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  14 .rodata       00016302  0009516c  0009516c  0009516c  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  15 .eh_frame_hdr 0000002c  000ab470  000ab470  000ab470  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  16 .eh_frame     000000e8  000ab49c  000ab49c  000ab49c  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  17 .init_array   00000004  000ad2a8  000ad2a8  000ad2a8  2**1
                   CONTENTS, ALLOC, LOAD, DATA
  18 .fini_array   00000004  000ad2ac  000ad2ac  000ad2ac  2**1
                   CONTENTS, ALLOC, LOAD, DATA
  19 .data.rel.ro  00000c40  000ad2b0  000ad2b0  000ad2b0  2**1
                   CONTENTS, ALLOC, LOAD, DATA
  20 .dynamic      00000110  000adef0  000adef0  000adef0  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  21 .got          000039d8  000ae000  000ae000  000ae000  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  22 .data         0000198c  000b19d8  000b19d8  000b19d8  2**2
                   CONTENTS, ALLOC, LOAD, DATA
  23 .bss          000090b0  000b3364  000b3364  000b3364  2**2
                   ALLOC


> 
> I am not sure why JM's link has applications linked at 0x70000000.
> Is that a glibc thing?  My examples above are all based on uClibc.
> 
> While we are talking about link addresses, JM, can you tell me what
> your kernel is linked at?  For me it is from a base near 0 (well actually
> 128k offset, but there is some meaning to that address) which matches
> the physical DRAM which starts at address 0:
> 
> $ m68k-linux-objdump --headers linux/vmlinux
> 
> linux/vmlinux:     file format elf32-m68k
> 
> Sections:
> Idx Name          Size      VMA       LMA       File off  Algn
>    0 .text         002e1800  00020000  00020000  00002000  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>    1 .rodata       0003f398  00302000  00302000  002e4000  2**4
>                    CONTENTS, ALLOC, LOAD, DATA
> ...
> 
> 
> I know the 54418 typically has DRAM at a different physical offset
> (I think it is 0x40000000?), so wondering if the VM layout was
> adjusted in some way to cater for that difference?

The SDRAM is mapped at 0x40000000 yes.
In my case it is linked like this:
m68k-linux-objdump --headers vmlinux

vmlinux:     file format elf32-m68k

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
   0 .text         00377ad0  41002000  41002000  00002000  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   1 .rodata       000e95c0  4137a000  4137a000  0037a000  2**4
                   CONTENTS, ALLOC, LOAD, DATA
   2 __ksymtab     00009e04  414635c0  414635c0  004635c0  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   3 __ksymtab_gpl 000079e0  4146d3c4  4146d3c4  0046d3c4  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   4 __ksymtab_strings 0001a88b  41474da4  41474da4  00474da4  2**0
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   5 __param       000004d8  4148f630  4148f630  0048f630  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   6 __modver      000000aa  4148fb08  4148fb08  0048fb08  2**1
                   CONTENTS, ALLOC, LOAD, DATA
   7 .notes        00000054  4148fbb4  4148fbb4  0048fbb4  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
   8 .data         00091b70  41490000  41490000  00490000  2**4
                   CONTENTS, ALLOC, LOAD, DATA
   9 __ex_table    000014e8  41521b70  41521b70  00521b70  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, DATA
  10 .init.text    000121f4  41524000  41524000  00524000  2**1
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
  11 .init.data    0000275c  415361f4  415361f4  005361f4  2**1
                   CONTENTS, ALLOC, LOAD, DATA
  12 .data..percpu 00000000  4153a000  4153a248  0053a248  2**0
                   CONTENTS, ALLOC, LOAD, DATA
  13 .m68k_fixup   00000248  4153a000  4153a000  0053a000  2**0
                   CONTENTS, ALLOC, LOAD, DATA
  14 .init.data    00001db8  4153a248  4153a248  0053a248  2**0
                   ALLOC
  15 .bss          00035098  4153c000  4153c000  0053a248  2**4
                   ALLOC
  16 .comment      00000035  00000000  00000000  0053a248  2**0
                   CONTENTS, READONLY

Thanks,
JM

> Regards
> Greg
> 
> 
> 
> 
> 
>> Cheers,
>>
>>      Michael
>>
>>
>>> mmap: bfb5a000-bfb7c000 rw-p bffde000 00:00 0 [stack]
>>>
>>> The first call generates the "ls" output, not the second one.
>>> The helper looks like:
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 83b4682ec85c..14d861e9cba2 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -76,6 +76,87 @@ int mmap_rnd_compat_bits __read_mostly =
>>> CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
>>>  static bool ignore_rlimit_data;
>>>  core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
>>>
>>> +int dump_memory_map(struct task_struct *task)
>>> +{
>>> +    struct mm_struct *mm = task->mm;
>>> +    struct vm_area_struct *vma;
>>> +    struct file *file;
>>> +    struct path *path;
>>> +    char *buf;
>>> +    char *pathname;
>>> +
>>> +       if (!mm) {
>>> +               return -ENOMEM;
>>> +       }
>>> +
>>> +       MA_STATE(mas, &mm->mm_mt, 0, -1);
>>> +    // Acquire the read lock for mmap_lock
>>> +    down_read(&mm->mmap_lock);
>>> +       mas_lock(&mas);
>>> +       for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas,
>>> ULONG_MAX)) {
>>> +               char perms[5] = "---p"; // Default permissions
>>> +               // Set permissions based on vm_flags
>>> +               if (vma->vm_flags & VM_READ) perms[0] = 'r';
>>> +               if (vma->vm_flags & VM_WRITE) perms[1] = 'w';
>>> +               if (vma->vm_flags & VM_EXEC) perms[2] = 'x';
>>> +               if (vma->vm_flags & VM_MAYSHARE) perms[3] = 's';
>>> +
>>> +               if (vma->vm_file) { // If there's an associated file
>>> +                       buf = (char *)__get_free_page(GFP_KERNEL);
>>> +                       if (!buf) {
>>> +                               continue; // Handle memory allocation
>>> failure
>>> +                       }
>>> +
>>> +                       file = vma->vm_file;
>>> +                       path = &file->f_path;
>>> +                       pathname = d_path(path, buf, PAGE_SIZE);
>>> +                       if (IS_ERR(pathname)) {
>>> +                               pathname = NULL;
>>> +                       }
>>> +
>>> +                       // Print memory area information with file path
>>> +                       pr_info("%08lx-%08lx %s %08lx %02x:%02x %lu 
>>> %s\n",
>>> +                               vma->vm_start, vma->vm_end,
>>> +                               perms,
>>> +                               vma->vm_pgoff << PAGE_SHIFT,
>>> +                               MAJOR(file_inode(file)->i_rdev),
>>> +                               MINOR(file_inode(file)->i_rdev),
>>> +                               file_inode(file)->i_ino,
>>> +                               pathname ? pathname : "");
>>> +
>>> +                       free_page((unsigned long)buf);
>>> +               } else {
>>> +                       char *special_area_name = NULL;
>>> +
>>> +                       // Check for heap
>>> +                       if (vma->vm_end > mm->start_brk && vma->vm_start
>>> < mm->brk) {
>>> +                               special_area_name = "[heap]";
>>> +                       }
>>> +                       // Check for stack
>>> +                       else if (vma->vm_start <= mm->start_stack &&
>>> vma->vm_end >= mm->start_stack) {
>>> +                               special_area_name = "[stack]";
>>> +                       }
>>> +                       // Check for vdso
>>> +                       else if (vma->vm_flags & VM_EXEC &&
>>> vma->vm_flags & VM_READ && !vma->vm_file) {
>>> +                               special_area_name = "[vdso]";
>>> +                       }
>>> +
>>> +                       // Print memory area information without file 
>>> path
>>> +                       pr_info("%08lx-%08lx %s %08lx 00:00 0 %s\n",
>>> +                               vma->vm_start, vma->vm_end,
>>> +                               perms,
>>> +                               vma->vm_pgoff << PAGE_SHIFT,
>>> +                               special_area_name ? special_area_name :
>>> "    ");
>>> +               }
>>> +       }
>>> +       mas_unlock(&mas);
>>> +    // Release the read lock for mmap_lock
>>> +    up_read(&mm->mmap_lock);
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(dump_memory_map);
>>>
>>>
>>> Thanks,
>>> JM
Andreas Schwab July 1, 2024, 8:01 a.m. UTC | #21
On Jul 01 2024, Greg Ungerer wrote:

> I am not sure why JM's link has applications linked at 0x70000000.
> Is that a glibc thing?  My examples above are all based on uClibc.

No, it is defined by the linker (TEXT_START_ADDR).
diff mbox series

Patch

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index a1f46919934c..d01bd3276204 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1474,6 +1474,7 @@  static bool ubifs_dirty_folio(struct address_space 
*mapping,
          * An attempt to dirty a page without budgeting for it - should not
          * happen.
          */
+       VM_BUG_ON_FOLIO(ret != false, folio);
         ubifs_assert(c, ret == false);
         return ret;