Message ID | 20211111191800.21281-1-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] proc/vmcore: fix clearing user buffer by properly using clear_user() | expand |
On 11/11/21 at 08:18pm, David Hildenbrand wrote: > To clear a user buffer we cannot simply use memset, we have to use > clear_user(). Using a kernel config based on rawhide Fedora and a > virtio-mem device that registers a vmcore_cb, I can easily trigger: > > [ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service... > [ 11.339697] kdump[420]: Kdump is using the default log level(3). > [ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ > [ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ > [ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete > [ 11.386722] kdump[467]: saving vmcore > [ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000 > [ 16.531705] #PF: supervisor write access in kernel mode > [ 16.532037] #PF: error_code(0x0003) - permissions violation > [ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 > [ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI > [ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 > [ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 > [ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 > [ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 > [ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 > [ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 > [ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 > [ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 > [ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 > [ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 > [ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 > [ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 > [ 16.539510] Call Trace: > [ 16.539679] <TASK> > [ 16.539828] read_vmcore+0x236/0x2c0 > [ 16.540063] ? enqueue_hrtimer+0x2f/0x80 > [ 16.540323] ? inode_security+0x22/0x60 > [ 16.540572] proc_reg_read+0x55/0xa0 > [ 16.540807] vfs_read+0x95/0x190 > [ 16.541022] ksys_read+0x4f/0xc0 > [ 16.541238] do_syscall_64+0x3b/0x90 > [ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae > > To fix, properly use clear_user() when required. Looks a great fix to me, thanks for fixing this. Check the code, clear_user invokes access_ok to do check, then call memset(). It's unclear to me how the bug is triggered, could you please tell more so that I can learn? > > Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Philipp Rudo <prudo@redhat.com> > Cc: kexec@lists.infradead.org > Cc: linux-mm@kvack.org > Cc: linux-fsdevel@vger.kernel.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > fs/proc/vmcore.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > index 30a3b66f475a..509f85148fee 100644 > --- a/fs/proc/vmcore.c > +++ b/fs/proc/vmcore.c > @@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count, > nr_bytes = count; > > /* If pfn is not ram, return zeros for sparse dump files */ > - if (!pfn_is_ram(pfn)) > - memset(buf, 0, nr_bytes); > - else { > + if (!pfn_is_ram(pfn)) { > + tmp = 0; > + if (!userbuf) > + memset(buf, 0, nr_bytes); > + else if (clear_user(buf, nr_bytes)) > + tmp = -EFAULT; > + } else { > if (encrypted) > tmp = copy_oldmem_page_encrypted(pfn, buf, > nr_bytes, > @@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count, > else > tmp = copy_oldmem_page(pfn, buf, nr_bytes, > offset, userbuf); > - > - if (tmp < 0) { > - up_read(&vmcore_cb_rwsem); > - return tmp; > - } > } > + if (tmp < 0) { > + up_read(&vmcore_cb_rwsem); > + return tmp; > + } > + > *ppos += nr_bytes; > count -= nr_bytes; > buf += nr_bytes; > > base-commit: debe436e77c72fcee804fb867f275e6d31aa999c > -- > 2.31.1 >
On 12.11.21 08:01, Baoquan He wrote: > On 11/11/21 at 08:18pm, David Hildenbrand wrote: >> To clear a user buffer we cannot simply use memset, we have to use >> clear_user(). Using a kernel config based on rawhide Fedora and a >> virtio-mem device that registers a vmcore_cb, I can easily trigger: >> >> [ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service... >> [ 11.339697] kdump[420]: Kdump is using the default log level(3). >> [ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ >> [ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ >> [ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete >> [ 11.386722] kdump[467]: saving vmcore >> [ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000 >> [ 16.531705] #PF: supervisor write access in kernel mode >> [ 16.532037] #PF: error_code(0x0003) - permissions violation >> [ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 >> [ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI >> [ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 >> [ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 >> [ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 >> [ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 >> [ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 >> [ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 >> [ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 >> [ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 >> [ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 >> [ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 >> [ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 >> [ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 >> [ 16.539510] Call Trace: >> [ 16.539679] <TASK> >> [ 16.539828] read_vmcore+0x236/0x2c0 >> [ 16.540063] ? enqueue_hrtimer+0x2f/0x80 >> [ 16.540323] ? inode_security+0x22/0x60 >> [ 16.540572] proc_reg_read+0x55/0xa0 >> [ 16.540807] vfs_read+0x95/0x190 >> [ 16.541022] ksys_read+0x4f/0xc0 >> [ 16.541238] do_syscall_64+0x3b/0x90 >> [ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> To fix, properly use clear_user() when required. > > Looks a great fix to me, thanks for fixing this. > > Check the code, clear_user invokes access_ok to do check, then call > memset(). It's unclear to me how the bug is triggered, could you > please tell more so that I can learn? > TBH, I was testing virtio-mem+vmcore before without running into this issue, but after I retested with upstream in a different setup (different kernel config but eventually also different CPU features), I ran into this. Note that you were looking at the generic __clear_user() implementation, the x86-64 variant is different, see arch/x86/lib/usercopy_64.c I can spot that it triggers stac()/clac() (X86_SMAP): https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention "that allows supervisor mode programs to optionally set user-space memory mappings so that access to those mappings from supervisor mode will cause a trap. This makes it harder for malicious programs to "trick" the kernel into using instructions or data from a user-space program" Yes, that's most probably it :)
On 11/12/21 at 09:16am, David Hildenbrand wrote: > On 12.11.21 08:01, Baoquan He wrote: > > On 11/11/21 at 08:18pm, David Hildenbrand wrote: > >> To clear a user buffer we cannot simply use memset, we have to use > >> clear_user(). Using a kernel config based on rawhide Fedora and a > >> virtio-mem device that registers a vmcore_cb, I can easily trigger: > >> > >> [ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service... > >> [ 11.339697] kdump[420]: Kdump is using the default log level(3). > >> [ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ > >> [ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ > >> [ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete > >> [ 11.386722] kdump[467]: saving vmcore > >> [ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000 > >> [ 16.531705] #PF: supervisor write access in kernel mode > >> [ 16.532037] #PF: error_code(0x0003) - permissions violation > >> [ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 > >> [ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI > >> [ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 > >> [ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 > >> [ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 > >> [ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 > >> [ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 > >> [ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 > >> [ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 > >> [ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 > >> [ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 > >> [ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 > >> [ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 > >> [ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 > >> [ 16.539510] Call Trace: > >> [ 16.539679] <TASK> > >> [ 16.539828] read_vmcore+0x236/0x2c0 > >> [ 16.540063] ? enqueue_hrtimer+0x2f/0x80 > >> [ 16.540323] ? inode_security+0x22/0x60 > >> [ 16.540572] proc_reg_read+0x55/0xa0 > >> [ 16.540807] vfs_read+0x95/0x190 > >> [ 16.541022] ksys_read+0x4f/0xc0 > >> [ 16.541238] do_syscall_64+0x3b/0x90 > >> [ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> To fix, properly use clear_user() when required. > > > > Looks a great fix to me, thanks for fixing this. > > > > Check the code, clear_user invokes access_ok to do check, then call > > memset(). It's unclear to me how the bug is triggered, could you > > please tell more so that I can learn? > > > TBH, I was testing virtio-mem+vmcore before without running into this > issue, but after I retested with upstream in a different setup > (different kernel config but eventually also different CPU features), I > ran into this. > > > Note that you were looking at the generic __clear_user() implementation, > the x86-64 variant is different, see arch/x86/lib/usercopy_64.c > > I can spot that it triggers stac()/clac() (X86_SMAP): > https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention > > "that allows supervisor mode programs to optionally set user-space > memory mappings so that access to those mappings from supervisor mode > will cause a trap. This makes it harder for malicious programs to > "trick" the kernel into using instructions or data from a user-space > program" OK, probably. I thought it's triggered in access_ok(), and tried to figure out why. But seems we should do something to check this in access_ok(), otherwise the logic of clear_user/_clear_user is not so reasonable. Anyway, I have learned it, thanks a lot for digging it out. By the way, I can't open above wiki article, found below commit from hpa. Maybe we can add some into log to tell this, not strong opinin, leave it to you. For this patch, looks good to me. Acked-by: Baoquan He <bhe@redhat.com> ~~~~~~~~~ commit 63bcff2a307b9bcc712a8251eb27df8b2e117967 Author: H. Peter Anvin <hpa@linux.intel.com> Date: Fri Sep 21 12:43:12 2012 -0700 x86, smap: Add STAC and CLAC instructions to control user space access When Supervisor Mode Access Prevention (SMAP) is enabled, access to userspace from the kernel is controlled by the AC flag. To make the performance of manipulating that flag acceptable, there are two new instructions, STAC and CLAC, to set and clear it. This patch adds those instructions, via alternative(), when the SMAP feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the SYSCALL entry mask; there is simply no reason to make that one conditional. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
> > "that allows supervisor mode programs to optionally set user-space > > memory mappings so that access to those mappings from supervisor mode > > will cause a trap. This makes it harder for malicious programs to > > "trick" the kernel into using instructions or data from a user-space > > program" > > OK, probably. I thought it's triggered in access_ok(), and tried to > figure out why. But seems we should do something to check this in > access_ok(), otherwise the logic of clear_user/_clear_user is not so > reasonable. Anyway, I have learned it, thanks a lot for digging it out. > > By the way, I can't open above wiki article, found below commit from > hpa. Maybe we can add some into log to tell this, not strong opinin, > leave it to you. Yes, now that we know the root cause I'll add some more details to the patch description and resend -- thanks Baoquan!
On 11/12/21 at 10:08am, David Hildenbrand wrote: > > > "that allows supervisor mode programs to optionally set user-space > > > memory mappings so that access to those mappings from supervisor mode > > > will cause a trap. This makes it harder for malicious programs to > > > "trick" the kernel into using instructions or data from a user-space > > > program" > > > > OK, probably. I thought it's triggered in access_ok(), and tried to > > figure out why. But seems we should do something to check this in > > access_ok(), otherwise the logic of clear_user/_clear_user is not so > > reasonable. Anyway, I have learned it, thanks a lot for digging it out. > > > > By the way, I can't open above wiki article, found below commit from > > hpa. Maybe we can add some into log to tell this, not strong opinin, > > leave it to you. > > Yes, now that we know the root cause I'll add some more details to the > patch description and resend -- thanks Baoquan! Thanks for sending v2.
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 30a3b66f475a..509f85148fee 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size_t count, nr_bytes = count; /* If pfn is not ram, return zeros for sparse dump files */ - if (!pfn_is_ram(pfn)) - memset(buf, 0, nr_bytes); - else { + if (!pfn_is_ram(pfn)) { + tmp = 0; + if (!userbuf) + memset(buf, 0, nr_bytes); + else if (clear_user(buf, nr_bytes)) + tmp = -EFAULT; + } else { if (encrypted) tmp = copy_oldmem_page_encrypted(pfn, buf, nr_bytes, @@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size_t count, else tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf); - - if (tmp < 0) { - up_read(&vmcore_cb_rwsem); - return tmp; - } } + if (tmp < 0) { + up_read(&vmcore_cb_rwsem); + return tmp; + } + *ppos += nr_bytes; count -= nr_bytes; buf += nr_bytes;
To clear a user buffer we cannot simply use memset, we have to use clear_user(). Using a kernel config based on rawhide Fedora and a virtio-mem device that registers a vmcore_cb, I can easily trigger: [ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service... [ 11.339697] kdump[420]: Kdump is using the default log level(3). [ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ [ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ [ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete [ 11.386722] kdump[467]: saving vmcore [ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000 [ 16.531705] #PF: supervisor write access in kernel mode [ 16.532037] #PF: error_code(0x0003) - permissions violation [ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 [ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI [ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 [ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 [ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 [ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 [ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 [ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 [ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 [ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 [ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 [ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 [ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 [ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 [ 16.539510] Call Trace: [ 16.539679] <TASK> [ 16.539828] read_vmcore+0x236/0x2c0 [ 16.540063] ? enqueue_hrtimer+0x2f/0x80 [ 16.540323] ? inode_security+0x22/0x60 [ 16.540572] proc_reg_read+0x55/0xa0 [ 16.540807] vfs_read+0x95/0x190 [ 16.541022] ksys_read+0x4f/0xc0 [ 16.541238] do_syscall_64+0x3b/0x90 [ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae To fix, properly use clear_user() when required. Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Philipp Rudo <prudo@redhat.com> Cc: kexec@lists.infradead.org Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> --- fs/proc/vmcore.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) base-commit: debe436e77c72fcee804fb867f275e6d31aa999c