Message ID | 534D6A1F.70102@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
From: David Long <dave.long@linaro.org> Date: Tue, 15 Apr 2014 13:19:27 -0400 > On 04/15/14 11:46, Oleg Nesterov wrote: >> >> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch >> did). This is certainly not what uprobe_write_opcode() wants, we do not want >> or need "flush" in this case. The same for __create_xol_area(). >> > > It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot() > would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly > be more philosophically correct for all architectures): It occurs to me that because of the specific environment in which this executes we can make the interface take advantage of the invariants at this call site: 1) We are copying into userspace and thus current->mm 2) We are only storing an instruction or two So it would be just like the dynamic linker lazy resolving a PLT slot in userland. Furthermore, we can do the stores using something akin to put_user(), directly store them into userspace. This avoids completely any D-cache aliasing issues. The kernel stores to the same address, and therefore the same cache lines, as userspace would. So for example, since even on the most braindamaged sparc64 cpus the remote cpus will have their I-cache snoop the store we do locally we just need to do a simple flush on the local cpu where this store is happening. So I'd like to do this something like: stw %r1, [%r2 + 0x0] flush %r2 PowerPC could probably do something similar. The copy_to_user_page() interface has to deal with storing into a non-current 'mm' and therefore via the kernel side copy of the page and that's why the D-cache aliasing issues are so painful to deal with. Just a thought...
On 04/15, David Long wrote: > > On 04/15/14 11:46, Oleg Nesterov wrote: > > > > But. Please do not add copy_to_user_page() into copy_to_page() (as your patch > > did). This is certainly not what uprobe_write_opcode() wants, we do not want > > or need "flush" in this case. The same for __create_xol_area(). > > > > It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot() > would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly > be more philosophically correct for all architectures): David, let me repeat. I am not going to argue even if I obviously like the Linus's suggestion more. I only argued with the suggestions to follow the __access_remote_vm() logic. > @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) > return 0; > > /* Initialize the slot */ > - copy_to_page(area->page, xol_vaddr, > - &uprobe->arch.ixol, sizeof(uprobe->arch.ixol)); > - /* > - * We probably need flush_icache_user_range() but it needs vma. > - * This should work on supported architectures too. > - */ > - flush_dcache_page(area->page); > + kaddr = kmap_atomic(area->page); > + __copy_to_user_page(current->mm, area->page, xol_vaddr, > + kaddr + (xol_vaddr & ~PAGE_MASK), > + &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true); > + kunmap_atomic(kaddr); This all is fine, but you need to introduce __copy_to_user_page() first. And you need to do this for every arch. And you need to verify that it does not need mmap_sem. It seems that at least alpha needs this lock. And as Russel pointed out (again, again, I could easily misunderstood him) you need preempt_disable() around memcpy + flush, so down_read(mmap_sem) should probably go into __copy_to_user_page(). And I am not sure this helper needs "struct mm_struct *mm" argument, but this is minor. Finally, let me repeat, you should verify that this __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr is not mmapped, or its mapping do not match area->page. Good luck ;) Oleg.
From: Oleg Nesterov <oleg@redhat.com> Date: Tue, 15 Apr 2014 19:43:30 +0200 > Finally, let me repeat, you should verify that this > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr > is not mmapped, or its mapping do not match area->page. Just directly access userspace with the usual exception mechanism we use for copy_to_user(), put_user(), et al. and if it faults you'll get -EFAULT and handle it. This also avoids the D-cache aliasing issues entirely as I explained in my other reply.
On 04/15, David Miller wrote: > > From: David Long <dave.long@linaro.org> > Date: Tue, 15 Apr 2014 13:19:27 -0400 > > > On 04/15/14 11:46, Oleg Nesterov wrote: > >> > >> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch > >> did). This is certainly not what uprobe_write_opcode() wants, we do not want > >> or need "flush" in this case. The same for __create_xol_area(). > >> > > > > It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot() > > would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly > > be more philosophically correct for all architectures): > > It occurs to me that because of the specific environment in which this > executes we can make the interface take advantage of the invariants at > this call site: > > 1) We are copying into userspace and thus current->mm > > 2) We are only storing an instruction or two Not sure this can help, but also 3) nobody else will even read/execute this peace of memory > Furthermore, we can do the stores using something akin to put_user(), > directly store them into userspace. > > This avoids completely any D-cache aliasing issues. The kernel stores > to the same address, and therefore the same cache lines, as userspace > would. Heh. Of course I thought about this. This can not work afaics. We do not want to write to, say, page cache if the probed application mmaps a file at the same vaddr. Oleg.
From: Oleg Nesterov <oleg@redhat.com> Date: Tue, 15 Apr 2014 19:49:00 +0200 > We do not want to write to, say, page cache if the probed application > mmaps a file at the same vaddr. If user has write access to that page... not our problem. And if he doesn't we'll fault.
On 04/15, David Miller wrote: > > From: Oleg Nesterov <oleg@redhat.com> > Date: Tue, 15 Apr 2014 19:43:30 +0200 > > > Finally, let me repeat, you should verify that this > > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr > > is not mmapped, or its mapping do not match area->page. > > Just directly access userspace with the usual exception mechanism we > use for copy_to_user(), put_user(), et al. and if it faults you'll get > -EFAULT and handle it. > > This also avoids the D-cache aliasing issues entirely as I explained > in my other reply. Yes, yes, this is obvious. But I have no idea what else we should do to take care of icache. Oleg.
On 04/15, David Miller wrote: > > From: Oleg Nesterov <oleg@redhat.com> > Date: Tue, 15 Apr 2014 19:49:00 +0200 > > > We do not want to write to, say, page cache if the probed application > > mmaps a file at the same vaddr. > > If user has write access to that page... not our problem. Well, I am not sure. Yes, this won't allow you to write the exploit. But if this actually happens because an application is buggy, this can lead to really hard-to-debug problems. Oleg.
From: Oleg Nesterov <oleg@redhat.com> Date: Tue, 15 Apr 2014 20:07:00 +0200 > On 04/15, David Miller wrote: >> >> From: Oleg Nesterov <oleg@redhat.com> >> Date: Tue, 15 Apr 2014 19:49:00 +0200 >> >> > We do not want to write to, say, page cache if the probed application >> > mmaps a file at the same vaddr. >> >> If user has write access to that page... not our problem. > > Well, I am not sure. > > Yes, this won't allow you to write the exploit. But if this actually > happens because an application is buggy, this can lead to really > hard-to-debug problems. And this is different from a stray userland scribble onto such a page how? This is all so much simpler if you don't insist on doing all of this hand-holding for the user, that's not the kernel's job.
From: Oleg Nesterov <oleg@redhat.com> Date: Tue, 15 Apr 2014 20:03:11 +0200 > On 04/15, David Miller wrote: >> >> From: Oleg Nesterov <oleg@redhat.com> >> Date: Tue, 15 Apr 2014 19:43:30 +0200 >> >> > Finally, let me repeat, you should verify that this >> > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr >> > is not mmapped, or its mapping do not match area->page. >> >> Just directly access userspace with the usual exception mechanism we >> use for copy_to_user(), put_user(), et al. and if it faults you'll get >> -EFAULT and handle it. >> >> This also avoids the D-cache aliasing issues entirely as I explained >> in my other reply. > > Yes, yes, this is obvious. > > But I have no idea what else we should do to take care of icache. Russell, if for example userland stores into the instruction stream to patch an instruction, what needs to be done on ARM?
On 04/15, David Miller wrote: > > From: Oleg Nesterov <oleg@redhat.com> > Date: Tue, 15 Apr 2014 20:07:00 +0200 > > > On 04/15, David Miller wrote: > >> > >> From: Oleg Nesterov <oleg@redhat.com> > >> Date: Tue, 15 Apr 2014 19:49:00 +0200 > >> > >> > We do not want to write to, say, page cache if the probed application > >> > mmaps a file at the same vaddr. > >> > >> If user has write access to that page... not our problem. > > > > Well, I am not sure. > > > > Yes, this won't allow you to write the exploit. But if this actually > > happens because an application is buggy, this can lead to really > > hard-to-debug problems. > > And this is different from a stray userland scribble onto such a page > how? Sure. But I think that the kernel should not write to the file just because a buggy application did the extra munmap() or passed a wrong addres to mmap/munmap. > This is all so much simpler if you don't insist on doing all of this > hand-holding for the user, that's not the kernel's job. Well, I do not agree, but perhaps because I do not really understand what do you mean. Oleg.
On Tue, Apr 15, 2014 at 02:30:54PM -0400, David Miller wrote: > Russell, if for example userland stores into the instruction stream > to patch an instruction, what needs to be done on ARM? Hi David, I'm afraid I won't be too responsive this week, but I'll try to provide input where I can. So I'll try to cover all points in the previous discussion in this reply. I think your suggestion that we should be writing directly to userspace from uprobes is definitely the correct way forward, as that nicely takes any d-cache aliasing issues completely out of the picture. However, unlike your "most braindead sparc" CPU, the i-cache doesn't snoop d-cache stores at all. However, this is something that we already deal with since self-modifying code has to work, so (from userspace) we have a syscall that is used to sort that out. Internally in the kernel, this translates to: ret = flush_cache_user_range(start, end) This deals with whatever the CPU requires to be able to correctly execute code which has been previously written in the range - and only actions on the currently mapped userspace. I hope this helps.
From: David Miller <davem@davemloft.net> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT) > Russell, if for example userland stores into the instruction stream > to patch an instruction, what needs to be done on ARM? Looking around I suspect something like: mcrne p15, 0, INSN_ADDR, c7, c5, 1 after the instruction stores will do it.
From: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Tue, 15 Apr 2014 19:47:26 +0100 > However, unlike your "most braindead sparc" CPU, the i-cache doesn't > snoop d-cache stores at all. All that matters is remote snooping on SMP, not local snooping. > However, this is something that we already deal with since > self-modifying code has to work, so (from userspace) we have a > syscall that is used to sort that out. Internally in the kernel, > this translates to: > > ret = flush_cache_user_range(start, end) > > This deals with whatever the CPU requires to be able to correctly execute > code which has been previously written in the range - and only actions on > the currently mapped userspace. Looking around it seems the I-cache line mcr operation should do the right thing for most chips. You could simply make a new cpuc op for writing an instruction or two to userspace and doing the I-cache line mcr op afterwards.
On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT) > > > Russell, if for example userland stores into the instruction stream > > to patch an instruction, what needs to be done on ARM? > > Looking around I suspect something like: > > mcrne p15, 0, INSN_ADDR, c7, c5, 1 > > after the instruction stores will do it. It does still need to be pushed out of the D-cache first though. So, for ARMv7 for example: str NEW_INSN, [INSN_ADDR] @ store new instruction mcr p15, 0, INSN_ADDR, c7, c11, 1 @ clean d line mcr p15, 0, INSN_ADDR, c7, c5, 1 @ flush i line would do it. We of course need the user access marking on that (so that any fault doesn't oops the kernel) - not only for the store, but also the following two instructions which could fault (and oops unless they're marked with a fixup) if someone were to munmap() this page in another thread. All those fixups can just do the "lets return -EFAULT" from the operation.
On 04/15/14 13:43, Oleg Nesterov wrote: > On 04/15, David Long wrote: >> >> On 04/15/14 11:46, Oleg Nesterov wrote: >>> >>> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch >>> did). This is certainly not what uprobe_write_opcode() wants, we do not want >>> or need "flush" in this case. The same for __create_xol_area(). >>> >> >> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot() >> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly >> be more philosophically correct for all architectures): > > David, let me repeat. I am not going to argue even if I obviously like > the Linus's suggestion more. > > I only argued with the suggestions to follow the __access_remote_vm() > logic. > >> @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) >> return 0; >> >> /* Initialize the slot */ >> - copy_to_page(area->page, xol_vaddr, >> - &uprobe->arch.ixol, sizeof(uprobe->arch.ixol)); >> - /* >> - * We probably need flush_icache_user_range() but it needs vma. >> - * This should work on supported architectures too. >> - */ >> - flush_dcache_page(area->page); >> + kaddr = kmap_atomic(area->page); >> + __copy_to_user_page(current->mm, area->page, xol_vaddr, >> + kaddr + (xol_vaddr & ~PAGE_MASK), >> + &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true); >> + kunmap_atomic(kaddr); > > This all is fine, but you need to introduce __copy_to_user_page() first. > Yes, I have that coded for 14 out of 24 architectures (the easy ones). The remaining ones need more work. Some may prove problematic. The whole approach could yet prove impractical. More recent suggested approaches may be better too (or Linus's relatively simple change). > And you need to do this for every arch. And you need to verify that it > does not need mmap_sem. It seems that at least alpha needs this lock. > And as Russel pointed out (again, again, I could easily misunderstood him) > you need preempt_disable() around memcpy + flush, so down_read(mmap_sem) > should probably go into __copy_to_user_page(). And I am not sure this > helper needs "struct mm_struct *mm" argument, but this is minor. > I will look at the locking. I'm not sure yet if anything needs struct mm_struct either. > Finally, let me repeat, you should verify that this > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr > is not mmapped, or its mapping do not match area->page. > > Good luck ;) > > Oleg. >
From: Russell King - ARM Linux <linux@arm.linux.org.uk> Date: Tue, 15 Apr 2014 20:29:15 +0100 > On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote: >> From: David Miller <davem@davemloft.net> >> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT) >> >> > Russell, if for example userland stores into the instruction stream >> > to patch an instruction, what needs to be done on ARM? >> >> Looking around I suspect something like: >> >> mcrne p15, 0, INSN_ADDR, c7, c5, 1 >> >> after the instruction stores will do it. > > It does still need to be pushed out of the D-cache first though. So, > for ARMv7 for example: > > str NEW_INSN, [INSN_ADDR] @ store new instruction > mcr p15, 0, INSN_ADDR, c7, c11, 1 @ clean d line > mcr p15, 0, INSN_ADDR, c7, c5, 1 @ flush i line > > would do it. We of course need the user access marking on that (so that > any fault doesn't oops the kernel) - not only for the store, but also the > following two instructions which could fault (and oops unless they're > marked with a fixup) if someone were to munmap() this page in another > thread. All those fixups can just do the "lets return -EFAULT" from > the operation. Right, you'd need the exception table business used in put_user() et al.
From: David Long <dave.long@linaro.org> Date: Tue, 15 Apr 2014 15:39:18 -0400 > Yes, I have that coded for 14 out of 24 architectures (the easy > ones). The remaining ones need more work. Some may prove problematic. > The whole approach could yet prove impractical. More recent suggested > approaches may be better too (or Linus's relatively simple change). BTW, another reason perhaps to write directly to userspace and have a uprobes_*() specific interface is that you'll only need to do 3 architectures, the ones that support uprobes. That way, this is just part of what every arch needs to implement in order to support uprobes.
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 04709b6..b418626 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1287,6 +1287,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) { struct xol_area *area; unsigned long xol_vaddr; + void *kaddr; area = get_xol_area(); if (!area) @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe) return 0; /* Initialize the slot */ - copy_to_page(area->page, xol_vaddr, - &uprobe->arch.ixol, sizeof(uprobe->arch.ixol)); - /* - * We probably need flush_icache_user_range() but it needs vma. - * This should work on supported architectures too. - */ - flush_dcache_page(area->page); + kaddr = kmap_atomic(area->page); + __copy_to_user_page(current->mm, area->page, xol_vaddr, + kaddr + (xol_vaddr & ~PAGE_MASK), + &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true); + kunmap_atomic(kaddr); return xol_vaddr; }