Message ID | 20210816065417.3987596-1-chenhuacai@loongson.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | futex: Fix fault_in_user_writeable() | expand |
On Mon, 16 Aug 2021, Huacai Chen wrote: >fault_in_user_writeable() should verify R/W access but only verify W. In >most archs W implies R, but not true in MIPS and LoongArch, so fix it. Yuck for a find_vma() in futex.c. If this is a problem in MIPS, shouldn't the fix be there? Furthermore it's stated that fault_in_user_writeable(): "Fault in user address and verify RW access" And you guys seem to have proposed it already: https://lore.kernel.org/linux-mips/20200630005845.1239974-1-liulichao@loongson.cn/ Thanks, Davidlohr
On Mon, Aug 16 2021 at 11:27, Davidlohr Bueso wrote: > On Mon, 16 Aug 2021, Huacai Chen wrote: > >>fault_in_user_writeable() should verify R/W access but only verify W. In >>most archs W implies R, but not true in MIPS and LoongArch, so fix it. > > Yuck for a find_vma() in futex.c. If this is a problem in MIPS, shouldn't > the fix be there? Furthermore it's stated that fault_in_user_writeable(): > > "Fault in user address and verify RW access" That seems to be wishful thinking given the fact that some architectures do not imply R for FLAG_FAULT_WRITE. > And you guys seem to have proposed it already: > > https://lore.kernel.org/linux-mips/20200630005845.1239974-1-liulichao@loongson.cn/ That's surely one way to fix that. If that does not work for whatever reason, then we really don't want this find_vma() hack there, but rather something like: if (IS_ENABLED(CONFIG_ARCH_USER_FAULT_VOODOO) && get_user(&tmp, uaddr)) return -EFAULT; Thanks, tglx
Hi, Davidlohr and Thomas, On Tue, Aug 17, 2021 at 3:03 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, Aug 16 2021 at 11:27, Davidlohr Bueso wrote: > > On Mon, 16 Aug 2021, Huacai Chen wrote: > > > >>fault_in_user_writeable() should verify R/W access but only verify W. In > >>most archs W implies R, but not true in MIPS and LoongArch, so fix it. > > > > Yuck for a find_vma() in futex.c. If this is a problem in MIPS, shouldn't > > the fix be there? Furthermore it's stated that fault_in_user_writeable(): > > > > "Fault in user address and verify RW access" > > That seems to be wishful thinking given the fact that some architectures > do not imply R for FLAG_FAULT_WRITE. > > > And you guys seem to have proposed it already: > > > > https://lore.kernel.org/linux-mips/20200630005845.1239974-1-liulichao@loongson.cn/ This works, but I don't think this is a MIPS problem, so does Thomas Bogendoerfer. Because write-only page is valid in MIPS (so Thomas rejected this patch). > > That's surely one way to fix that. If that does not work for whatever > reason, then we really don't want this find_vma() hack there, but rather > something like: I don't know why find_vma() is unacceptable here, there is also find_vma() in fixup_user_fault(). > > if (IS_ENABLED(CONFIG_ARCH_USER_FAULT_VOODOO) && get_user(&tmp, uaddr)) > return -EFAULT; get_user() may be better than find_vma(), but can we drop CONFIG_ARCH_USER_FAULT_VOODOO here? On those "W implies R" archs, get_user() always success, this can simplify the logic. Huacai > > Thanks, > > tglx
On Tue, Aug 17 2021 at 09:53, Huacai Chen wrote: > On Tue, Aug 17, 2021 at 3:03 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> That's surely one way to fix that. If that does not work for whatever >> reason, then we really don't want this find_vma() hack there, but rather >> something like: > I don't know why find_vma() is unacceptable here, there is also > find_vma() in fixup_user_fault(). Wrong. find_extend_vma() != find_vma(). Aside of that fixup_user_fault() does way more than that. >> if (IS_ENABLED(CONFIG_ARCH_USER_FAULT_VOODOO) && get_user(&tmp, uaddr)) >> return -EFAULT; > > get_user() may be better than find_vma(), but can we drop > CONFIG_ARCH_USER_FAULT_VOODOO here? On those "W implies R" archs, > get_user() always success, this can simplify the logic. For architectures which imply R fixup_user_fault() is way more effinicient than taking the fault on get_user() and then invoking fixup_user_fault() to ensure that the mapping is writeable. No, we are not making stuff less efficient just because of MIPS. Thanks, tglx
Hi, Thomas, On Tue, Aug 17, 2021 at 3:07 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Aug 17 2021 at 09:53, Huacai Chen wrote: > > On Tue, Aug 17, 2021 at 3:03 AM Thomas Gleixner <tglx@linutronix.de> wrote: > >> That's surely one way to fix that. If that does not work for whatever > >> reason, then we really don't want this find_vma() hack there, but rather > >> something like: > > I don't know why find_vma() is unacceptable here, there is also > > find_vma() in fixup_user_fault(). > > Wrong. find_extend_vma() != find_vma(). Aside of that fixup_user_fault() > does way more than that. > > >> if (IS_ENABLED(CONFIG_ARCH_USER_FAULT_VOODOO) && get_user(&tmp, uaddr)) > >> return -EFAULT; > > > > get_user() may be better than find_vma(), but can we drop > > CONFIG_ARCH_USER_FAULT_VOODOO here? On those "W implies R" archs, > > get_user() always success, this can simplify the logic. > > For architectures which imply R fixup_user_fault() is way more > effinicient than taking the fault on get_user() and then invoking > fixup_user_fault() to ensure that the mapping is writeable. > > No, we are not making stuff less efficient just because of MIPS. > We use this program to test MIPS and X86: int main(int argc, char** argv) { int fd; void *ptr; int ret; int syscall_nr = 98; fd = open("/dev/zero", O_RDWR); if (fd == -1) exit(EXIT_FAILURE); ptr = mmap(NULL, 16384, PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, fd, 0); close(fd); /* * futex syscall nr: * x86_64: 202 * MIPS84: 5194 */ #ifdef __mips__ syscall_nr = 5194; #elif __x86_64__ syscall_nr = 202; #endif ret = syscall(syscall_nr,ptr,FUTEX_LOCK_PI,0, NULL, NULL, 0,0); printf("syscall %d ret = %d\n",syscall_nr,ret); return 0; } On X86, it returns 0; on MIPS64 without patch, it hangs in kernel; on MIPS64 with this patch, it returns -1. Then, I want to know, on "W implies R" archs (such as X86), should it return 0? Maybe return -1 is more reasonable? (because the VMA is marked as write-only). If this program should return -1, then I don't think this is a MIPS-specific problem. Huacai > Thanks, > > tglx
Huacai, On Tue, Aug 17 2021 at 15:38, Huacai Chen wrote: > On Tue, Aug 17, 2021 at 3:07 PM Thomas Gleixner <tglx@linutronix.de> wrote: > On X86, it returns 0; on MIPS64 without patch, it hangs in kernel; on > MIPS64 with this patch, it returns -1. As expected. > Then, I want to know, on "W implies R" archs (such as X86), should it > return 0? Maybe return -1 is more reasonable? (because the VMA is > marked as write-only). If this program should return -1, then I don't > think this is a MIPS-specific problem. No. mmap(.., PROT_WRITE...) is simply impossible on x86 and implies PROT_READ as documented in mmap(2). So why should this fail and only fail in the fault case, but succeed when the PTE is already established? Thanks, tglx
On Tue, Aug 17, 2021 at 09:53:14AM +0800, Huacai Chen wrote: > Hi, Davidlohr and Thomas, > > On Tue, Aug 17, 2021 at 3:03 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Mon, Aug 16 2021 at 11:27, Davidlohr Bueso wrote: > > > On Mon, 16 Aug 2021, Huacai Chen wrote: > > > > > >>fault_in_user_writeable() should verify R/W access but only verify W. In > > >>most archs W implies R, but not true in MIPS and LoongArch, so fix it. > > > > > > Yuck for a find_vma() in futex.c. If this is a problem in MIPS, shouldn't > > > the fix be there? Furthermore it's stated that fault_in_user_writeable(): > > > > > > "Fault in user address and verify RW access" > > > > That seems to be wishful thinking given the fact that some architectures > > do not imply R for FLAG_FAULT_WRITE. > > > > > And you guys seem to have proposed it already: > > > > > > https://lore.kernel.org/linux-mips/20200630005845.1239974-1-liulichao@loongson.cn/ > This works, but I don't think this is a MIPS problem, so does Thomas > Bogendoerfer. Because write-only page is valid in MIPS (so Thomas > rejected this patch). > > > > > That's surely one way to fix that. If that does not work for whatever > > reason, then we really don't want this find_vma() hack there, but rather > > something like: > I don't know why find_vma() is unacceptable here, there is also > find_vma() in fixup_user_fault(). > > > > > if (IS_ENABLED(CONFIG_ARCH_USER_FAULT_VOODOO) && get_user(&tmp, uaddr)) > > return -EFAULT; > get_user() may be better than find_vma(), but can we drop > CONFIG_ARCH_USER_FAULT_VOODOO here? On those "W implies R" archs, > get_user() always success, this can simplify the logic. AFAICT that whole W implies R thing goes much deeper, mm/gup.c:vma_permits_fault() has: vm_flags_t vm_flags = write ? VM_WRITE : VM_READ; So unless someone wants to go fix the core MM and eradicate all such assumptions, I'd suggest going with the 'easy' route and fix the arch. This patch is probably broken and will likely break lots of things... --- diff --git a/include/linux/mm.h b/include/linux/mm.h index 7ca22e6e694a..fc587dbb90b4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,6 +478,7 @@ enum fault_flag { FAULT_FLAG_REMOTE = 1 << 7, FAULT_FLAG_INSTRUCTION = 1 << 8, FAULT_FLAG_INTERRUPTIBLE = 1 << 9, + FAULT_FLAG_READ = 1 << 10, }; /* diff --git a/kernel/futex.c b/kernel/futex.c index fcc0570868b7..2c0970759919 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -735,7 +735,7 @@ static int fault_in_user_writeable(u32 __user *uaddr) mmap_read_lock(mm); ret = fixup_user_fault(mm, (unsigned long)uaddr, - FAULT_FLAG_WRITE, NULL); + FAULT_FLAG_READ|FAULT_FLAG_WRITE, NULL); mmap_read_unlock(mm); return ret < 0 ? ret : 0; diff --git a/mm/gup.c b/mm/gup.c index e805c1748bbf..37c8bfbe5196 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1222,11 +1222,17 @@ static long __get_user_pages(struct mm_struct *mm, static bool vma_permits_fault(struct vm_area_struct *vma, unsigned int fault_flags) { + bool read = !!(fault_flags & FAULT_FLAG_READ); bool write = !!(fault_flags & FAULT_FLAG_WRITE); bool foreign = !!(fault_flags & FAULT_FLAG_REMOTE); - vm_flags_t vm_flags = write ? VM_WRITE : VM_READ; + vm_flags_t vm_flags = 0; - if (!(vm_flags & vma->vm_flags)) + if (read) + vm_flags |= VM_READ; + if (write) + vm_flags |= VM_WRITE; + + if ((vma->vm_flags & vm_flags) != vm_flags) return false; /*
On Tue, Aug 17, 2021 at 11:05:15AM +0200, Thomas Gleixner wrote: > Huacai, > > On Tue, Aug 17 2021 at 15:38, Huacai Chen wrote: > > On Tue, Aug 17, 2021 at 3:07 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On X86, it returns 0; on MIPS64 without patch, it hangs in kernel; on > > MIPS64 with this patch, it returns -1. > > As expected. > > > Then, I want to know, on "W implies R" archs (such as X86), should it > > return 0? Maybe return -1 is more reasonable? (because the VMA is > > marked as write-only). If this program should return -1, then I don't > > think this is a MIPS-specific problem. > > No. mmap(.., PROT_WRITE...) is simply impossible on x86 and implies > PROT_READ as documented in mmap(2). > > So why should this fail and only fail in the fault case, but succeed > when the PTE is already established? I wouldn't actually mind if it failed on fault -- it's the 'best' we can do on x86. Doing a RmW op on PROT_WRITE is silly and deserves all the wreckage it can get.
Hi, all, On Tue, Aug 17, 2021 at 5:45 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Aug 17, 2021 at 11:05:15AM +0200, Thomas Gleixner wrote: > > Huacai, > > > > On Tue, Aug 17 2021 at 15:38, Huacai Chen wrote: > > > On Tue, Aug 17, 2021 at 3:07 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > On X86, it returns 0; on MIPS64 without patch, it hangs in kernel; on > > > MIPS64 with this patch, it returns -1. > > > > As expected. > > > > > Then, I want to know, on "W implies R" archs (such as X86), should it > > > return 0? Maybe return -1 is more reasonable? (because the VMA is > > > marked as write-only). If this program should return -1, then I don't > > > think this is a MIPS-specific problem. > > > > No. mmap(.., PROT_WRITE...) is simply impossible on x86 and implies > > PROT_READ as documented in mmap(2). > > > > So why should this fail and only fail in the fault case, but succeed > > when the PTE is already established? > > I wouldn't actually mind if it failed on fault -- it's the 'best' we can > do on x86. Doing a RmW op on PROT_WRITE is silly and deserves all the > wreckage it can get. If we must fix it in arch code, there are two methods: 1, don't use write-only map (modify protection_map as Liu Lichao did); 2, override arch_vma_access_permitted() to do extra checks. Thomas, which is better? Huacai
diff --git a/kernel/futex.c b/kernel/futex.c index 2ecb07575055..c3b68be31bf3 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -672,10 +672,15 @@ static int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key, */ static int fault_in_user_writeable(u32 __user *uaddr) { - struct mm_struct *mm = current->mm; int ret; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; mmap_read_lock(mm); + vma = find_vma(mm, (unsigned long)uaddr); + if (!(vma->vm_flags & VM_READ)) + ret = -EFAULT; + ret = fixup_user_fault(mm, (unsigned long)uaddr, FAULT_FLAG_WRITE, NULL); mmap_read_unlock(mm);