Message ID | 20201007073932.865218-1-jannh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length | expand |
On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c > index 078608ec2e92..b1fabb97d138 100644 > --- a/arch/powerpc/kernel/syscalls.c > +++ b/arch/powerpc/kernel/syscalls.c > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, > { > long ret = -EINVAL; > > - if (!arch_validate_prot(prot, addr)) > + if (!arch_validate_prot(prot, addr, len)) This call isn't under mmap lock. I also find it rather weird as the generic code only calls arch_validate_prot from mprotect, only powerpc also calls it from mmap. This seems to go back to commit ef3d3246a0d0 ("powerpc/mm: Add Strong Access Ordering support")
On Wed, Oct 7, 2020 at 2:35 PM Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: > > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c > > index 078608ec2e92..b1fabb97d138 100644 > > --- a/arch/powerpc/kernel/syscalls.c > > +++ b/arch/powerpc/kernel/syscalls.c > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, > > { > > long ret = -EINVAL; > > > > - if (!arch_validate_prot(prot, addr)) > > + if (!arch_validate_prot(prot, addr, len)) > > This call isn't under mmap lock. I also find it rather weird as the > generic code only calls arch_validate_prot from mprotect, only powerpc > also calls it from mmap. > > This seems to go back to commit ef3d3246a0d0 > ("powerpc/mm: Add Strong Access Ordering support") I'm _guessing_ the idea in the generic case might be that mmap() doesn't check unknown bits in the protection flags, and therefore maybe people wanted to avoid adding new error cases that could be caused by random high bits being set? So while the mprotect() case checks the flags and refuses unknown values, the mmap() code just lets the architecture figure out which bits are actually valid to set (via arch_calc_vm_prot_bits()) and silently ignores the rest? And powerpc apparently decided that they do want to error out on bogus prot values passed to their version of mmap(), and in exchange, assume in arch_calc_vm_prot_bits() that the protection bits are valid? powerpc's arch_validate_prot() doesn't actually need the mmap lock, so I think this is fine-ish for now (as in, while the code is a bit unclean, I don't think I'm making it worse, and I don't think it's actually buggy). In theory, we could move the arch_validate_prot() call over into the mmap guts, where we're holding the lock, and gate it on the architecture or on some feature CONFIG that powerpc can activate in its Kconfig. But I'm not sure whether that'd be helping or making things worse, so when I sent this patch, I deliberately left the powerpc stuff as-is.
On 10/7/20 1:39 AM, Jann Horn wrote: > arch_validate_prot() is a hook that can validate whether a given set of > protection flags is valid in an mprotect() operation. It is given the set > of protection flags and the address being modified. > > However, the address being modified can currently not actually be used in > a meaningful way because: > > 1. Only the address is given, but not the length, and the operation can > span multiple VMAs. Therefore, the callee can't actually tell which > virtual address range, or which VMAs, are being targeted. > 2. The mmap_lock is not held, meaning that if the callee were to check > the VMA at @addr, that VMA would be unrelated to the one the > operation is performed on. > > Currently, custom arch_validate_prot() handlers are defined by > arm64, powerpc and sparc. > arm64 and powerpc don't care about the address range, they just check the > flags against CPU support masks. > sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > the mmap_lock. > > Change the function signature to also take a length, and move the > arch_validate_prot() call in mm/mprotect.c down into the locked region. > > Cc: stable@vger.kernel.org > Fixes: 9035cf9a97e4 ("mm: Add address parameter to arch_validate_prot()") > Suggested-by: Khalid Aziz <khalid.aziz@oracle.com> > Suggested-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Jann Horn <jannh@google.com> > --- > arch/arm64/include/asm/mman.h | 4 ++-- > arch/powerpc/include/asm/mman.h | 3 ++- > arch/powerpc/kernel/syscalls.c | 2 +- > arch/sparc/include/asm/mman.h | 6 ++++-- > include/linux/mman.h | 3 ++- > mm/mprotect.c | 6 ++++-- > 6 files changed, 15 insertions(+), 9 deletions(-) This looks good to me. As Chris pointed out, the call to arch_validate_prot() from do_mmap2() is made without holding mmap_lock. Lock is not acquired until vm_mmap_pgoff(). This variance is uncomfortable but I am more uncomfortable forcing all implementations of validate_prot to require mmap_lock be held when non-sparc implementations do not have such need yet. Since do_mmap2() is in powerpc specific code, for now this patch solves a current problem. That leaves open the question of should generic mmap call arch_validate_prot and return EINVAL for invalid combination of protection bits, but that is better addressed in a separate patch. Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> > > diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h > index 081ec8de9ea6..0876a87986dc 100644 > --- a/arch/arm64/include/asm/mman.h > +++ b/arch/arm64/include/asm/mman.h > @@ -23,7 +23,7 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) > #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) > > static inline bool arch_validate_prot(unsigned long prot, > - unsigned long addr __always_unused) > + unsigned long addr __always_unused, unsigned long len __always_unused) > { > unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; > > @@ -32,6 +32,6 @@ static inline bool arch_validate_prot(unsigned long prot, > > return (prot & ~supported) == 0; > } > -#define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr) > +#define arch_validate_prot(prot, addr, len) arch_validate_prot(prot, addr, len) > > #endif /* ! __ASM_MMAN_H__ */ > diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h > index 7cb6d18f5cd6..65dd9b594985 100644 > --- a/arch/powerpc/include/asm/mman.h > +++ b/arch/powerpc/include/asm/mman.h > @@ -36,7 +36,8 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) > } > #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) > > -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) > +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr, > + unsigned long len) > { > if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO)) > return false; > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c > index 078608ec2e92..b1fabb97d138 100644 > --- a/arch/powerpc/kernel/syscalls.c > +++ b/arch/powerpc/kernel/syscalls.c > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, > { > long ret = -EINVAL; > > - if (!arch_validate_prot(prot, addr)) > + if (!arch_validate_prot(prot, addr, len)) > goto out; > > if (shift) { > diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h > index f94532f25db1..e85222c76585 100644 > --- a/arch/sparc/include/asm/mman.h > +++ b/arch/sparc/include/asm/mman.h > @@ -52,9 +52,11 @@ static inline pgprot_t sparc_vm_get_page_prot(unsigned long vm_flags) > return (vm_flags & VM_SPARC_ADI) ? __pgprot(_PAGE_MCD_4V) : __pgprot(0); > } > > -#define arch_validate_prot(prot, addr) sparc_validate_prot(prot, addr) > -static inline int sparc_validate_prot(unsigned long prot, unsigned long addr) > +#define arch_validate_prot(prot, addr, len) sparc_validate_prot(prot, addr, len) > +static inline int sparc_validate_prot(unsigned long prot, unsigned long addr, > + unsigned long len) > { > + mmap_assert_write_locked(current->mm); > if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI)) > return 0; > if (prot & PROT_ADI) { > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 6f34c33075f9..5b4d554d3189 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -96,7 +96,8 @@ static inline void vm_unacct_memory(long pages) > * > * Returns true if the prot flags are valid > */ > -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) > +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr, > + unsigned long len) > { > return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0; > } > diff --git a/mm/mprotect.c b/mm/mprotect.c > index ce8b8a5eacbb..e2d6b51acbf8 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -533,14 +533,16 @@ static int do_mprotect_pkey(unsigned long start, size_t len, > end = start + len; > if (end <= start) > return -ENOMEM; > - if (!arch_validate_prot(prot, start)) > - return -EINVAL; > > reqprot = prot; > > if (mmap_write_lock_killable(current->mm)) > return -EINTR; > > + error = -EINVAL; > + if (!arch_validate_prot(prot, start, len)) > + goto out; > + > /* > * If userspace did not allocate the pkey, do not let > * them use it here. > > base-commit: c85fb28b6f999db9928b841f63f1beeb3074eeca >
On Wed, Oct 07, 2020 at 04:42:55PM +0200, Jann Horn wrote: > > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, > > > { > > > long ret = -EINVAL; > > > > > > - if (!arch_validate_prot(prot, addr)) > > > + if (!arch_validate_prot(prot, addr, len)) > > > > This call isn't under mmap lock. I also find it rather weird as the > > generic code only calls arch_validate_prot from mprotect, only powerpc > > also calls it from mmap. > > > > This seems to go back to commit ef3d3246a0d0 > > ("powerpc/mm: Add Strong Access Ordering support") > > I'm _guessing_ the idea in the generic case might be that mmap() > doesn't check unknown bits in the protection flags, and therefore > maybe people wanted to avoid adding new error cases that could be > caused by random high bits being set? So while the mprotect() case > checks the flags and refuses unknown values, the mmap() code just lets > the architecture figure out which bits are actually valid to set (via > arch_calc_vm_prot_bits()) and silently ignores the rest? > > And powerpc apparently decided that they do want to error out on bogus > prot values passed to their version of mmap(), and in exchange, assume > in arch_calc_vm_prot_bits() that the protection bits are valid? The problem really is that now programs behave different on powerpc compared to all other architectures. > powerpc's arch_validate_prot() doesn't actually need the mmap lock, so > I think this is fine-ish for now (as in, while the code is a bit > unclean, I don't think I'm making it worse, and I don't think it's > actually buggy). In theory, we could move the arch_validate_prot() > call over into the mmap guts, where we're holding the lock, and gate > it on the architecture or on some feature CONFIG that powerpc can > activate in its Kconfig. But I'm not sure whether that'd be helping or > making things worse, so when I sent this patch, I deliberately left > the powerpc stuff as-is. For now I'd just duplicate the trivial logic from arch_validate_prot in the powerpc version of do_mmap2 and add a comment that this check causes a gratious incompatibility to all other architectures. And then hope that the powerpc maintainers fix it up :)
On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: > arch_validate_prot() is a hook that can validate whether a given set of > protection flags is valid in an mprotect() operation. It is given the set > of protection flags and the address being modified. > > However, the address being modified can currently not actually be used in > a meaningful way because: > > 1. Only the address is given, but not the length, and the operation can > span multiple VMAs. Therefore, the callee can't actually tell which > virtual address range, or which VMAs, are being targeted. > 2. The mmap_lock is not held, meaning that if the callee were to check > the VMA at @addr, that VMA would be unrelated to the one the > operation is performed on. > > Currently, custom arch_validate_prot() handlers are defined by > arm64, powerpc and sparc. > arm64 and powerpc don't care about the address range, they just check the > flags against CPU support masks. > sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > the mmap_lock. > > Change the function signature to also take a length, and move the > arch_validate_prot() call in mm/mprotect.c down into the locked region. For arm64 mte, I noticed the arch_validate_prot() issue with multiple vmas and addressed this in a different way: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=c462ac288f2c97e0c1d9ff6a65955317e799f958 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=0042090548740921951f31fc0c20dcdb96638cb0 Both patches queued for 5.10. Basically, arch_calc_vm_prot_bits() returns a VM_MTE if PROT_MTE has been requested. The newly introduced arch_validate_flags() will check the VM_MTE flag against what the system supports and this covers both mmap() and mprotect(). Note that arch_validate_prot() only handles the latter and I don't think it's sufficient for SPARC ADI. For arm64 MTE we definitely wanted mmap() flags to be validated. In addition, there's a new arch_calc_vm_flag_bits() which allows us to set a VM_MTE_ALLOWED on a vma if the conditions are right (MAP_ANONYMOUS or shmem_mmap(): https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=b3fbbea4c00220f62e6f7e2514466e6ee81f7f60
Jann Horn <jannh@google.com> writes: > On Wed, Oct 7, 2020 at 2:35 PM Christoph Hellwig <hch@infradead.org> wrote: >> On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: >> > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c >> > index 078608ec2e92..b1fabb97d138 100644 >> > --- a/arch/powerpc/kernel/syscalls.c >> > +++ b/arch/powerpc/kernel/syscalls.c >> > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, >> > { >> > long ret = -EINVAL; >> > >> > - if (!arch_validate_prot(prot, addr)) >> > + if (!arch_validate_prot(prot, addr, len)) >> >> This call isn't under mmap lock. I also find it rather weird as the >> generic code only calls arch_validate_prot from mprotect, only powerpc >> also calls it from mmap. >> >> This seems to go back to commit ef3d3246a0d0 >> ("powerpc/mm: Add Strong Access Ordering support") > > I'm _guessing_ the idea in the generic case might be that mmap() > doesn't check unknown bits in the protection flags, and therefore > maybe people wanted to avoid adding new error cases that could be > caused by random high bits being set? I suspect it's just that when we added it we updated our do_mmap2() and didn't touch the generic version because we didn't need to. ie. it's not intentional it's just a buglet. I think this is the original submission: https://lore.kernel.org/linuxppc-dev/20080610220055.10257.84465.sendpatchset@norville.austin.ibm.com/ Which only calls arch_validate_prot() from mprotect and the powerpc code, and there was no discussion about adding it elsewhere. > So while the mprotect() case > checks the flags and refuses unknown values, the mmap() code just lets > the architecture figure out which bits are actually valid to set (via > arch_calc_vm_prot_bits()) and silently ignores the rest? > > And powerpc apparently decided that they do want to error out on bogus > prot values passed to their version of mmap(), and in exchange, assume > in arch_calc_vm_prot_bits() that the protection bits are valid? I don't think we really decided that, it just happened by accident and no one noticed/complained. Seems userspace is pretty well behaved when it comes to passing prot values to mmap(). > powerpc's arch_validate_prot() doesn't actually need the mmap lock, so > I think this is fine-ish for now (as in, while the code is a bit > unclean, I don't think I'm making it worse, and I don't think it's > actually buggy). In theory, we could move the arch_validate_prot() > call over into the mmap guts, where we're holding the lock, and gate > it on the architecture or on some feature CONFIG that powerpc can > activate in its Kconfig. But I'm not sure whether that'd be helping or > making things worse, so when I sent this patch, I deliberately left > the powerpc stuff as-is. I think what you've done is fine, and anything more elaborate is not worth the effort. cheers
On Thu, Oct 08, 2020 at 09:34:26PM +1100, Michael Ellerman wrote: > Jann Horn <jannh@google.com> writes: > > So while the mprotect() case > > checks the flags and refuses unknown values, the mmap() code just lets > > the architecture figure out which bits are actually valid to set (via > > arch_calc_vm_prot_bits()) and silently ignores the rest? > > > > And powerpc apparently decided that they do want to error out on bogus > > prot values passed to their version of mmap(), and in exchange, assume > > in arch_calc_vm_prot_bits() that the protection bits are valid? > > I don't think we really decided that, it just happened by accident and > no one noticed/complained. > > Seems userspace is pretty well behaved when it comes to passing prot > values to mmap(). It's not necessarily about well behaved but whether it can have security implications. On arm64, if the underlying memory does not support MTE (say some DAX mmap) but we still allow PROT_MTE driven by user, it will lead to an SError which brings the whole machine down. Not sure whether ADI has similar requirements but at least for arm64 we addressed the mmap() case as well (see my other email on the details; I think the approach would work on SPARC as well).
Hi Khalid, On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: > On 10/7/20 1:39 AM, Jann Horn wrote: > > arch_validate_prot() is a hook that can validate whether a given set of > > protection flags is valid in an mprotect() operation. It is given the set > > of protection flags and the address being modified. > > > > However, the address being modified can currently not actually be used in > > a meaningful way because: > > > > 1. Only the address is given, but not the length, and the operation can > > span multiple VMAs. Therefore, the callee can't actually tell which > > virtual address range, or which VMAs, are being targeted. > > 2. The mmap_lock is not held, meaning that if the callee were to check > > the VMA at @addr, that VMA would be unrelated to the one the > > operation is performed on. > > > > Currently, custom arch_validate_prot() handlers are defined by > > arm64, powerpc and sparc. > > arm64 and powerpc don't care about the address range, they just check the > > flags against CPU support masks. > > sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > > the mmap_lock. > > > > Change the function signature to also take a length, and move the > > arch_validate_prot() call in mm/mprotect.c down into the locked region. [...] > As Chris pointed out, the call to arch_validate_prot() from do_mmap2() > is made without holding mmap_lock. Lock is not acquired until > vm_mmap_pgoff(). This variance is uncomfortable but I am more > uncomfortable forcing all implementations of validate_prot to require > mmap_lock be held when non-sparc implementations do not have such need > yet. Since do_mmap2() is in powerpc specific code, for now this patch > solves a current problem. I still think sparc should avoid walking the vmas in arch_validate_prot(). The core code already has the vmas, though not when calling arch_validate_prot(). That's one of the reasons I added arch_validate_flags() with the MTE patches. For sparc, this could be (untested, just copied the arch_validate_prot() code): static inline bool arch_validate_flags(unsigned long vm_flags) { if (!(vm_flags & VM_SPARC_ADI)) return true; if (!adi_capable()) return false; /* ADI can not be enabled on PFN mapped pages */ if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) return false; /* * Mergeable pages can become unmergeable if ADI is enabled on * them even if they have identical data on them. This can be * because ADI enabled pages with identical data may still not * have identical ADI tags on them. Disallow ADI on mergeable * pages. */ if (vma->vm_flags & VM_MERGEABLE) return false; return true; } > That leaves open the question of should > generic mmap call arch_validate_prot and return EINVAL for invalid > combination of protection bits, but that is better addressed in a > separate patch. The above would cover mmap() as well. The current sparc_validate_prot() relies on finding the vma for the corresponding address. However, if you call this early in the mmap() path, there's no such vma. It is only created later in mmap_region() which no longer has the original PROT_* flags (all converted to VM_* flags). Calling arch_validate_flags() on mmap() has a small side-effect on the user ABI: if the CPU is not adi_capable(), PROT_ADI is currently ignored on mmap() but rejected by sparc_validate_prot(). Powerpc already does this already and I think it should be fine for arm64 (it needs checking though as we have another flag, PROT_BTI, hopefully dynamic loaders don't pass this flag unconditionally). However, as I said above, it doesn't solve the mmap() PROT_ADI checking for sparc since there's no vma yet. I'd strongly recommend the arch_validate_flags() approach and reverting the "start" parameter added to arch_validate_prot() if you go for the flags route. Thanks.
On 10/10/20 5:09 AM, Catalin Marinas wrote: > Hi Khalid, > > On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >> On 10/7/20 1:39 AM, Jann Horn wrote: >>> arch_validate_prot() is a hook that can validate whether a given set of >>> protection flags is valid in an mprotect() operation. It is given the set >>> of protection flags and the address being modified. >>> >>> However, the address being modified can currently not actually be used in >>> a meaningful way because: >>> >>> 1. Only the address is given, but not the length, and the operation can >>> span multiple VMAs. Therefore, the callee can't actually tell which >>> virtual address range, or which VMAs, are being targeted. >>> 2. The mmap_lock is not held, meaning that if the callee were to check >>> the VMA at @addr, that VMA would be unrelated to the one the >>> operation is performed on. >>> >>> Currently, custom arch_validate_prot() handlers are defined by >>> arm64, powerpc and sparc. >>> arm64 and powerpc don't care about the address range, they just check the >>> flags against CPU support masks. >>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take >>> the mmap_lock. >>> >>> Change the function signature to also take a length, and move the >>> arch_validate_prot() call in mm/mprotect.c down into the locked region. > [...] >> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() >> is made without holding mmap_lock. Lock is not acquired until >> vm_mmap_pgoff(). This variance is uncomfortable but I am more >> uncomfortable forcing all implementations of validate_prot to require >> mmap_lock be held when non-sparc implementations do not have such need >> yet. Since do_mmap2() is in powerpc specific code, for now this patch >> solves a current problem. > > I still think sparc should avoid walking the vmas in > arch_validate_prot(). The core code already has the vmas, though not > when calling arch_validate_prot(). That's one of the reasons I added > arch_validate_flags() with the MTE patches. For sparc, this could be > (untested, just copied the arch_validate_prot() code): I am little uncomfortable with the idea of validating protection bits inside the VMA walk loop in do_mprotect_pkey(). When ADI is being enabled across multiple VMAs and arch_validate_flags() fails on a VMA later, do_mprotect_pkey() will bail out with error leaving ADI enabled on earlier VMAs. This will apply to protection bits other than ADI as well of course. This becomes a partial failure of mprotect() call. I think it should be all or nothing with mprotect() - when one calls mprotect() from userspace, either the entire address range passed in gets its protection bits updated or none of it does. That requires validating protection bits upfront or undoing what earlier iterations of VMA walk loop might have done. -- Khalid > > static inline bool arch_validate_flags(unsigned long vm_flags) > { > if (!(vm_flags & VM_SPARC_ADI)) > return true; > > if (!adi_capable()) > return false; > > /* ADI can not be enabled on PFN mapped pages */ > if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > return false; > > /* > * Mergeable pages can become unmergeable if ADI is enabled on > * them even if they have identical data on them. This can be > * because ADI enabled pages with identical data may still not > * have identical ADI tags on them. Disallow ADI on mergeable > * pages. > */ > if (vma->vm_flags & VM_MERGEABLE) > return false; > > return true; > } > >> That leaves open the question of should >> generic mmap call arch_validate_prot and return EINVAL for invalid >> combination of protection bits, but that is better addressed in a >> separate patch. > > The above would cover mmap() as well. > > The current sparc_validate_prot() relies on finding the vma for the > corresponding address. However, if you call this early in the mmap() > path, there's no such vma. It is only created later in mmap_region() > which no longer has the original PROT_* flags (all converted to VM_* > flags). > > Calling arch_validate_flags() on mmap() has a small side-effect on the > user ABI: if the CPU is not adi_capable(), PROT_ADI is currently ignored > on mmap() but rejected by sparc_validate_prot(). Powerpc already does > this already and I think it should be fine for arm64 (it needs checking > though as we have another flag, PROT_BTI, hopefully dynamic loaders > don't pass this flag unconditionally). > > However, as I said above, it doesn't solve the mmap() PROT_ADI checking > for sparc since there's no vma yet. I'd strongly recommend the > arch_validate_flags() approach and reverting the "start" parameter added > to arch_validate_prot() if you go for the flags route. > > Thanks. >
On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: > On 10/10/20 5:09 AM, Catalin Marinas wrote: > > On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: > >> On 10/7/20 1:39 AM, Jann Horn wrote: > >>> arch_validate_prot() is a hook that can validate whether a given set of > >>> protection flags is valid in an mprotect() operation. It is given the set > >>> of protection flags and the address being modified. > >>> > >>> However, the address being modified can currently not actually be used in > >>> a meaningful way because: > >>> > >>> 1. Only the address is given, but not the length, and the operation can > >>> span multiple VMAs. Therefore, the callee can't actually tell which > >>> virtual address range, or which VMAs, are being targeted. > >>> 2. The mmap_lock is not held, meaning that if the callee were to check > >>> the VMA at @addr, that VMA would be unrelated to the one the > >>> operation is performed on. > >>> > >>> Currently, custom arch_validate_prot() handlers are defined by > >>> arm64, powerpc and sparc. > >>> arm64 and powerpc don't care about the address range, they just check the > >>> flags against CPU support masks. > >>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > >>> the mmap_lock. > >>> > >>> Change the function signature to also take a length, and move the > >>> arch_validate_prot() call in mm/mprotect.c down into the locked region. > > [...] > >> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() > >> is made without holding mmap_lock. Lock is not acquired until > >> vm_mmap_pgoff(). This variance is uncomfortable but I am more > >> uncomfortable forcing all implementations of validate_prot to require > >> mmap_lock be held when non-sparc implementations do not have such need > >> yet. Since do_mmap2() is in powerpc specific code, for now this patch > >> solves a current problem. > > > > I still think sparc should avoid walking the vmas in > > arch_validate_prot(). The core code already has the vmas, though not > > when calling arch_validate_prot(). That's one of the reasons I added > > arch_validate_flags() with the MTE patches. For sparc, this could be > > (untested, just copied the arch_validate_prot() code): > > I am little uncomfortable with the idea of validating protection bits > inside the VMA walk loop in do_mprotect_pkey(). When ADI is being > enabled across multiple VMAs and arch_validate_flags() fails on a VMA > later, do_mprotect_pkey() will bail out with error leaving ADI enabled > on earlier VMAs. This will apply to protection bits other than ADI as > well of course. This becomes a partial failure of mprotect() call. I > think it should be all or nothing with mprotect() - when one calls > mprotect() from userspace, either the entire address range passed in > gets its protection bits updated or none of it does. That requires > validating protection bits upfront or undoing what earlier iterations of > VMA walk loop might have done. I thought the same initially but mprotect() already does this with the VM_MAY* flag checking. If you ask it for an mprotect() that crosses multiple vmas and one of them fails, it doesn't roll back the changes to the prior ones. I considered that a similar approach is fine for MTE (it's most likely a user error).
On 10/12/20 11:22 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>> arch_validate_prot() is a hook that can validate whether a given set of >>>>> protection flags is valid in an mprotect() operation. It is given the set >>>>> of protection flags and the address being modified. >>>>> >>>>> However, the address being modified can currently not actually be used in >>>>> a meaningful way because: >>>>> >>>>> 1. Only the address is given, but not the length, and the operation can >>>>> span multiple VMAs. Therefore, the callee can't actually tell which >>>>> virtual address range, or which VMAs, are being targeted. >>>>> 2. The mmap_lock is not held, meaning that if the callee were to check >>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>> operation is performed on. >>>>> >>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>> arm64, powerpc and sparc. >>>>> arm64 and powerpc don't care about the address range, they just check the >>>>> flags against CPU support masks. >>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take >>>>> the mmap_lock. >>>>> >>>>> Change the function signature to also take a length, and move the >>>>> arch_validate_prot() call in mm/mprotect.c down into the locked region. >>> [...] >>>> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() >>>> is made without holding mmap_lock. Lock is not acquired until >>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>> uncomfortable forcing all implementations of validate_prot to require >>>> mmap_lock be held when non-sparc implementations do not have such need >>>> yet. Since do_mmap2() is in powerpc specific code, for now this patch >>>> solves a current problem. >>> >>> I still think sparc should avoid walking the vmas in >>> arch_validate_prot(). The core code already has the vmas, though not >>> when calling arch_validate_prot(). That's one of the reasons I added >>> arch_validate_flags() with the MTE patches. For sparc, this could be >>> (untested, just copied the arch_validate_prot() code): >> >> I am little uncomfortable with the idea of validating protection bits >> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >> enabled across multiple VMAs and arch_validate_flags() fails on a VMA >> later, do_mprotect_pkey() will bail out with error leaving ADI enabled >> on earlier VMAs. This will apply to protection bits other than ADI as >> well of course. This becomes a partial failure of mprotect() call. I >> think it should be all or nothing with mprotect() - when one calls >> mprotect() from userspace, either the entire address range passed in >> gets its protection bits updated or none of it does. That requires >> validating protection bits upfront or undoing what earlier iterations of >> VMA walk loop might have done. > > I thought the same initially but mprotect() already does this with the > VM_MAY* flag checking. If you ask it for an mprotect() that crosses > multiple vmas and one of them fails, it doesn't roll back the changes to > the prior ones. I considered that a similar approach is fine for MTE > (it's most likely a user error). > You are right about the current behavior with VM_MAY* flags, but that is not the right behavior. Adding more cases to this just perpetuates incorrect behavior. It is not easy to roll back changes after VMAs have potentially been split/merged which is probably why the current code simply throws in the towel and returns with partially modified address space. It is lot easier to do all the checks upfront and then proceed or not proceed with modifying VMAs. One approach might be to call arch_validate_flags() in a loop before modifying VMAs and walk all VMAs with a read lock held. Current code also bails out with ENOMEM if it finds a hole in the address range and leaves any modifications already made in place. This is another case where a hole could have been detected earlier. -- Khalid
On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: > On 10/12/20 11:22 AM, Catalin Marinas wrote: > > On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: > >> On 10/10/20 5:09 AM, Catalin Marinas wrote: > >>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: > >>>> On 10/7/20 1:39 AM, Jann Horn wrote: > >>>>> arch_validate_prot() is a hook that can validate whether a given set of > >>>>> protection flags is valid in an mprotect() operation. It is given the set > >>>>> of protection flags and the address being modified. > >>>>> > >>>>> However, the address being modified can currently not actually be used in > >>>>> a meaningful way because: > >>>>> > >>>>> 1. Only the address is given, but not the length, and the operation can > >>>>> span multiple VMAs. Therefore, the callee can't actually tell which > >>>>> virtual address range, or which VMAs, are being targeted. > >>>>> 2. The mmap_lock is not held, meaning that if the callee were to check > >>>>> the VMA at @addr, that VMA would be unrelated to the one the > >>>>> operation is performed on. > >>>>> > >>>>> Currently, custom arch_validate_prot() handlers are defined by > >>>>> arm64, powerpc and sparc. > >>>>> arm64 and powerpc don't care about the address range, they just check the > >>>>> flags against CPU support masks. > >>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > >>>>> the mmap_lock. > >>>>> > >>>>> Change the function signature to also take a length, and move the > >>>>> arch_validate_prot() call in mm/mprotect.c down into the locked region. > >>> [...] > >>>> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() > >>>> is made without holding mmap_lock. Lock is not acquired until > >>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more > >>>> uncomfortable forcing all implementations of validate_prot to require > >>>> mmap_lock be held when non-sparc implementations do not have such need > >>>> yet. Since do_mmap2() is in powerpc specific code, for now this patch > >>>> solves a current problem. > >>> > >>> I still think sparc should avoid walking the vmas in > >>> arch_validate_prot(). The core code already has the vmas, though not > >>> when calling arch_validate_prot(). That's one of the reasons I added > >>> arch_validate_flags() with the MTE patches. For sparc, this could be > >>> (untested, just copied the arch_validate_prot() code): > >> > >> I am little uncomfortable with the idea of validating protection bits > >> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being > >> enabled across multiple VMAs and arch_validate_flags() fails on a VMA > >> later, do_mprotect_pkey() will bail out with error leaving ADI enabled > >> on earlier VMAs. This will apply to protection bits other than ADI as > >> well of course. This becomes a partial failure of mprotect() call. I > >> think it should be all or nothing with mprotect() - when one calls > >> mprotect() from userspace, either the entire address range passed in > >> gets its protection bits updated or none of it does. That requires > >> validating protection bits upfront or undoing what earlier iterations of > >> VMA walk loop might have done. > > > > I thought the same initially but mprotect() already does this with the > > VM_MAY* flag checking. If you ask it for an mprotect() that crosses > > multiple vmas and one of them fails, it doesn't roll back the changes to > > the prior ones. I considered that a similar approach is fine for MTE > > (it's most likely a user error). > > You are right about the current behavior with VM_MAY* flags, but that is > not the right behavior. Adding more cases to this just perpetuates > incorrect behavior. It is not easy to roll back changes after VMAs have > potentially been split/merged which is probably why the current code > simply throws in the towel and returns with partially modified address > space. It is lot easier to do all the checks upfront and then proceed or > not proceed with modifying VMAs. One approach might be to call > arch_validate_flags() in a loop before modifying VMAs and walk all VMAs > with a read lock held. Current code also bails out with ENOMEM if it > finds a hole in the address range and leaves any modifications already > made in place. This is another case where a hole could have been > detected earlier. This should be ideal indeed though with the risk of breaking the current ABI (FWIW, FreeBSD seems to do a first pass to check for violations: https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). However, I'm not sure it's worth the hassle. Do we expect the user to call mprotect() across multiple mixed type mappings while relying on no change if an error is returned? We should probably at least document the current behaviour in the mprotect man page.
On 10/13/20 3:16 AM, Catalin Marinas wrote: > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: >> On 10/12/20 11:22 AM, Catalin Marinas wrote: >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: >>>>> On Wed, Oct 07, 2020 at 02:14:09PM -0600, Khalid Aziz wrote: >>>>>> On 10/7/20 1:39 AM, Jann Horn wrote: >>>>>>> arch_validate_prot() is a hook that can validate whether a given set of >>>>>>> protection flags is valid in an mprotect() operation. It is given the set >>>>>>> of protection flags and the address being modified. >>>>>>> >>>>>>> However, the address being modified can currently not actually be used in >>>>>>> a meaningful way because: >>>>>>> >>>>>>> 1. Only the address is given, but not the length, and the operation can >>>>>>> span multiple VMAs. Therefore, the callee can't actually tell which >>>>>>> virtual address range, or which VMAs, are being targeted. >>>>>>> 2. The mmap_lock is not held, meaning that if the callee were to check >>>>>>> the VMA at @addr, that VMA would be unrelated to the one the >>>>>>> operation is performed on. >>>>>>> >>>>>>> Currently, custom arch_validate_prot() handlers are defined by >>>>>>> arm64, powerpc and sparc. >>>>>>> arm64 and powerpc don't care about the address range, they just check the >>>>>>> flags against CPU support masks. >>>>>>> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take >>>>>>> the mmap_lock. >>>>>>> >>>>>>> Change the function signature to also take a length, and move the >>>>>>> arch_validate_prot() call in mm/mprotect.c down into the locked region. >>>>> [...] >>>>>> As Chris pointed out, the call to arch_validate_prot() from do_mmap2() >>>>>> is made without holding mmap_lock. Lock is not acquired until >>>>>> vm_mmap_pgoff(). This variance is uncomfortable but I am more >>>>>> uncomfortable forcing all implementations of validate_prot to require >>>>>> mmap_lock be held when non-sparc implementations do not have such need >>>>>> yet. Since do_mmap2() is in powerpc specific code, for now this patch >>>>>> solves a current problem. >>>>> >>>>> I still think sparc should avoid walking the vmas in >>>>> arch_validate_prot(). The core code already has the vmas, though not >>>>> when calling arch_validate_prot(). That's one of the reasons I added >>>>> arch_validate_flags() with the MTE patches. For sparc, this could be >>>>> (untested, just copied the arch_validate_prot() code): >>>> >>>> I am little uncomfortable with the idea of validating protection bits >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VMA >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabled >>>> on earlier VMAs. This will apply to protection bits other than ADI as >>>> well of course. This becomes a partial failure of mprotect() call. I >>>> think it should be all or nothing with mprotect() - when one calls >>>> mprotect() from userspace, either the entire address range passed in >>>> gets its protection bits updated or none of it does. That requires >>>> validating protection bits upfront or undoing what earlier iterations of >>>> VMA walk loop might have done. >>> >>> I thought the same initially but mprotect() already does this with the >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses >>> multiple vmas and one of them fails, it doesn't roll back the changes to >>> the prior ones. I considered that a similar approach is fine for MTE >>> (it's most likely a user error). >> >> You are right about the current behavior with VM_MAY* flags, but that is >> not the right behavior. Adding more cases to this just perpetuates >> incorrect behavior. It is not easy to roll back changes after VMAs have >> potentially been split/merged which is probably why the current code >> simply throws in the towel and returns with partially modified address >> space. It is lot easier to do all the checks upfront and then proceed or >> not proceed with modifying VMAs. One approach might be to call >> arch_validate_flags() in a loop before modifying VMAs and walk all VMAs >> with a read lock held. Current code also bails out with ENOMEM if it >> finds a hole in the address range and leaves any modifications already >> made in place. This is another case where a hole could have been >> detected earlier. > > This should be ideal indeed though with the risk of breaking the current > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). I am not sure I understand where the ABI breakage would be. Are we aware of apps that intentionally modify address space partially using the current code? What FreeBSD does seems like a reasonable thing to do. Any way first thing to do is to update sparc to use arch_validate_flags() and update sparc_validate_prot() to not peek into vma without lock. I can do that unless Jann wants to rework this 2 patch series with these changes. > > However, I'm not sure it's worth the hassle. Do we expect the user to > call mprotect() across multiple mixed type mappings while relying on no > change if an error is returned? We should probably at least document the > current behaviour in the mprotect man page. > Yes, documenting current behavior is definitely a good thing to do. -- Khalid
On Wed, Oct 14, 2020 at 11:24 PM Khalid Aziz <khalid.aziz@oracle.com> wrote: [...] > current code? What FreeBSD does seems like a reasonable thing to do. Any > way first thing to do is to update sparc to use arch_validate_flags() > and update sparc_validate_prot() to not peek into vma without lock. I > can do that unless Jann wants to rework this 2 patch series with these > changes. Ah, if you're willing to take care of that, that'd be nice, please do. :)
On Wed, Oct 14, 2020 at 03:21:16PM -0600, Khalid Aziz wrote: > On 10/13/20 3:16 AM, Catalin Marinas wrote: > > On Mon, Oct 12, 2020 at 01:14:50PM -0600, Khalid Aziz wrote: > >> On 10/12/20 11:22 AM, Catalin Marinas wrote: > >>> On Mon, Oct 12, 2020 at 11:03:33AM -0600, Khalid Aziz wrote: > >>>> On 10/10/20 5:09 AM, Catalin Marinas wrote: > >>>>> I still think sparc should avoid walking the vmas in > >>>>> arch_validate_prot(). The core code already has the vmas, though not > >>>>> when calling arch_validate_prot(). That's one of the reasons I added > >>>>> arch_validate_flags() with the MTE patches. For sparc, this could be > >>>>> (untested, just copied the arch_validate_prot() code): > >>>> > >>>> I am little uncomfortable with the idea of validating protection bits > >>>> inside the VMA walk loop in do_mprotect_pkey(). When ADI is being > >>>> enabled across multiple VMAs and arch_validate_flags() fails on a VMA > >>>> later, do_mprotect_pkey() will bail out with error leaving ADI enabled > >>>> on earlier VMAs. This will apply to protection bits other than ADI as > >>>> well of course. This becomes a partial failure of mprotect() call. I > >>>> think it should be all or nothing with mprotect() - when one calls > >>>> mprotect() from userspace, either the entire address range passed in > >>>> gets its protection bits updated or none of it does. That requires > >>>> validating protection bits upfront or undoing what earlier iterations of > >>>> VMA walk loop might have done. > >>> > >>> I thought the same initially but mprotect() already does this with the > >>> VM_MAY* flag checking. If you ask it for an mprotect() that crosses > >>> multiple vmas and one of them fails, it doesn't roll back the changes to > >>> the prior ones. I considered that a similar approach is fine for MTE > >>> (it's most likely a user error). > >> > >> You are right about the current behavior with VM_MAY* flags, but that is > >> not the right behavior. Adding more cases to this just perpetuates > >> incorrect behavior. It is not easy to roll back changes after VMAs have > >> potentially been split/merged which is probably why the current code > >> simply throws in the towel and returns with partially modified address > >> space. It is lot easier to do all the checks upfront and then proceed or > >> not proceed with modifying VMAs. One approach might be to call > >> arch_validate_flags() in a loop before modifying VMAs and walk all VMAs > >> with a read lock held. Current code also bails out with ENOMEM if it > >> finds a hole in the address range and leaves any modifications already > >> made in place. This is another case where a hole could have been > >> detected earlier. > > > > This should be ideal indeed though with the risk of breaking the current > > ABI (FWIW, FreeBSD seems to do a first pass to check for violations: > > https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L2630). > > I am not sure I understand where the ABI breakage would be. Are we aware > of apps that intentionally modify address space partially using the > current code? I hope there aren't any but you never know until you make the change and someone complains. Arguably, such user code is already broken since mprotect() doesn't even tell where the failure occurred. > What FreeBSD does seems like a reasonable thing to do. Any way first > thing to do is to update sparc to use arch_validate_flags() and update > sparc_validate_prot() to not peek into vma without lock. If you go for arch_validate_flags(), I think sparc_validate_prot() doesn't need the vma at all. BTW, on the ADI topic, I think you have a race in do_swap_page() since set_pte_at() is called before arch_do_swap_page(). So a thread in the same process would see the new mapping but the tags have not been updated yet. Unless sparc relies on the new user pte to be set, I think you can just swap the two calls.
On 10/15/20 3:05 AM, Catalin Marinas wrote: > On Wed, Oct 14, 2020 at 03:21:16PM -0600, Khalid Aziz wrote: >> What FreeBSD does seems like a reasonable thing to do. Any way first >> thing to do is to update sparc to use arch_validate_flags() and update >> sparc_validate_prot() to not peek into vma without lock. > > If you go for arch_validate_flags(), I think sparc_validate_prot() > doesn't need the vma at all. Yes, the plan is to move vma flag check from sparc_validate_prot() to arch_validate_flags().. > > BTW, on the ADI topic, I think you have a race in do_swap_page() since > set_pte_at() is called before arch_do_swap_page(). So a thread in the > same process would see the new mapping but the tags have not been > updated yet. Unless sparc relies on the new user pte to be set, I think > you can just swap the two calls. > Thanks for pointing that out. I will take a look at it. -- Khalid
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h index 081ec8de9ea6..0876a87986dc 100644 --- a/arch/arm64/include/asm/mman.h +++ b/arch/arm64/include/asm/mman.h @@ -23,7 +23,7 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) static inline bool arch_validate_prot(unsigned long prot, - unsigned long addr __always_unused) + unsigned long addr __always_unused, unsigned long len __always_unused) { unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM; @@ -32,6 +32,6 @@ static inline bool arch_validate_prot(unsigned long prot, return (prot & ~supported) == 0; } -#define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr) +#define arch_validate_prot(prot, addr, len) arch_validate_prot(prot, addr, len) #endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 7cb6d18f5cd6..65dd9b594985 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -36,7 +36,8 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags) } #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags) -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr, + unsigned long len) { if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO)) return false; diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c index 078608ec2e92..b1fabb97d138 100644 --- a/arch/powerpc/kernel/syscalls.c +++ b/arch/powerpc/kernel/syscalls.c @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len, { long ret = -EINVAL; - if (!arch_validate_prot(prot, addr)) + if (!arch_validate_prot(prot, addr, len)) goto out; if (shift) { diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h index f94532f25db1..e85222c76585 100644 --- a/arch/sparc/include/asm/mman.h +++ b/arch/sparc/include/asm/mman.h @@ -52,9 +52,11 @@ static inline pgprot_t sparc_vm_get_page_prot(unsigned long vm_flags) return (vm_flags & VM_SPARC_ADI) ? __pgprot(_PAGE_MCD_4V) : __pgprot(0); } -#define arch_validate_prot(prot, addr) sparc_validate_prot(prot, addr) -static inline int sparc_validate_prot(unsigned long prot, unsigned long addr) +#define arch_validate_prot(prot, addr, len) sparc_validate_prot(prot, addr, len) +static inline int sparc_validate_prot(unsigned long prot, unsigned long addr, + unsigned long len) { + mmap_assert_write_locked(current->mm); if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI)) return 0; if (prot & PROT_ADI) { diff --git a/include/linux/mman.h b/include/linux/mman.h index 6f34c33075f9..5b4d554d3189 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -96,7 +96,8 @@ static inline void vm_unacct_memory(long pages) * * Returns true if the prot flags are valid */ -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr) +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr, + unsigned long len) { return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0; } diff --git a/mm/mprotect.c b/mm/mprotect.c index ce8b8a5eacbb..e2d6b51acbf8 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -533,14 +533,16 @@ static int do_mprotect_pkey(unsigned long start, size_t len, end = start + len; if (end <= start) return -ENOMEM; - if (!arch_validate_prot(prot, start)) - return -EINVAL; reqprot = prot; if (mmap_write_lock_killable(current->mm)) return -EINTR; + error = -EINVAL; + if (!arch_validate_prot(prot, start, len)) + goto out; + /* * If userspace did not allocate the pkey, do not let * them use it here.
arch_validate_prot() is a hook that can validate whether a given set of protection flags is valid in an mprotect() operation. It is given the set of protection flags and the address being modified. However, the address being modified can currently not actually be used in a meaningful way because: 1. Only the address is given, but not the length, and the operation can span multiple VMAs. Therefore, the callee can't actually tell which virtual address range, or which VMAs, are being targeted. 2. The mmap_lock is not held, meaning that if the callee were to check the VMA at @addr, that VMA would be unrelated to the one the operation is performed on. Currently, custom arch_validate_prot() handlers are defined by arm64, powerpc and sparc. arm64 and powerpc don't care about the address range, they just check the flags against CPU support masks. sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take the mmap_lock. Change the function signature to also take a length, and move the arch_validate_prot() call in mm/mprotect.c down into the locked region. Cc: stable@vger.kernel.org Fixes: 9035cf9a97e4 ("mm: Add address parameter to arch_validate_prot()") Suggested-by: Khalid Aziz <khalid.aziz@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jann Horn <jannh@google.com> --- arch/arm64/include/asm/mman.h | 4 ++-- arch/powerpc/include/asm/mman.h | 3 ++- arch/powerpc/kernel/syscalls.c | 2 +- arch/sparc/include/asm/mman.h | 6 ++++-- include/linux/mman.h | 3 ++- mm/mprotect.c | 6 ++++-- 6 files changed, 15 insertions(+), 9 deletions(-) base-commit: c85fb28b6f999db9928b841f63f1beeb3074eeca