Message ID | 20220908185006.1212126-1-abrestic@rivosinc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ | expand |
The v2 patch looks great, > - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) > - return -EINVAL; > - This also removes the check for --x pages, which used to be present in previous versions (before the submission of the to-be-reverted patch). Is this intended? Thanks!
On Thu, Sep 8, 2022 at 2:57 PM SS JieJi <c141028@gmail.com> wrote: > > The v2 patch looks great, > > - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) > > - return -EINVAL; > > - > This also removes the check for --x pages, which used to be present in > previous versions (before the submission of the to-be-reverted patch). > Is this intended? Thanks! There's no change in behavior for --X mappings; those have always been allowed as it's a valid set of PTE permissions. This patch does allow -WX mappings, which were originally disallowed in commit e0d17c842c0f ("RISC-V: Don't allow write+exec only page mapping request in mmap"), by implying read permissions for such mappings as well. I have a note in the commit message about this. -Andrew
On 2022/9/9 02:50, Andrew Bresticker wrote: > Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is > invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout > PROT_READ with the justification that a write-only PTE is considered a > reserved PTE permission bit pattern in the privileged spec. This check > is unnecessary since RISC-V defines its protection_map such that PROT_WRITE > maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is > inconsistent with other architectures that don't support write-only PTEs, > creating a potential software portability issue. Just remove the check > altogether and let PROT_WRITE imply PROT_READ as is the case on other > architectures. > > Note that this also allows PROT_WRITE|PROT_EXEC mappings which were > disallowed prior to the aforementioned commit; PROT_READ is implied in > such mappings as well. > > Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") > Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> > --- > v1 -> v2: Update access_error() to account for write-implies-read > --- > arch/riscv/kernel/sys_riscv.c | 3 --- > arch/riscv/mm/fault.c | 3 ++- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c > index 571556bb9261..5d3f2fbeb33c 100644 > --- a/arch/riscv/kernel/sys_riscv.c > +++ b/arch/riscv/kernel/sys_riscv.c > @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, > if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) > return -EINVAL; > > - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) > - return -EINVAL; > - > return ksys_mmap_pgoff(addr, len, prot, flags, fd, > offset >> (PAGE_SHIFT - page_shift_offset)); > } > diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > index f2fbd1400b7c..d86f7cebd4a7 100644 > --- a/arch/riscv/mm/fault.c > +++ b/arch/riscv/mm/fault.c > @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma) > } > break; > case EXC_LOAD_PAGE_FAULT: > - if (!(vma->vm_flags & VM_READ)) { > + /* Write implies read */ > + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) { > return true; > } > break; Hi, this did solve the problem and achieved consistency between architectures, but I have a question. Such a change specifies behavior for a state that should not exist, and if, in the future, RISC-V spec specifies a different behavior for that state (I mean, RVI itself has a history of not caring about downstream, like Zicsr and Zifencei), it will create inconsistencies, which is bad. If we reject the "write but not read" state, the user gets the most direct response: the state is not allowed so that they do not and cannot rely on the behavior of the state. This will bring better time consistency to the application if the spec specifies the behavior in the future. But it lost architecture consistency. How do you think this situation should be handled properly? Yours, Celeste Liu
On 2022/9/9 11:01, Celeste Liu wrote: > On 2022/9/9 02:50, Andrew Bresticker wrote: >> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is >> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout >> PROT_READ with the justification that a write-only PTE is considered a >> reserved PTE permission bit pattern in the privileged spec. This check >> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE >> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is >> inconsistent with other architectures that don't support write-only PTEs, >> creating a potential software portability issue. Just remove the check >> altogether and let PROT_WRITE imply PROT_READ as is the case on other >> architectures. >> >> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were >> disallowed prior to the aforementioned commit; PROT_READ is implied in >> such mappings as well. >> >> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") >> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> >> --- >> v1 -> v2: Update access_error() to account for write-implies-read >> --- >> arch/riscv/kernel/sys_riscv.c | 3 --- >> arch/riscv/mm/fault.c | 3 ++- >> 2 files changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c >> index 571556bb9261..5d3f2fbeb33c 100644 >> --- a/arch/riscv/kernel/sys_riscv.c >> +++ b/arch/riscv/kernel/sys_riscv.c >> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, >> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) >> return -EINVAL; >> >> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) >> - return -EINVAL; >> - >> return ksys_mmap_pgoff(addr, len, prot, flags, fd, >> offset >> (PAGE_SHIFT - page_shift_offset)); >> } >> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c >> index f2fbd1400b7c..d86f7cebd4a7 100644 >> --- a/arch/riscv/mm/fault.c >> +++ b/arch/riscv/mm/fault.c >> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma) >> } >> break; >> case EXC_LOAD_PAGE_FAULT: >> - if (!(vma->vm_flags & VM_READ)) { >> + /* Write implies read */ >> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) { >> return true; >> } >> break; > > Hi, this did solve the problem and achieved consistency between > architectures, but I have a question. > > Such a change specifies behavior for a state that should not exist, > and if, in the future, RISC-V spec specifies a different behavior > for that state (I mean, RVI itself has a history of not caring about > downstream, like Zicsr and Zifencei), it will create inconsistencies, > which is bad. > > If we reject the "write but not read" state, the user gets the most direct > response: the state is not allowed so that they do not and cannot rely > on the behavior of the state. This will bring better time consistency > to the application if the spec specifies the behavior in the future. > But it lost architecture consistency. > > How do you think this situation should be handled properly? > > Yours, > Celeste Liu Oops! I found a mistake in my previous understanding: PTE permission!=vma permission. So your modification makes sense, no matter how we handle the mapping of input permissions to PTEs, as long as we don't use the reserved permission combinations, the behavior is reasonable and also independent of the architecture's definition of PTEs. But I think this mapping relationship should be well documented. If we have such a mapping behavior in all architectures, then we should change this line in the mmap documentation On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ. to apply all architectures. According to my read about code, all the vm_get_page_prot will do the protection_map mapping to have this feature. Yours, Celeste Liu
On Fri, Sep 9, 2022 at 7:42 AM Coelacanthus <coelacanthushex@gmail.com> wrote: > > On 2022/9/9 11:01, Celeste Liu wrote: > > On 2022/9/9 02:50, Andrew Bresticker wrote: > >> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is > >> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout > >> PROT_READ with the justification that a write-only PTE is considered a > >> reserved PTE permission bit pattern in the privileged spec. This check > >> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE > >> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is > >> inconsistent with other architectures that don't support write-only PTEs, > >> creating a potential software portability issue. Just remove the check > >> altogether and let PROT_WRITE imply PROT_READ as is the case on other > >> architectures. > >> > >> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were > >> disallowed prior to the aforementioned commit; PROT_READ is implied in > >> such mappings as well. > >> > >> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") > >> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> > >> --- > >> v1 -> v2: Update access_error() to account for write-implies-read > >> --- > >> arch/riscv/kernel/sys_riscv.c | 3 --- > >> arch/riscv/mm/fault.c | 3 ++- > >> 2 files changed, 2 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c > >> index 571556bb9261..5d3f2fbeb33c 100644 > >> --- a/arch/riscv/kernel/sys_riscv.c > >> +++ b/arch/riscv/kernel/sys_riscv.c > >> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, > >> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) > >> return -EINVAL; > >> > >> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) > >> - return -EINVAL; > >> - > >> return ksys_mmap_pgoff(addr, len, prot, flags, fd, > >> offset >> (PAGE_SHIFT - page_shift_offset)); > >> } > >> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > >> index f2fbd1400b7c..d86f7cebd4a7 100644 > >> --- a/arch/riscv/mm/fault.c > >> +++ b/arch/riscv/mm/fault.c > >> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma) > >> } > >> break; > >> case EXC_LOAD_PAGE_FAULT: > >> - if (!(vma->vm_flags & VM_READ)) { > >> + /* Write implies read */ > >> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) { > >> return true; > >> } > >> break; > > > > Hi, this did solve the problem and achieved consistency between > > architectures, but I have a question. > > > > Such a change specifies behavior for a state that should not exist, > > and if, in the future, RISC-V spec specifies a different behavior > > for that state (I mean, RVI itself has a history of not caring about > > downstream, like Zicsr and Zifencei), it will create inconsistencies, > > which is bad. > > > > If we reject the "write but not read" state, the user gets the most direct > > response: the state is not allowed so that they do not and cannot rely > > on the behavior of the state. This will bring better time consistency > > to the application if the spec specifies the behavior in the future. > > But it lost architecture consistency. > > > > How do you think this situation should be handled properly? > > > > Yours, > > Celeste Liu > > Oops! > > I found a mistake in my previous understanding: PTE permission!=vma permission. > So your modification makes sense, no matter how we handle the mapping of input > permissions to PTEs, as long as we don't use the reserved permission combinations, > the behavior is reasonable and also independent of the architecture's definition > of PTEs. > > But I think this mapping relationship should be well documented. If we have > such a mapping behavior in all architectures, then we should change this line in > the mmap documentation > On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ. > to apply all architectures. According to my read about code, all the vm_get_page_prot > will do the protection_map mapping to have this feature. I think leaving the PROT_WRITE-implies-PROT_READ as being specified as architecture-dependent is reasonable, but of course portable programs shouldn't rely on this behavior. There are CPUs out there that support write-only mappings -- MIPS with RI/XI comes to mind and indeed mmap(PROT_WRITE) on such CPUs results in write-only mappings. -Andrew > > Yours, > Celeste Liu
On 2022/9/9 23:16, Andrew Bresticker wrote> > I think leaving the PROT_WRITE-implies-PROT_READ as being specified as > architecture-dependent is reasonable, but of course portable programs > shouldn't rely on this behavior. There are CPUs out there that support > write-only mappings -- MIPS with RI/XI comes to mind and indeed > mmap(PROT_WRITE) on such CPUs results in write-only mappings. > > -Andrew > Ok, I have no question now. This patch looks good to me. This feature shouldn't be relied upon indeed, as it depends on the specific hardware implementation. Thanks for your explanation! Yours, Celeste Liu
On Thu, Sep 8, 2022 at 11:50 AM Andrew Bresticker <abrestic@rivosinc.com> wrote: > > Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is > invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout > PROT_READ with the justification that a write-only PTE is considered a > reserved PTE permission bit pattern in the privileged spec. This check > is unnecessary since RISC-V defines its protection_map such that PROT_WRITE > maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is > inconsistent with other architectures that don't support write-only PTEs, > creating a potential software portability issue. Just remove the check > altogether and let PROT_WRITE imply PROT_READ as is the case on other > architectures. > > Note that this also allows PROT_WRITE|PROT_EXEC mappings which were > disallowed prior to the aforementioned commit; PROT_READ is implied in > such mappings as well. > > Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") > Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> > --- > v1 -> v2: Update access_error() to account for write-implies-read > --- > arch/riscv/kernel/sys_riscv.c | 3 --- > arch/riscv/mm/fault.c | 3 ++- > 2 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c > index 571556bb9261..5d3f2fbeb33c 100644 > --- a/arch/riscv/kernel/sys_riscv.c > +++ b/arch/riscv/kernel/sys_riscv.c > @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, > if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) > return -EINVAL; > > - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) > - return -EINVAL; > - > return ksys_mmap_pgoff(addr, len, prot, flags, fd, > offset >> (PAGE_SHIFT - page_shift_offset)); > } > diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > index f2fbd1400b7c..d86f7cebd4a7 100644 > --- a/arch/riscv/mm/fault.c > +++ b/arch/riscv/mm/fault.c > @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma) > } > break; > case EXC_LOAD_PAGE_FAULT: > - if (!(vma->vm_flags & VM_READ)) { > + /* Write implies read */ > + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) { > return true; > } > break; This should be a separate patch with commit text about VMA permissions. > -- > 2.25.1 > Otherwise, lgtm. Reviewed-by: Atish Patra <atishp@rivosinc.com>
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index 571556bb9261..5d3f2fbeb33c 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL; - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) - return -EINVAL; - return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset)); } diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c index f2fbd1400b7c..d86f7cebd4a7 100644 --- a/arch/riscv/mm/fault.c +++ b/arch/riscv/mm/fault.c @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma) } break; case EXC_LOAD_PAGE_FAULT: - if (!(vma->vm_flags & VM_READ)) { + /* Write implies read */ + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) { return true; } break;
Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout PROT_READ with the justification that a write-only PTE is considered a reserved PTE permission bit pattern in the privileged spec. This check is unnecessary since RISC-V defines its protection_map such that PROT_WRITE maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is inconsistent with other architectures that don't support write-only PTEs, creating a potential software portability issue. Just remove the check altogether and let PROT_WRITE imply PROT_READ as is the case on other architectures. Note that this also allows PROT_WRITE|PROT_EXEC mappings which were disallowed prior to the aforementioned commit; PROT_READ is implied in such mappings as well. Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> --- v1 -> v2: Update access_error() to account for write-implies-read --- arch/riscv/kernel/sys_riscv.c | 3 --- arch/riscv/mm/fault.c | 3 ++- 2 files changed, 2 insertions(+), 4 deletions(-)