[v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

Message ID	20220908185006.1212126-1-abrestic@rivosinc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org> From: Andrew Bresticker <abrestic@rivosinc.com> To: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com>, Atish Patra <atishp@atishpatra.org>, Celeste Liu <coelacanthus@outlook.com>, dram <dramforever@live.com>, Ruizhe Pan <c141028@gmail.com>, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Andrew Bresticker <abrestic@rivosinc.com> Subject: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ Date: Thu, 8 Sep 2022 14:50:06 -0400 Message-Id: <20220908185006.1212126-1-abrestic@rivosinc.com> In-Reply-To: <20220908170133.1159747-1-abrestic@rivosinc.com> References: <20220908170133.1159747-1-abrestic@rivosinc.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	[v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ \| expand [v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

Andrew Bresticker Sept. 8, 2022, 6:50 p.m. UTC

Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
PROT_READ with the justification that a write-only PTE is considered a
reserved PTE permission bit pattern in the privileged spec. This check
is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
inconsistent with other architectures that don't support write-only PTEs,
creating a potential software portability issue. Just remove the check
altogether and let PROT_WRITE imply PROT_READ as is the case on other
architectures.

Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
disallowed prior to the aforementioned commit; PROT_READ is implied in
such mappings as well.

Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
---
v1 -> v2: Update access_error() to account for write-implies-read
---
 arch/riscv/kernel/sys_riscv.c | 3 ---
 arch/riscv/mm/fault.c         | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

SS JieJi Sept. 8, 2022, 6:56 p.m. UTC | #1

The v2 patch looks great,
> -       if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> -               return -EINVAL;
> -
This also removes the check for --x pages, which used to be present in
previous versions (before the submission of the to-be-reverted patch).
Is this intended? Thanks!

Andrew Bresticker Sept. 8, 2022, 7:18 p.m. UTC | #2

On Thu, Sep 8, 2022 at 2:57 PM SS JieJi <c141028@gmail.com> wrote:
>
> The v2 patch looks great,
> > -       if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> > -               return -EINVAL;
> > -
> This also removes the check for --x pages, which used to be present in
> previous versions (before the submission of the to-be-reverted patch).
> Is this intended? Thanks!

There's no change in behavior for --X mappings; those have always been
allowed as it's a valid set of PTE permissions.

This patch does allow -WX mappings, which were originally disallowed
in commit e0d17c842c0f ("RISC-V: Don't allow write+exec only page
mapping request in mmap"), by implying read permissions for such
mappings as well. I have a note in the commit message about this.

-Andrew

Celeste Liu Sept. 9, 2022, 3:01 a.m. UTC | #3

On 2022/9/9 02:50, Andrew Bresticker wrote:
> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> PROT_READ with the justification that a write-only PTE is considered a
> reserved PTE permission bit pattern in the privileged spec. This check
> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> inconsistent with other architectures that don't support write-only PTEs,
> creating a potential software portability issue. Just remove the check
> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> architectures.
> 
> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> disallowed prior to the aforementioned commit; PROT_READ is implied in
> such mappings as well.
> 
> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
> ---
> v1 -> v2: Update access_error() to account for write-implies-read
> ---
>  arch/riscv/kernel/sys_riscv.c | 3 ---
>  arch/riscv/mm/fault.c         | 3 ++-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 571556bb9261..5d3f2fbeb33c 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
>  	if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
>  		return -EINVAL;
>  
> -	if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> -		return -EINVAL;
> -
>  	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
>  			       offset >> (PAGE_SHIFT - page_shift_offset));
>  }
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index f2fbd1400b7c..d86f7cebd4a7 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
>  		}
>  		break;
>  	case EXC_LOAD_PAGE_FAULT:
> -		if (!(vma->vm_flags & VM_READ)) {
> +		/* Write implies read */
> +		if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
>  			return true;
>  		}
>  		break;

Hi, this did solve the problem and achieved consistency between
architectures, but I have a question.

Such a change specifies behavior for a state that should not exist,
and if, in the future, RISC-V spec specifies a different behavior
for that state (I mean, RVI itself has a history of not caring about
downstream, like Zicsr and Zifencei), it will create inconsistencies,
which is bad.

If we reject the "write but not read" state, the user gets the most direct
response: the state is not allowed so that they do not and cannot rely
on the behavior of the state. This will bring better time consistency
to the application if the spec specifies the behavior in the future.
But it lost architecture consistency.

How do you think this situation should be handled properly?

Yours,
Celeste Liu

Celeste Liu Sept. 9, 2022, 11:42 a.m. UTC | #4

On 2022/9/9 11:01, Celeste Liu wrote:
> On 2022/9/9 02:50, Andrew Bresticker wrote:
>> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
>> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
>> PROT_READ with the justification that a write-only PTE is considered a
>> reserved PTE permission bit pattern in the privileged spec. This check
>> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
>> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
>> inconsistent with other architectures that don't support write-only PTEs,
>> creating a potential software portability issue. Just remove the check
>> altogether and let PROT_WRITE imply PROT_READ as is the case on other
>> architectures.
>>
>> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
>> disallowed prior to the aforementioned commit; PROT_READ is implied in
>> such mappings as well.
>>
>> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
>> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
>> ---
>> v1 -> v2: Update access_error() to account for write-implies-read
>> ---
>>  arch/riscv/kernel/sys_riscv.c | 3 ---
>>  arch/riscv/mm/fault.c         | 3 ++-
>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
>> index 571556bb9261..5d3f2fbeb33c 100644
>> --- a/arch/riscv/kernel/sys_riscv.c
>> +++ b/arch/riscv/kernel/sys_riscv.c
>> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
>>  	if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
>>  		return -EINVAL;
>>  
>> -	if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
>> -		return -EINVAL;
>> -
>>  	return ksys_mmap_pgoff(addr, len, prot, flags, fd,
>>  			       offset >> (PAGE_SHIFT - page_shift_offset));
>>  }
>> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
>> index f2fbd1400b7c..d86f7cebd4a7 100644
>> --- a/arch/riscv/mm/fault.c
>> +++ b/arch/riscv/mm/fault.c
>> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
>>  		}
>>  		break;
>>  	case EXC_LOAD_PAGE_FAULT:
>> -		if (!(vma->vm_flags & VM_READ)) {
>> +		/* Write implies read */
>> +		if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
>>  			return true;
>>  		}
>>  		break;
> 
> Hi, this did solve the problem and achieved consistency between
> architectures, but I have a question.
> 
> Such a change specifies behavior for a state that should not exist,
> and if, in the future, RISC-V spec specifies a different behavior
> for that state (I mean, RVI itself has a history of not caring about
> downstream, like Zicsr and Zifencei), it will create inconsistencies,
> which is bad.
> 
> If we reject the "write but not read" state, the user gets the most direct
> response: the state is not allowed so that they do not and cannot rely
> on the behavior of the state. This will bring better time consistency
> to the application if the spec specifies the behavior in the future.
> But it lost architecture consistency.
> 
> How do you think this situation should be handled properly?
> 
> Yours,
> Celeste Liu

Oops!

I found a mistake in my previous understanding: PTE permission!=vma permission.
So your modification makes sense, no matter how we handle the mapping of input
permissions to PTEs, as long as we don't use the reserved permission combinations,
the behavior is reasonable and also independent of the architecture's definition
of PTEs.

But I think this mapping relationship should be well documented. If we have
such a mapping behavior in all architectures, then we should change this line in
the mmap documentation
    On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ.
to apply all architectures. According to my read about code, all the vm_get_page_prot
will do the protection_map mapping to have this feature.

Yours,
Celeste Liu

Andrew Bresticker Sept. 9, 2022, 3:16 p.m. UTC | #5

On Fri, Sep 9, 2022 at 7:42 AM Coelacanthus <coelacanthushex@gmail.com> wrote:
>
> On 2022/9/9 11:01, Celeste Liu wrote:
> > On 2022/9/9 02:50, Andrew Bresticker wrote:
> >> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> >> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> >> PROT_READ with the justification that a write-only PTE is considered a
> >> reserved PTE permission bit pattern in the privileged spec. This check
> >> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> >> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> >> inconsistent with other architectures that don't support write-only PTEs,
> >> creating a potential software portability issue. Just remove the check
> >> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> >> architectures.
> >>
> >> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> >> disallowed prior to the aforementioned commit; PROT_READ is implied in
> >> such mappings as well.
> >>
> >> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> >> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
> >> ---
> >> v1 -> v2: Update access_error() to account for write-implies-read
> >> ---
> >>  arch/riscv/kernel/sys_riscv.c | 3 ---
> >>  arch/riscv/mm/fault.c         | 3 ++-
> >>  2 files changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> >> index 571556bb9261..5d3f2fbeb33c 100644
> >> --- a/arch/riscv/kernel/sys_riscv.c
> >> +++ b/arch/riscv/kernel/sys_riscv.c
> >> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> >>      if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
> >>              return -EINVAL;
> >>
> >> -    if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> >> -            return -EINVAL;
> >> -
> >>      return ksys_mmap_pgoff(addr, len, prot, flags, fd,
> >>                             offset >> (PAGE_SHIFT - page_shift_offset));
> >>  }
> >> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> >> index f2fbd1400b7c..d86f7cebd4a7 100644
> >> --- a/arch/riscv/mm/fault.c
> >> +++ b/arch/riscv/mm/fault.c
> >> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
> >>              }
> >>              break;
> >>      case EXC_LOAD_PAGE_FAULT:
> >> -            if (!(vma->vm_flags & VM_READ)) {
> >> +            /* Write implies read */
> >> +            if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
> >>                      return true;
> >>              }
> >>              break;
> >
> > Hi, this did solve the problem and achieved consistency between
> > architectures, but I have a question.
> >
> > Such a change specifies behavior for a state that should not exist,
> > and if, in the future, RISC-V spec specifies a different behavior
> > for that state (I mean, RVI itself has a history of not caring about
> > downstream, like Zicsr and Zifencei), it will create inconsistencies,
> > which is bad.
> >
> > If we reject the "write but not read" state, the user gets the most direct
> > response: the state is not allowed so that they do not and cannot rely
> > on the behavior of the state. This will bring better time consistency
> > to the application if the spec specifies the behavior in the future.
> > But it lost architecture consistency.
> >
> > How do you think this situation should be handled properly?
> >
> > Yours,
> > Celeste Liu
>
> Oops!
>
> I found a mistake in my previous understanding: PTE permission!=vma permission.
> So your modification makes sense, no matter how we handle the mapping of input
> permissions to PTEs, as long as we don't use the reserved permission combinations,
> the behavior is reasonable and also independent of the architecture's definition
> of PTEs.
>
> But I think this mapping relationship should be well documented. If we have
> such a mapping behavior in all architectures, then we should change this line in
> the mmap documentation
>     On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ.
> to apply all architectures. According to my read about code, all the vm_get_page_prot
> will do the protection_map mapping to have this feature.

I think leaving the PROT_WRITE-implies-PROT_READ as being specified as
architecture-dependent is reasonable, but of course portable programs
shouldn't rely on this behavior. There are CPUs out there that support
write-only mappings -- MIPS with RI/XI comes to mind and indeed
mmap(PROT_WRITE) on such CPUs results in write-only mappings.

-Andrew

>
> Yours,
> Celeste Liu

Celeste Liu Sept. 9, 2022, 3:45 p.m. UTC | #6

On 2022/9/9 23:16, Andrew Bresticker wrote> 
> I think leaving the PROT_WRITE-implies-PROT_READ as being specified as
> architecture-dependent is reasonable, but of course portable programs
> shouldn't rely on this behavior. There are CPUs out there that support
> write-only mappings -- MIPS with RI/XI comes to mind and indeed
> mmap(PROT_WRITE) on such CPUs results in write-only mappings.
> 
> -Andrew
> 

Ok, I have no question now. This patch looks good to me.

This feature shouldn't be relied upon indeed, as it depends on the specific
hardware implementation.

Thanks for your explanation!

Yours,
Celeste Liu

Atish Patra Sept. 9, 2022, 6:52 p.m. UTC | #7

On Thu, Sep 8, 2022 at 11:50 AM Andrew Bresticker <abrestic@rivosinc.com> wrote:
>
> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> PROT_READ with the justification that a write-only PTE is considered a
> reserved PTE permission bit pattern in the privileged spec. This check
> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> inconsistent with other architectures that don't support write-only PTEs,
> creating a potential software portability issue. Just remove the check
> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> architectures.
>
> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> disallowed prior to the aforementioned commit; PROT_READ is implied in
> such mappings as well.
>
> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
> ---
> v1 -> v2: Update access_error() to account for write-implies-read
> ---
>  arch/riscv/kernel/sys_riscv.c | 3 ---
>  arch/riscv/mm/fault.c         | 3 ++-
>  2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 571556bb9261..5d3f2fbeb33c 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
>         if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
>                 return -EINVAL;
>
> -       if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> -               return -EINVAL;
> -
>         return ksys_mmap_pgoff(addr, len, prot, flags, fd,
>                                offset >> (PAGE_SHIFT - page_shift_offset));
>  }
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index f2fbd1400b7c..d86f7cebd4a7 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
>                 }
>                 break;
>         case EXC_LOAD_PAGE_FAULT:
> -               if (!(vma->vm_flags & VM_READ)) {
> +               /* Write implies read */
> +               if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
>                         return true;
>                 }
>                 break;

This should be a separate patch with commit text about VMA permissions.

> --
> 2.25.1
>

Otherwise, lgtm.

Reviewed-by: Atish Patra <atishp@rivosinc.com>

[v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

Commit Message

Comments

Patch