diff mbox

[v6,12/20] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

Message ID 1450215766-14765-13-git-send-email-ynorov@caviumnetworks.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yury Norov Dec. 15, 2015, 9:42 p.m. UTC
From: Andrew Pinski <apinski@cavium.com>

Add a separate syscall-table for ILP32, which dispatches either to native
LP64 system call implementation or to compat-syscalls, as appropriate.

Reviewed-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Pinski <Andrew.Pinski@caviumnetworks.com>
---
 arch/arm64/include/asm/unistd.h |  6 ++++
 arch/arm64/kernel/Makefile      |  1 +
 arch/arm64/kernel/entry.S       | 12 +++++++-
 arch/arm64/kernel/sys_ilp32.c   | 65 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/sys_ilp32.c

Comments

Arnd Bergmann Dec. 16, 2015, 4:07 p.m. UTC | #1
On Wednesday 16 December 2015 00:42:38 Yury Norov wrote:
> +/* Using non-compat syscalls where necessary */
> +#define compat_sys_fadvise64_64        sys_fadvise64_64
> +#define compat_sys_fallocate           sys_fallocate
> +#define compat_sys_ftruncate64         sys_ftruncate
> +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> +#define compat_sys_pread64             sys_pread64
> +#define compat_sys_pwrite64            sys_pwrite64
> +#define compat_sys_readahead           sys_readahead
> +#define compat_sys_shmat               sys_shmat
> +#define compat_sys_sigaltstack         sys_sigaltstack
> +#define compat_sys_sync_file_range     sys_sync_file_range
> +#define compat_sys_truncate64          sys_truncate
> +#define sys_llseek                     sys_lseek
> +
> +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> +#define compat_sys_openat              sys_openat

Very nice and short list now!

I've double-checked all calls and the only one I'm not sure about is
sys_sigaltstack. Did we discuss that one earlier?
My first guess would be that it's easier to use the compat
version of that.

I would probably also group sys_shmat with sys_openat, because it's
different from the other ones in the first block as this is not
about passing 64-bit arguments as registers, but instead it's about
the behavior of the system call.

	Arnd
Catalin Marinas Dec. 17, 2015, 6:27 p.m. UTC | #2
On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:
> +/* Using non-compat syscalls where necessary */
> +#define compat_sys_fadvise64_64        sys_fadvise64_64
> +#define compat_sys_fallocate           sys_fallocate
> +#define compat_sys_ftruncate64         sys_ftruncate

I initially thought this should be sys_ftruncate64 (or a wrapper to pass
small == 0) but we rely on sys_openat to set O_LARGEFILE.

arch/arm has ftruncate and ftruncate64, but it looks like we route both
via sys_ftruncate(). The difference is the "small" argument which
imposes a limit on the length without O_LARGEFILE, so we may have a bug
here.

> +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> +#define compat_sys_pread64             sys_pread64
> +#define compat_sys_pwrite64            sys_pwrite64
> +#define compat_sys_readahead           sys_readahead
> +#define compat_sys_shmat               sys_shmat

I wonder whether we need wrappers (actually, not only for these but
sys_read etc.). These functions take either a pointer or a size_t
argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
kernel. Can we guarantee that user space zeros the top 32-bit of the
arguments passed here?

With compat/AArch32, this is guaranteed by the kernel since EL0 won't be
able to touch the top part but here I'm not entirely sure. As long as
user space used Wn registers for 32-bit types, we are probably fine (the
architecture guarantees the top 32-bit zeroing following a MOV, LDR etc.
instruction into a Wn register). We just need to mention this in the ABI
document (ilp32.txt).

> +#define compat_sys_sigaltstack         sys_sigaltstack

I think Arnd is right here in using the compat function. The stack_t
would differ between LP64 and ILP32. compat_sys_sigaltstack() uses
compat_user_stack_pointer() but this should work correctly as it checks
pt_regs for the right mode.

> +#define compat_sys_sync_file_range     sys_sync_file_range
> +#define compat_sys_truncate64          sys_truncate
> +#define sys_llseek                     sys_lseek

I think this makes sense since we have 64-bit registers.

> +
> +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> +#define compat_sys_openat              sys_openat

So using sys_openat() forces O_LARGEFILE and we don't have a problem
with (f)truncate. We may have an issue with AArch32 compat though.
Arnd Bergmann Dec. 17, 2015, 8:10 p.m. UTC | #3
On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote:
> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:

> > +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> > +#define compat_sys_pread64             sys_pread64
> > +#define compat_sys_pwrite64            sys_pwrite64
> > +#define compat_sys_readahead           sys_readahead
> > +#define compat_sys_shmat               sys_shmat
> 
> I wonder whether we need wrappers (actually, not only for these but
> sys_read etc.). These functions take either a pointer or a size_t
> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
> kernel. Can we guarantee that user space zeros the top 32-bit of the
> arguments passed here?

I'm pretty sure that is safe. I haven't read the calling conventions
specification for arm64 ilp32, but usually all function arguments are
passed as 64-bit registers with proper sign-extend or zero-extend.

Most other syscalls rely on this behavior too, not just the ones that
are being modified here.

> With compat/AArch32, this is guaranteed by the kernel since EL0 won't be
> able to touch the top part but here I'm not entirely sure. As long as
> user space used Wn registers for 32-bit types, we are probably fine (the
> architecture guarantees the top 32-bit zeroing following a MOV, LDR etc.
> instruction into a Wn register). We just need to mention this in the ABI
> document (ilp32.txt).

I think the aarch32 case is actually the hard one, because it has to
worry about explicitly sign-extending 32-bit arguments (signed int or
signed long) that might be negative, e.g. user space passes -1 
as 0xffffffff, which the kernel entry turns into 0x00000000ffffffff
when it should use 0xffffffffffffffff. The COMPAT_SYSCALL_DEFINEx
macros take care of this.

> > +
> > +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> > +#define compat_sys_openat              sys_openat
> 
> So using sys_openat() forces O_LARGEFILE and we don't have a problem
> with (f)truncate. We may have an issue with AArch32 compat though.

aarch32 uses the correct compat functions in asm/unistd32.h

	Arnd
Andrew Pinski Dec. 17, 2015, 8:14 p.m. UTC | #4
On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote:
>> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:
>
>> > +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
>> > +#define compat_sys_pread64             sys_pread64
>> > +#define compat_sys_pwrite64            sys_pwrite64
>> > +#define compat_sys_readahead           sys_readahead
>> > +#define compat_sys_shmat               sys_shmat
>>
>> I wonder whether we need wrappers (actually, not only for these but
>> sys_read etc.). These functions take either a pointer or a size_t
>> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
>> kernel. Can we guarantee that user space zeros the top 32-bit of the
>> arguments passed here?
>
> I'm pretty sure that is safe. I haven't read the calling conventions
> specification for arm64 ilp32, but usually all function arguments are
> passed as 64-bit registers with proper sign-extend or zero-extend.

Well (just like LP64 on AARCH64), when passing a 32bit value to a
function, the upper 32bits are undefined.  I ran into this when I was
debugging the GCC go library on ILP32 (though reproduced with pure C
code) and the assembly functions inside glibc where pointers are
passed with the upper 32bits as undefined.
So we have an issue if called with syscall function or using pure
assembly to create the syscall functions (which glibc does).

Thanks,
Andrew

>
> Most other syscalls rely on this behavior too, not just the ones that
> are being modified here.
>
>> With compat/AArch32, this is guaranteed by the kernel since EL0 won't be
>> able to touch the top part but here I'm not entirely sure. As long as
>> user space used Wn registers for 32-bit types, we are probably fine (the
>> architecture guarantees the top 32-bit zeroing following a MOV, LDR etc.
>> instruction into a Wn register). We just need to mention this in the ABI
>> document (ilp32.txt).
>
> I think the aarch32 case is actually the hard one, because it has to
> worry about explicitly sign-extending 32-bit arguments (signed int or
> signed long) that might be negative, e.g. user space passes -1
> as 0xffffffff, which the kernel entry turns into 0x00000000ffffffff
> when it should use 0xffffffffffffffff. The COMPAT_SYSCALL_DEFINEx
> macros take care of this.
>
>> > +
>> > +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
>> > +#define compat_sys_openat              sys_openat
>>
>> So using sys_openat() forces O_LARGEFILE and we don't have a problem
>> with (f)truncate. We may have an issue with AArch32 compat though.
>
> aarch32 uses the correct compat functions in asm/unistd32.h
>
>         Arnd
Arnd Bergmann Dec. 17, 2015, 8:50 p.m. UTC | #5
On Thursday 17 December 2015 12:14:20 Andrew Pinski wrote:
> On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote:
> >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:
> >
> >> > +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> >> > +#define compat_sys_pread64             sys_pread64
> >> > +#define compat_sys_pwrite64            sys_pwrite64
> >> > +#define compat_sys_readahead           sys_readahead
> >> > +#define compat_sys_shmat               sys_shmat
> >>
> >> I wonder whether we need wrappers (actually, not only for these but
> >> sys_read etc.). These functions take either a pointer or a size_t
> >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
> >> kernel. Can we guarantee that user space zeros the top 32-bit of the
> >> arguments passed here?
> >
> > I'm pretty sure that is safe. I haven't read the calling conventions
> > specification for arm64 ilp32, but usually all function arguments are
> > passed as 64-bit registers with proper sign-extend or zero-extend.
> 
> Well (just like LP64 on AARCH64), when passing a 32bit value to a
> function, the upper 32bits are undefined.  I ran into this when I was
> debugging the GCC go library on ILP32 (though reproduced with pure C
> code) and the assembly functions inside glibc where pointers are
> passed with the upper 32bits as undefined.
> So we have an issue if called with syscall function or using pure
> assembly to create the syscall functions (which glibc does).

Ok, I see :-(

So the calling conventions avoid the problem of being able to set
the upper bits from malicious user space when the kernel assumes they
are zeroed out (we had security bugs in this area, before we introduced
SYSCALL_DEFINEx()), but it means that we need wrappers around each
syscall that takes an argument that is different length between user
and kernel space (as Catalin guessed). arch/s390 has the same problem and
works around it with code in arch/s390/kernel/compat_wrapper.c, while
other architectures (at least powerpc, x86 and tile IIRC, don't know much
about mips, parisc and sparc) don't have the problem because of their
calling conventions.

This also means that we cannot work around it in glibc at all, because
we have to be able to handle malicious user space, so it has to be
done in the kernel using something similar to what s390 does.

	Arnd
Catalin Marinas Dec. 18, 2015, 11:42 a.m. UTC | #6
(cc'ing Marcus for more insight on the tools side)

On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote:
> On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote:
> >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:
> >
> >> > +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> >> > +#define compat_sys_pread64             sys_pread64
> >> > +#define compat_sys_pwrite64            sys_pwrite64
> >> > +#define compat_sys_readahead           sys_readahead
> >> > +#define compat_sys_shmat               sys_shmat
> >>
> >> I wonder whether we need wrappers (actually, not only for these but
> >> sys_read etc.). These functions take either a pointer or a size_t
> >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
> >> kernel. Can we guarantee that user space zeros the top 32-bit of the
> >> arguments passed here?
> >
> > I'm pretty sure that is safe. I haven't read the calling conventions
> > specification for arm64 ilp32, but usually all function arguments are
> > passed as 64-bit registers with proper sign-extend or zero-extend.
> 
> Well (just like LP64 on AARCH64), when passing a 32bit value to a
> function, the upper 32bits are undefined.  I ran into this when I was
> debugging the GCC go library on ILP32 (though reproduced with pure C
> code) and the assembly functions inside glibc where pointers are
> passed with the upper 32bits as undefined.
> So we have an issue if called with syscall function or using pure
> assembly to create the syscall functions (which glibc does).

I think the ILP32 syscall ABI should follow the PCS convention where the
top 32-bit of a register is not guaranteed 0 when the size of the
argument is 32-bit. So take the read(2) syscall:

	ssize_t read(int fd, void *buf, size_t count);

From the ILP32 code perspective, void * and size_t are both 32-bit. It
would call into the kernel leaving the top 32-bit as undefined (if we
follow the PCS). Normally, calling a function with the same size
arguments is not a problem since the compiler generates the callee code
accordingly. However, we route the syscall directly into the native
sys_read() where void * and size_t are 64-bit with the top 32-bit left
undefined.

We have three options here:

1. Always follow PCS convention across user/kernel call and add wrappers
   in the kernel (preferred)

2. Follow the PCS up to glibc and get glibc to zero the top part (not
   always safe with hand-written assembly, though we already do this for
   AArch32 where the PCS only specifies 4 arguments in registers, the
   rest go on the stack)

3. Follow the PCS up to glibc but always pass syscall arguments in W
   registers, like AArch32 compat support (the least preferred option,
   the only advantage is a single wrapper for all syscalls but it would
   be doing unnecessary zeroing even for syscalls where it isn't needed)

My preference, as stated above, is (1). You can write the wrappers in C
directly and let the compiler upgrade the types when calling the native
syscall. But any other option would be fine (take some inspiration from
other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE
for all functions that we need to wrap, it would have been easier (so we
need to add them but probably in the arch/arm64 code).
Arnd Bergmann Dec. 18, 2015, 12:47 p.m. UTC | #7
On Friday 18 December 2015 11:42:19 Catalin Marinas wrote:
> On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote:
> > Well (just like LP64 on AARCH64), when passing a 32bit value to a
> > function, the upper 32bits are undefined.  I ran into this when I was
> > debugging the GCC go library on ILP32 (though reproduced with pure C
> > code) and the assembly functions inside glibc where pointers are
> > passed with the upper 32bits as undefined.
> > So we have an issue if called with syscall function or using pure
> > assembly to create the syscall functions (which glibc does).
> 
> I think the ILP32 syscall ABI should follow the PCS convention where the
> top 32-bit of a register is not guaranteed 0 when the size of the
> argument is 32-bit. So take the read(2) syscall:
> 
>         ssize_t read(int fd, void *buf, size_t count);
> 
> From the ILP32 code perspective, void * and size_t are both 32-bit. It
> would call into the kernel leaving the top 32-bit as undefined (if we
> follow the PCS). Normally, calling a function with the same size
> arguments is not a problem since the compiler generates the callee code
> accordingly. However, we route the syscall directly into the native
> sys_read() where void * and size_t are 64-bit with the top 32-bit left
> undefined.
> 
> We have three options here:
> 
> 1. Always follow PCS convention across user/kernel call and add wrappers
>    in the kernel (preferred)

Yes, I also think this is best.

> 2. Follow the PCS up to glibc and get glibc to zero the top part (not
>    always safe with hand-written assembly, though we already do this for
>    AArch32 where the PCS only specifies 4 arguments in registers, the
>    rest go on the stack)

I assume this needs special handling for syscalls with 64-bit arguments
in both glibc and kernel.

> 3. Follow the PCS up to glibc but always pass syscall arguments in W
>    registers, like AArch32 compat support (the least preferred option,
>    the only advantage is a single wrapper for all syscalls but it would
>    be doing unnecessary zeroing even for syscalls where it isn't needed)

This would mean we cannot pass 64-bit arguments in registers, right?
 
> My preference, as stated above, is (1). You can write the wrappers in C
> directly and let the compiler upgrade the types when calling the native
> syscall. But any other option would be fine (take some inspiration from
> other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE
> for all functions that we need to wrap, it would have been easier (so we
> need to add them but probably in the arch/arm64 code).

It would be nice to have that code architecture-independent, so we can
share it with s390 and only need to update one place when new syscalls
get added.

	Arnd
Catalin Marinas Dec. 21, 2015, 6:31 p.m. UTC | #8
On Fri, Dec 18, 2015 at 01:47:55PM +0100, Arnd Bergmann wrote:
> On Friday 18 December 2015 11:42:19 Catalin Marinas wrote:
> > On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote:
> > > Well (just like LP64 on AARCH64), when passing a 32bit value to a
> > > function, the upper 32bits are undefined.  I ran into this when I was
> > > debugging the GCC go library on ILP32 (though reproduced with pure C
> > > code) and the assembly functions inside glibc where pointers are
> > > passed with the upper 32bits as undefined.
> > > So we have an issue if called with syscall function or using pure
> > > assembly to create the syscall functions (which glibc does).
> > 
> > I think the ILP32 syscall ABI should follow the PCS convention where the
> > top 32-bit of a register is not guaranteed 0 when the size of the
> > argument is 32-bit. So take the read(2) syscall:
> > 
> >         ssize_t read(int fd, void *buf, size_t count);
> > 
> > From the ILP32 code perspective, void * and size_t are both 32-bit. It
> > would call into the kernel leaving the top 32-bit as undefined (if we
> > follow the PCS). Normally, calling a function with the same size
> > arguments is not a problem since the compiler generates the callee code
> > accordingly. However, we route the syscall directly into the native
> > sys_read() where void * and size_t are 64-bit with the top 32-bit left
> > undefined.
> > 
> > We have three options here:
> > 
> > 1. Always follow PCS convention across user/kernel call and add wrappers
> >    in the kernel (preferred)
> 
> Yes, I also think this is best.
> 
> > 2. Follow the PCS up to glibc and get glibc to zero the top part (not
> >    always safe with hand-written assembly, though we already do this for
> >    AArch32 where the PCS only specifies 4 arguments in registers, the
> >    rest go on the stack)
> 
> I assume this needs special handling for syscalls with 64-bit arguments
> in both glibc and kernel.

I think glibc only should suffice, if it is its responsibility to zero
the top 32-bit part.

> > 3. Follow the PCS up to glibc but always pass syscall arguments in W
> >    registers, like AArch32 compat support (the least preferred option,
> >    the only advantage is a single wrapper for all syscalls but it would
> >    be doing unnecessary zeroing even for syscalls where it isn't needed)
> 
> This would mean we cannot pass 64-bit arguments in registers, right?

Not in a single register but two (like we do on AArch32).

> > My preference, as stated above, is (1). You can write the wrappers in C
> > directly and let the compiler upgrade the types when calling the native
> > syscall. But any other option would be fine (take some inspiration from
> > other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE
> > for all functions that we need to wrap, it would have been easier (so we
> > need to add them but probably in the arch/arm64 code).
> 
> It would be nice to have that code architecture-independent, so we can
> share it with s390 and only need to update one place when new syscalls
> get added.

We could indeed move things like:

COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count)

to the core code and share them between s390 and arm64/ILP32. So let's
stick to option 1.
Philipp Tomsich Dec. 21, 2015, 6:39 p.m. UTC | #9
> On 18 Dec 2015, at 13:47, Arnd Bergmann <arnd@arndb.de> wrote:
> 
>> 3. Follow the PCS up to glibc but always pass syscall arguments in W
>>   registers, like AArch32 compat support (the least preferred option,
>>   the only advantage is a single wrapper for all syscalls but it would
>>   be doing unnecessary zeroing even for syscalls where it isn't needed)
> 
> This would mean we cannot pass 64-bit arguments in registers, right?

Note that there’s no 32bit registers (the ‘w’-form always refers to the lower
32bits of a 64bit register, with implicit zero-extension)… and load/store
instructions always use the full base-register (‘x’-form) for address calculation.
I.e. a load/store would inadvertently pickup “random garbage” in the upper 
32bits, if no explicit zero-extension is applied.

In other words: all zero-extensions for 32bit arguments should be explicit
on the kernel side.

Regards,
Philipp.
Arnd Bergmann Dec. 21, 2015, 10:13 p.m. UTC | #10
On Monday 21 December 2015, Dr. Philipp Tomsich wrote:
> 
> > On 18 Dec 2015, at 13:47, Arnd Bergmann <arnd@arndb.de> wrote:
> > 
> >> 3. Follow the PCS up to glibc but always pass syscall arguments in W
> >>   registers, like AArch32 compat support (the least preferred option,
> >>   the only advantage is a single wrapper for all syscalls but it would
> >>   be doing unnecessary zeroing even for syscalls where it isn't needed)
> > 
> > This would mean we cannot pass 64-bit arguments in registers, right?
> 
> Note that there’s no 32bit registers (the ‘w’-form always refers to the lower
> 32bits of a 64bit register, with implicit zero-extension)… and load/store
> instructions always use the full base-register (‘x’-form) for address calculation.
> I.e. a load/store would inadvertently pickup “random garbage” in the upper 
> 32bits, if no explicit zero-extension is applied.
> 
> In other words: all zero-extensions for 32bit arguments should be explicit
> on the kernel side.

I think that is what Catalin meant with the single wrapper in the description:
the kernel always zeroes out the upper 32 bits in the initial trap, and then
we only need to do sign-extensions for the few cases that need it, and pass 64-bit
arguments in two lower halves. In contrast, approach 1 adds a separate wrapper
for each syscall that takes care of both sign-extending where necessary and
zero-extending all othe 32-bit arguments but not the 64-bit arguments.

	Arnd
Arnd Bergmann Dec. 21, 2015, 10:19 p.m. UTC | #11
On Monday 21 December 2015, Catalin Marinas wrote:
> On Fri, Dec 18, 2015 at 01:47:55PM +0100, Arnd Bergmann wrote:
> > On Friday 18 December 2015 11:42:19 Catalin Marinas wrote:
> > > 2. Follow the PCS up to glibc and get glibc to zero the top part (not
> > >    always safe with hand-written assembly, though we already do this for
> > >    AArch32 where the PCS only specifies 4 arguments in registers, the
> > >    rest go on the stack)
> > 
> > I assume this needs special handling for syscalls with 64-bit arguments
> > in both glibc and kernel.
> 
> I think glibc only should suffice, if it is its responsibility to zero
> the top 32-bit part.

The kernel still needs to know about whether to call e.g. sys_llseek or
sys_lseek. The default syscall table contains llseek for 32-bit architectures,
but the current patch set uses lseek because that makes more sense when
you have 64-bit registers.

> > > 3. Follow the PCS up to glibc but always pass syscall arguments in W
> > >    registers, like AArch32 compat support (the least preferred option,
> > >    the only advantage is a single wrapper for all syscalls but it would
> > >    be doing unnecessary zeroing even for syscalls where it isn't needed)
> > 
> > This would mean we cannot pass 64-bit arguments in registers, right?
> 
> Not in a single register but two (like we do on AArch32).

Yes, that's what I mean. Essentially we'd use the unmodified 32-bit API
here.

> > > My preference, as stated above, is (1). You can write the wrappers in C
> > > directly and let the compiler upgrade the types when calling the native
> > > syscall. But any other option would be fine (take some inspiration from
> > > other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE
> > > for all functions that we need to wrap, it would have been easier (so we
> > > need to add them but probably in the arch/arm64 code).
> > 
> > It would be nice to have that code architecture-independent, so we can
> > share it with s390 and only need to update one place when new syscalls
> > get added.
> 
> We could indeed move things like:
> 
> COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count)
> 
> to the core code and share them between s390 and arm64/ILP32. So let's
> stick to option 1.

Ok.

	Arnd
Arnd Bergmann Dec. 21, 2015, 10:31 p.m. UTC | #12
On Tuesday 15 December 2015, Yury Norov wrote:
> +
> +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> +#define compat_sys_openat              sys_openat
> +

One more thing I just remembered: I think we want this behavior for all new
32-bit architectures, it was a bug to call compat_sys_openat for the generic
syscall table, as we don't support 32-bit off_t.

Could you split this out into a separate patch that does these changes:

- change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at
- change tile to override those two to keep the current (suboptimal) behavior
- change the force_o_largefile() definition so it defaults to true for all future
  architectures. The easiest way is probably to add a Kconfig symbol for this
  that gets selected by all 32-bit architectures, so we can use

#define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))

	Arnd
Yury Norov Dec. 23, 2015, 6:31 p.m. UTC | #13
On Mon, Dec 21, 2015 at 11:31:57PM +0100, Arnd Bergmann wrote:
> On Tuesday 15 December 2015, Yury Norov wrote:
> > +
> > +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> > +#define compat_sys_openat              sys_openat
> > +
> 
> One more thing I just remembered: I think we want this behavior for all new
> 32-bit architectures, it was a bug to call compat_sys_openat for the generic
> syscall table, as we don't support 32-bit off_t.
> 
> Could you split this out into a separate patch that does these changes:
> 
> - change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at
> - change tile to override those two to keep the current (suboptimal) behavior
> - change the force_o_largefile() definition so it defaults to true for all future
>   architectures. The easiest way is probably to add a Kconfig symbol for this
>   that gets selected by all 32-bit architectures, so we can use
> 
> #define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
> 

Hi Arnd,

First two items are OK. The last one... Do you need it to remove
compat_sys_openat and compat_sys_open_by_handle_at?

The patch that introduces CONFIG_ARCH_32BIT_OFF_T will affect all
architectures, and so I need to get ack from each maintainer. 

I will also have to ask them to test that change, because I have no
access to all the hardware. (Even with QEMU, I cannot test them all.)
I think the only man who is able to success with it is Linus :)...

Some arches has more than one compat mode. For example ARM64: aarch32
requires this config enabled, but ilp32 needs it disabled. Now we can
enable both features, but this will make them mutual exclusive. We can
instead (un)define __ARCH_WANT_32BIT_OFF_T for each ABI of each platform,
but it's even more work. And we'd think twice how to do it because
it's not mechanical work. For example, on aarch64 it will look like:

        #define __ARCH_WANT_32BIT_OFF_T is_a32_compat_task()

I have no idea how it will look on x86 or ppc.

In your previous email (Nov 18) you write that tile is the only user
of asm-generic/unistd.h that needs compat behaviour. If so, why not
just to turn it around, as you initially suggested, and fix tile. And
do nothing with force_o_largefile()?

By the way, is there a comprehensive list of linux platforms/abis, or
at least ones that use asm-generic/unistd.h?

BR,
Yury.

> 	Arnd
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Arnd Bergmann Dec. 23, 2015, 9:48 p.m. UTC | #14
On Wednesday 23 December 2015, Yury Norov wrote:
> On Mon, Dec 21, 2015 at 11:31:57PM +0100, Arnd Bergmann wrote:
> > On Tuesday 15 December 2015, Yury Norov wrote:
> > > +
> > > +#define compat_sys_open_by_handle_at   sys_open_by_handle_at
> > > +#define compat_sys_openat              sys_openat
> > > +
> > 
> > One more thing I just remembered: I think we want this behavior for all new
> > 32-bit architectures, it was a bug to call compat_sys_openat for the generic
> > syscall table, as we don't support 32-bit off_t.
> > 
> > Could you split this out into a separate patch that does these changes:
> > 
> > - change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at
> > - change tile to override those two to keep the current (suboptimal) behavior
> > - change the force_o_largefile() definition so it defaults to true for all future
> >   architectures. The easiest way is probably to add a Kconfig symbol for this
> >   that gets selected by all 32-bit architectures, so we can use
> > 
> > #define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
> > 
> 
> Hi Arnd,
> 
> First two items are OK. The last one... Do you need it to remove
> compat_sys_openat and compat_sys_open_by_handle_at?

It's needed so all future architectures get it right.

> The patch that introduces CONFIG_ARCH_32BIT_OFF_T will affect all
> architectures, and so I need to get ack from each maintainer. 

Just post it to the linux-arch and linux-kernel mailing lists.
You are not actually changing behavior, so my Ack should be sufficient
here, as long as we make sure we handle all architectures the same
way.

> I will also have to ask them to test that change, because I have no
> access to all the hardware. (Even with QEMU, I cannot test them all.)
> I think the only man who is able to success with it is Linus :)...
> 
> Some arches has more than one compat mode. For example ARM64: aarch32
> requires this config enabled, but ilp32 needs it disabled. Now we can
> enable both features, but this will make them mutual exclusive. We can
> instead (un)define __ARCH_WANT_32BIT_OFF_T for each ABI of each platform,
> but it's even more work. And we'd think twice how to do it because
> it's not mechanical work. For example, on aarch64 it will look like:
> 
>         #define __ARCH_WANT_32BIT_OFF_T is_a32_compat_task()
> 
> I have no idea how it will look on x86 or ppc.

arm64 is not affected at all, because it's a 64-bit-only architecture
and force_o_largefile() already returns true here.

> In your previous email (Nov 18) you write that tile is the only user
> of asm-generic/unistd.h that needs compat behaviour. If so, why not
> just to turn it around, as you initially suggested, and fix tile. And
> do nothing with force_o_largefile()?

The first two of the three changes I listed above are for the
compat ABI, and there it is enough to change tile.

The third change has nothing to do with compat mode, but is about
native 32-bit architectures. However, we should change both in sync,
so the next architecture that gets added with both native 32-bit
mode and compat 32-bit mode on a 64-bit kernel behaves the same
way for all 32-bit user space, independent of what the kernel does.

We can have two separate patches to clarify that these don't have to
be done atomically, but we need to do both for consistency.
 
> By the way, is there a comprehensive list of linux platforms/abis, or
> at least ones that use asm-generic/unistd.h?

Just the source code.

	Arnd
Yury Norov Jan. 5, 2016, 3:26 p.m. UTC | #15
On Thu, Dec 17, 2015 at 09:50:52PM +0100, Arnd Bergmann wrote:
> On Thursday 17 December 2015 12:14:20 Andrew Pinski wrote:
> > On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote:
> > >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote:
> > >
> > >> > +#define compat_sys_lookup_dcookie      sys_lookup_dcookie
> > >> > +#define compat_sys_pread64             sys_pread64
> > >> > +#define compat_sys_pwrite64            sys_pwrite64
> > >> > +#define compat_sys_readahead           sys_readahead
> > >> > +#define compat_sys_shmat               sys_shmat
> > >>
> > >> I wonder whether we need wrappers (actually, not only for these but
> > >> sys_read etc.). These functions take either a pointer or a size_t
> > >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64
> > >> kernel. Can we guarantee that user space zeros the top 32-bit of the
> > >> arguments passed here?
> > >
> > > I'm pretty sure that is safe. I haven't read the calling conventions
> > > specification for arm64 ilp32, but usually all function arguments are
> > > passed as 64-bit registers with proper sign-extend or zero-extend.
> > 
> > Well (just like LP64 on AARCH64), when passing a 32bit value to a
> > function, the upper 32bits are undefined.  I ran into this when I was
> > debugging the GCC go library on ILP32 (though reproduced with pure C
> > code) and the assembly functions inside glibc where pointers are
> > passed with the upper 32bits as undefined.
> > So we have an issue if called with syscall function or using pure
> > assembly to create the syscall functions (which glibc does).
> 
> Ok, I see :-(
> 
> So the calling conventions avoid the problem of being able to set
> the upper bits from malicious user space when the kernel assumes they
> are zeroed out (we had security bugs in this area, before we introduced
> SYSCALL_DEFINEx()), but it means that we need wrappers around each
> syscall that takes an argument that is different length between user
> and kernel space (as Catalin guessed). arch/s390 has the same problem and
> works around it with code in arch/s390/kernel/compat_wrapper.c, while
> other architectures (at least powerpc, x86 and tile IIRC, don't know much
> about mips, parisc and sparc) don't have the problem because of their
> calling conventions.
> 
> This also means that we cannot work around it in glibc at all, because
> we have to be able to handle malicious user space, so it has to be
> done in the kernel using something similar to what s390 does.
> 
> 	Arnd

So it seems like we (should) have 2 compat modes - with and without access
to upper half of register. I'm thinking now on how put it in generic
unistd.h less painfull way.

Beside of that, I think I almost finished with all current comments. As
this issue is not related to ILP32 directly, I think, it's better to show
it now, as there is pretty massive rework. What do you think?
Arnd Bergmann Jan. 5, 2016, 9:12 p.m. UTC | #16
On Tuesday 05 January 2016 18:26:57 Yury Norov wrote:
> > So the calling conventions avoid the problem of being able to set
> > the upper bits from malicious user space when the kernel assumes they
> > are zeroed out (we had security bugs in this area, before we introduced
> > SYSCALL_DEFINEx()), but it means that we need wrappers around each
> > syscall that takes an argument that is different length between user
> > and kernel space (as Catalin guessed). arch/s390 has the same problem and
> > works around it with code in arch/s390/kernel/compat_wrapper.c, while
> > other architectures (at least powerpc, x86 and tile IIRC, don't know much
> > about mips, parisc and sparc) don't have the problem because of their
> > calling conventions.
> > 
> > This also means that we cannot work around it in glibc at all, because
> > we have to be able to handle malicious user space, so it has to be
> > done in the kernel using something similar to what s390 does.
> > 
> >       Arnd
> 
> So it seems like we (should) have 2 compat modes - with and without access
> to upper half of register. I'm thinking now on how put it in generic
> unistd.h less painfull way.

I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/
__SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for
arm64-ilp32 and s390, the other two don't.

We can use some clever string concatenation to add a ##_wrapper to the name
of the handler where needed and then just have a file that implements
the wrappers, copied from s390.

Unfortunately, we can't just zero out all the upper halves and be done with
it: even if we went back to passing 64-bit arguments as separate 32-bit
registers, we'd still need to deal with sign-extending negative 32-bit
numbers.

> Beside of that, I think I almost finished with all current comments. As
> this issue is not related to ILP32 directly, I think, it's better to show
> it now, as there is pretty massive rework. What do you think?

Good idea, yes.

	Arnd
Catalin Marinas Jan. 6, 2016, 5:10 p.m. UTC | #17
On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote:
> On Tuesday 05 January 2016 18:26:57 Yury Norov wrote:
> > > So the calling conventions avoid the problem of being able to set
> > > the upper bits from malicious user space when the kernel assumes they
> > > are zeroed out (we had security bugs in this area, before we introduced
> > > SYSCALL_DEFINEx()), but it means that we need wrappers around each
> > > syscall that takes an argument that is different length between user
> > > and kernel space (as Catalin guessed). arch/s390 has the same problem and
> > > works around it with code in arch/s390/kernel/compat_wrapper.c, while
> > > other architectures (at least powerpc, x86 and tile IIRC, don't know much
> > > about mips, parisc and sparc) don't have the problem because of their
> > > calling conventions.
> > > 
> > > This also means that we cannot work around it in glibc at all, because
> > > we have to be able to handle malicious user space, so it has to be
> > > done in the kernel using something similar to what s390 does.
> > 
> > So it seems like we (should) have 2 compat modes - with and without access
> > to upper half of register. I'm thinking now on how put it in generic
> > unistd.h less painfull way.
> 
> I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/
> __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for
> arm64-ilp32 and s390, the other two don't.
> 
> We can use some clever string concatenation to add a ##_wrapper to the name
> of the handler where needed and then just have a file that implements
> the wrappers, copied from s390.
> 
> Unfortunately, we can't just zero out all the upper halves and be done with
> it: even if we went back to passing 64-bit arguments as separate 32-bit
> registers, we'd still need to deal with sign-extending negative 32-bit
> numbers.

How many syscalls would we need sign-extension for? Most are probably
already handled by specific compat_sys_* functions, otherwise A32 compat
wouldn't work properly.

Anyway, I think we can get away with not modifying the generic __SYSCALL
definition and only use something like
arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make
__SYSCALL expand the function name with some ilp32_ prefix.

For existing compat_* syscalls, we only need to handle the pointer types
(something like the s390's __TYPE_IS_PTR). I think other types are
already handled by defining the prototype with compat_ulong_t etc.

For native syscalls like sys_read, apart from pointers we also need to
handle size_t. The wrapper would need to be defined using compat types:

ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count)

and let the compiler handle the conversion to size_t automatically when
calling sys_read from the wrapper.

> > Beside of that, I think I almost finished with all current comments. As
> > this issue is not related to ILP32 directly, I think, it's better to show
> > it now, as there is pretty massive rework. What do you think?
> 
> Good idea, yes.

Note that we still need to sort the 0/sign extension out before we
"declare" the ILP32 ABI stable.
Arnd Bergmann Jan. 7, 2016, 2:13 p.m. UTC | #18
On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote:
> On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote:
> > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote:
> > > > So the calling conventions avoid the problem of being able to set
> > > > the upper bits from malicious user space when the kernel assumes they
> > > > are zeroed out (we had security bugs in this area, before we introduced
> > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each
> > > > syscall that takes an argument that is different length between user
> > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and
> > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while
> > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much
> > > > about mips, parisc and sparc) don't have the problem because of their
> > > > calling conventions.
> > > > 
> > > > This also means that we cannot work around it in glibc at all, because
> > > > we have to be able to handle malicious user space, so it has to be
> > > > done in the kernel using something similar to what s390 does.
> > > 
> > > So it seems like we (should) have 2 compat modes - with and without access
> > > to upper half of register. I'm thinking now on how put it in generic
> > > unistd.h less painfull way.
> > 
> > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/
> > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for
> > arm64-ilp32 and s390, the other two don't.
> > 
> > We can use some clever string concatenation to add a ##_wrapper to the name
> > of the handler where needed and then just have a file that implements
> > the wrappers, copied from s390.
> > 
> > Unfortunately, we can't just zero out all the upper halves and be done with
> > it: even if we went back to passing 64-bit arguments as separate 32-bit
> > registers, we'd still need to deal with sign-extending negative 32-bit
> > numbers.
> 
> How many syscalls would we need sign-extension for? Most are probably
> already handled by specific compat_sys_* functions, otherwise A32 compat
> wouldn't work properly.

Good point. I suppose any system call that expects a negative argument
may run into this on all architectures and require a COMPAT_SYSCALL handler,
but only s390 cares about doing the extension for the entire set of syscalls.

This may be to work around a peculiarity of s390, which has now two
but three possible 32-to-64 extension modes: signed int, unsigned int
and pointer. The third one sets the top 33 bits to zero, clearing the
top bit of the 31-bit pointer value in the process. Nothing else needs
this, so if we just clear the upper bits on all system calls and go
back to passing 64-bit arguments as pairs, we are fine and have a much
simpler solution.

> Anyway, I think we can get away with not modifying the generic __SYSCALL
> definition and only use something like
> arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make
> __SYSCALL expand the function name with some ilp32_ prefix.

I couldn't think of a way, but I'm gladly proven wrong here.

> For existing compat_* syscalls, we only need to handle the pointer types
> (something like the s390's __TYPE_IS_PTR). I think other types are
> already handled by defining the prototype with compat_ulong_t etc.

Right.
 
> For native syscalls like sys_read, apart from pointers we also need to
> handle size_t. The wrapper would need to be defined using compat types:
> 
> ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count)
> 
> and let the compiler handle the conversion to size_t automatically when
> calling sys_read from the wrapper.

Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros
though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right
already.

	Arnd
Yury Norov Jan. 7, 2016, 3:42 p.m. UTC | #19
On Thu, Jan 07, 2016 at 03:13:37PM +0100, Arnd Bergmann wrote:
> On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote:
> > On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote:
> > > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote:
> > > > > So the calling conventions avoid the problem of being able to set
> > > > > the upper bits from malicious user space when the kernel assumes they
> > > > > are zeroed out (we had security bugs in this area, before we introduced
> > > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each
> > > > > syscall that takes an argument that is different length between user
> > > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and
> > > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while
> > > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much
> > > > > about mips, parisc and sparc) don't have the problem because of their
> > > > > calling conventions.
> > > > > 
> > > > > This also means that we cannot work around it in glibc at all, because
> > > > > we have to be able to handle malicious user space, so it has to be
> > > > > done in the kernel using something similar to what s390 does.
> > > > 
> > > > So it seems like we (should) have 2 compat modes - with and without access
> > > > to upper half of register. I'm thinking now on how put it in generic
> > > > unistd.h less painfull way.
> > > 
> > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/
> > > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for
> > > arm64-ilp32 and s390, the other two don't.
> > > 
> > > We can use some clever string concatenation to add a ##_wrapper to the name
> > > of the handler where needed and then just have a file that implements
> > > the wrappers, copied from s390.
> > > 
> > > Unfortunately, we can't just zero out all the upper halves and be done with
> > > it: even if we went back to passing 64-bit arguments as separate 32-bit
> > > registers, we'd still need to deal with sign-extending negative 32-bit
> > > numbers.
> > 
> > How many syscalls would we need sign-extension for? Most are probably
> > already handled by specific compat_sys_* functions, otherwise A32 compat
> > wouldn't work properly.
> 
> Good point. I suppose any system call that expects a negative argument
> may run into this on all architectures and require a COMPAT_SYSCALL handler,
> but only s390 cares about doing the extension for the entire set of syscalls.
> 
> This may be to work around a peculiarity of s390, which has now two
> but three possible 32-to-64 extension modes: signed int, unsigned int
> and pointer. The third one sets the top 33 bits to zero, clearing the
> top bit of the 31-bit pointer value in the process. Nothing else needs
> this, so if we just clear the upper bits on all system calls and go
> back to passing 64-bit arguments as pairs, we are fine and have a much
> simpler solution.

Wrappers will not add extra complexity because we already have it in
s390 port. In other hand, splitting and then assembling 64-bit values
affects performance so small, that we may not care about it much.

So, both approaches are acceptable for me. But I'd choose wrappers,
because it looks more generic, and more fun. :)

This way we can make something like ILP16 easily (can't imagine what
for, though).

> 
> > Anyway, I think we can get away with not modifying the generic __SYSCALL
> > definition and only use something like
> > arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make
> > __SYSCALL expand the function name with some ilp32_ prefix.
> 
> I couldn't think of a way, but I'm gladly proven wrong here.
> 
> > For existing compat_* syscalls, we only need to handle the pointer types
> > (something like the s390's __TYPE_IS_PTR). I think other types are
> > already handled by defining the prototype with compat_ulong_t etc.
> 
> Right.
>  
> > For native syscalls like sys_read, apart from pointers we also need to
> > handle size_t. The wrapper would need to be defined using compat types:
> > 
> > ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count)
> > 
> > and let the compiler handle the conversion to size_t automatically when
> > calling sys_read from the wrapper.
> 
> Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros
> though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right
> already.
> 
> 	Arnd
Catalin Marinas Jan. 7, 2016, 5:23 p.m. UTC | #20
On Thu, Jan 07, 2016 at 03:13:37PM +0100, Arnd Bergmann wrote:
> On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote:
> > On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote:
> > > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote:
> > > > > So the calling conventions avoid the problem of being able to set
> > > > > the upper bits from malicious user space when the kernel assumes they
> > > > > are zeroed out (we had security bugs in this area, before we introduced
> > > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each
> > > > > syscall that takes an argument that is different length between user
> > > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and
> > > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while
> > > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much
> > > > > about mips, parisc and sparc) don't have the problem because of their
> > > > > calling conventions.
> > > > > 
> > > > > This also means that we cannot work around it in glibc at all, because
> > > > > we have to be able to handle malicious user space, so it has to be
> > > > > done in the kernel using something similar to what s390 does.
> > > > 
> > > > So it seems like we (should) have 2 compat modes - with and without access
> > > > to upper half of register. I'm thinking now on how put it in generic
> > > > unistd.h less painfull way.
> > > 
> > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/
> > > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for
> > > arm64-ilp32 and s390, the other two don't.
> > > 
> > > We can use some clever string concatenation to add a ##_wrapper to the name
> > > of the handler where needed and then just have a file that implements
> > > the wrappers, copied from s390.
> > > 
> > > Unfortunately, we can't just zero out all the upper halves and be done with
> > > it: even if we went back to passing 64-bit arguments as separate 32-bit
> > > registers, we'd still need to deal with sign-extending negative 32-bit
> > > numbers.
> > 
> > How many syscalls would we need sign-extension for? Most are probably
> > already handled by specific compat_sys_* functions, otherwise A32 compat
> > wouldn't work properly.
> 
> Good point. I suppose any system call that expects a negative argument
> may run into this on all architectures and require a COMPAT_SYSCALL handler,
> but only s390 cares about doing the extension for the entire set of syscalls.
> 
> This may be to work around a peculiarity of s390, which has now two
> but three possible 32-to-64 extension modes: signed int, unsigned int
> and pointer. The third one sets the top 33 bits to zero, clearing the
> top bit of the 31-bit pointer value in the process. Nothing else needs
> this, so if we just clear the upper bits on all system calls and go
> back to passing 64-bit arguments as pairs, we are fine and have a much
> simpler solution.

It would be indeed simpler from a kernel perspective. I'm not sure about
the performance impact (a wrapper which does "mov wn, wn" for the first
6 registers, on top of existing wrappers). OTOH, with explicit wrappers
we have the overhead of an additional function call, so we may be better
off with the former.

> > For native syscalls like sys_read, apart from pointers we also need to
> > handle size_t. The wrapper would need to be defined using compat types:
> > 
> > ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count)
> > 
> > and let the compiler handle the conversion to size_t automatically when
> > calling sys_read from the wrapper.
> 
> Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros
> though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right
> already.

The existing COMPAT_SYSCALL_DEFINEx macros generate the wrapper and
definition for the compat_sys_* functions. What I meant by an
ILP32_SYSCALL_DEFINEx is a macro which only generates a wrapper that
calls into the native syscall (after sanitizing the arguments).
diff mbox

Patch

diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index e6216ef..230db54 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -13,6 +13,12 @@ 
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+
+#ifdef CONFIG_ARM64_ILP32
+#define __ARCH_WANT_COMPAT_SYS_PREADV64
+#define __ARCH_WANT_COMPAT_SYS_PWRITEV64
+#endif
+
 #ifdef CONFIG_AARCH32_EL0
 #define __ARCH_WANT_COMPAT_SYS_GETDENTS64
 #define __ARCH_WANT_SYS_GETHOSTNAME
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 1470332..8787347 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -24,6 +24,7 @@  arm64-obj-$(CONFIG_AARCH32_EL0)		+= sys32.o kuser32.o signal32.o 	\
 					   sys_compat.o entry32.o		\
 					   ../../arm/kernel/opcodes.o
 arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
+arm64-obj-$(CONFIG_ARM64_ILP32)		+= sys_ilp32.o
 arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
 arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 52be5c8..bcd921a 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -664,9 +664,13 @@  ENDPROC(ret_from_fork)
  */
 	.align	6
 el0_svc:
-	adrp	stbl, sys_call_table		// load syscall table pointer
 	uxtw	scno, w8			// syscall number in w8
 	mov	sc_nr, #__NR_syscalls
+#ifdef CONFIG_ARM64_ILP32
+	ldr	x16, [tsk, #TI_FLAGS]
+	tbnz	x16, #TIF_32BIT_AARCH64, el0_ilp32_svc // We are using ILP32
+#endif
+	adrp	stbl, sys_call_table		// load syscall table pointer
 el0_svc_naked:					// compat entry point
 	stp	x0, scno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
 	enable_dbg_and_irq
@@ -686,6 +690,12 @@  ni_sys:
 	b	ret_fast_syscall
 ENDPROC(el0_svc)
 
+#ifdef CONFIG_ARM64_ILP32
+el0_ilp32_svc:
+	adrp	stbl, sys_call_ilp32_table // load syscall table pointer
+	b el0_svc_naked
+#endif
+
 	/*
 	 * This is the really slow path.  We're going to be doing context
 	 * switches, and waiting for our parent to respond.
diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
new file mode 100644
index 0000000..8ce79db
--- /dev/null
+++ b/arch/arm64/kernel/sys_ilp32.c
@@ -0,0 +1,65 @@ 
+/*
+ * AArch64- ILP32 specific system calls implementation
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Andrew Pinski <apinski@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/msg.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/compat.h>
+
+/* Using non-compat syscalls where necessary */
+#define compat_sys_fadvise64_64        sys_fadvise64_64
+#define compat_sys_fallocate           sys_fallocate
+#define compat_sys_ftruncate64         sys_ftruncate
+#define compat_sys_lookup_dcookie      sys_lookup_dcookie
+#define compat_sys_pread64             sys_pread64
+#define compat_sys_pwrite64            sys_pwrite64
+#define compat_sys_readahead           sys_readahead
+#define compat_sys_shmat               sys_shmat
+#define compat_sys_sigaltstack         sys_sigaltstack
+#define compat_sys_sync_file_range     sys_sync_file_range
+#define compat_sys_truncate64          sys_truncate
+#define sys_llseek                     sys_lseek
+
+#define compat_sys_open_by_handle_at   sys_open_by_handle_at
+#define compat_sys_openat              sys_openat
+
+#include <asm/syscall.h>
+
+#undef __SYSCALL
+#undef __SC_COMP
+#undef __SC_3264
+#undef __SC_COMP_3264
+
+#define __SYSCALL_COMPAT
+#define __SYSCALL(nr, sym)	[nr] = sym,
+
+/*
+ * The sys_call_ilp32_table array must be 4K aligned to be accessible from
+ * kernel/entry.S.
+ */
+void *sys_call_ilp32_table[__NR_syscalls] __aligned(4096) = {
+	[0 ... __NR_syscalls - 1] = sys_ni_syscall,
+#include <asm/unistd.h>
+};