Message ID | 20200907153701.2981205-5-arnd@arndb.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM: remove set_fs callers and implementation | expand |
Hi Arnd, help me out here because I feel vaguely stupid... On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote: > { > + if (IS_ENABLED(CONFIG_OABI_COMPAT)) > + return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE; Where __NR_OABI_SYSCALL_BASE is #define __NR_OABI_SYSCALL_BASE 0x900000 So you will end up with sycall number & FF6FFFFF masking off bits 20 and 23. I suppose this is based on this: > bics r10, r10, #0xff000000 > + str r10, [tsk, #TI_SYSCALL] OK we mask off bits 24-31 before we store this. > bic scno, scno, #0xff000000 @ mask off SWI op-code > + str scno, [tsk, #TI_SYSCALL] And here too. > eor scno, scno, #__NR_SYSCALL_BASE @ check OS number And then happens that which will ... I don't know really. Exclusive or with 0x9000000 is not immediately intuitive evident to me, I suppose it is for everyone else... :/ I need some idea how this numberspace is managed in order to understand the code so I can review it, I guess it all makes perfect sense but I need some background here. Thanks, Linus Walleij
On Mon, Sep 28, 2020 at 11:41 AM Linus Walleij <linus.walleij@linaro.org> wrote: > > Hi Arnd, > > help me out here because I feel vaguely stupid... > > On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote: > > > { > > + if (IS_ENABLED(CONFIG_OABI_COMPAT)) > > + return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE; > > Where __NR_OABI_SYSCALL_BASE is > #define __NR_OABI_SYSCALL_BASE 0x900000 > > So you will end up with sycall number & FF6FFFFF > masking off bits 20 and 23. Right. I fixed a bug in here since I sent this, the correct version also needs to mask away the __NR_OABI_SYSCALL_BASE for a native oabi kernel, not just for an eabi kernel with oabi-compat mode. > I suppose this is based on this: > > > bics r10, r10, #0xff000000 > > + str r10, [tsk, #TI_SYSCALL] > > OK we mask off bits 24-31 before we store this. > > > bic scno, scno, #0xff000000 @ mask off SWI op-code > > + str scno, [tsk, #TI_SYSCALL] > > And here too. > > > eor scno, scno, #__NR_SYSCALL_BASE @ check OS number > > And then happens that which will ... I don't know really. > Exclusive or with 0x9000000 is not immediately intuitive > evident to me, I suppose it is for everyone else... :/ This is how the SWI/SVC immediate argument gets turned into a system call number that is used as an offset into the sys_call_table. OABI syscalls are called with '__NR_OABI_SYSCALL_BASE | scno' in the immediate argument of the instruction, so using an 'eor ... , #__NR_SYSCALL_BASE' means that any valid argument afterwards is a number between zero and __NR_syscalls, and any invalid argument is a number outside of that range EABI syscalls are just 'SVC 0' with the syscall number in register 7 and no offset. See also https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f2829a31573e3e502b874c8d69a765f7a778793 > I need some idea how this numberspace is managed in order to > understand the code so I can review it, I guess it all makes perfect > sense but I need some background here. I also had never understood this part before, and I'm still not sure where the 0x900000 actually comes from, though my best guess is that this was intended as a an OS specific number space, with '9' being assigned to Linux (similar to the way Itanium and MIPS do with their respective offsets). By the time EABI got added, this was apparently no longer considered helpful. Arnd
On Mon, Sep 28, 2020 at 02:42:43PM +0200, Arnd Bergmann wrote: > > I need some idea how this numberspace is managed in order to > > understand the code so I can review it, I guess it all makes perfect > > sense but I need some background here. > > I also had never understood this part before, and I'm still not > sure where the 0x900000 actually comes from, though my best > guess is that this was intended as a an OS specific number space, > with '9' being assigned to Linux (similar to the way Itanium and > MIPS do with their respective offsets). By the time EABI got added, > this was apparently no longer considered helpful. It is an OS specific number space, originally designed to allow RISC OS programs to be run under Linux. There was indeed such a project, but that died and the code ripped out. EABI, by using SWI 0 - or more accurately, not reading the SWI opcode, trampled over the ability for RISC OS programs to be run under Linux.
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h index fd02761ba06c..ff6cc365eaf7 100644 --- a/arch/arm/include/asm/syscall.h +++ b/arch/arm/include/asm/syscall.h @@ -22,6 +22,9 @@ extern const unsigned long sys_call_table[]; static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs) { + if (IS_ENABLED(CONFIG_OABI_COMPAT)) + return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE; + return task_thread_info(task)->syscall; } diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index a1570c8bab25..97af6735172b 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -46,6 +46,7 @@ int main(void) DEFINE(TI_CPU, offsetof(struct thread_info, cpu)); DEFINE(TI_CPU_DOMAIN, offsetof(struct thread_info, cpu_domain)); DEFINE(TI_CPU_SAVE, offsetof(struct thread_info, cpu_context)); + DEFINE(TI_SYSCALL, offsetof(struct thread_info, syscall)); DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp)); DEFINE(TI_TP_VALUE, offsetof(struct thread_info, tp_value)); DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate)); diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 271cb8a1eba1..2ea3a1989fed 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -223,6 +223,7 @@ ENTRY(vector_swi) /* saved_psr and saved_pc are now dead */ uaccess_disable tbl + get_thread_info tsk adr tbl, sys_call_table @ load syscall table pointer @@ -234,13 +235,16 @@ ENTRY(vector_swi) * get the old ABI syscall table address. */ bics r10, r10, #0xff000000 + str r10, [tsk, #TI_SYSCALL] eorne scno, r10, #__NR_OABI_SYSCALL_BASE ldrne tbl, =sys_oabi_call_table #elif !defined(CONFIG_AEABI) bic scno, scno, #0xff000000 @ mask off SWI op-code + str scno, [tsk, #TI_SYSCALL] eor scno, scno, #__NR_SYSCALL_BASE @ check OS number +#else + str scno, [tsk, #TI_SYSCALL] #endif - get_thread_info tsk /* * Reload the registers that may have been corrupted on entry to * the syscall assembly (by tracing or context tracking.) @@ -285,7 +289,6 @@ ENDPROC(vector_swi) * context switches, and waiting for our parent to respond. */ __sys_trace: - mov r1, scno add r0, sp, #S_OFF bl syscall_trace_enter mov scno, r0 diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c index 2771e682220b..252060663b00 100644 --- a/arch/arm/kernel/ptrace.c +++ b/arch/arm/kernel/ptrace.c @@ -885,9 +885,9 @@ static void tracehook_report_syscall(struct pt_regs *regs, regs->ARM_ip = ip; } -asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno) +asmlinkage int syscall_trace_enter(struct pt_regs *regs) { - current_thread_info()->syscall = scno; + int scno; if (test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
The system call number is used in a a couple of places, in particular ptrace, seccomp and /proc/<pid>/syscall. The last one apparently never worked reliably on ARM for tasks that are not currently getting traced. Storing the syscall number in the normal entry path makes it work, as well as allowing us to see if the current system call is for OABI compat mode, which is the next thing I want to hook into. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/syscall.h | 3 +++ arch/arm/kernel/asm-offsets.c | 1 + arch/arm/kernel/entry-common.S | 7 +++++-- arch/arm/kernel/ptrace.c | 4 ++-- 4 files changed, 11 insertions(+), 4 deletions(-)