Message ID | 20170607143440.7c86af85@mschwideX1 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 07, 2017 at 02:34:40PM +0200, Martin Schwidefsky wrote: > +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ > +({ \ > + struct elf64_hdr *_ehdr = (void *) ehdr; \ > + struct elf64_phdr *_phdr = (void *) phdr; \ > + int _rc = 0; \ > + if (_ehdr->e_ident[EI_CLASS] == ELFCLASS64 && \ > + _phdr->p_type == PT_S390_REQUEST_PGSTE && \ > + !page_table_allocate_pgste && \ > + !test_thread_flag(TIF_REQUEST_PGSTE)) { \ > + set_thread_flag(TIF_REQUEST_PGSTE); \ > + set_pt_regs_flag(task_pt_regs(current), \ > + PIF_SYSCALL_RESTART); \ > + _rc = -EAGAIN; \ > + } \ > + _rc; \ > +}) I'm wondering if this should simply fail, if a PT_S390_REQUEST_PGSTE type segment exists, but it is not ELFCLASS64? It will fail later anyway on s390_enable_sie(), but... > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h > index c119d564d8f2..1201b18e817d 100644 > --- a/arch/s390/include/asm/mmu_context.h > +++ b/arch/s390/include/asm/mmu_context.h > @@ -25,7 +25,8 @@ static inline int init_new_context(struct task_struct *tsk, > mm->context.gmap_asce = 0; > mm->context.flush_mm = 0; > #ifdef CONFIG_PGSTE > - mm->context.alloc_pgste = page_table_allocate_pgste; > + mm->context.alloc_pgste = page_table_allocate_pgste || > + test_thread_flag(TIF_REQUEST_PGSTE); I think the alloc_pgste flag should be inherited on fork, no?
On Wed, 7 Jun 2017 22:47:56 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Wed, Jun 07, 2017 at 02:34:40PM +0200, Martin Schwidefsky wrote: > > +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ > > +({ \ > > + struct elf64_hdr *_ehdr = (void *) ehdr; \ > > + struct elf64_phdr *_phdr = (void *) phdr; \ > > + int _rc = 0; \ > > + if (_ehdr->e_ident[EI_CLASS] == ELFCLASS64 && \ > > + _phdr->p_type == PT_S390_REQUEST_PGSTE && \ > > + !page_table_allocate_pgste && \ > > + !test_thread_flag(TIF_REQUEST_PGSTE)) { \ > > + set_thread_flag(TIF_REQUEST_PGSTE); \ > > + set_pt_regs_flag(task_pt_regs(current), \ > > + PIF_SYSCALL_RESTART); \ > > + _rc = -EAGAIN; \ > > + } \ > > + _rc; \ > > +}) > > I'm wondering if this should simply fail, if a PT_S390_REQUEST_PGSTE type > segment exists, but it is not ELFCLASS64? > It will fail later anyway on s390_enable_sie(), but... Does it matter if it fails for a 32-bit ELF file? Just makes the code more complex without benefit, no? > > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h > > index c119d564d8f2..1201b18e817d 100644 > > --- a/arch/s390/include/asm/mmu_context.h > > +++ b/arch/s390/include/asm/mmu_context.h > > @@ -25,7 +25,8 @@ static inline int init_new_context(struct task_struct *tsk, > > mm->context.gmap_asce = 0; > > mm->context.flush_mm = 0; > > #ifdef CONFIG_PGSTE > > - mm->context.alloc_pgste = page_table_allocate_pgste; > > + mm->context.alloc_pgste = page_table_allocate_pgste || > > + test_thread_flag(TIF_REQUEST_PGSTE); > > I think the alloc_pgste flag should be inherited on fork, no? Yes, that makes it more consistent. I'll add it.
On Thu, Jun 08, 2017 at 07:35:28AM +0200, Martin Schwidefsky wrote: > On Wed, 7 Jun 2017 22:47:56 +0200 > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > On Wed, Jun 07, 2017 at 02:34:40PM +0200, Martin Schwidefsky wrote: > > > +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ > > > +({ \ > > > + struct elf64_hdr *_ehdr = (void *) ehdr; \ > > > + struct elf64_phdr *_phdr = (void *) phdr; \ > > > + int _rc = 0; \ > > > + if (_ehdr->e_ident[EI_CLASS] == ELFCLASS64 && \ > > > + _phdr->p_type == PT_S390_REQUEST_PGSTE && \ > > > + !page_table_allocate_pgste && \ > > > + !test_thread_flag(TIF_REQUEST_PGSTE)) { \ > > > + set_thread_flag(TIF_REQUEST_PGSTE); \ > > > + set_pt_regs_flag(task_pt_regs(current), \ > > > + PIF_SYSCALL_RESTART); \ > > > + _rc = -EAGAIN; \ > > > + } \ > > > + _rc; \ > > > +}) > > > > I'm wondering if this should simply fail, if a PT_S390_REQUEST_PGSTE type > > segment exists, but it is not ELFCLASS64? > > It will fail later anyway on s390_enable_sie(), but... > > Does it matter if it fails for a 32-bit ELF file? Just makes the code more > complex without benefit, no? It would be more consistent, since right now a 32-bit ELF file with PT_S390_REQUEST_PGSTE will be exectuted, but the page tables won't have any pgstes. That's sort of odd, isn't it? And that later on it won't be able to create a virtual machine because our current implementation doesn't allow that for compat tasks is sort of unrelated. But anyway, I'll leave that up to you, it doesn't really matter. > > > > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h > > > index c119d564d8f2..1201b18e817d 100644 > > > --- a/arch/s390/include/asm/mmu_context.h > > > +++ b/arch/s390/include/asm/mmu_context.h > > > @@ -25,7 +25,8 @@ static inline int init_new_context(struct task_struct *tsk, > > > mm->context.gmap_asce = 0; > > > mm->context.flush_mm = 0; > > > #ifdef CONFIG_PGSTE > > > - mm->context.alloc_pgste = page_table_allocate_pgste; > > > + mm->context.alloc_pgste = page_table_allocate_pgste || > > > + test_thread_flag(TIF_REQUEST_PGSTE); > > > > I think the alloc_pgste flag should be inherited on fork, no? > > Yes, that makes it more consistent. I'll add it. By the way, what prevents with the _current_ code a scenario like: - set allocate_pgste sysctl to 1 - create kvm guest - s390_enable_sie - run vcpu - set allocate_pgste sysctl to 0 - clone(... CLONE_FILES ...) (that is: new mm without pgstes, but shared fds) - [child] run vcpu Is there anything that makes sure we cannot execute the sie instruction in the child process?
On Thu, 8 Jun 2017 08:25:31 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Thu, Jun 08, 2017 at 07:35:28AM +0200, Martin Schwidefsky wrote: > > On Wed, 7 Jun 2017 22:47:56 +0200 > > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > On Wed, Jun 07, 2017 at 02:34:40PM +0200, Martin Schwidefsky wrote: > > > > +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ > > > > +({ \ > > > > + struct elf64_hdr *_ehdr = (void *) ehdr; \ > > > > + struct elf64_phdr *_phdr = (void *) phdr; \ > > > > + int _rc = 0; \ > > > > + if (_ehdr->e_ident[EI_CLASS] == ELFCLASS64 && \ > > > > + _phdr->p_type == PT_S390_REQUEST_PGSTE && \ > > > > + !page_table_allocate_pgste && \ > > > > + !test_thread_flag(TIF_REQUEST_PGSTE)) { \ > > > > + set_thread_flag(TIF_REQUEST_PGSTE); \ > > > > + set_pt_regs_flag(task_pt_regs(current), \ > > > > + PIF_SYSCALL_RESTART); \ > > > > + _rc = -EAGAIN; \ > > > > + } \ > > > > + _rc; \ > > > > +}) > > > > > > I'm wondering if this should simply fail, if a PT_S390_REQUEST_PGSTE type > > > segment exists, but it is not ELFCLASS64? > > > It will fail later anyway on s390_enable_sie(), but... > > > > Does it matter if it fails for a 32-bit ELF file? Just makes the code more > > complex without benefit, no? > > It would be more consistent, since right now a 32-bit ELF file with > PT_S390_REQUEST_PGSTE will be exectuted, but the page tables won't have any > pgstes. That's sort of odd, isn't it? And that later on it won't be able to > create a virtual machine because our current implementation doesn't allow > that for compat tasks is sort of unrelated. > But anyway, I'll leave that up to you, it doesn't really matter. Actually the code will be less complex if we add PT_S390_REQUEST_PGSTE for 32-bit ELF files as well. It does not make sense to define the segment for a compat process as KVM won't work but you get what you ask for.. This looks like this: #define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ ({ \ int _rc = 0; \ if (phdr->p_type == PT_S390_REQUEST_PGSTE && \ !page_table_allocate_pgste && \ !test_thread_flag(TIF_REQUEST_PGSTE)) { \ set_thread_flag(TIF_REQUEST_PGSTE); \ set_pt_regs_flag(task_pt_regs(current), \ PIF_SYSCALL_RESTART); \ _rc = -EAGAIN; \ } \ _rc; \ }) phdr is a (struct elf_phd *) which is either define to a a (struct elf64_phdr *) or a (struct elf32_phdr *). The check works in both cases. > > > > > > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h > > > > index c119d564d8f2..1201b18e817d 100644 > > > > --- a/arch/s390/include/asm/mmu_context.h > > > > +++ b/arch/s390/include/asm/mmu_context.h > > > > @@ -25,7 +25,8 @@ static inline int init_new_context(struct task_struct *tsk, > > > > mm->context.gmap_asce = 0; > > > > mm->context.flush_mm = 0; > > > > #ifdef CONFIG_PGSTE > > > > - mm->context.alloc_pgste = page_table_allocate_pgste; > > > > + mm->context.alloc_pgste = page_table_allocate_pgste || > > > > + test_thread_flag(TIF_REQUEST_PGSTE); > > > > > > I think the alloc_pgste flag should be inherited on fork, no? > > > > Yes, that makes it more consistent. I'll add it. > > By the way, what prevents with the _current_ code a scenario like: > > - set allocate_pgste sysctl to 1 > - create kvm guest > - s390_enable_sie > - run vcpu > - set allocate_pgste sysctl to 0 > - clone(... CLONE_FILES ...) (that is: new mm without pgstes, but shared fds) > - [child] run vcpu > > Is there anything that makes sure we cannot execute the sie instruction in > the child process? Yes, that looks like a loop-hole.
On Thu, Jun 08, 2017 at 01:24:01PM +0200, Martin Schwidefsky wrote: > On Thu, 8 Jun 2017 08:25:31 +0200 > Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > It would be more consistent, since right now a 32-bit ELF file with > > PT_S390_REQUEST_PGSTE will be exectuted, but the page tables won't have any > > pgstes. That's sort of odd, isn't it? And that later on it won't be able to > > create a virtual machine because our current implementation doesn't allow > > that for compat tasks is sort of unrelated. > > But anyway, I'll leave that up to you, it doesn't really matter. > > Actually the code will be less complex if we add PT_S390_REQUEST_PGSTE for > 32-bit ELF files as well. It does not make sense to define the segment for > a compat process as KVM won't work but you get what you ask for.. > > This looks like this: > > #define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ > ({ \ > int _rc = 0; \ > if (phdr->p_type == PT_S390_REQUEST_PGSTE && \ > !page_table_allocate_pgste && \ > !test_thread_flag(TIF_REQUEST_PGSTE)) { \ > set_thread_flag(TIF_REQUEST_PGSTE); \ > set_pt_regs_flag(task_pt_regs(current), \ > PIF_SYSCALL_RESTART); \ > _rc = -EAGAIN; \ > } \ > _rc; \ > }) > > phdr is a (struct elf_phd *) which is either define to a a (struct elf64_phdr *) > or a (struct elf32_phdr *). The check works in both cases. Yes, that makes sense.
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 69a77eecaec1..7bd182676ddd 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -64,6 +64,7 @@ config ARCH_SUPPORTS_UPROBES config S390 def_bool y + select ARCH_BINFMT_ELF_STATE select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index e8f623041769..7811d1006642 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -117,6 +117,9 @@ #define ELF_DATA ELFDATA2MSB #define ELF_ARCH EM_S390 +/* s390 specific phdr types */ +#define PT_S390_REQUEST_PGSTE 0x70000000 + /* * ELF register definitions.. */ @@ -151,6 +154,29 @@ extern unsigned int vdso_enabled; && (x)->e_ident[EI_CLASS] == ELF_CLASS) #define compat_start_thread start_thread31 +struct arch_elf_state { +}; + +#define INIT_ARCH_ELF_STATE { } + +#define arch_check_elf(ehdr, interp, interp_ehdr, state) (0) +#define arch_elf_pt_proc(ehdr, phdr, elf, interp, state) \ +({ \ + struct elf64_hdr *_ehdr = (void *) ehdr; \ + struct elf64_phdr *_phdr = (void *) phdr; \ + int _rc = 0; \ + if (_ehdr->e_ident[EI_CLASS] == ELFCLASS64 && \ + _phdr->p_type == PT_S390_REQUEST_PGSTE && \ + !page_table_allocate_pgste && \ + !test_thread_flag(TIF_REQUEST_PGSTE)) { \ + set_thread_flag(TIF_REQUEST_PGSTE); \ + set_pt_regs_flag(task_pt_regs(current), \ + PIF_SYSCALL_RESTART); \ + _rc = -EAGAIN; \ + } \ + _rc; \ +}) + /* For SVR4/S390 the function pointer to be registered with `atexit` is passed in R14. */ #define ELF_PLAT_INIT(_r, load_addr) \ diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index c119d564d8f2..1201b18e817d 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -25,7 +25,8 @@ static inline int init_new_context(struct task_struct *tsk, mm->context.gmap_asce = 0; mm->context.flush_mm = 0; #ifdef CONFIG_PGSTE - mm->context.alloc_pgste = page_table_allocate_pgste; + mm->context.alloc_pgste = page_table_allocate_pgste || + test_thread_flag(TIF_REQUEST_PGSTE); mm->context.has_pgste = 0; mm->context.use_skey = 0; mm->context.use_cmma = 0; diff --git a/arch/s390/include/asm/ptrace.h b/arch/s390/include/asm/ptrace.h index 99bc456cc26a..24baa80f7af6 100644 --- a/arch/s390/include/asm/ptrace.h +++ b/arch/s390/include/asm/ptrace.h @@ -11,9 +11,11 @@ #define PIF_SYSCALL 0 /* inside a system call */ #define PIF_PER_TRAP 1 /* deliver sigtrap on return to user */ +#define PIF_SYSCALL_RESTART 2 /* restart the current system call */ #define _PIF_SYSCALL _BITUL(PIF_SYSCALL) #define _PIF_PER_TRAP _BITUL(PIF_PER_TRAP) +#define _PIF_SYSCALL_RESTART _BITUL(PIF_SYSCALL_RESTART) #ifndef __ASSEMBLY__ diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h index f36e6e2b73f0..e7444d76cc63 100644 --- a/arch/s390/include/asm/thread_info.h +++ b/arch/s390/include/asm/thread_info.h @@ -56,6 +56,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ #define TIF_UPROBE 3 /* breakpointed or single-stepping */ #define TIF_GUARDED_STORAGE 4 /* load guarded storage control block */ +#define TIF_REQUEST_PGSTE 5 /* New mm's will use 4K page tables */ #define TIF_SYSCALL_TRACE 8 /* syscall trace active */ #define TIF_SYSCALL_AUDIT 9 /* syscall auditing active */ #define TIF_SECCOMP 10 /* secure computing */ diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index 0c2c3b8bfc9a..8c824b32527a 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -52,7 +52,7 @@ _TIF_TRACE = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \ _TIF_SYSCALL_TRACEPOINT) _CIF_WORK = (_CIF_MCCK_PENDING | _CIF_ASCE_PRIMARY | \ _CIF_ASCE_SECONDARY | _CIF_FPU) -_PIF_WORK = (_PIF_PER_TRAP) +_PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART) #define BASED(name) name-cleanup_critical(%r13) @@ -342,6 +342,8 @@ ENTRY(system_call) jo .Lsysc_guarded_storage TSTMSK __PT_FLAGS(%r11),_PIF_PER_TRAP jo .Lsysc_singlestep + TSTMSK __PT_FLAGS(%r11),_PIF_SYSCALL_RESTART + jo .Lsysc_syscall_restart TSTMSK __TI_flags(%r12),_TIF_SIGPENDING jo .Lsysc_sigpending TSTMSK __TI_flags(%r12),_TIF_NOTIFY_RESUME @@ -434,6 +436,15 @@ ENTRY(system_call) jg do_per_trap # +# _PIF_SYSCALL_RESTART is set, repeat the current system call +# +.Lsysc_syscall_restart: + ni __PT_FLAGS+7(%r11),255-_PIF_SYSCALL_RESTART + lmg %r1,%r7,__PT_R1(%r11) # load svc arguments + lg %r2,__PT_ORIG_GPR2(%r11) + j .Lsysc_do_svc + +# # call tracehook_report_syscall_entry/tracehook_report_syscall_exit before # and after the system call #