Message ID | 20141212021241.GA5944@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 11, 2014 at 6:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Linus, do you see any problems with the following patch (against the mainline)? Not concpetually, but create_kthread() uses CLONE_FS, and I don't think it's just umask that things like nfsd want to avoid sharing. What about all the *other* fields? Just as an example: even if all the threads actually end up all having the same global root, what about contention on 'fs->lock'? I have *not* looked at the details, and maybe there's some reason I'm completely off, but it worries me. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 11, 2014 at 06:29:37PM -0800, Linus Torvalds wrote: > On Thu, Dec 11, 2014 at 6:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Linus, do you see any problems with the following patch (against the mainline)? > > Not concpetually, but create_kthread() uses CLONE_FS, and I don't > think it's just umask that things like nfsd want to avoid sharing. > What about all the *other* fields? > > Just as an example: even if all the threads actually end up all having > the same global root, what about contention on 'fs->lock'? > > I have *not* looked at the details, and maybe there's some reason I'm > completely off, but it worries me. Umm... I would be very surprised if it turned out to be a problem. nfsd really doesn't give a fuck about its cwd and root - not in the thread side of things. And (un)exporting is (a) not on a hot path and (b) not done from a kernel thread anyway. fh_to_dentry and friends doesn't care about root/cwd, etc. I don't see anything that could cause that kind of issues. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 12, 2014 at 03:02:06AM +0000, Al Viro wrote: > On Thu, Dec 11, 2014 at 06:29:37PM -0800, Linus Torvalds wrote: > > On Thu, Dec 11, 2014 at 6:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > Linus, do you see any problems with the following patch (against the mainline)? > > > > Not concpetually, but create_kthread() uses CLONE_FS, and I don't > > think it's just umask that things like nfsd want to avoid sharing. > > What about all the *other* fields? > > > > Just as an example: even if all the threads actually end up all having > > the same global root, what about contention on 'fs->lock'? > > > > I have *not* looked at the details, and maybe there's some reason I'm > > completely off, but it worries me. > > Umm... I would be very surprised if it turned out to be a problem. > nfsd really doesn't give a fuck about its cwd and root - not in the > thread side of things. And (un)exporting is (a) not on a hot path > and (b) not done from a kernel thread anyway. fh_to_dentry and friends > doesn't care about root/cwd, etc. > > I don't see anything that could cause that kind of issues. PS: I haven't checked if lustre is trying to avoid that kind of contention; it might be, but I would consider having those threads resolve _any_ kind of pathnames from root or cwd as serious bug - after all, we are not guaranteed that filesystem in question is reachable from the namespace PID 1 is running it. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 12 Dec 2014 03:02:06 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Thu, Dec 11, 2014 at 06:29:37PM -0800, Linus Torvalds wrote: > > On Thu, Dec 11, 2014 at 6:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > Linus, do you see any problems with the following patch (against the mainline)? > > > > Not concpetually, but create_kthread() uses CLONE_FS, and I don't > > think it's just umask that things like nfsd want to avoid sharing. > > What about all the *other* fields? > > > > Just as an example: even if all the threads actually end up all having > > the same global root, what about contention on 'fs->lock'? > > > > I have *not* looked at the details, and maybe there's some reason I'm > > completely off, but it worries me. > > Umm... I would be very surprised if it turned out to be a problem. > nfsd really doesn't give a fuck about its cwd and root - not in the > thread side of things. And (un)exporting is (a) not on a hot path > and (b) not done from a kernel thread anyway. fh_to_dentry and friends > doesn't care about root/cwd, etc. > > I don't see anything that could cause that kind of issues. I like the change overall -- it would certainly make my patch series simpler, but what about pathwalking? We do take the fs->lock in unlazy_walk. Is it possible we'd end up with more contention there?
On Fri, Dec 12, 2014 at 06:54:08AM -0500, Jeff Layton wrote: > > Umm... I would be very surprised if it turned out to be a problem. > > nfsd really doesn't give a fuck about its cwd and root - not in the > > thread side of things. And (un)exporting is (a) not on a hot path > > and (b) not done from a kernel thread anyway. fh_to_dentry and friends > > doesn't care about root/cwd, etc. > > > > I don't see anything that could cause that kind of issues. > > I like the change overall -- it would certainly make my patch series > simpler, but what about pathwalking? We do take the fs->lock in > unlazy_walk. Is it possible we'd end up with more contention there? That would take a pathname lookup in kernel thread side of nfsd that * isn't single-component * isn't LOOKUP_ROOT one (i.e. vfs_path_lookup() or file_open_root()) and I would really hope we don't have such things. Any such a beast would allow probing the tree layout outside of what we export, to start with... AFAICS, we really don't have anything of that sort. Note that e.g. lookup_one_len() doesn't go anywhere near ->fs->lock... -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 12 Dec 2014 16:59:52 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Fri, Dec 12, 2014 at 06:54:08AM -0500, Jeff Layton wrote: > > > > Umm... I would be very surprised if it turned out to be a problem. > > > nfsd really doesn't give a fuck about its cwd and root - not in the > > > thread side of things. And (un)exporting is (a) not on a hot path > > > and (b) not done from a kernel thread anyway. fh_to_dentry and friends > > > doesn't care about root/cwd, etc. > > > > > > I don't see anything that could cause that kind of issues. > > > > I like the change overall -- it would certainly make my patch series > > simpler, but what about pathwalking? We do take the fs->lock in > > unlazy_walk. Is it possible we'd end up with more contention there? > > That would take a pathname lookup in kernel thread side of nfsd that > * isn't single-component > * isn't LOOKUP_ROOT one (i.e. vfs_path_lookup() or file_open_root()) > and I would really hope we don't have such things. Any such a beast would > allow probing the tree layout outside of what we export, to start with... > > AFAICS, we really don't have anything of that sort. Note that e.g. > lookup_one_len() doesn't go anywhere near ->fs->lock... Ahh right. Ok, then I don't see any issue with this so far. Maybe worth letting it stew in -next once -rc1 ships? Thanks! Acked-by: Jeff Layton <jlayton@primarydata.com> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 13, 2014 at 09:06:45AM -0500, Jeff Layton wrote: > On Fri, 12 Dec 2014 16:59:52 +0000 > Al Viro <viro@ZenIV.linux.org.uk> wrote: > > > On Fri, Dec 12, 2014 at 06:54:08AM -0500, Jeff Layton wrote: > > > > > > Umm... I would be very surprised if it turned out to be a problem. > > > > nfsd really doesn't give a fuck about its cwd and root - not in the > > > > thread side of things. And (un)exporting is (a) not on a hot path > > > > and (b) not done from a kernel thread anyway. fh_to_dentry and friends > > > > doesn't care about root/cwd, etc. > > > > > > > > I don't see anything that could cause that kind of issues. > > > > > > I like the change overall -- it would certainly make my patch series > > > simpler, but what about pathwalking? We do take the fs->lock in > > > unlazy_walk. Is it possible we'd end up with more contention there? > > > > That would take a pathname lookup in kernel thread side of nfsd that > > * isn't single-component > > * isn't LOOKUP_ROOT one (i.e. vfs_path_lookup() or file_open_root()) > > and I would really hope we don't have such things. Any such a beast would > > allow probing the tree layout outside of what we export, to start with... > > > > AFAICS, we really don't have anything of that sort. Note that e.g. > > lookup_one_len() doesn't go anywhere near ->fs->lock... > > Ahh right. Ok, then I don't see any issue with this so far. Maybe worth > letting it stew in -next once -rc1 ships? Thanks! FWIW, right now I think that out of those 3 commits #1 (separating PID 1 from init_fs + making all kernel threads get umask 0) and #3 (assorted crapectomy in lustre, getting rid of odd games with fs_struct there) are OK for mainline, with #2 (removal of unshare_fs_struct()) being -next fodder, to see if we get anything like unexpected contention, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index a6b2f90..e097489 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -136,7 +136,6 @@ void cfs_enter_debugger(void); /* * Defined by platform */ -int unshare_fs_struct(void); sigset_t cfs_get_blocked_sigs(void); sigset_t cfs_block_allsigs(void); sigset_t cfs_block_sigs(unsigned long sigs); diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c index c314e9c..53876f9 100644 --- a/drivers/staging/lustre/lustre/obdclass/genops.c +++ b/drivers/staging/lustre/lustre/obdclass/genops.c @@ -1713,7 +1713,6 @@ EXPORT_SYMBOL(obd_zombie_barrier); */ static int obd_zombie_impexp_thread(void *unused) { - unshare_fs_struct(); complete(&obd_zombie_start); obd_zombie_pid = current_pid(); diff --git a/drivers/staging/lustre/lustre/obdclass/llog.c b/drivers/staging/lustre/lustre/obdclass/llog.c index 3ab0529..6130b23 100644 --- a/drivers/staging/lustre/lustre/obdclass/llog.c +++ b/drivers/staging/lustre/lustre/obdclass/llog.c @@ -411,8 +411,6 @@ static int llog_process_thread_daemonize(void *arg) struct lu_env env; int rc; - unshare_fs_struct(); - /* client env has no keys, tags is just 0 */ rc = lu_env_init(&env, LCT_LOCAL | LCT_MG_THREAD); if (rc) diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c index 2e7e717..d395e06 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/import.c +++ b/drivers/staging/lustre/lustre/ptlrpc/import.c @@ -1290,8 +1290,6 @@ static int ptlrpc_invalidate_import_thread(void *data) { struct obd_import *imp = data; - unshare_fs_struct(); - CDEBUG(D_HA, "thread invalidate import %s to %s@%s\n", imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd), imp->imp_connection->c_remote_uuid.uuid); diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c index 20341b2..9f426ea 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c @@ -586,8 +586,6 @@ static int ping_evictor_main(void *arg) struct l_wait_info lwi = { 0 }; time_t expire_time; - unshare_fs_struct(); - CDEBUG(D_HA, "Starting Ping Evictor\n"); pet_state = PET_READY; while (1) { diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c index 357ea9f..a2a1574 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c @@ -382,7 +382,6 @@ static int ptlrpcd(void *arg) struct lu_env env = { .le_ses = NULL }; int rc, exit = 0; - unshare_fs_struct(); #if defined(CONFIG_SMP) if (test_bit(LIOD_BIND, &pc->pc_flags)) { int index = pc->pc_index; diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c index c500aff..9e33781 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c @@ -164,8 +164,6 @@ static int sec_gc_main(void *arg) struct ptlrpc_thread *thread = (struct ptlrpc_thread *) arg; struct l_wait_info lwi; - unshare_fs_struct(); - /* Record that the thread is running */ thread_set_flags(thread, SVC_RUNNING); wake_up(&thread->t_ctl_waitq); diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c b/drivers/staging/lustre/lustre/ptlrpc/service.c index a8df8a7..149c65c 100644 --- a/drivers/staging/lustre/lustre/ptlrpc/service.c +++ b/drivers/staging/lustre/lustre/ptlrpc/service.c @@ -2277,7 +2277,6 @@ static int ptlrpc_main(void *arg) int counter = 0, rc = 0; thread->t_pid = current_pid(); - unshare_fs_struct(); /* NB: we will call cfs_cpt_bind() for all threads, because we * might want to run lustre server only on a subset of system CPUs, @@ -2478,7 +2477,6 @@ static int ptlrpc_hr_main(void *arg) snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d", hrp->hrp_cpt, hrt->hrt_id); - unshare_fs_struct(); rc = cfs_cpt_bind(ptlrpc_hr.hr_cpt_table, hrp->hrp_cpt); if (rc != 0) { diff --git a/fs/fs_struct.c b/fs/fs_struct.c index 7dca743..401fd2e 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -128,29 +128,6 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old) return fs; } -int unshare_fs_struct(void) -{ - struct fs_struct *fs = current->fs; - struct fs_struct *new_fs = copy_fs_struct(fs); - int kill; - - if (!new_fs) - return -ENOMEM; - - task_lock(current); - spin_lock(&fs->lock); - kill = !--fs->users; - current->fs = new_fs; - spin_unlock(&fs->lock); - task_unlock(current); - - if (kill) - free_fs_struct(fs); - - return 0; -} -EXPORT_SYMBOL_GPL(unshare_fs_struct); - int current_umask(void) { return current->fs->umask; @@ -162,5 +139,5 @@ struct fs_struct init_fs = { .users = 1, .lock = __SPIN_LOCK_UNLOCKED(init_fs.lock), .seq = SEQCNT_ZERO(init_fs.seq), - .umask = 0022, + .umask = 0, }; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 752d56b..5c03928 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -573,16 +573,6 @@ nfsd(void *vrqstp) /* Lock module and set up kernel thread */ mutex_lock(&nfsd_mutex); - /* At this point, the thread shares current->fs - * with the init process. We need to create files with a - * umask of 0 instead of init's umask. */ - if (unshare_fs_struct() < 0) { - printk("Unable to start nfsd thread: out of memory\n"); - goto out; - } - - current->fs->umask = 0; - /* * thread is spawned with all signals set to SIG_IGN, re-enable * the ones that will bring down the thread diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index 0efc3e6..18d369c 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -21,7 +21,6 @@ extern void set_fs_root(struct fs_struct *, const struct path *); extern void set_fs_pwd(struct fs_struct *, const struct path *); extern struct fs_struct *copy_fs_struct(struct fs_struct *); extern void free_fs_struct(struct fs_struct *); -extern int unshare_fs_struct(void); static inline void get_fs_root(struct fs_struct *fs, struct path *root) { diff --git a/init/main.c b/init/main.c index ca380ec..d1a4acf 100644 --- a/init/main.c +++ b/init/main.c @@ -399,7 +399,7 @@ static noinline void __init_refok rest_init(void) * the init task will end up wanting to create kthreads, which, if * we schedule it before we create kthreadd, will OOPS. */ - kernel_thread(kernel_init, NULL, CLONE_FS); + kernel_thread(kernel_init, NULL, 0); numa_default_policy(); pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES); rcu_read_lock(); @@ -924,6 +924,7 @@ static int __ref kernel_init(void *unused) { int ret; + current->fs->umask = 0022; kernel_init_freeable(); /* need to finish all async __init code before freeing the memory */ async_synchronize_full(); diff --git a/kernel/kmod.c b/kernel/kmod.c index 2777f40..0686840 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -219,6 +219,7 @@ static int ____call_usermodehelper(void *data) struct cred *new; int retval; + current->fs->umask = 0022; spin_lock_irq(¤t->sighand->siglock); flush_signal_handlers(current, 1); spin_unlock_irq(¤t->sighand->siglock);