Message ID | 20200817220425.9389-9-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/17] exec: Move unshare_files to fix posix file locking during exec | expand |
I like the series, but I have to say that the extension of the horrible "fcheck*()" interfaces raises my hackles.. That interface is very very questionable to begin with, and it's easy to get wrong. I don't see you getting it wrong - you do seem to hold the RCU read lock in the places I checked, but it worries me. I think my worry could be at least partially mitigated with more comments and docs: On Mon, Aug 17, 2020 at 3:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > +struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd) Please please *please* make it clear that because this does *not* increment any reference counts, you have to be very very careful using the "struct file" pointer this returns. I really dislike the naming. The old "fcheck()" naming came from the fact that at least one original user just mainly checked if the result was NULL or not. And then others had to be careful to either hold the file_lock spinlock, or at least the RCU read lock so that the result didn't go away. Here, you have "fnext_task()", and it doesn't even have that "check" warning mark, or any comment about how dangerous it can be to use.. So the rule is that the "struct file" pointer this returns can only be read under RCU, but the 'fd' it returns has a longer lifetime. I think your uses are ok, and I think this is an improvement in general, I just want a bigger "WARNING! Here be dragons!" sign (and naming could be nicer). Linus
On Mon, Aug 17, 2020 at 6:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > I struggle with the fcheck name as I have not seen or at least not > registed on the the user that just checks to see if the result is NULL. > So the name fcheck never made a bit of sense to me. Yeah, that name is not great. I just don't want to make things even worse. > I will see if I can come up with some good descriptive comments around > these functions. Along with describing what these things are doing I am > thinking maybe I should put "_rcu" in their names and have a debug check > that verifies "_rcu" is held. Yeah, something along the lines of "rcu_lookup_fd_task(tsk,fd)" would be a *lot* more descriptive than fcheck_task(). And I think "fnext_task()" could be "rcu_lookup_next_fd_task(tsk,fd)". Yes, those are much longer names, but it's not like you end up typing them all that often, and I think being descriptive would be worth it. And "fcheck()" and "fcheck_files()" would be good to rename too along the same lines. Something like "rcu_lookup_fd()" and "rcu_lookup_fd_files()" respectively? I'm obviously trying to go for a "rcu_lookup_fd*()" kind of pattern, and I'm not married to _that_ particular pattern but I think it would be better than what we have now. Linus
On Mon, Aug 17, 2020 at 06:17:35PM -0700, Linus Torvalds wrote: > On Mon, Aug 17, 2020 at 6:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > I struggle with the fcheck name as I have not seen or at least not > > registed on the the user that just checks to see if the result is NULL. > > So the name fcheck never made a bit of sense to me. > > Yeah, that name is not great. I just don't want to make things even worse. > > > I will see if I can come up with some good descriptive comments around > > these functions. Along with describing what these things are doing I am > > thinking maybe I should put "_rcu" in their names and have a debug check > > that verifies "_rcu" is held. > > Yeah, something along the lines of "rcu_lookup_fd_task(tsk,fd)" would > be a *lot* more descriptive than fcheck_task(). > > And I think "fnext_task()" could be "rcu_lookup_next_fd_task(tsk,fd)". > > Yes, those are much longer names, but it's not like you end up typing > them all that often, and I think being descriptive would be worth it. > > And "fcheck()" and "fcheck_files()" would be good to rename too along > the same lines. > > Something like "rcu_lookup_fd()" and "rcu_lookup_fd_files()" respectively? > > I'm obviously trying to go for a "rcu_lookup_fd*()" kind of pattern, > and I'm not married to _that_ particular pattern but I think it would > be better than what we have now. In fs/inode.c and a few other places we have the *_rcu suffix pattern already so maybe: fcheck() -> fd_file_rcu() or lookup_fd_rcu() fcheck_files() -> fd_files_rcu() or lookup_fd_files_rcu() fnext_task() -> fd_file_from_task_rcu() or lookup_next_fd_from_task_rcu() rather than as prefix or sm. Christian
On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > The bug in the existing code is that bpf_iter does get_file instead > of get_file_rcu. Does anyone have any sense of how to add debugging > to get_file to notice when it is being called in the wrong context? That bug is already fixed in bpf tree. See commit cf28f3bbfca0 ("bpf: Use get_file_rcu() instead of get_file() for task_file iterator")
On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > So I sat down and played with it and here is what I wound up with is: > > __fcheck_files -> files_lookup_fd_raw > fcheck_files -> files_lookup_fd_locked > fcheck_files -> files_lookup_fd_rcu > fcheck -> lookup_fd_rcu > ... > fcheck_task -> task_lookup_fd_fcu > fnext_task -> task_lookup_next_fd_rcu This certainly looks fine to me. No confusion about what it does. So Ack. Linus
On Mon, Aug 17, 2020 at 05:04:17PM -0500, Eric W. Biederman wrote: > As a companion to fget_task and fcheck_task implement fnext_task that > will return the struct file for the first file descriptor show number > is equal or greater than the fd argument value, or NULL if there is > no such struct file. > > This allows file descriptors of foreign processes to be iterated through > safely, without needed to increment the count on files_struct. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/file.c | 21 +++++++++++++++++++++ > include/linux/fdtable.h | 1 + > 2 files changed, 22 insertions(+) > > diff --git a/fs/file.c b/fs/file.c > index 8d4b385055e9..88f9f78869f8 100644 > --- a/fs/file.c > +++ b/fs/file.c > @@ -876,6 +876,27 @@ struct file *fcheck_task(struct task_struct *task, unsigned int fd) > return file; > } > > +struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd) > +{ > + /* Must be called with rcu_read_lock held */ > + struct files_struct *files; > + unsigned int fd = *ret_fd; > + struct file *file = NULL; > + > + task_lock(task); > + files = task->files; > + if (files) { > + for (; fd < files_fdtable(files)->max_fds; fd++) { > + file = fcheck_files(files, fd); > + if (file) > + break; > + } > + } > + task_unlock(task); > + *ret_fd = fd; > + return file; > +} Eric, if only I'm not missing something obvious you could escape @fd/@ret_fd operations in case if task->files = NULL, iow if (files) { unsigned int fd = *ret_fd; for (; fd < files_fdtable(files)->max_fds; fd++) { file = fcheck_files(files, fd); if (file) break; } *ret_fd = fd; }
On Fri, Aug 21, 2020 at 8:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Wed, Aug 19, 2020 at 6:25 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > >> The bug in the existing code is that bpf_iter does get_file instead > >> of get_file_rcu. Does anyone have any sense of how to add debugging > >> to get_file to notice when it is being called in the wrong context? > > > > That bug is already fixed in bpf tree. > > See commit cf28f3bbfca0 ("bpf: Use get_file_rcu() instead of > > get_file() for task_file iterator") > > I wished you had based that change on -rc1 instead of some random > looking place in David's Millers net tree. random? It's a well documented process. Please see: Documentation/bpf/bpf_devel_QA.rst > I am glad to see that our existing debug checks can catch that > kind of problem when the code is exercised enough. They did not. Please see the commit log of the fix. It was a NULL pointer dereference. > I am going to pull this change into my tree on top of -rc1 so we won't > have unnecessary conflicts. Hopefully this will show up in -rc2 so the > final version of this patchset can use an easily describable base. Please do not cherry pick fixes from other trees. You need to wait until the bpf tree gets merged into net tree and net into Linus's tree. It's only a couple days away. Hopefully it's there by -rc2, but I cannot speak for Dave's schedule. We'll send bpf tree pull-req to Dave today.
diff --git a/fs/file.c b/fs/file.c index 8d4b385055e9..88f9f78869f8 100644 --- a/fs/file.c +++ b/fs/file.c @@ -876,6 +876,27 @@ struct file *fcheck_task(struct task_struct *task, unsigned int fd) return file; } +struct file *fnext_task(struct task_struct *task, unsigned int *ret_fd) +{ + /* Must be called with rcu_read_lock held */ + struct files_struct *files; + unsigned int fd = *ret_fd; + struct file *file = NULL; + + task_lock(task); + files = task->files; + if (files) { + for (; fd < files_fdtable(files)->max_fds; fd++) { + file = fcheck_files(files, fd); + if (file) + break; + } + } + task_unlock(task); + *ret_fd = fd; + return file; +} + /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index def9debd2ce2..a3a054084f49 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -104,6 +104,7 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int */ #define fcheck(fd) fcheck_files(current->files, fd) struct file *fcheck_task(struct task_struct *task, unsigned int fd); +struct file *fnext_task(struct task_struct *task, unsigned int *fd); struct task_struct;
As a companion to fget_task and fcheck_task implement fnext_task that will return the struct file for the first file descriptor show number is equal or greater than the fd argument value, or NULL if there is no such struct file. This allows file descriptors of foreign processes to be iterated through safely, without needed to increment the count on files_struct. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/file.c | 21 +++++++++++++++++++++ include/linux/fdtable.h | 1 + 2 files changed, 22 insertions(+)