diff mbox

[2/2] vfs: grab the lock instead of blocking in __fd_install during resizing

Message ID 1507028295-9353-3-git-send-email-mguzik@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mateusz Guzik Oct. 3, 2017, 10:58 a.m. UTC
Explicit locking in the fallback case provides a safe state of the
table. Getting rid of blocking semantics makes __fd_install usable
again in non-sleepable contexts, which easies backporting efforts.

There is a side effect of slightly nicer assembly for the common case
as might_sleep can now be removed.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
---
 Documentation/filesystems/porting |  4 ----
 fs/file.c                         | 11 +++++++----
 2 files changed, 7 insertions(+), 8 deletions(-)

Comments

Eric Dumazet Oct. 3, 2017, 2:41 p.m. UTC | #1
On Tue, Oct 3, 2017 at 3:58 AM, Mateusz Guzik <mguzik@redhat.com> wrote:
> Explicit locking in the fallback case provides a safe state of the
> table. Getting rid of blocking semantics makes __fd_install usable
> again in non-sleepable contexts, which easies backporting efforts.
>
> There is a side effect of slightly nicer assembly for the common case
> as might_sleep can now be removed.
>
> Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
> ---
>  Documentation/filesystems/porting |  4 ----
>  fs/file.c                         | 11 +++++++----
>  2 files changed, 7 insertions(+), 8 deletions(-)

Nice change !

Reviewed-by: Eric Dumazet <edumazet@google.com>
Matthew Wilcox Oct. 4, 2017, 1:55 p.m. UTC | #2
On Tue, Oct 03, 2017 at 07:41:11AM -0700, Eric Dumazet wrote:
> On Tue, Oct 3, 2017 at 3:58 AM, Mateusz Guzik <mguzik@redhat.com> wrote:
> > Explicit locking in the fallback case provides a safe state of the
> > table. Getting rid of blocking semantics makes __fd_install usable
> > again in non-sleepable contexts, which easies backporting efforts.
> >
> > There is a side effect of slightly nicer assembly for the common case
> > as might_sleep can now be removed.
> >
> > Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
> > ---
> >  Documentation/filesystems/porting |  4 ----
> >  fs/file.c                         | 11 +++++++----
> >  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> Nice change !
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Hey Eric,

Any chance you could review the patches from Sandhya that make this entire
codepath obsolete?

https://lkml.org/lkml/2017/4/29/20
Eric Dumazet Oct. 4, 2017, 2 p.m. UTC | #3
On Wed, Oct 4, 2017 at 6:55 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Oct 03, 2017 at 07:41:11AM -0700, Eric Dumazet wrote:
>> On Tue, Oct 3, 2017 at 3:58 AM, Mateusz Guzik <mguzik@redhat.com> wrote:
>> > Explicit locking in the fallback case provides a safe state of the
>> > table. Getting rid of blocking semantics makes __fd_install usable
>> > again in non-sleepable contexts, which easies backporting efforts.
>> >
>> > There is a side effect of slightly nicer assembly for the common case
>> > as might_sleep can now be removed.
>> >
>> > Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
>> > ---
>> >  Documentation/filesystems/porting |  4 ----
>> >  fs/file.c                         | 11 +++++++----
>> >  2 files changed, 7 insertions(+), 8 deletions(-)
>>
>> Nice change !
>>
>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>
> Hey Eric,
>
> Any chance you could review the patches from Sandhya that make this entire
> codepath obsolete?
>
> https://lkml.org/lkml/2017/4/29/20
>

Hmm...

18 files changed, 578 insertions(+), 585 deletions(-)

Frankly I need to be convinced with solid performance numbers before I
am taking a look at this series.

I do not believe an IDR will be faster than current implementation, so
I am not quite convinced at this moment.

Thanks.
Matthew Wilcox Oct. 4, 2017, 2:58 p.m. UTC | #4
On Wed, Oct 04, 2017 at 07:00:40AM -0700, Eric Dumazet wrote:
> On Wed, Oct 4, 2017 at 6:55 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > Any chance you could review the patches from Sandhya that make this entire
> > codepath obsolete?
> >
> > https://lkml.org/lkml/2017/4/29/20
> >
> 
> Hmm...
> 
> 18 files changed, 578 insertions(+), 585 deletions(-)
> 
> Frankly I need to be convinced with solid performance numbers before I
> am taking a look at this series.

I was hoping you'd help us get some solid performance numbers ... you
seem to have workloads available to you that help find weaknesses in
implementations.

The number of lines inserted is a bit of a red herring.  Over 100 are in
the test suite (you surely aren't going to review those) and another ~300
are adding enhancements to the IDR & radix tree that should be useful
for other users (eg I think I have a way to speed up writing out dirty
pages by using get_tag_batch()).

> I do not believe an IDR will be faster than current implementation, so
> I am not quite convinced at this moment.

I don't think it should be significantly different in performance.  Let's
look at the layout of data for a typical bash shell (fds 0-2 and 255 open).

Current implementation:

files_struct -> fdt -> fd -> struct file

IDR:

files_struct -> radix node (shift 6) -> radix node (shift 0) -> struct file

In either case, it's the same number of dependent loads.  It'll start
to look worse for the radix tree above 4096 open fds in a given process.
Eric Dumazet Oct. 4, 2017, 3:04 p.m. UTC | #5
On Wed, Oct 4, 2017 at 7:58 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Oct 04, 2017 at 07:00:40AM -0700, Eric Dumazet wrote:
>> On Wed, Oct 4, 2017 at 6:55 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> > Any chance you could review the patches from Sandhya that make this entire
>> > codepath obsolete?
>> >
>> > https://lkml.org/lkml/2017/4/29/20
>> >
>>
>> Hmm...
>>
>> 18 files changed, 578 insertions(+), 585 deletions(-)
>>
>> Frankly I need to be convinced with solid performance numbers before I
>> am taking a look at this series.
>
> I was hoping you'd help us get some solid performance numbers ... you
> seem to have workloads available to you that help find weaknesses in
> implementations.
>
> The number of lines inserted is a bit of a red herring.  Over 100 are in
> the test suite (you surely aren't going to review those) and another ~300
> are adding enhancements to the IDR & radix tree that should be useful
> for other users (eg I think I have a way to speed up writing out dirty
> pages by using get_tag_batch()).
>
>> I do not believe an IDR will be faster than current implementation, so
>> I am not quite convinced at this moment.
>
> I don't think it should be significantly different in performance.  Let's
> look at the layout of data for a typical bash shell (fds 0-2 and 255 open).
>
> Current implementation:
>
> files_struct -> fdt -> fd -> struct file
>
> IDR:
>
> files_struct -> radix node (shift 6) -> radix node (shift 0) -> struct file
>
> In either case, it's the same number of dependent loads.  It'll start
> to look worse for the radix tree above 4096 open fds in a given process.

I am interested in performance for process with 10,000,000 fds, and
~100 threads constantly adding/deleting/using fds.

Thanks
diff mbox

Patch

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 93e0a24..17bb4dc 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -502,10 +502,6 @@  in your dentry operations instead.
 	store it as cookie.
 --
 [mandatory]
-	__fd_install() & fd_install() can now sleep. Callers should not
-	hold a spinlock	or other resources that do not allow a schedule.
---
-[mandatory]
 	any symlink that might use page_follow_link_light/page_put_link() must
 	have inode_nohighmem(inode) called before anything might start playing with
 	its pagecache.  No highmem pages should end up in the pagecache of such
diff --git a/fs/file.c b/fs/file.c
index 9d047bd..4115503 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -592,13 +592,16 @@  void __fd_install(struct files_struct *files, unsigned int fd,
 {
 	struct fdtable *fdt;
 
-	might_sleep();
 	rcu_read_lock_sched();
 
-	while (unlikely(files->resize_in_progress)) {
+	if (unlikely(files->resize_in_progress)) {
 		rcu_read_unlock_sched();
-		wait_event(files->resize_wait, !files->resize_in_progress);
-		rcu_read_lock_sched();
+		spin_lock(&files->file_lock);
+		fdt = files_fdtable(files);
+		BUG_ON(fdt->fd[fd] != NULL);
+		rcu_assign_pointer(fdt->fd[fd], file);
+		spin_unlock(&files->file_lock);
+		return;
 	}
 	/* coupled with smp_wmb() in expand_fdtable() */
 	smp_rmb();