[v2,01/16] CIFS: Fix async reading on reconnects

Message ID	1403863073-19526-2-git-send-email-pshilovsky@samba.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-cifs-owner@kernel.org> From: Pavel Shilovsky <pshilovsky@samba.org> To: linux-cifs@vger.kernel.org Cc: Jeff Layton <jlayton@poochiereds.net> Subject: [PATCH v2 01/16] CIFS: Fix async reading on reconnects Date: Fri, 27 Jun 2014 13:57:38 +0400 Message-Id: <1403863073-19526-2-git-send-email-pshilovsky@samba.org> In-Reply-To: <1403863073-19526-1-git-send-email-pshilovsky@samba.org> References: <1403863073-19526-1-git-send-email-pshilovsky@samba.org> Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk

Pavel Shilovsky June 27, 2014, 9:57 a.m. UTC

If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.

After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.

Cc: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
---
 fs/cifs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Jeff Layton June 27, 2014, 10:52 a.m. UTC | #1

On Fri, 27 Jun 2014 13:57:38 +0400
Pavel Shilovsky <pshilovsky@samba.org> wrote:

> If we get into read_into_pages() from cifs_readv_receive() and then
> loose a network, we issue cifs_reconnect that moves all mids to
> a private list and issue their callbacks. The callback of the async
> read request sets a mid to retry, frees it and wakes up a process
> that waits on the rdata completion.
> 
> After the connection is established we return from read_into_pages()
> with a short read, use the mid that was freed before and try to read
> the remaining data from the a newly created socket. Both actions are
> not what we want to do. In reconnect cases (-EAGAIN) we should not
> mask off the error with a short read but should return the error
> code instead.
> 

I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?

> Cc: Jeff Layton <jlayton@poochiereds.net>
> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
> ---
>  fs/cifs/file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index e90a1e9..6b6df30 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2823,7 +2823,7 @@ cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
>  		total_read += result;
>  	}
>  
> -	return total_read > 0 ? total_read : result;
> +	return total_read > 0 && result != -EAGAIN ? total_read : result;
>  }
>  
>  ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> @@ -3231,7 +3231,7 @@ cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
>  		total_read += result;
>  	}
>  
> -	return total_read > 0 ? total_read : result;
> +	return total_read > 0 && result != -EAGAIN ? total_read : result;
>  }
>  
>  static int cifs_readpages(struct file *file, struct address_space *mapping,

Steve French June 27, 2014, 2:14 p.m. UTC | #2

---------- Forwarded message ----------
From: Steve French <smfrench@gmail.com>
Date: Fri, Jun 27, 2014 at 9:12 AM
Subject: Re: [PATCH v2 01/16] CIFS: Fix async reading on reconnects
To: Jeff Layton <jlayton@poochiereds.net>
Cc: Pavel Shilovsky <pshilovsky@samba.org>,
"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>





On Fri, Jun 27, 2014 at 5:52 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
>
> On Fri, 27 Jun 2014 13:57:38 +0400
> Pavel Shilovsky <pshilovsky@samba.org> wrote:
>
> > If we get into read_into_pages() from cifs_readv_receive() and then
> > loose a network, we issue cifs_reconnect that moves all mids to
> > a private list and issue their callbacks. The callback of the async
> > read request sets a mid to retry, frees it and wakes up a process
> > that waits on the rdata completion.
> >
> > After the connection is established we return from read_into_pages()
> > with a short read, use the mid that was freed before and try to read
> > the remaining data from the a newly created socket. Both actions are
> > not what we want to do. In reconnect cases (-EAGAIN) we should not
> > mask off the error with a short read but should return the error
> > code instead.
> >
>
> I'm not sure I understand what problem this solves. Why is returning a
> short read wrong here?
>

It will oops since the mid has already been freed and the caller uses
the mid (cifs_readv_receive calls dequeue_mid with the mid).
rdata->read_into_pages succeeded with a short read so length > 0 but
the mid was freed due to the reconnect

Pavel Shilovsky June 27, 2014, 4:06 p.m. UTC | #3

2014-06-27 14:52 GMT+04:00 Jeff Layton <jlayton@poochiereds.net>:
> On Fri, 27 Jun 2014 13:57:38 +0400
> Pavel Shilovsky <pshilovsky@samba.org> wrote:
>
>> If we get into read_into_pages() from cifs_readv_receive() and then
>> loose a network, we issue cifs_reconnect that moves all mids to
>> a private list and issue their callbacks. The callback of the async
>> read request sets a mid to retry, frees it and wakes up a process
>> that waits on the rdata completion.
>>
>> After the connection is established we return from read_into_pages()
>> with a short read, use the mid that was freed before and try to read
>> the remaining data from the a newly created socket. Both actions are
>> not what we want to do. In reconnect cases (-EAGAIN) we should not
>> mask off the error with a short read but should return the error
>> code instead.
>>
>
> I'm not sure I understand what problem this solves. Why is returning a
> short read wrong here?
>

This patch solves several problems in cifs_readv_receive() when
read_into_pages() call ended up with cifs_reconnect inside it:

1) prevents use-after-free and a possible double free for "mid"
variable; this can cause kernel oopses.

2) prevents updating rdata->bytes with a short read length that causes
the cifs_async_readv() caller to retry write request with a shorten
wrong length; this can cause data coherency problems.

3) prevents reading an actual data from a newly created socket through
cifs_readv_discard() -- this data can be a part of a negotiate
response received from the server; this can cause the client to retry
negotiate or hang.

Oopses from the 1st scenario was not observed since
cifs_readv_discard() returns negative value while trying to get a
remaining data. The 2nd and 3rd one were caught during testing: issue
reading with "dd", then switch off network on the server, wait 120 sec
and switch on network. The bug is caught when we are lucky enough to
stop the link while the client is in read_into_pages().

The patch fixes all these problems for me. It stays on the following
logic if a reconnect has happened:
1) all mids are already marked to retry and their callbacks are issued
- we shouldn't do anything with our mid/rdata since the requests will
be retried anyway.
2) the socket is newly created - we shouldn't try to get any remaining
data of the previous connection.

While -EAGAIN error code will be able to indicate something else in
possible future patches I prefer to leave it that way since the patch
is small and easy to apply that is critical for stable series. More
accurate way is to add a flag that is passed through
cifs_readv_from_socket() and switched on if cifs_reconnect() was
issued (like Steve suggested).

Jeff Layton July 3, 2014, 10:45 a.m. UTC | #4

On Fri, 27 Jun 2014 20:06:48 +0400
Pavel Shilovsky <pshilovsky@samba.org> wrote:

> 2014-06-27 14:52 GMT+04:00 Jeff Layton <jlayton@poochiereds.net>:
> > On Fri, 27 Jun 2014 13:57:38 +0400
> > Pavel Shilovsky <pshilovsky@samba.org> wrote:
> >
> >> If we get into read_into_pages() from cifs_readv_receive() and then
> >> loose a network, we issue cifs_reconnect that moves all mids to
> >> a private list and issue their callbacks. The callback of the async
> >> read request sets a mid to retry, frees it and wakes up a process
> >> that waits on the rdata completion.
> >>
> >> After the connection is established we return from read_into_pages()
> >> with a short read, use the mid that was freed before and try to read
> >> the remaining data from the a newly created socket. Both actions are
> >> not what we want to do. In reconnect cases (-EAGAIN) we should not
> >> mask off the error with a short read but should return the error
> >> code instead.
> >>
> >
> > I'm not sure I understand what problem this solves. Why is returning a
> > short read wrong here?
> >
> 
> This patch solves several problems in cifs_readv_receive() when
> read_into_pages() call ended up with cifs_reconnect inside it:
> 
> 1) prevents use-after-free and a possible double free for "mid"
> variable; this can cause kernel oopses.
> 
> 2) prevents updating rdata->bytes with a short read length that causes
> the cifs_async_readv() caller to retry write request with a shorten
> wrong length; this can cause data coherency problems.
> 
> 3) prevents reading an actual data from a newly created socket through
> cifs_readv_discard() -- this data can be a part of a negotiate
> response received from the server; this can cause the client to retry
> negotiate or hang.
> 
> Oopses from the 1st scenario was not observed since
> cifs_readv_discard() returns negative value while trying to get a
> remaining data. The 2nd and 3rd one were caught during testing: issue
> reading with "dd", then switch off network on the server, wait 120 sec
> and switch on network. The bug is caught when we are lucky enough to
> stop the link while the client is in read_into_pages().
> 
> The patch fixes all these problems for me. It stays on the following
> logic if a reconnect has happened:
> 1) all mids are already marked to retry and their callbacks are issued
> - we shouldn't do anything with our mid/rdata since the requests will
> be retried anyway.
> 2) the socket is newly created - we shouldn't try to get any remaining
> data of the previous connection.
> 
> While -EAGAIN error code will be able to indicate something else in
> possible future patches I prefer to leave it that way since the patch
> is small and easy to apply that is critical for stable series. More
> accurate way is to add a flag that is passed through
> cifs_readv_from_socket() and switched on if cifs_reconnect() was
> issued (like Steve suggested).
> 

Hmm, ok. I think the best fix would be to use a different error code
than -EAGAIN for the case where we have or are going to reconnect the
socket.

My main concern here is that you may end up doing 99% of a valid read
into the set of pages, only to get a disconnect right at the end. If
you eventually end up returning a non-retryable error back to userland,
it may not expect that the pages it passed in were written to. If you
don't return a short read in that situation then you're potentially not
communicating what actually happened to userland.

Pavel Shilovsky July 3, 2014, 4:12 p.m. UTC | #5

2014-07-03 14:45 GMT+04:00 Jeff Layton <jlayton@samba.org>:
> On Fri, 27 Jun 2014 20:06:48 +0400
> Pavel Shilovsky <pshilovsky@samba.org> wrote:
>
>> 2014-06-27 14:52 GMT+04:00 Jeff Layton <jlayton@poochiereds.net>:
>> > On Fri, 27 Jun 2014 13:57:38 +0400
>> > Pavel Shilovsky <pshilovsky@samba.org> wrote:
>> >
>> >> If we get into read_into_pages() from cifs_readv_receive() and then
>> >> loose a network, we issue cifs_reconnect that moves all mids to
>> >> a private list and issue their callbacks. The callback of the async
>> >> read request sets a mid to retry, frees it and wakes up a process
>> >> that waits on the rdata completion.
>> >>
>> >> After the connection is established we return from read_into_pages()
>> >> with a short read, use the mid that was freed before and try to read
>> >> the remaining data from the a newly created socket. Both actions are
>> >> not what we want to do. In reconnect cases (-EAGAIN) we should not
>> >> mask off the error with a short read but should return the error
>> >> code instead.
>> >>
>> >
>> > I'm not sure I understand what problem this solves. Why is returning a
>> > short read wrong here?
>> >
>>
>> This patch solves several problems in cifs_readv_receive() when
>> read_into_pages() call ended up with cifs_reconnect inside it:
>>
>> 1) prevents use-after-free and a possible double free for "mid"
>> variable; this can cause kernel oopses.
>>
>> 2) prevents updating rdata->bytes with a short read length that causes
>> the cifs_async_readv() caller to retry write request with a shorten
>> wrong length; this can cause data coherency problems.
>>
>> 3) prevents reading an actual data from a newly created socket through
>> cifs_readv_discard() -- this data can be a part of a negotiate
>> response received from the server; this can cause the client to retry
>> negotiate or hang.
>>
>> Oopses from the 1st scenario was not observed since
>> cifs_readv_discard() returns negative value while trying to get a
>> remaining data. The 2nd and 3rd one were caught during testing: issue
>> reading with "dd", then switch off network on the server, wait 120 sec
>> and switch on network. The bug is caught when we are lucky enough to
>> stop the link while the client is in read_into_pages().
>>
>> The patch fixes all these problems for me. It stays on the following
>> logic if a reconnect has happened:
>> 1) all mids are already marked to retry and their callbacks are issued
>> - we shouldn't do anything with our mid/rdata since the requests will
>> be retried anyway.
>> 2) the socket is newly created - we shouldn't try to get any remaining
>> data of the previous connection.
>>
>> While -EAGAIN error code will be able to indicate something else in
>> possible future patches I prefer to leave it that way since the patch
>> is small and easy to apply that is critical for stable series. More
>> accurate way is to add a flag that is passed through
>> cifs_readv_from_socket() and switched on if cifs_reconnect() was
>> issued (like Steve suggested).
>>
>
> Hmm, ok. I think the best fix would be to use a different error code
> than -EAGAIN for the case where we have or are going to reconnect the
> socket.
>
> My main concern here is that you may end up doing 99% of a valid read
> into the set of pages, only to get a disconnect right at the end. If
> you eventually end up returning a non-retryable error back to userland,
> it may not expect that the pages it passed in were written to. If you
> don't return a short read in that situation then you're potentially not
> communicating what actually happened to userland.

In that case we should prevent cifs_reconnect from removing the MID we
are actually using in cifs_readv_receive() from the mid list. For
example by calling dequeue_mid() before read_into_pages() and then
returning a short read if reconnect happened?

Jeff Layton July 3, 2014, 4:39 p.m. UTC | #6

On Thu, 3 Jul 2014 06:45:49 -0400
Jeff Layton <jlayton@samba.org> wrote:

> On Fri, 27 Jun 2014 20:06:48 +0400
> Pavel Shilovsky <pshilovsky@samba.org> wrote:
> 
> > 2014-06-27 14:52 GMT+04:00 Jeff Layton <jlayton@poochiereds.net>:
> > > On Fri, 27 Jun 2014 13:57:38 +0400
> > > Pavel Shilovsky <pshilovsky@samba.org> wrote:
> > >
> > >> If we get into read_into_pages() from cifs_readv_receive() and then
> > >> loose a network, we issue cifs_reconnect that moves all mids to
> > >> a private list and issue their callbacks. The callback of the async
> > >> read request sets a mid to retry, frees it and wakes up a process
> > >> that waits on the rdata completion.
> > >>
> > >> After the connection is established we return from read_into_pages()
> > >> with a short read, use the mid that was freed before and try to read
> > >> the remaining data from the a newly created socket. Both actions are
> > >> not what we want to do. In reconnect cases (-EAGAIN) we should not
> > >> mask off the error with a short read but should return the error
> > >> code instead.
> > >>
> > >
> > > I'm not sure I understand what problem this solves. Why is returning a
> > > short read wrong here?
> > >
> > 
> > This patch solves several problems in cifs_readv_receive() when
> > read_into_pages() call ended up with cifs_reconnect inside it:
> > 
> > 1) prevents use-after-free and a possible double free for "mid"
> > variable; this can cause kernel oopses.
> > 
> > 2) prevents updating rdata->bytes with a short read length that causes
> > the cifs_async_readv() caller to retry write request with a shorten
> > wrong length; this can cause data coherency problems.
> > 
> > 3) prevents reading an actual data from a newly created socket through
> > cifs_readv_discard() -- this data can be a part of a negotiate
> > response received from the server; this can cause the client to retry
> > negotiate or hang.
> > 
> > Oopses from the 1st scenario was not observed since
> > cifs_readv_discard() returns negative value while trying to get a
> > remaining data. The 2nd and 3rd one were caught during testing: issue
> > reading with "dd", then switch off network on the server, wait 120 sec
> > and switch on network. The bug is caught when we are lucky enough to
> > stop the link while the client is in read_into_pages().
> > 
> > The patch fixes all these problems for me. It stays on the following
> > logic if a reconnect has happened:
> > 1) all mids are already marked to retry and their callbacks are issued
> > - we shouldn't do anything with our mid/rdata since the requests will
> > be retried anyway.
> > 2) the socket is newly created - we shouldn't try to get any remaining
> > data of the previous connection.
> > 
> > While -EAGAIN error code will be able to indicate something else in
> > possible future patches I prefer to leave it that way since the patch
> > is small and easy to apply that is critical for stable series. More
> > accurate way is to add a flag that is passed through
> > cifs_readv_from_socket() and switched on if cifs_reconnect() was
> > issued (like Steve suggested).
> > 
> 
> Hmm, ok. I think the best fix would be to use a different error code
> than -EAGAIN for the case where we have or are going to reconnect the
> socket.
> 
> My main concern here is that you may end up doing 99% of a valid read
> into the set of pages, only to get a disconnect right at the end. If
> you eventually end up returning a non-retryable error back to userland,
> it may not expect that the pages it passed in were written to. If you
> don't return a short read in that situation then you're potentially not
> communicating what actually happened to userland.
> 

After chatting with Pavel, I'll go ahead and Ack this patch. I think
it's probably the lesser evil...

It does seem to me though it seems to me that if you got part of a read
request and then hit an error, then it would be:

a) more efficient to not reissue the reads that were successful

...and...

b) more correct to go ahead and return a short read. There's no
guarantee that you'll be able to reach the server in order to do the
reconnect, so it seems like it would be better to go ahead and return
what we have than keep retrying in the kernel.

So...with that in mind...

Acked-by: Jeff Layton <jlayton@samba.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pavel Shilovsky July 21, 2014, 3:41 p.m. UTC | #7

2014-07-03 20:39 GMT+04:00 Jeff Layton <jlayton@samba.org>:
> After chatting with Pavel, I'll go ahead and Ack this patch. I think
> it's probably the lesser evil...
>
> It does seem to me though it seems to me that if you got part of a read
> request and then hit an error, then it would be:
>
> a) more efficient to not reissue the reads that were successful
>
> ...and...
>
> b) more correct to go ahead and return a short read. There's no
> guarantee that you'll be able to reach the server in order to do the
> reconnect, so it seems like it would be better to go ahead and return
> what we have than keep retrying in the kernel.
>
> So...with that in mind...
>
> Acked-by: Jeff Layton <jlayton@samba.org>

I think the patch [01/16] of the series should go to stable since it
fixes the existing problem. Thoughts?

[v2,01/16] CIFS: Fix async reading on reconnects

Commit Message

Comments

Patch