Message ID | 499791.1685485603@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Bug in short splice to socket? | expand |
On Tue, 30 May 2023 23:26:43 +0100 David Howells wrote: > Interesting. Now that you've pointed me at it, I've tried running it. Mostly > it passes, but I'm having some problems with the multi_chunk_sendfile tests > that time out. I think that splice_direct_to_actor() has a bug. The problem > is this bit of code: > > /* > * If more data is pending, set SPLICE_F_MORE > * If this is the last data and SPLICE_F_MORE was not set > * initially, clears it. > */ > if (read_len < len) > sd->flags |= SPLICE_F_MORE; > else if (!more) > sd->flags &= ~SPLICE_F_MORE; > > When used with sendfile(), it sets SPLICE_F_MORE (which causes MSG_MORE to be > passed to the network protocol) if we haven't yet read everything that the > user requested and clears it if we fulfilled what the user requested. > > This has the weird effect that MSG_MORE gets kind of inverted. It's never > seen by the actor if we can read the entire request into the pipe - except if > we hit the EOF first. If we hit the EOF before we fulfil the entire request, > we get a short read and SPLICE_F_MORE and thus MSG_MORE *is* set. The > upstream TLS code ignores it - but I'm changing this with my patches as > sendmsg() then uses it to mark the EOR. > > I think we probably need to fix this in some way to check the size of source > file - which may not be a regular file:-/ With the attached change, all tests > pass; without it, a bunch of tests fail with timeouts. Yeah.. it's one of those 'known warts' which we worked around in TLS because I don't know enough about VFS to confidently fix it in fs/. Proper fix would be pretty nice to have. The original-original report of the problem is here, FWIW: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ And my somewhat hacky fix was d452d48b9f8.
diff --git a/fs/splice.c b/fs/splice.c index 3e06611d19ae..a7cf216c02a7 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -982,10 +982,21 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, * If this is the last data and SPLICE_F_MORE was not set * initially, clears it. */ - if (read_len < len) - sd->flags |= SPLICE_F_MORE; - else if (!more) + if (read_len < len) { + struct inode *ii = in->f_mapping->host; + + if (ii->i_fop->llseek != noop_llseek && + pos >= i_size_read(ii)) { + if (!more) + sd->flags &= ~SPLICE_F_MORE; + } else { + sd->flags |= SPLICE_F_MORE; + } + + } else if (!more) { sd->flags &= ~SPLICE_F_MORE; + } + /* * NOTE: nonblocking mode only applies to the input. We * must not do the output in nonblocking mode as then we
Jakub Kicinski <kuba@kernel.org> wrote: > Will the TLS selftests under tools/.../net/tls.c exercise this? Interesting. Now that you've pointed me at it, I've tried running it. Mostly it passes, but I'm having some problems with the multi_chunk_sendfile tests that time out. I think that splice_direct_to_actor() has a bug. The problem is this bit of code: /* * If more data is pending, set SPLICE_F_MORE * If this is the last data and SPLICE_F_MORE was not set * initially, clears it. */ if (read_len < len) sd->flags |= SPLICE_F_MORE; else if (!more) sd->flags &= ~SPLICE_F_MORE; When used with sendfile(), it sets SPLICE_F_MORE (which causes MSG_MORE to be passed to the network protocol) if we haven't yet read everything that the user requested and clears it if we fulfilled what the user requested. This has the weird effect that MSG_MORE gets kind of inverted. It's never seen by the actor if we can read the entire request into the pipe - except if we hit the EOF first. If we hit the EOF before we fulfil the entire request, we get a short read and SPLICE_F_MORE and thus MSG_MORE *is* set. The upstream TLS code ignores it - but I'm changing this with my patches as sendmsg() then uses it to mark the EOR. I think we probably need to fix this in some way to check the size of source file - which may not be a regular file:-/ With the attached change, all tests pass; without it, a bunch of tests fail with timeouts. David ---