Message ID | 20130725201805.GB17962@fieldses.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> wrote: > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > guarantee that there is enough write-space on each connection to > > queue a reply. > > > > If memory pressure causes the window to shrink too small, the request > > throttling in sunrpc/svc will not accept any requests so no more requests > > will be handled. Even when pressure decreases the window will not > > grow again until data is sent on the connection. > > This means we get a deadlock: no requests will be handled until there > > is more space, and no space will be allocated until a request is > > handled. > > > > This can be simulated by modifying svc_tcp_has_wspace to inflate the > > number of byte required and removing the 'svc_sock_setbufsize' calls > > in svc_setup_socket. > > Ah-hah! > > > I found that multiplying by 16 was enough to make the requirement > > exceed the default allocation. With this modification in place: > > mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt > > would block and eventually time out because the nfs server could not > > accept any requests. > > So, this?: Close enough. I just put "//" in front of the lines I didn't want rather than delete them. But yes: exactly that effect. > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 305374d..36de50d 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) > if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) > return 1; > required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg; > + required *= 16; > if (sk_stream_wspace(svsk->sk_sk) >= required) > return 1; > set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, > /* Initialize the socket */ > if (sock->type == SOCK_DGRAM) > svc_udp_init(svsk, serv); > - else { > - /* initialise setting must have enough space to > - * receive and respond to one request. > - */ > - svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg, > - 4 * serv->sv_max_mesg); > + else > svc_tcp_init(svsk, serv); > - } > > dprintk("svc: svc_setup_socket created %p (inet %p)\n", > svsk, svsk->sk_sk); > > > This patch relaxes the request throttling to always allow at least one > > request through per connection. It does this by checking both > > sk_stream_min_wspace() and xprt->xpt_reserved > > are zero. > > The first is zero when the TCP transmit queue is empty. > > The second is zero when there are no RPC requests being processed. > > When both of these are zero the socket is idle and so one more > > request can safely be allowed through. > > > > Applying this patch allows the above mount command to succeed cleanly. > > Tracing shows that the allocated write buffer space quickly grows and > > after a few requests are handled, the extra tests are no longer needed > > to permit further requests to be processed. > > > > The main purpose of request throttling is to handle the case when one > > client is slow at collecting replies and the send queue gets full of > > replies that the client hasn't acknowledged (at the TCP level) yet. > > As we only change behaviour when the send queue is empty this main > > purpose is still preserved. > > > > Reported-by: Ben Myers <bpm@sgi.com> > > Signed-off-by: NeilBrown <neilb@suse.de> > > > > -- > > As you can see I've changed the patch. While writing up the above > > description realised there was a weakness and so added the sk_stream_min_wspace > > test. That allowed me to write the final paragraph. > > This is great, thanks! > > Inclined to queue it up for 3.11 and stable.... I'd agree for 3.11. It feels a bit border-line for stable. "dead-lock" and "has been seen in the wild" are technically enough justification... I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" or something like that, just for a bit of breathing space. Your call though. NeilBrown > > --b. > > > > > NeilBrown > > > > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index 305374d..7762b9f 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) > > if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) > > return 1; > > required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg; > > - if (sk_stream_wspace(svsk->sk_sk) >= required) > > + if (sk_stream_wspace(svsk->sk_sk) >= required || > > + (sk_stream_min_wspace(svsk->sk_sk) == 0 && > > + atomic_read(&xprt->xpt_reserved) == 0)) > > return 1; > > set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > > return 0; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 26, 2013 at 06:33:03AM +1000, NeilBrown wrote: > On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> > wrote: > > > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > > guarantee that there is enough write-space on each connection to > > > queue a reply. ... > > This is great, thanks! > > > > Inclined to queue it up for 3.11 and stable.... > > I'd agree for 3.11. > It feels a bit border-line for stable. "dead-lock" and "has been seen in the > wild" are technically enough justification... > I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" > or something like that, just for a bit of breathing space. > Your call though. So my takeaway from http://lwn.net/Articles/559113/ was that Linus and Greg were requesting that: - criteria for -stable and late -rc's should really be about the same, and - people should follow Documentation/stable-kernel-rules.txt. So as an exercise to remind me what those rules are: Easy questions: - "no bigger than 100 lines, with context." Check. - "It must fix only one thing." Check. - "real bug that bothers people". Check. - "tested": yep. It doesn't actually say "tested on stable trees", and I recall this did land you with a tricky bug one time when a prerequisite was omitted from the backport. Judgement calls: - "obviously correct": it's short, but admittedly subtle, and performance regressions can take a while to get sorted out. - "It must fix a problem that causes a build error (but not for things marked CONFIG_BROKEN), an oops, a hang, data corruption, a real security issue, or some "oh, that's not good" issue. In short, something critical." We could argue that "server stops responding" is critical, though not to the same degree as a panic. - OR: alternatively: "Serious issues as reported by a user of a distribution kernel may also be considered if they fix a notable performance or interactivity issue." The only bz I've personally seen was the result of artificial testing of some kind, and it sounds like your case involved a disk failure? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 26 Jul 2013 10:19:16 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> wrote: > On Fri, Jul 26, 2013 at 06:33:03AM +1000, NeilBrown wrote: > > On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> > > wrote: > > > > > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > > > guarantee that there is enough write-space on each connection to > > > > queue a reply. > ... > > > This is great, thanks! > > > > > > Inclined to queue it up for 3.11 and stable.... > > > > I'd agree for 3.11. > > It feels a bit border-line for stable. "dead-lock" and "has been seen in the > > wild" are technically enough justification... > > I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" > > or something like that, just for a bit of breathing space. > > Your call though. > > > So my takeaway from http://lwn.net/Articles/559113/ was that Linus and > Greg were requesting that: > > - criteria for -stable and late -rc's should really be about the > same, and > - people should follow Documentation/stable-kernel-rules.txt. > > So as an exercise to remind me what those rules are: > > Easy questions: > > - "no bigger than 100 lines, with context." Check. > - "It must fix only one thing." Check. > - "real bug that bothers people". Check. > - "tested": yep. It doesn't actually say "tested on stable > trees", and I recall this did land you with a tricky bug one > time when a prerequisite was omitted from the backport. > > Judgement calls: > > - "obviously correct": it's short, but admittedly subtle, and > performance regressions can take a while to get sorted out. > - "It must fix a problem that causes a build error (but not for > things marked CONFIG_BROKEN), an oops, a hang, data > corruption, a real security issue, or some "oh, that's not > good" issue. In short, something critical." We could argue > that "server stops responding" is critical, though not to the > same degree as a panic. > - OR: alternatively: "Serious issues as reported by a user of a > distribution kernel may also be considered if they fix a > notable performance or interactivity issue." The only bz I've > personally seen was the result of artificial testing of some > kind, and it sounds like your case involved a disk failure? > > --b. Looks like good analysis ... except that it doesn't seem conclusive. Being conclusive would make it really good. :-) The case that brought it to my attention doesn't require the fix. A file system was mis-behaving (blocking when it should return EJUKEBOX) and this resulted in nfsd behaviour different than my expectation. I expected nfsd to keep accepting requests until all threads were blocks. However only 4 requests were accepted (which is actually better behaviour, but not what I expected). So I looked into it and thought that what I found wasn't really right. Which turned out to be the case, but not the way I thought... So my direct experience doesn't argue for the patch going to -stable at all. If the only other reports are from artificial testing then I'd leave it out of -stable. I don't feel -rc4 (that's next I think) is too late for it though. NeilBrown
On Tue, Jul 30, 2013 at 12:48:57PM +1000, NeilBrown wrote: > On Fri, 26 Jul 2013 10:19:16 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> > wrote: > > > On Fri, Jul 26, 2013 at 06:33:03AM +1000, NeilBrown wrote: > > > On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu> > > > wrote: > > > > > > > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > > > > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > > > > guarantee that there is enough write-space on each connection to > > > > > queue a reply. > > ... > > > > This is great, thanks! > > > > > > > > Inclined to queue it up for 3.11 and stable.... > > > > > > I'd agree for 3.11. > > > It feels a bit border-line for stable. "dead-lock" and "has been seen in the > > > wild" are technically enough justification... > > > I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" > > > or something like that, just for a bit of breathing space. > > > Your call though. > > > > > > So my takeaway from http://lwn.net/Articles/559113/ was that Linus and > > Greg were requesting that: > > > > - criteria for -stable and late -rc's should really be about the > > same, and > > - people should follow Documentation/stable-kernel-rules.txt. > > > > So as an exercise to remind me what those rules are: > > > > Easy questions: > > > > - "no bigger than 100 lines, with context." Check. > > - "It must fix only one thing." Check. > > - "real bug that bothers people". Check. > > - "tested": yep. It doesn't actually say "tested on stable > > trees", and I recall this did land you with a tricky bug one > > time when a prerequisite was omitted from the backport. > > > > Judgement calls: > > > > - "obviously correct": it's short, but admittedly subtle, and > > performance regressions can take a while to get sorted out. > > - "It must fix a problem that causes a build error (but not for > > things marked CONFIG_BROKEN), an oops, a hang, data > > corruption, a real security issue, or some "oh, that's not > > good" issue. In short, something critical." We could argue > > that "server stops responding" is critical, though not to the > > same degree as a panic. > > - OR: alternatively: "Serious issues as reported by a user of a > > distribution kernel may also be considered if they fix a > > notable performance or interactivity issue." The only bz I've > > personally seen was the result of artificial testing of some > > kind, and it sounds like your case involved a disk failure? > > > > --b. > > Looks like good analysis ... except that it doesn't seem conclusive. Being > conclusive would make it really good. :-) Hah, yeah sorry. > The case that brought it to my attention doesn't require the fix. > A file system was mis-behaving (blocking when it should return EJUKEBOX) and > this resulted in nfsd behaviour different than my expectation. > I expected nfsd to keep accepting requests until all threads were blocks. > However only 4 requests were accepted (which is actually better behaviour, > but not what I expected). > So I looked into it and thought that what I found wasn't really right. Which > turned out to be the case, but not the way I thought... > > So my direct experience doesn't argue for the patch going to -stable at all. > If the only other reports are from artificial testing then I'd leave it out > of -stable. I don't feel -rc4 (that's next I think) is too late for it > though. OK, I think I agree. We can pass it along to stable later if more complaints surface.... --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 305374d..36de50d 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) return 1; required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg; + required *= 16; if (sk_stream_wspace(svsk->sk_sk) >= required) return 1; set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, /* Initialize the socket */ if (sock->type == SOCK_DGRAM) svc_udp_init(svsk, serv); - else { - /* initialise setting must have enough space to - * receive and respond to one request. - */ - svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg, - 4 * serv->sv_max_mesg); + else svc_tcp_init(svsk, serv); - } dprintk("svc: svc_setup_socket created %p (inet %p)\n", svsk, svsk->sk_sk);