Message ID | CAN-5tyFQ2PiSHp41mOMHa=DSJ8SmXBo=Nk=v-k1hASPKRbbhzQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: > > If we instead bump the sequence number in the case of interrupted and do: You have no guarantees that the server has seen and processed the operation. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> If we instead bump the sequence number in the case of interrupted and do: > > You have no guarantees that the server has seen and processed the operation. That is correct, i have tested the patch and made server never to receive the operation and client have an interrupted slot. On the next operation the server will complain back with SEQ_MISORDERED. Client can recover from this operation. Client can not recover from "Remote EIO". -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> If we instead bump the sequence number in the case of interrupted and do: >> >> You have no guarantees that the server has seen and processed the operation. > > That is correct, i have tested the patch and made server never to > receive the operation and client have an interrupted slot. On the next > operation the server will complain back with SEQ_MISORDERED. Client > can recover from this operation. Client can not recover from "Remote > EIO”. > Why not?
On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> >>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> If we instead bump the sequence number in the case of interrupted and do: >>> >>> You have no guarantees that the server has seen and processed the operation. >> >> That is correct, i have tested the patch and made server never to >> receive the operation and client have an interrupted slot. On the next >> operation the server will complain back with SEQ_MISORDERED. Client >> can recover from this operation. Client can not recover from "Remote >> EIO”. >> > > Why not? When XDR layer returns EREMOTEIO it's not handled by the NFS error recovery (are you suggesting we should?) and returns that to the application. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> >>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> If we instead bump the sequence number in the case of interrupted and do: >>>> >>>> You have no guarantees that the server has seen and processed the operation. >>> >>> That is correct, i have tested the patch and made server never to >>> receive the operation and client have an interrupted slot. On the next >>> operation the server will complain back with SEQ_MISORDERED. Client >>> can recover from this operation. Client can not recover from "Remote >>> EIO”. >>> >> >> Why not? > > When XDR layer returns EREMOTEIO it's not handled by the NFS error > recovery (are you suggesting we should?) and returns that to the > application. > I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid.
On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> >>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>>> >>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>> >>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>> >>>>> You have no guarantees that the server has seen and processed the operation. >>>> >>>> That is correct, i have tested the patch and made server never to >>>> receive the operation and client have an interrupted slot. On the next >>>> operation the server will complain back with SEQ_MISORDERED. Client >>>> can recover from this operation. Client can not recover from "Remote >>>> EIO”. >>>> >>> >>> Why not? >> >> When XDR layer returns EREMOTEIO it's not handled by the NFS error >> recovery (are you suggesting we should?) and returns that to the >> application. >> > > I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. > I'm confused where your objection lies. Are you ok with bumping the sequence # when task->tk_status = 1 and saying that we should still keep the code that I deleted in the 2nd chunk of the patch that bumped the seqid on getting SEQ_MISORDERED due to a previously interrupted slot? Wouldn't that create a difference of 2 slots for the server that has received the original request? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> >>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>> <trondmy@primarydata.com> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>> >>>>>> You have no guarantees that the server has seen and processed the operation. >>>>> >>>>> That is correct, i have tested the patch and made server never to >>>>> receive the operation and client have an interrupted slot. On the next >>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>> can recover from this operation. Client can not recover from "Remote >>>>> EIO”. >>>>> >>>> >>>> Why not? >>> >>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>> recovery (are you suggesting we should?) and returns that to the >>> application. >>> >> >> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >> > > I'm confused where your objection lies. Are you ok with bumping the > sequence # when task->tk_status = 1 and saying that we should still > keep the code that I deleted in the 2nd chunk of the patch that bumped > the seqid on getting SEQ_MISORDERED due to a previously interrupted > slot? > Wouldn't that create a difference of 2 slots for the server that has > received the original request? > I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion.
On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> >>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>>> >>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>> >>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>> <trondmy@primarydata.com> wrote: >>>>>>> >>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>> >>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>> >>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>> >>>>>> That is correct, i have tested the patch and made server never to >>>>>> receive the operation and client have an interrupted slot. On the next >>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>> can recover from this operation. Client can not recover from "Remote >>>>>> EIO”. >>>>>> >>>>> >>>>> Why not? >>>> >>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>> recovery (are you suggesting we should?) and returns that to the >>>> application. >>>> >>> >>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>> >> >> I'm confused where your objection lies. Are you ok with bumping the >> sequence # when task->tk_status = 1 and saying that we should still >> keep the code that I deleted in the 2nd chunk of the patch that bumped >> the seqid on getting SEQ_MISORDERED due to a previously interrupted >> slot? >> Wouldn't that create a difference of 2 slots for the server that has >> received the original request? >> > > I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. I'm not understand what you are suggestion. I do better with example so allow me: REMOVE used slot 0 seq=00000036 received ctrl-c nfs41_sequence_done() gets called task->tk_status = 1: slot->interrupted is set to 1. slot is freed. next operation comes in, in my case it's ACCESS. initialization of the sequence uses slot 0 seq=00000036 server replies with REMOVE client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() returns EREMOTEIO. handle error just returns that error. where do we retry? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> >>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>> <trondmy@primarydata.com> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>> >>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>> >>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>> >>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>> >>>>>>> That is correct, i have tested the patch and made server never to >>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>> EIO”. >>>>>>> >>>>>> >>>>>> Why not? >>>>> >>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>> recovery (are you suggesting we should?) and returns that to the >>>>> application. >>>>> >>>> >>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>> >>> >>> I'm confused where your objection lies. Are you ok with bumping the >>> sequence # when task->tk_status = 1 and saying that we should still >>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>> slot? >>> Wouldn't that create a difference of 2 slots for the server that has >>> received the original request? >>> >> >> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. > > I'm not understand what you are suggestion. I do better with example > so allow me: > > REMOVE used slot 0 seq=00000036 received ctrl-c > nfs41_sequence_done() gets called task->tk_status = 1: > slot->interrupted is set to 1. slot is freed. > > next operation comes in, in my case it's ACCESS. initialization of the > sequence uses slot 0 seq=00000036 > server replies with REMOVE > > client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() > returns EREMOTEIO. handle error just returns that error. > > where do we retry? > The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC.
On Fri, Sep 23, 2016 at 3:57 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> >>>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>>> >>>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>> >>>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>>> <trondmy@primarydata.com> wrote: >>>>>>> >>>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>> >>>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>>> >>>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>>> >>>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>>> >>>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>>> >>>>>>>> That is correct, i have tested the patch and made server never to >>>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>>> EIO”. >>>>>>>> >>>>>>> >>>>>>> Why not? >>>>>> >>>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>>> recovery (are you suggesting we should?) and returns that to the >>>>>> application. >>>>>> >>>>> >>>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>>> >>>> >>>> I'm confused where your objection lies. Are you ok with bumping the >>>> sequence # when task->tk_status = 1 and saying that we should still >>>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>>> slot? >>>> Wouldn't that create a difference of 2 slots for the server that has >>>> received the original request? >>>> >>> >>> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. >> >> I'm not understand what you are suggestion. I do better with example >> so allow me: >> >> REMOVE used slot 0 seq=00000036 received ctrl-c >> nfs41_sequence_done() gets called task->tk_status = 1: >> slot->interrupted is set to 1. slot is freed. >> >> next operation comes in, in my case it's ACCESS. initialization of the >> sequence uses slot 0 seq=00000036 >> server replies with REMOVE >> >> client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() >> returns EREMOTEIO. handle error just returns that error. >> >> where do we retry? >> > > The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC. Are you suggestion that REMOVE is retried? Ok I can see that (though I'm not sure why a killed task suppose to be retried. Wasn't it killed for a reason?). But if you are saying ACCESS should be retried then I don't see how it can fit into the code flow. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 23, 2016 at 4:07 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Fri, Sep 23, 2016 at 3:57 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> >>> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> >>>>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>>>> <trondmy@primarydata.com> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>> >>>>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>> >>>>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>>>> >>>>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>>>> >>>>>>>>> That is correct, i have tested the patch and made server never to >>>>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>>>> EIO”. >>>>>>>>> >>>>>>>> >>>>>>>> Why not? >>>>>>> >>>>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>>>> recovery (are you suggesting we should?) and returns that to the >>>>>>> application. >>>>>>> >>>>>> >>>>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>>>> >>>>> >>>>> I'm confused where your objection lies. Are you ok with bumping the >>>>> sequence # when task->tk_status = 1 and saying that we should still >>>>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>>>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>>>> slot? >>>>> Wouldn't that create a difference of 2 slots for the server that has >>>>> received the original request? >>>>> >>>> >>>> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. >>> >>> I'm not understand what you are suggestion. I do better with example >>> so allow me: >>> >>> REMOVE used slot 0 seq=00000036 received ctrl-c >>> nfs41_sequence_done() gets called task->tk_status = 1: >>> slot->interrupted is set to 1. slot is freed. >>> >>> next operation comes in, in my case it's ACCESS. initialization of the >>> sequence uses slot 0 seq=00000036 >>> server replies with REMOVE >>> >>> client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() >>> returns EREMOTEIO. handle error just returns that error. >>> >>> where do we retry? >>> >> >> The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC. > > Are you suggestion that REMOVE is retried? Ok I can see that (though > I'm not sure why a killed task suppose to be retried. Wasn't it killed > for a reason?). But if you are saying ACCESS should be retried then I > don't see how it can fit into the code flow. I'm still hung up on the fact your suggestion of "retry". There is no retry. You wrote "if we get a SEQ_MISORDERED" we never get "SEQ_MISORDERED". I can see if you want to add to error_handling that we check if error is EREMOTEIO and check that slot->interrupted is set, then we try? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQo+IE9uIFNlcCAyMywgMjAxNiwgYXQgMTY6MjUsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt aWNoLmVkdT4gd3JvdGU6DQo+IA0KPiBPbiBGcmksIFNlcCAyMywgMjAxNiBhdCA0OjA3IFBNLCBP bGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+IHdyb3RlOg0KPj4gT24gRnJpLCBTZXAg MjMsIDIwMTYgYXQgMzo1NyBQTSwgVHJvbmQgTXlrbGVidXN0DQo+PiA8dHJvbmRteUBwcmltYXJ5 ZGF0YS5jb20+IHdyb3RlOg0KPj4+IA0KPj4+PiBPbiBTZXAgMjMsIDIwMTYsIGF0IDE1OjI3LCBP bGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+IHdyb3RlOg0KPj4+PiANCj4+Pj4gT24g RnJpLCBTZXAgMjMsIDIwMTYgYXQgMzowNyBQTSwgVHJvbmQgTXlrbGVidXN0DQo+Pj4+IDx0cm9u ZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4+PiANCj4+Pj4+PiBPbiBTZXAgMjMsIDIw MTYsIGF0IDE0OjQxLCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+IHdyb3RlOg0K Pj4+Pj4+IA0KPj4+Pj4+IE9uIEZyaSwgU2VwIDIzLCAyMDE2IGF0IDI6MzQgUE0sIFRyb25kIE15 a2xlYnVzdA0KPj4+Pj4+IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4+Pj4+ IA0KPj4+Pj4+Pj4gT24gU2VwIDIzLCAyMDE2LCBhdCAxNDoyNSwgT2xnYSBLb3JuaWV2c2thaWEg PGFnbG9AdW1pY2guZWR1PiB3cm90ZToNCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gT24gRnJpLCBTZXAg MjMsIDIwMTYgYXQgMjowOCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+Pj4+Pj4+PiA8dHJvbmRteUBw cmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4+PiBPbiBTZXAgMjMs IDIwMTYsIGF0IDEzOjU5LCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+IHdyb3Rl Og0KPj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gT24gRnJpLCBTZXAgMjMsIDIwMTYgYXQgMTo0NSBQ TSwgVHJvbmQgTXlrbGVidXN0DQo+Pj4+Pj4+Pj4+IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4g d3JvdGU6DQo+Pj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4+PiBPbiBTZXAgMjMsIDIwMTYsIGF0IDEz OjQwLCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+IHdyb3RlOg0KPj4+Pj4+Pj4+ Pj4+IA0KPj4+Pj4+Pj4+Pj4+IElmIHdlIGluc3RlYWQgYnVtcCB0aGUgc2VxdWVuY2UgbnVtYmVy IGluIHRoZSBjYXNlIG9mIGludGVycnVwdGVkIGFuZCBkbzoNCj4+Pj4+Pj4+Pj4+IA0KPj4+Pj4+ Pj4+Pj4gWW91IGhhdmUgbm8gZ3VhcmFudGVlcyB0aGF0IHRoZSBzZXJ2ZXIgaGFzIHNlZW4gYW5k IHByb2Nlc3NlZCB0aGUgb3BlcmF0aW9uLg0KPj4+Pj4+Pj4+PiANCj4+Pj4+Pj4+Pj4gVGhhdCBp cyBjb3JyZWN0LCBpIGhhdmUgdGVzdGVkIHRoZSBwYXRjaCBhbmQgbWFkZSBzZXJ2ZXIgbmV2ZXIg dG8NCj4+Pj4+Pj4+Pj4gcmVjZWl2ZSB0aGUgb3BlcmF0aW9uIGFuZCBjbGllbnQgaGF2ZSBhbiBp bnRlcnJ1cHRlZCBzbG90LiBPbiB0aGUgbmV4dA0KPj4+Pj4+Pj4+PiBvcGVyYXRpb24gdGhlIHNl cnZlciB3aWxsIGNvbXBsYWluIGJhY2sgd2l0aCBTRVFfTUlTT1JERVJFRC4gQ2xpZW50DQo+Pj4+ Pj4+Pj4+IGNhbiByZWNvdmVyIGZyb20gdGhpcyBvcGVyYXRpb24uIENsaWVudCBjYW4gbm90IHJl Y292ZXIgZnJvbSAiUmVtb3RlDQo+Pj4+Pj4+Pj4+IEVJT+KAnS4NCj4+Pj4+Pj4+Pj4gDQo+Pj4+ Pj4+Pj4gDQo+Pj4+Pj4+Pj4gV2h5IG5vdD8NCj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gV2hlbiBYRFIg bGF5ZXIgcmV0dXJucyBFUkVNT1RFSU8gaXQncyBub3QgaGFuZGxlZCBieSB0aGUgTkZTIGVycm9y DQo+Pj4+Pj4+PiByZWNvdmVyeSAoYXJlIHlvdSBzdWdnZXN0aW5nIHdlIHNob3VsZD8pICBhbmQg cmV0dXJucyB0aGF0IHRvIHRoZQ0KPj4+Pj4+Pj4gYXBwbGljYXRpb24uDQo+Pj4+Pj4+PiANCj4+ Pj4+Pj4gDQo+Pj4+Pj4+IEnigJltIHNheWluZyB0aGF0IGlmIHdlIGdldCBhIFNFUV9NSVNPUkRF UkVEIGR1ZSB0byBhIHByZXZpb3VzIGludGVycnVwdCBvbiB0aGF0IHNsb3QsIHRoZW4gd2Ugc2hv dWxkIGlnbm9yZSB0aGUgZXJyb3IgaW4gdGFzay0+dGtfc3RhdHVzLCBhbmQganVzdCByZXRyeSBh ZnRlciBidW1waW5nIHRoZSBzbG90IHNlcWlkLg0KPj4+Pj4+PiANCj4+Pj4+PiANCj4+Pj4+PiBJ J20gY29uZnVzZWQgd2hlcmUgeW91ciBvYmplY3Rpb24gbGllcy4gQXJlIHlvdSBvayB3aXRoIGJ1 bXBpbmcgdGhlDQo+Pj4+Pj4gc2VxdWVuY2UgIyB3aGVuIHRhc2stPnRrX3N0YXR1cyA9IDEgYW5k IHNheWluZyB0aGF0IHdlIHNob3VsZCBzdGlsbA0KPj4+Pj4+IGtlZXAgdGhlIGNvZGUgdGhhdCBJ IGRlbGV0ZWQgaW4gdGhlIDJuZCBjaHVuayBvZiB0aGUgcGF0Y2ggdGhhdCBidW1wZWQNCj4+Pj4+ PiB0aGUgc2VxaWQgb24gZ2V0dGluZyBTRVFfTUlTT1JERVJFRCBkdWUgdG8gYSBwcmV2aW91c2x5 IGludGVycnVwdGVkDQo+Pj4+Pj4gc2xvdD8NCj4+Pj4+PiBXb3VsZG4ndCB0aGF0IGNyZWF0ZSBh IGRpZmZlcmVuY2Ugb2YgMiBzbG90cyBmb3IgdGhlIHNlcnZlciB0aGF0IGhhcw0KPj4+Pj4+IHJl Y2VpdmVkIHRoZSBvcmlnaW5hbCByZXF1ZXN0Pw0KPj4+Pj4+IA0KPj4+Pj4gDQo+Pj4+PiBJ4oCZ bSBzYXlpbmcgSeKAmWQgcHJlZmVyIHRvIGtlZXAgdGhlIGN1cnJlbnQgY29kZSwgYnV0IGZpeCB0 aGUgcmV0cnkgdGhhdCBpcyBhcHBhcmVudGx5IGJyb2tlbi4gSWYgd2XigJlyZSBub3QgaWdub3Jp bmcgdGhlIHRhc2stPnRrX2Vycm9yIHdoZW4gd2UgZGVjaWRlIHRvIHJldHJ5LCB0aGVuIHRoYXTi gJlzIGEgYnVnIGluIG15IG9waW5pb24uDQo+Pj4+IA0KPj4+PiBJJ20gbm90IHVuZGVyc3RhbmQg d2hhdCB5b3UgYXJlIHN1Z2dlc3Rpb24uIEkgZG8gYmV0dGVyIHdpdGggZXhhbXBsZQ0KPj4+PiBz byBhbGxvdyBtZToNCj4+Pj4gDQo+Pj4+IFJFTU9WRSB1c2VkIHNsb3QgMCBzZXE9MDAwMDAwMzYg cmVjZWl2ZWQgY3RybC1jDQo+Pj4+IG5mczQxX3NlcXVlbmNlX2RvbmUoKSBnZXRzIGNhbGxlZCB0 YXNrLT50a19zdGF0dXMgPSAxOg0KPj4+PiBzbG90LT5pbnRlcnJ1cHRlZCBpcyBzZXQgdG8gMS4g c2xvdCBpcyBmcmVlZC4NCj4+Pj4gDQo+Pj4+IG5leHQgb3BlcmF0aW9uIGNvbWVzIGluLCBpbiBt eSBjYXNlIGl0J3MgQUNDRVNTLiBpbml0aWFsaXphdGlvbiBvZiB0aGUNCj4+Pj4gc2VxdWVuY2Ug dXNlcyBzbG90IDAgc2VxPTAwMDAwMDM2DQo+Pj4+IHNlcnZlciByZXBsaWVzIHdpdGggUkVNT1ZF DQo+Pj4+IA0KPj4+PiBjbGllbnQgY29kZSB4ZHIgaW4gZGVjb2RlX29wX2hycygpIHJldHVybnMg RVJFTU9URUlPLiBkZWNvZGVfYWNjZXNzKCkNCj4+Pj4gcmV0dXJucyBFUkVNT1RFSU8uIGhhbmRs ZSBlcnJvciBqdXN0IHJldHVybnMgdGhhdCBlcnJvci4NCj4+Pj4gDQo+Pj4+IHdoZXJlIGRvIHdl IHJldHJ5Pw0KPj4+PiANCj4+PiANCj4+PiBUaGUgcmV0cnkgc2hvdWxkIGJlIGhhcHBlbmluZyB3 aGVuIHdlIGV4aXQgZnJvbSBuZnM0MV9zZXF1ZW5jZV9kb25lKCkgYnkgcmVzdGFydGluZyB0aGUg UlBDLg0KPj4gDQo+PiBBcmUgeW91IHN1Z2dlc3Rpb24gdGhhdCBSRU1PVkUgaXMgcmV0cmllZD8g T2sgSSBjYW4gc2VlIHRoYXQgKHRob3VnaA0KPj4gSSdtIG5vdCBzdXJlIHdoeSBhIGtpbGxlZCB0 YXNrIHN1cHBvc2UgdG8gYmUgcmV0cmllZC4gV2Fzbid0IGl0IGtpbGxlZA0KPj4gZm9yIGEgcmVh c29uPykuIEJ1dCBpZiB5b3UgYXJlIHNheWluZyBBQ0NFU1Mgc2hvdWxkIGJlIHJldHJpZWQgdGhl biBJDQo+PiBkb24ndCBzZWUgaG93IGl0IGNhbiBmaXQgaW50byB0aGUgY29kZSBmbG93Lg0KPiAN Cj4gSSdtIHN0aWxsIGh1bmcgdXAgb24gdGhlIGZhY3QgeW91ciBzdWdnZXN0aW9uIG9mICJyZXRy eSIuIFRoZXJlIGlzIG5vDQo+IHJldHJ5LiBZb3Ugd3JvdGUgImlmIHdlIGdldCBhIFNFUV9NSVNP UkRFUkVEIiB3ZSBuZXZlciBnZXQNCj4gIlNFUV9NSVNPUkRFUkVEIi4NCj4gDQo+IEkgY2FuIHNl ZSBpZiB5b3Ugd2FudCB0byBhZGQgdG8gZXJyb3JfaGFuZGxpbmcgdGhhdCB3ZSBjaGVjayBpZiBl cnJvcg0KPiBpcyBFUkVNT1RFSU8gYW5kIGNoZWNrIHRoYXQgc2xvdC0+aW50ZXJydXB0ZWQgaXMg c2V0LCB0aGVuIHdlIHRyeT8NCj4gDQoNClllcywgdGhhdOKAmXMgd2hhdCBJ4oCZZCBob3BlIHRv IHNlZS4NCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 23, 2016 at 4:34 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Sep 23, 2016, at 16:25, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Fri, Sep 23, 2016 at 4:07 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> On Fri, Sep 23, 2016 at 3:57 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>>> >>>>> On Sep 23, 2016, at 15:27, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Fri, Sep 23, 2016 at 3:07 PM, Trond Myklebust >>>>> <trondmy@primarydata.com> wrote: >>>>>> >>>>>>> On Sep 23, 2016, at 14:41, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 2:34 PM, Trond Myklebust >>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>> >>>>>>>>> On Sep 23, 2016, at 14:25, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>> >>>>>>>>> On Fri, Sep 23, 2016 at 2:08 PM, Trond Myklebust >>>>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Sep 23, 2016, at 13:59, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 23, 2016 at 1:45 PM, Trond Myklebust >>>>>>>>>>> <trondmy@primarydata.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Sep 23, 2016, at 13:40, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> If we instead bump the sequence number in the case of interrupted and do: >>>>>>>>>>>> >>>>>>>>>>>> You have no guarantees that the server has seen and processed the operation. >>>>>>>>>>> >>>>>>>>>>> That is correct, i have tested the patch and made server never to >>>>>>>>>>> receive the operation and client have an interrupted slot. On the next >>>>>>>>>>> operation the server will complain back with SEQ_MISORDERED. Client >>>>>>>>>>> can recover from this operation. Client can not recover from "Remote >>>>>>>>>>> EIO”. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Why not? >>>>>>>>> >>>>>>>>> When XDR layer returns EREMOTEIO it's not handled by the NFS error >>>>>>>>> recovery (are you suggesting we should?) and returns that to the >>>>>>>>> application. >>>>>>>>> >>>>>>>> >>>>>>>> I’m saying that if we get a SEQ_MISORDERED due to a previous interrupt on that slot, then we should ignore the error in task->tk_status, and just retry after bumping the slot seqid. >>>>>>>> >>>>>>> >>>>>>> I'm confused where your objection lies. Are you ok with bumping the >>>>>>> sequence # when task->tk_status = 1 and saying that we should still >>>>>>> keep the code that I deleted in the 2nd chunk of the patch that bumped >>>>>>> the seqid on getting SEQ_MISORDERED due to a previously interrupted >>>>>>> slot? >>>>>>> Wouldn't that create a difference of 2 slots for the server that has >>>>>>> received the original request? >>>>>>> >>>>>> >>>>>> I’m saying I’d prefer to keep the current code, but fix the retry that is apparently broken. If we’re not ignoring the task->tk_error when we decide to retry, then that’s a bug in my opinion. >>>>> >>>>> I'm not understand what you are suggestion. I do better with example >>>>> so allow me: >>>>> >>>>> REMOVE used slot 0 seq=00000036 received ctrl-c >>>>> nfs41_sequence_done() gets called task->tk_status = 1: >>>>> slot->interrupted is set to 1. slot is freed. >>>>> >>>>> next operation comes in, in my case it's ACCESS. initialization of the >>>>> sequence uses slot 0 seq=00000036 >>>>> server replies with REMOVE >>>>> >>>>> client code xdr in decode_op_hrs() returns EREMOTEIO. decode_access() >>>>> returns EREMOTEIO. handle error just returns that error. >>>>> >>>>> where do we retry? >>>>> >>>> >>>> The retry should be happening when we exit from nfs41_sequence_done() by restarting the RPC. >>> >>> Are you suggestion that REMOVE is retried? Ok I can see that (though >>> I'm not sure why a killed task suppose to be retried. Wasn't it killed >>> for a reason?). But if you are saying ACCESS should be retried then I >>> don't see how it can fit into the code flow. >> >> I'm still hung up on the fact your suggestion of "retry". There is no >> retry. You wrote "if we get a SEQ_MISORDERED" we never get >> "SEQ_MISORDERED". >> >> I can see if you want to add to error_handling that we check if error >> is EREMOTEIO and check that slot->interrupted is set, then we try? >> > > Yes, that’s what I’d hope to see. Ok I understand now and would try that. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a1a3b4c..b78dac5 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -728,6 +728,7 @@ int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res) * operation.. * Mark the slot as having hosted an interrupted RPC call. */ + ++slot->seq_nr; slot->interrupted = 1; goto out;