Message ID | CAN-5tyFpbM5GE2fAcx8uMG6MO2JC-R4ASN_jrnmYH8idie_H+w@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Olga, On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > Hi folks, > > I would like an opinion about changing code in such as way that we > don't select a delegation stateid for an IO operation when this > particular delegation is being returned. > > The reason it's some what problematic is that we send out a > DELEG_RETURN and then we don't remove the stateid until we receive a > reply. In the mean while, an IO operation can be happening and in > nfs4_select_rw_stateid() it sees a delegation stateid and uses it. > Well, at the server, it finishes processing DELEG_RETURN before > getting an IO op and by that time the server is finished with the > stateid and can error an IO operation with BAD_STATEID. > > diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c > index 7f3f606..4f6f6c9 100644 > --- a/fs/nfs/delegation.c > +++ b/fs/nfs/delegation.c > @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid > *dst, struct inode *in > flags &= FMODE_READ|FMODE_WRITE; > rcu_read_lock(); > delegation = rcu_dereference(nfsi->delegation); > - ret = (delegation != NULL && (delegation->type & flags) == flags); > + ret = (delegation != NULL && (delegation->type & flags) == flags && > + !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)); > if (ret) { > nfs4_stateid_copy(dst, &delegation->stateid); > nfs_mark_delegation_referenced(delegation); The above cannot eliminate the possibility that we won't use a delegation while it is being returned. It will at best just reduce the window of opportunity. So, why is this being considered to be a problem in the first place? Are you seeing a measurable performance impact on a real life workload (as opposed to some 1-in-a-billion occurrence from a QA test :-))? Cheers Trond
On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > Hi Olga, > > On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >> Hi folks, >> >> I would like an opinion about changing code in such as way that we >> don't select a delegation stateid for an IO operation when this >> particular delegation is being returned. >> >> The reason it's some what problematic is that we send out a >> DELEG_RETURN and then we don't remove the stateid until we receive a >> reply. In the mean while, an IO operation can be happening and in >> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >> Well, at the server, it finishes processing DELEG_RETURN before >> getting an IO op and by that time the server is finished with the >> stateid and can error an IO operation with BAD_STATEID. >> >> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >> index 7f3f606..4f6f6c9 100644 >> --- a/fs/nfs/delegation.c >> +++ b/fs/nfs/delegation.c >> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >> *dst, struct inode *in >> flags &= FMODE_READ|FMODE_WRITE; >> rcu_read_lock(); >> delegation = rcu_dereference(nfsi->delegation); >> - ret = (delegation != NULL && (delegation->type & flags) == flags); >> + ret = (delegation != NULL && (delegation->type & flags) == flags && >> + !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)); >> if (ret) { >> nfs4_stateid_copy(dst, &delegation->stateid); >> nfs_mark_delegation_referenced(delegation); > > The above cannot eliminate the possibility that we won't use a > delegation while it is being returned. It will at best just reduce the > window of opportunity. > You are right this are still problems. Actually, we might set the bit but not yet get the open stateid from the open with deleg_cur and that's not good. It would be good to know we got the open stateid and then pick that. > So, why is this being considered to be a problem in the first place? > Are you seeing a measurable performance impact on a real life workload > (as opposed to some 1-in-a-billion occurrence from a QA test :-))? Unfortunately, this problem is quite common and I hit it all the time on my setup. This leads to client seizing IO on that file and returning EIO. It's an unrecoverable error. I'm trying to figure out how to eliminate getting to that state. > > Cheers > Trond > -- > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust > <trond.myklebust@primarydata.com> wrote: >> Hi Olga, >> >> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> Hi folks, >>> >>> I would like an opinion about changing code in such as way that we >>> don't select a delegation stateid for an IO operation when this >>> particular delegation is being returned. >>> >>> The reason it's some what problematic is that we send out a >>> DELEG_RETURN and then we don't remove the stateid until we receive a >>> reply. In the mean while, an IO operation can be happening and in >>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>> Well, at the server, it finishes processing DELEG_RETURN before >>> getting an IO op and by that time the server is finished with the >>> stateid and can error an IO operation with BAD_STATEID. >>> >>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>> index 7f3f606..4f6f6c9 100644 >>> --- a/fs/nfs/delegation.c >>> +++ b/fs/nfs/delegation.c >>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>> *dst, struct inode *in >>> flags &= FMODE_READ|FMODE_WRITE; >>> rcu_read_lock(); >>> delegation = rcu_dereference(nfsi->delegation); >>> - ret = (delegation != NULL && (delegation->type & flags) == flags); >>> + ret = (delegation != NULL && (delegation->type & flags) == flags && >>> + !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)); >>> if (ret) { >>> nfs4_stateid_copy(dst, &delegation->stateid); >>> nfs_mark_delegation_referenced(delegation); >> >> The above cannot eliminate the possibility that we won't use a >> delegation while it is being returned. It will at best just reduce the >> window of opportunity. >> > > You are right this are still problems. Actually, we might set the bit > but not yet get the open stateid from the open with deleg_cur and > that's not good. It would be good to know we got the open stateid and > then pick that. > >> So, why is this being considered to be a problem in the first place? >> Are you seeing a measurable performance impact on a real life workload >> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? > > Unfortunately, this problem is quite common and I hit it all the time > on my setup. This leads to client seizing IO on that file and > returning EIO. It's an unrecoverable error. I'm trying to figure out > how to eliminate getting to that state. > It definitely isn't intended to be an irrecoverable error. The client is supposed to just replay the write after updating the stateid.
On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >>> Hi Olga, >>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> Hi folks, >>>> >>>> I would like an opinion about changing code in such as way that we >>>> don't select a delegation stateid for an IO operation when this >>>> particular delegation is being returned. >>>> >>>> The reason it's some what problematic is that we send out a >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>> reply. In the mean while, an IO operation can be happening and in >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>> getting an IO op and by that time the server is finished with the >>>> stateid and can error an IO operation with BAD_STATEID. >>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>> index 7f3f606..4f6f6c9 100644 >>>> --- a/fs/nfs/delegation.c >>>> +++ b/fs/nfs/delegation.c >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>> *dst, struct inode *in >>>> flags &= FMODE_READ|FMODE_WRITE; >>>> rcu_read_lock(); >>>> delegation = rcu_dereference(nfsi->delegation); >>>> - ret = (delegation != NULL && (delegation->type & flags) == flags); >>>> + ret = (delegation != NULL && (delegation->type & flags) == flags && >>>> + !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)); >>>> if (ret) { >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>> nfs_mark_delegation_referenced(delegation); >>> >>> The above cannot eliminate the possibility that we won't use a >>> delegation while it is being returned. It will at best just reduce the >>> window of opportunity. >>> >> >> You are right this are still problems. Actually, we might set the bit >> but not yet get the open stateid from the open with deleg_cur and >> that's not good. It would be good to know we got the open stateid and >> then pick that. >> >>> So, why is this being considered to be a problem in the first place? >>> Are you seeing a measurable performance impact on a real life workload >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >> >> Unfortunately, this problem is quite common and I hit it all the time >> on my setup. This leads to client seizing IO on that file and >> returning EIO. It's an unrecoverable error. I'm trying to figure out >> how to eliminate getting to that state. >> > > It definitely isn't intended to be an irrecoverable error. The client > is supposed to just replay the write after updating the stateid. open(deleg_cur) call / reply lock() call/reply deleg_return() call write(with deluge_stateid) call gets BAD_STATEID state recovery code marks lock state lost -> EIO. :-/ > > -- > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >> >> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >> > wrote: >> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >> >> <trond.myklebust@primarydata.com> wrote: >> >>> Hi Olga, >> >>> >> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >> >>> wrote: >> >>>> Hi folks, >> >>>> >> >>>> I would like an opinion about changing code in such as way that we >> >>>> don't select a delegation stateid for an IO operation when this >> >>>> particular delegation is being returned. >> >>>> >> >>>> The reason it's some what problematic is that we send out a >> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >> >>>> reply. In the mean while, an IO operation can be happening and in >> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >> >>>> Well, at the server, it finishes processing DELEG_RETURN before >> >>>> getting an IO op and by that time the server is finished with the >> >>>> stateid and can error an IO operation with BAD_STATEID. >> >>>> >> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >> >>>> index 7f3f606..4f6f6c9 100644 >> >>>> --- a/fs/nfs/delegation.c >> >>>> +++ b/fs/nfs/delegation.c >> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >> >>>> *dst, struct inode *in >> >>>> flags &= FMODE_READ|FMODE_WRITE; >> >>>> rcu_read_lock(); >> >>>> delegation = rcu_dereference(nfsi->delegation); >> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >> >>>> flags); >> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >> >>>> flags && >> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >> >>>> &delegation->flags)); >> >>>> if (ret) { >> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >> >>>> nfs_mark_delegation_referenced(delegation); >> >>> >> >>> The above cannot eliminate the possibility that we won't use a >> >>> delegation while it is being returned. It will at best just reduce the >> >>> window of opportunity. >> >>> >> >> >> >> You are right this are still problems. Actually, we might set the bit >> >> but not yet get the open stateid from the open with deleg_cur and >> >> that's not good. It would be good to know we got the open stateid and >> >> then pick that. >> >> >> >>> So, why is this being considered to be a problem in the first place? >> >>> Are you seeing a measurable performance impact on a real life workload >> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >> >> >> >> Unfortunately, this problem is quite common and I hit it all the time >> >> on my setup. This leads to client seizing IO on that file and >> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >> >> how to eliminate getting to that state. >> >> >> > >> > It definitely isn't intended to be an irrecoverable error. The client >> > is supposed to just replay the write after updating the stateid. >> >> open(deleg_cur) call / reply >> lock() call/reply >> deleg_return() call >> write(with deluge_stateid) call gets BAD_STATEID >> state recovery code marks lock state lost -> EIO. > > Why is it marking the lock as lost? If the recovery succeeded, it should > notice that the stateid has changed and instead retry. I'll get you a better explanation tomorrow besides saying "that's what I see when I run the code". > What kernel is this? This is upstream. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust > <trond.myklebust@primarydata.com> wrote: >> >> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>> >>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>> <trond.myklebust@primarydata.com> wrote: >>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>> > wrote: >>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>> >> <trond.myklebust@primarydata.com> wrote: >>> >>> Hi Olga, >>> >>> >>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>> >>> wrote: >>> >>>> Hi folks, >>> >>>> >>> >>>> I would like an opinion about changing code in such as way that we >>> >>>> don't select a delegation stateid for an IO operation when this >>> >>>> particular delegation is being returned. >>> >>>> >>> >>>> The reason it's some what problematic is that we send out a >>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>> >>>> reply. In the mean while, an IO operation can be happening and in >>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>> >>>> getting an IO op and by that time the server is finished with the >>> >>>> stateid and can error an IO operation with BAD_STATEID. >>> >>>> >>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>> >>>> index 7f3f606..4f6f6c9 100644 >>> >>>> --- a/fs/nfs/delegation.c >>> >>>> +++ b/fs/nfs/delegation.c >>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>> >>>> *dst, struct inode *in >>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>> >>>> rcu_read_lock(); >>> >>>> delegation = rcu_dereference(nfsi->delegation); >>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>> >>>> flags); >>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>> >>>> flags && >>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>> >>>> &delegation->flags)); >>> >>>> if (ret) { >>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>> >>>> nfs_mark_delegation_referenced(delegation); >>> >>> >>> >>> The above cannot eliminate the possibility that we won't use a >>> >>> delegation while it is being returned. It will at best just reduce the >>> >>> window of opportunity. >>> >>> >>> >> >>> >> You are right this are still problems. Actually, we might set the bit >>> >> but not yet get the open stateid from the open with deleg_cur and >>> >> that's not good. It would be good to know we got the open stateid and >>> >> then pick that. >>> >> >>> >>> So, why is this being considered to be a problem in the first place? >>> >>> Are you seeing a measurable performance impact on a real life workload >>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>> >> >>> >> Unfortunately, this problem is quite common and I hit it all the time >>> >> on my setup. This leads to client seizing IO on that file and >>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>> >> how to eliminate getting to that state. >>> >> >>> > >>> > It definitely isn't intended to be an irrecoverable error. The client >>> > is supposed to just replay the write after updating the stateid. >>> >>> open(deleg_cur) call / reply >>> lock() call/reply >>> deleg_return() call >>> write(with deluge_stateid) call gets BAD_STATEID >>> state recovery code marks lock state lost -> EIO. >> >> Why is it marking the lock as lost? If the recovery succeeded, it should >> notice that the stateid has changed and instead retry. > > I'll get you a better explanation tomorrow besides saying "that's what > I see when I run the code". nfs4_async_handle_error() initiates state recovery nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which marks the lock LOST. state is delegated so the kernel logs "lock reclaim failed". write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST and the nfs4_select_rw_stateid() fails with EIO. > >> What kernel is this? > > This is upstream. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 4, 2014 at 12:36 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >> On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >>> >>> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>>> >>>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>>> <trond.myklebust@primarydata.com> wrote: >>>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>>> > wrote: >>>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>>> >> <trond.myklebust@primarydata.com> wrote: >>>> >>> Hi Olga, >>>> >>> >>>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>>> >>> wrote: >>>> >>>> Hi folks, >>>> >>>> >>>> >>>> I would like an opinion about changing code in such as way that we >>>> >>>> don't select a delegation stateid for an IO operation when this >>>> >>>> particular delegation is being returned. >>>> >>>> >>>> >>>> The reason it's some what problematic is that we send out a >>>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>> >>>> reply. In the mean while, an IO operation can be happening and in >>>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>> >>>> getting an IO op and by that time the server is finished with the >>>> >>>> stateid and can error an IO operation with BAD_STATEID. >>>> >>>> >>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>> >>>> index 7f3f606..4f6f6c9 100644 >>>> >>>> --- a/fs/nfs/delegation.c >>>> >>>> +++ b/fs/nfs/delegation.c >>>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>> >>>> *dst, struct inode *in >>>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>>> >>>> rcu_read_lock(); >>>> >>>> delegation = rcu_dereference(nfsi->delegation); >>>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>>> >>>> flags); >>>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>>> >>>> flags && >>>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>>> >>>> &delegation->flags)); >>>> >>>> if (ret) { >>>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>> >>>> nfs_mark_delegation_referenced(delegation); >>>> >>> >>>> >>> The above cannot eliminate the possibility that we won't use a >>>> >>> delegation while it is being returned. It will at best just reduce the >>>> >>> window of opportunity. >>>> >>> >>>> >> >>>> >> You are right this are still problems. Actually, we might set the bit >>>> >> but not yet get the open stateid from the open with deleg_cur and >>>> >> that's not good. It would be good to know we got the open stateid and >>>> >> then pick that. >>>> >> >>>> >>> So, why is this being considered to be a problem in the first place? >>>> >>> Are you seeing a measurable performance impact on a real life workload >>>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>>> >> >>>> >> Unfortunately, this problem is quite common and I hit it all the time >>>> >> on my setup. This leads to client seizing IO on that file and >>>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>>> >> how to eliminate getting to that state. >>>> >> >>>> > >>>> > It definitely isn't intended to be an irrecoverable error. The client >>>> > is supposed to just replay the write after updating the stateid. >>>> >>>> open(deleg_cur) call / reply >>>> lock() call/reply >>>> deleg_return() call >>>> write(with deluge_stateid) call gets BAD_STATEID >>>> state recovery code marks lock state lost -> EIO. >>> >>> Why is it marking the lock as lost? If the recovery succeeded, it should >>> notice that the stateid has changed and instead retry. >> >> I'll get you a better explanation tomorrow besides saying "that's what >> I see when I run the code". > > nfs4_async_handle_error() initiates state recovery > nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which > marks the lock LOST. state is delegated so the kernel logs "lock > reclaim failed". > write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST > and the nfs4_select_rw_stateid() fails with EIO. > >> >>> What kernel is this? >> >> This is upstream. So why isn't nfs4_write_stateid_changed() catching the change before we even get to nfs4_async_handle_error()? That's where this race is supposed to get resolved.
On Thu, Dec 4, 2014 at 1:23 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Thu, Dec 4, 2014 at 12:36 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >> On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust >>> <trond.myklebust@primarydata.com> wrote: >>>> >>>> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>>>> >>>>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>>>> <trond.myklebust@primarydata.com> wrote: >>>>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>>>> > wrote: >>>>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>>>> >> <trond.myklebust@primarydata.com> wrote: >>>>> >>> Hi Olga, >>>>> >>> >>>>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>>>> >>> wrote: >>>>> >>>> Hi folks, >>>>> >>>> >>>>> >>>> I would like an opinion about changing code in such as way that we >>>>> >>>> don't select a delegation stateid for an IO operation when this >>>>> >>>> particular delegation is being returned. >>>>> >>>> >>>>> >>>> The reason it's some what problematic is that we send out a >>>>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>>> >>>> reply. In the mean while, an IO operation can be happening and in >>>>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>>> >>>> getting an IO op and by that time the server is finished with the >>>>> >>>> stateid and can error an IO operation with BAD_STATEID. >>>>> >>>> >>>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>>> >>>> index 7f3f606..4f6f6c9 100644 >>>>> >>>> --- a/fs/nfs/delegation.c >>>>> >>>> +++ b/fs/nfs/delegation.c >>>>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>>> >>>> *dst, struct inode *in >>>>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>>>> >>>> rcu_read_lock(); >>>>> >>>> delegation = rcu_dereference(nfsi->delegation); >>>>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>>>> >>>> flags); >>>>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>>>> >>>> flags && >>>>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>>>> >>>> &delegation->flags)); >>>>> >>>> if (ret) { >>>>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>>> >>>> nfs_mark_delegation_referenced(delegation); >>>>> >>> >>>>> >>> The above cannot eliminate the possibility that we won't use a >>>>> >>> delegation while it is being returned. It will at best just reduce the >>>>> >>> window of opportunity. >>>>> >>> >>>>> >> >>>>> >> You are right this are still problems. Actually, we might set the bit >>>>> >> but not yet get the open stateid from the open with deleg_cur and >>>>> >> that's not good. It would be good to know we got the open stateid and >>>>> >> then pick that. >>>>> >> >>>>> >>> So, why is this being considered to be a problem in the first place? >>>>> >>> Are you seeing a measurable performance impact on a real life workload >>>>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>>>> >> >>>>> >> Unfortunately, this problem is quite common and I hit it all the time >>>>> >> on my setup. This leads to client seizing IO on that file and >>>>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>>>> >> how to eliminate getting to that state. >>>>> >> >>>>> > >>>>> > It definitely isn't intended to be an irrecoverable error. The client >>>>> > is supposed to just replay the write after updating the stateid. >>>>> >>>>> open(deleg_cur) call / reply >>>>> lock() call/reply >>>>> deleg_return() call >>>>> write(with deluge_stateid) call gets BAD_STATEID >>>>> state recovery code marks lock state lost -> EIO. >>>> >>>> Why is it marking the lock as lost? If the recovery succeeded, it should >>>> notice that the stateid has changed and instead retry. >>> >>> I'll get you a better explanation tomorrow besides saying "that's what >>> I see when I run the code". >> >> nfs4_async_handle_error() initiates state recovery >> nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which >> marks the lock LOST. state is delegated so the kernel logs "lock >> reclaim failed". >> write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST >> and the nfs4_select_rw_stateid() fails with EIO. >> >>> >>>> What kernel is this? >>> >>> This is upstream. > > So why isn't nfs4_write_stateid_changed() catching the change before > we even get to nfs4_async_handle_error()? That's where this race is > supposed to get resolved. Probably because nfs4_stateid_is_current() returns true because nfs4_set_rw_stateid() calls the nfs4_select_rw_stateid() and finds the lock lost? > > -- > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 4, 2014 at 1:58 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Thu, Dec 4, 2014 at 1:23 PM, Trond Myklebust > <trond.myklebust@primarydata.com> wrote: >> On Thu, Dec 4, 2014 at 12:36 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust >>>> <trond.myklebust@primarydata.com> wrote: >>>>> >>>>> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>>>>> >>>>>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>>>>> <trond.myklebust@primarydata.com> wrote: >>>>>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>> > wrote: >>>>>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>>>>> >> <trond.myklebust@primarydata.com> wrote: >>>>>> >>> Hi Olga, >>>>>> >>> >>>>>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>> >>> wrote: >>>>>> >>>> Hi folks, >>>>>> >>>> >>>>>> >>>> I would like an opinion about changing code in such as way that we >>>>>> >>>> don't select a delegation stateid for an IO operation when this >>>>>> >>>> particular delegation is being returned. >>>>>> >>>> >>>>>> >>>> The reason it's some what problematic is that we send out a >>>>>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>>>> >>>> reply. In the mean while, an IO operation can be happening and in >>>>>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>>>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>>>> >>>> getting an IO op and by that time the server is finished with the >>>>>> >>>> stateid and can error an IO operation with BAD_STATEID. >>>>>> >>>> >>>>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>>>> >>>> index 7f3f606..4f6f6c9 100644 >>>>>> >>>> --- a/fs/nfs/delegation.c >>>>>> >>>> +++ b/fs/nfs/delegation.c >>>>>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>>>> >>>> *dst, struct inode *in >>>>>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>>>>> >>>> rcu_read_lock(); >>>>>> >>>> delegation = rcu_dereference(nfsi->delegation); >>>>>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>>>>> >>>> flags); >>>>>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>>>>> >>>> flags && >>>>>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>>>>> >>>> &delegation->flags)); >>>>>> >>>> if (ret) { >>>>>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>>>> >>>> nfs_mark_delegation_referenced(delegation); >>>>>> >>> >>>>>> >>> The above cannot eliminate the possibility that we won't use a >>>>>> >>> delegation while it is being returned. It will at best just reduce the >>>>>> >>> window of opportunity. >>>>>> >>> >>>>>> >> >>>>>> >> You are right this are still problems. Actually, we might set the bit >>>>>> >> but not yet get the open stateid from the open with deleg_cur and >>>>>> >> that's not good. It would be good to know we got the open stateid and >>>>>> >> then pick that. >>>>>> >> >>>>>> >>> So, why is this being considered to be a problem in the first place? >>>>>> >>> Are you seeing a measurable performance impact on a real life workload >>>>>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>>>>> >> >>>>>> >> Unfortunately, this problem is quite common and I hit it all the time >>>>>> >> on my setup. This leads to client seizing IO on that file and >>>>>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>>>>> >> how to eliminate getting to that state. >>>>>> >> >>>>>> > >>>>>> > It definitely isn't intended to be an irrecoverable error. The client >>>>>> > is supposed to just replay the write after updating the stateid. >>>>>> >>>>>> open(deleg_cur) call / reply >>>>>> lock() call/reply >>>>>> deleg_return() call >>>>>> write(with deluge_stateid) call gets BAD_STATEID >>>>>> state recovery code marks lock state lost -> EIO. >>>>> >>>>> Why is it marking the lock as lost? If the recovery succeeded, it should >>>>> notice that the stateid has changed and instead retry. >>>> >>>> I'll get you a better explanation tomorrow besides saying "that's what >>>> I see when I run the code". >>> >>> nfs4_async_handle_error() initiates state recovery >>> nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which >>> marks the lock LOST. state is delegated so the kernel logs "lock >>> reclaim failed". >>> write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST >>> and the nfs4_select_rw_stateid() fails with EIO. >>> >>>> >>>>> What kernel is this? >>>> >>>> This is upstream. >> >> So why isn't nfs4_write_stateid_changed() catching the change before >> we even get to nfs4_async_handle_error()? That's where this race is >> supposed to get resolved. > > Probably because nfs4_stateid_is_current() returns true because > nfs4_set_rw_stateid() calls the nfs4_select_rw_stateid() and finds the > lock lost? I don't understand why the lock would already be marked as lost. Remember that we're calling from inside nfs4_write_done() and before calling nfs4_async_handle_error().
On Thu, Dec 4, 2014 at 2:08 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Thu, Dec 4, 2014 at 1:58 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >> On Thu, Dec 4, 2014 at 1:23 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >>> On Thu, Dec 4, 2014 at 12:36 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust >>>>> <trond.myklebust@primarydata.com> wrote: >>>>>> >>>>>> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>>>>>> <trond.myklebust@primarydata.com> wrote: >>>>>>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>>> > wrote: >>>>>>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>>>>>> >> <trond.myklebust@primarydata.com> wrote: >>>>>>> >>> Hi Olga, >>>>>>> >>> >>>>>>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>>> >>> wrote: >>>>>>> >>>> Hi folks, >>>>>>> >>>> >>>>>>> >>>> I would like an opinion about changing code in such as way that we >>>>>>> >>>> don't select a delegation stateid for an IO operation when this >>>>>>> >>>> particular delegation is being returned. >>>>>>> >>>> >>>>>>> >>>> The reason it's some what problematic is that we send out a >>>>>>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>>>>> >>>> reply. In the mean while, an IO operation can be happening and in >>>>>>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>>>>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>>>>> >>>> getting an IO op and by that time the server is finished with the >>>>>>> >>>> stateid and can error an IO operation with BAD_STATEID. >>>>>>> >>>> >>>>>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>>>>> >>>> index 7f3f606..4f6f6c9 100644 >>>>>>> >>>> --- a/fs/nfs/delegation.c >>>>>>> >>>> +++ b/fs/nfs/delegation.c >>>>>>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>>>>> >>>> *dst, struct inode *in >>>>>>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>>>>>> >>>> rcu_read_lock(); >>>>>>> >>>> delegation = rcu_dereference(nfsi->delegation); >>>>>>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>>>>>> >>>> flags); >>>>>>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>>>>>> >>>> flags && >>>>>>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>>>>>> >>>> &delegation->flags)); >>>>>>> >>>> if (ret) { >>>>>>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>>>>> >>>> nfs_mark_delegation_referenced(delegation); >>>>>>> >>> >>>>>>> >>> The above cannot eliminate the possibility that we won't use a >>>>>>> >>> delegation while it is being returned. It will at best just reduce the >>>>>>> >>> window of opportunity. >>>>>>> >>> >>>>>>> >> >>>>>>> >> You are right this are still problems. Actually, we might set the bit >>>>>>> >> but not yet get the open stateid from the open with deleg_cur and >>>>>>> >> that's not good. It would be good to know we got the open stateid and >>>>>>> >> then pick that. >>>>>>> >> >>>>>>> >>> So, why is this being considered to be a problem in the first place? >>>>>>> >>> Are you seeing a measurable performance impact on a real life workload >>>>>>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>>>>>> >> >>>>>>> >> Unfortunately, this problem is quite common and I hit it all the time >>>>>>> >> on my setup. This leads to client seizing IO on that file and >>>>>>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>>>>>> >> how to eliminate getting to that state. >>>>>>> >> >>>>>>> > >>>>>>> > It definitely isn't intended to be an irrecoverable error. The client >>>>>>> > is supposed to just replay the write after updating the stateid. >>>>>>> >>>>>>> open(deleg_cur) call / reply >>>>>>> lock() call/reply >>>>>>> deleg_return() call >>>>>>> write(with deluge_stateid) call gets BAD_STATEID >>>>>>> state recovery code marks lock state lost -> EIO. >>>>>> >>>>>> Why is it marking the lock as lost? If the recovery succeeded, it should >>>>>> notice that the stateid has changed and instead retry. >>>>> >>>>> I'll get you a better explanation tomorrow besides saying "that's what >>>>> I see when I run the code". >>>> >>>> nfs4_async_handle_error() initiates state recovery >>>> nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which >>>> marks the lock LOST. state is delegated so the kernel logs "lock >>>> reclaim failed". >>>> write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST >>>> and the nfs4_select_rw_stateid() fails with EIO. >>>> >>>>> >>>>>> What kernel is this? >>>>> >>>>> This is upstream. >>> >>> So why isn't nfs4_write_stateid_changed() catching the change before >>> we even get to nfs4_async_handle_error()? That's where this race is >>> supposed to get resolved. >> >> Probably because nfs4_stateid_is_current() returns true because >> nfs4_set_rw_stateid() calls the nfs4_select_rw_stateid() and finds the >> lock lost? > > I don't understand why the lock would already be marked as lost. > Remember that we're calling from inside nfs4_write_done() and before > calling nfs4_async_handle_error(). Sigh. I'm so confused... On November 30th I posted "is receiving BAD_STATEID during IO on delegated stateid unrecoverable?" and you've agreed that it would lead to the EIO. There are a couple of cases i'm juggling and I'm just getting confused if they are different or same. 1 is using delegation stateid for the IO while we are returning the stateid and thus getting BAD_STATEID. 2 is getting a BAD_STATEID on a delegated stateid on IO in without a deleg_return. It seems like 2nd case can also lead to the EIO. I have lossy traces from QA so it's possible that it's the 1st case and known to get the EIO. > > -- > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 4, 2014 at 3:49 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Thu, Dec 4, 2014 at 2:08 PM, Trond Myklebust > <trond.myklebust@primarydata.com> wrote: >> On Thu, Dec 4, 2014 at 1:58 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> On Thu, Dec 4, 2014 at 1:23 PM, Trond Myklebust >>> <trond.myklebust@primarydata.com> wrote: >>>> On Thu, Dec 4, 2014 at 12:36 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> On Wed, Dec 3, 2014 at 6:50 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>> On Wed, Dec 3, 2014 at 6:34 PM, Trond Myklebust >>>>>> <trond.myklebust@primarydata.com> wrote: >>>>>>> >>>>>>> On Dec 3, 2014 6:21 PM, "Olga Kornievskaia" <aglo@umich.edu> wrote: >>>>>>>> >>>>>>>> On Wed, Dec 3, 2014 at 6:13 PM, Trond Myklebust >>>>>>>> <trond.myklebust@primarydata.com> wrote: >>>>>>>> > On Wed, Dec 3, 2014 at 5:59 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>>>> > wrote: >>>>>>>> >> On Wed, Dec 3, 2014 at 3:59 PM, Trond Myklebust >>>>>>>> >> <trond.myklebust@primarydata.com> wrote: >>>>>>>> >>> Hi Olga, >>>>>>>> >>> >>>>>>>> >>> On Wed, Dec 3, 2014 at 2:54 PM, Olga Kornievskaia <aglo@umich.edu> >>>>>>>> >>> wrote: >>>>>>>> >>>> Hi folks, >>>>>>>> >>>> >>>>>>>> >>>> I would like an opinion about changing code in such as way that we >>>>>>>> >>>> don't select a delegation stateid for an IO operation when this >>>>>>>> >>>> particular delegation is being returned. >>>>>>>> >>>> >>>>>>>> >>>> The reason it's some what problematic is that we send out a >>>>>>>> >>>> DELEG_RETURN and then we don't remove the stateid until we receive a >>>>>>>> >>>> reply. In the mean while, an IO operation can be happening and in >>>>>>>> >>>> nfs4_select_rw_stateid() it sees a delegation stateid and uses it. >>>>>>>> >>>> Well, at the server, it finishes processing DELEG_RETURN before >>>>>>>> >>>> getting an IO op and by that time the server is finished with the >>>>>>>> >>>> stateid and can error an IO operation with BAD_STATEID. >>>>>>>> >>>> >>>>>>>> >>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >>>>>>>> >>>> index 7f3f606..4f6f6c9 100644 >>>>>>>> >>>> --- a/fs/nfs/delegation.c >>>>>>>> >>>> +++ b/fs/nfs/delegation.c >>>>>>>> >>>> @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid >>>>>>>> >>>> *dst, struct inode *in >>>>>>>> >>>> flags &= FMODE_READ|FMODE_WRITE; >>>>>>>> >>>> rcu_read_lock(); >>>>>>>> >>>> delegation = rcu_dereference(nfsi->delegation); >>>>>>>> >>>> - ret = (delegation != NULL && (delegation->type & flags) == >>>>>>>> >>>> flags); >>>>>>>> >>>> + ret = (delegation != NULL && (delegation->type & flags) == >>>>>>>> >>>> flags && >>>>>>>> >>>> + !test_bit(NFS_DELEGATION_RETURNING, >>>>>>>> >>>> &delegation->flags)); >>>>>>>> >>>> if (ret) { >>>>>>>> >>>> nfs4_stateid_copy(dst, &delegation->stateid); >>>>>>>> >>>> nfs_mark_delegation_referenced(delegation); >>>>>>>> >>> >>>>>>>> >>> The above cannot eliminate the possibility that we won't use a >>>>>>>> >>> delegation while it is being returned. It will at best just reduce the >>>>>>>> >>> window of opportunity. >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> You are right this are still problems. Actually, we might set the bit >>>>>>>> >> but not yet get the open stateid from the open with deleg_cur and >>>>>>>> >> that's not good. It would be good to know we got the open stateid and >>>>>>>> >> then pick that. >>>>>>>> >> >>>>>>>> >>> So, why is this being considered to be a problem in the first place? >>>>>>>> >>> Are you seeing a measurable performance impact on a real life workload >>>>>>>> >>> (as opposed to some 1-in-a-billion occurrence from a QA test :-))? >>>>>>>> >> >>>>>>>> >> Unfortunately, this problem is quite common and I hit it all the time >>>>>>>> >> on my setup. This leads to client seizing IO on that file and >>>>>>>> >> returning EIO. It's an unrecoverable error. I'm trying to figure out >>>>>>>> >> how to eliminate getting to that state. >>>>>>>> >> >>>>>>>> > >>>>>>>> > It definitely isn't intended to be an irrecoverable error. The client >>>>>>>> > is supposed to just replay the write after updating the stateid. >>>>>>>> >>>>>>>> open(deleg_cur) call / reply >>>>>>>> lock() call/reply >>>>>>>> deleg_return() call >>>>>>>> write(with deluge_stateid) call gets BAD_STATEID >>>>>>>> state recovery code marks lock state lost -> EIO. >>>>>>> >>>>>>> Why is it marking the lock as lost? If the recovery succeeded, it should >>>>>>> notice that the stateid has changed and instead retry. >>>>>> >>>>>> I'll get you a better explanation tomorrow besides saying "that's what >>>>>> I see when I run the code". >>>>> >>>>> nfs4_async_handle_error() initiates state recovery >>>>> nfs4_reclaim_open_state() eventually calls nfs4_reclaim_locks() which >>>>> marks the lock LOST. state is delegated so the kernel logs "lock >>>>> reclaim failed". >>>>> write retries and in nfs4_copy_ lock_stateid() the lock is marked LOST >>>>> and the nfs4_select_rw_stateid() fails with EIO. >>>>> >>>>>> >>>>>>> What kernel is this? >>>>>> >>>>>> This is upstream. >>>> >>>> So why isn't nfs4_write_stateid_changed() catching the change before >>>> we even get to nfs4_async_handle_error()? That's where this race is >>>> supposed to get resolved. >>> >>> Probably because nfs4_stateid_is_current() returns true because >>> nfs4_set_rw_stateid() calls the nfs4_select_rw_stateid() and finds the >>> lock lost? >> >> I don't understand why the lock would already be marked as lost. >> Remember that we're calling from inside nfs4_write_done() and before >> calling nfs4_async_handle_error(). > > Sigh. I'm so confused... > > On November 30th I posted "is receiving BAD_STATEID during IO on > delegated stateid unrecoverable?" and you've agreed that it would lead > to the EIO. > > There are a couple of cases i'm juggling and I'm just getting confused > if they are different or same. 1 is using delegation stateid for the > IO while we are returning the stateid and thus getting BAD_STATEID. 2 > is getting a BAD_STATEID on a delegated stateid on IO in without a > deleg_return. It seems like 2nd case can also lead to the EIO. I have > lossy traces from QA so it's possible that it's the 1st case and known > to get the EIO. got my cases mixed up. 2nd case was the november's message and agreed to be unrecoverable. 1st case is what i'm also looking into (maybe). > > >> >> -- >> Trond Myklebust >> >> Linux NFS client maintainer, PrimaryData >> >> trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 7f3f606..4f6f6c9 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -854,7 +854,8 @@ bool nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *in flags &= FMODE_READ|FMODE_WRITE; rcu_read_lock(); delegation = rcu_dereference(nfsi->delegation); - ret = (delegation != NULL && (delegation->type & flags) == flags); + ret = (delegation != NULL && (delegation->type & flags) == flags && + !test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)); if (ret) { nfs4_stateid_copy(dst, &delegation->stateid);