Message ID | 1309743002-1658-1-git-send-email-bergwolf@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2011-07-04 04:30, Peng Tao wrote: > There is no need to keep lseg reference when read/write through MDS. > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg > is not NULL. > > Signed-off-by: Peng Tao <peng_tao@emc.com> Looks good to me. Benny > --- > fs/nfs/pnfs.c | 6 ++++++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > index 30a0394..55fdf02 100644 > --- a/fs/nfs/pnfs.c > +++ b/fs/nfs/pnfs.c > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > data->pnfs_error); > + > + put_lseg(data->lseg); > + data->lseg = NULL; > status = nfs_initiate_write(data, NFS_CLIENT(data->inode), > data->mds_ops, NFS_FILE_SYNC); > return status ? : -EAGAIN; > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > data->pnfs_error); > + > + put_lseg(data->lseg); > + data->lseg = NULL; > status = nfs_initiate_read(data, NFS_CLIENT(data->inode), > data->mds_ops); > return status ? : -EAGAIN; -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Trond, Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns? Thanks, Tao > -----Original Message----- > From: Benny Halevy [mailto:bhalevy@tonian.com] > Sent: Saturday, July 09, 2011 10:10 PM > To: Peng Tao > Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS > > On 2011-07-04 04:30, Peng Tao wrote: > > There is no need to keep lseg reference when read/write through MDS. > > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc > > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg > > is not NULL. > > > > Signed-off-by: Peng Tao <peng_tao@emc.com> > > Looks good to me. > > Benny > > > --- > > fs/nfs/pnfs.c | 6 ++++++ > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > index 30a0394..55fdf02 100644 > > --- a/fs/nfs/pnfs.c > > +++ b/fs/nfs/pnfs.c > > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) > > > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > > data->pnfs_error); > > + > > + put_lseg(data->lseg); > > + data->lseg = NULL; > > status = nfs_initiate_write(data, NFS_CLIENT(data->inode), > > data->mds_ops, NFS_FILE_SYNC); > > return status ? : -EAGAIN; > > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) > > > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > > data->pnfs_error); > > + > > + put_lseg(data->lseg); > > + data->lseg = NULL; > > status = nfs_initiate_read(data, NFS_CLIENT(data->inode), > > data->mds_ops); > > return status ? : -EAGAIN; -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: > Hi, Trond, > > Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns? > > Thanks, > Tao The whole pnfs_ld_write_done thing is bogus and needs to be replaced with something sane. It is trying to initiate a WRITE RPC call with the wrong block size, and is calling the MDS rpc_call_done() and rpc_release() with an uninitialised rpc task pointer. Ditto for pnfs_ld_read_done. Cheers Trond > > -----Original Message----- > > From: Benny Halevy [mailto:bhalevy@tonian.com] > > Sent: Saturday, July 09, 2011 10:10 PM > > To: Peng Tao > > Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao > > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS > > > > On 2011-07-04 04:30, Peng Tao wrote: > > > There is no need to keep lseg reference when read/write through MDS. > > > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc > > > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg > > > is not NULL. > > > > > > Signed-off-by: Peng Tao <peng_tao@emc.com> > > > > Looks good to me. > > > > Benny > > > > > --- > > > fs/nfs/pnfs.c | 6 ++++++ > > > 1 files changed, 6 insertions(+), 0 deletions(-) > > > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > > index 30a0394..55fdf02 100644 > > > --- a/fs/nfs/pnfs.c > > > +++ b/fs/nfs/pnfs.c > > > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) > > > > > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > > > data->pnfs_error); > > > + > > > + put_lseg(data->lseg); > > > + data->lseg = NULL; > > > status = nfs_initiate_write(data, NFS_CLIENT(data->inode), > > > data->mds_ops, NFS_FILE_SYNC); > > > return status ? : -EAGAIN; > > > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) > > > > > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, > > > data->pnfs_error); > > > + > > > + put_lseg(data->lseg); > > > + data->lseg = NULL; > > > status = nfs_initiate_read(data, NFS_CLIENT(data->inode), > > > data->mds_ops); > > > return status ? : -EAGAIN; >
Trond Myklebust wrote: On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: > Hi, Trond, > > Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns? > > Thanks, > Tao The whole pnfs_ld_write_done thing is bogus and needs to be replaced with something sane. It is trying to initiate a WRITE RPC call with the wrong block size, and is calling the MDS rpc_call_done() and rpc_release() with an uninitialised rpc task pointer. Ditto for pnfs_ld_read_done. Benny / Boaz, have you run into this problem with object layout? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Trond, On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: >> Hi, Trond, >> >> Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns? >> >> Thanks, >> Tao > > The whole pnfs_ld_write_done thing is bogus and needs to be replaced > with something sane. It is trying to initiate a WRITE RPC call with the > wrong block size, and is calling the MDS rpc_call_done() and > rpc_release() with an uninitialised rpc task pointer. > > Ditto for pnfs_ld_read_done. Thanks for your explanation. Is there any plan on how to fix pnfs_ld_read/write_done? Basically, we would need an interface that can redirect the IO to MDS if pnfs_error is set or do all necessary cleanup work to end read/write if pnfs_error is 0. IMHO, the recoalesce logic need to access nfs_pageio_descriptor but we do not have that information at pnfs_ld_read/write_done. Best, Tao > > Cheers > Trond > >> > -----Original Message----- >> > From: Benny Halevy [mailto:bhalevy@tonian.com] >> > Sent: Saturday, July 09, 2011 10:10 PM >> > To: Peng Tao >> > Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao >> > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> > >> > On 2011-07-04 04:30, Peng Tao wrote: >> > > There is no need to keep lseg reference when read/write through MDS. >> > > This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc >> > > because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg >> > > is not NULL. >> > > >> > > Signed-off-by: Peng Tao <peng_tao@emc.com> >> > >> > Looks good to me. >> > >> > Benny >> > >> > > --- >> > > fs/nfs/pnfs.c | 6 ++++++ >> > > 1 files changed, 6 insertions(+), 0 deletions(-) >> > > >> > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >> > > index 30a0394..55fdf02 100644 >> > > --- a/fs/nfs/pnfs.c >> > > +++ b/fs/nfs/pnfs.c >> > > @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) >> > > >> > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, >> > > data->pnfs_error); >> > > + >> > > + put_lseg(data->lseg); >> > > + data->lseg = NULL; >> > > status = nfs_initiate_write(data, NFS_CLIENT(data->inode), >> > > data->mds_ops, NFS_FILE_SYNC); >> > > return status ? : -EAGAIN; >> > > @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) >> > > >> > > dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, >> > > data->pnfs_error); >> > > + >> > > + put_lseg(data->lseg); >> > > + data->lseg = NULL; >> > > status = nfs_initiate_read(data, NFS_CLIENT(data->inode), >> > > data->mds_ops); >> > > return status ? : -EAGAIN; >> > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Myklebust, Trond wrote: > Thanks for your explanation. Is there any plan on how to fix > pnfs_ld_read/write_done? Basically, we would need an interface that > can redirect the IO to MDS if pnfs_error is set or do all necessary > cleanup work to end read/write if pnfs_error is 0. IMHO, the > recoalesce logic need to access nfs_pageio_descriptor but we do not > have that information at pnfs_ld_read/write_done. As far as I can see, the right thing to do is to mark the layout as invalid and then redirty the page. It should be easy to have fsync() re-send the pages in this case. These should be extremely rare events, since we expect to catch most of the pNFS failures when we do the actual LAYOUTGET in the ->pg_init(). My main worry is for aio/dio where there is no good mechanism for retrying. I'm still working on that... What do you suggest we do for the current set of patches that add block layout to pnfs? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jim Rees [mailto:rees@umich.edu] > Sent: Tuesday, July 26, 2011 12:08 PM > To: Myklebust, Trond > Cc: Peng Tao; tao.peng@emc.com; linux-nfs@vger.kernel.org; > bhalevy@tonian.com > Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS > > Myklebust, Trond wrote: > > > Thanks for your explanation. Is there any plan on how to fix > > pnfs_ld_read/write_done? Basically, we would need an interface that > > can redirect the IO to MDS if pnfs_error is set or do all necessary > > cleanup work to end read/write if pnfs_error is 0. IMHO, the > > recoalesce logic need to access nfs_pageio_descriptor but we do not > > have that information at pnfs_ld_read/write_done. > > As far as I can see, the right thing to do is to mark the layout as > invalid and then redirty the page. It should be easy to have fsync() > re-send the pages in this case. These should be extremely rare > events, > since we expect to catch most of the pNFS failures when we do the > actual > LAYOUTGET in the ->pg_init(). > > My main worry is for aio/dio where there is no good mechanism for > retrying. I'm still working on that... > > What do you suggest we do for the current set of patches that add block > layout to pnfs? If you are calling pnfs_ld_read/write_done, then don't change anything: it is easier to fix this in one spot rather than several. However someone needs to start working on fixing the code in pnfs_ld_read/write_done to do the right thing. If nobody else has the cycles, then I can do that but I'd prefer to have someone who can easily test the resulting code do it. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-07-25 15:13, Trond Myklebust wrote: > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: >> Hi, Trond, >> >> Any comments on this patch? I still get kernel crash when pnfs write is attempted but fails and calls pnfs_ld_write_done(). It seems object layout uses the same code path as well. But I don't find the patch in either your tree or Benny's tree. Are there any concerns? >> >> Thanks, >> Tao > > The whole pnfs_ld_write_done thing is bogus and needs to be replaced > with something sane. It is trying to initiate a WRITE RPC call with the > wrong block size, I was under the impression that your re-coalesce work will take care of that. Is there anything else that needs to be done? > and is calling the MDS rpc_call_done() and > rpc_release() with an uninitialised rpc task pointer. So on this path there is indeed no active rpc task so we're using the task structure in the struct nfs_write_data. I agree that having a helper function at the rpc layer to initialize it to a meaningful value indicating there is no active rpc task would be a useful thing. But the fix Peng sent is for the fallback path where we initiate I/O to the MDS and we do build a rpc task properly. On this path lseg indeed needs to be put and set to NULL. Benny > > Ditto for pnfs_ld_read_done. > > Cheers > Trond > >>> -----Original Message----- >>> From: Benny Halevy [mailto:bhalevy@tonian.com] >>> Sent: Saturday, July 09, 2011 10:10 PM >>> To: Peng Tao >>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org; Peng, Tao >>> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >>> >>> On 2011-07-04 04:30, Peng Tao wrote: >>>> There is no need to keep lseg reference when read/write through MDS. >>>> This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc >>>> because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg >>>> is not NULL. >>>> >>>> Signed-off-by: Peng Tao <peng_tao@emc.com> >>> >>> Looks good to me. >>> >>> Benny >>> >>>> --- >>>> fs/nfs/pnfs.c | 6 ++++++ >>>> 1 files changed, 6 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >>>> index 30a0394..55fdf02 100644 >>>> --- a/fs/nfs/pnfs.c >>>> +++ b/fs/nfs/pnfs.c >>>> @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) >>>> >>>> dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, >>>> data->pnfs_error); >>>> + >>>> + put_lseg(data->lseg); >>>> + data->lseg = NULL; >>>> status = nfs_initiate_write(data, NFS_CLIENT(data->inode), >>>> data->mds_ops, NFS_FILE_SYNC); >>>> return status ? : -EAGAIN; >>>> @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) >>>> >>>> dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, >>>> data->pnfs_error); >>>> + >>>> + put_lseg(data->lseg); >>>> + data->lseg = NULL; >>>> status = nfs_initiate_read(data, NFS_CLIENT(data->inode), >>>> data->mds_ops); >>>> return status ? : -EAGAIN; >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-07-26 12:14, Myklebust, Trond wrote: >> -----Original Message----- >> From: Jim Rees [mailto:rees@umich.edu] >> Sent: Tuesday, July 26, 2011 12:08 PM >> To: Myklebust, Trond >> Cc: Peng Tao; tao.peng@emc.com; linux-nfs@vger.kernel.org; >> bhalevy@tonian.com >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> >> Myklebust, Trond wrote: >> >> > Thanks for your explanation. Is there any plan on how to fix >> > pnfs_ld_read/write_done? Basically, we would need an interface > that >> > can redirect the IO to MDS if pnfs_error is set or do all > necessary >> > cleanup work to end read/write if pnfs_error is 0. IMHO, the >> > recoalesce logic need to access nfs_pageio_descriptor but we do > not >> > have that information at pnfs_ld_read/write_done. >> >> As far as I can see, the right thing to do is to mark the layout as >> invalid and then redirty the page. It should be easy to have fsync() >> re-send the pages in this case. These should be extremely rare >> events, >> since we expect to catch most of the pNFS failures when we do the >> actual >> LAYOUTGET in the ->pg_init(). >> >> My main worry is for aio/dio where there is no good mechanism for >> retrying. I'm still working on that... >> >> What do you suggest we do for the current set of patches that add > block >> layout to pnfs? > > If you are calling pnfs_ld_read/write_done, then don't change anything: > it is easier to fix this in one spot rather than several. Even if we plan on fixing this for the next merge window I think there's value with the current fix even if it's going to be replaced with a better fix along the road. > However someone needs to start working on fixing the code in > pnfs_ld_read/write_done to do the right thing. If nobody else has the > cycles, then I can do that but I'd prefer to have someone who can easily > test the resulting code do it. I don't have cycles to code this either but I'll be happy to help with looking at the design and reviewing the implementation. Benny > > Cheers > Trond > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> -----Original Message----- >> From: Peng Tao [mailto:bergwolf@gmail.com] >> Sent: Tuesday, July 26, 2011 11:37 AM >> To: Myklebust, Trond >> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> >> Hi, Trond, >> >> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust >> <Trond.Myklebust@netapp.com> wrote: >> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: >> >> Hi, Trond, >> >> >> >> Any comments on this patch? I still get kernel crash when pnfs write >> is attempted but fails and calls pnfs_ld_write_done(). It seems object >> layout uses the same code path as well. But I don't find the patch in >> either your tree or Benny's tree. Are there any concerns? >> >> >> >> Thanks, >> >> Tao >> > >> > The whole pnfs_ld_write_done thing is bogus and needs to be replaced >> > with something sane. It is trying to initiate a WRITE RPC call with >> the >> > wrong block size, and is calling the MDS rpc_call_done() and >> > rpc_release() with an uninitialised rpc task pointer. >> > >> > Ditto for pnfs_ld_read_done. >> Thanks for your explanation. Is there any plan on how to fix >> pnfs_ld_read/write_done? Basically, we would need an interface that >> can redirect the IO to MDS if pnfs_error is set or do all necessary >> cleanup work to end read/write if pnfs_error is 0. IMHO, the >> recoalesce logic need to access nfs_pageio_descriptor but we do not >> have that information at pnfs_ld_read/write_done. > > As far as I can see, the right thing to do is to mark the layout as invalid and then redirty the page. It should be easy to have fsync() re-send the pages in this case. These should be extremely rare events, since we expect to catch most of the pNFS failures when we do the actual LAYOUTGET in the ->pg_init(). Agreed. This should be easier than re-coalescing and sending to MDS at read/write_done. > > My main worry is for aio/dio where there is no good mechanism for retrying. I'm still working on that... For dio, we may have to send the failed pages to MDS instead of relying on next fsync() to retry. Thanks, Tao > > Cheers > Trond > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 27, 2011 at 1:37 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> -----Original Message----- >> From: Peng Tao [mailto:bergwolf@gmail.com] >> Sent: Tuesday, July 26, 2011 1:33 PM >> To: Myklebust, Trond >> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> >> On Tue, Jul 26, 2011 at 11:50 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> >> -----Original Message----- >> >> From: Peng Tao [mailto:bergwolf@gmail.com] >> >> Sent: Tuesday, July 26, 2011 11:37 AM >> >> To: Myklebust, Trond >> >> Cc: tao.peng@emc.com; linux-nfs@vger.kernel.org; bhalevy@tonian.com >> >> Subject: Re: [PATCH] NFS41: Drop lseg ref before fallthru to MDS >> >> >> >> Hi, Trond, >> >> >> >> On Tue, Jul 26, 2011 at 3:13 AM, Trond Myklebust >> >> <Trond.Myklebust@netapp.com> wrote: >> >> > On Wed, 2011-07-20 at 01:52 -0400, tao.peng@emc.com wrote: >> >> >> Hi, Trond, >> >> >> >> >> >> Any comments on this patch? I still get kernel crash when pnfs >> write >> >> is attempted but fails and calls pnfs_ld_write_done(). It seems >> object >> >> layout uses the same code path as well. But I don't find the patch >> in >> >> either your tree or Benny's tree. Are there any concerns? >> >> >> >> >> >> Thanks, >> >> >> Tao >> >> > >> >> > The whole pnfs_ld_write_done thing is bogus and needs to be >> replaced >> >> > with something sane. It is trying to initiate a WRITE RPC call >> with >> >> the >> >> > wrong block size, and is calling the MDS rpc_call_done() and >> >> > rpc_release() with an uninitialised rpc task pointer. >> >> > >> >> > Ditto for pnfs_ld_read_done. >> >> Thanks for your explanation. Is there any plan on how to fix >> >> pnfs_ld_read/write_done? Basically, we would need an interface that >> >> can redirect the IO to MDS if pnfs_error is set or do all necessary >> >> cleanup work to end read/write if pnfs_error is 0. IMHO, the >> >> recoalesce logic need to access nfs_pageio_descriptor but we do not >> >> have that information at pnfs_ld_read/write_done. >> > >> > As far as I can see, the right thing to do is to mark the layout as >> invalid and then redirty the page. It should be easy to have fsync() >> re-send the pages in this case. These should be extremely rare events, >> since we expect to catch most of the pNFS failures when we do the >> actual LAYOUTGET in the ->pg_init(). >> Agreed. This should be easier than re-coalescing and sending to MDS at >> read/write_done. >> >> > >> > My main worry is for aio/dio where there is no good mechanism for >> retrying. I'm still working on that... >> For dio, we may have to send the failed pages to MDS instead of >> relying on next fsync() to retry. > > The problem isn't what to do, it is more one of _who_ does it. The rpciod/nfsiod queues aren't the ideal place to set up a resend since it involves allocating memory. How about having a pnfs private workqueue to take care of the resend? There are some other places default workqueue is used in io path in both block and object layout code. It can be problematic if the default workqueue is blocked. e.g. if someone on the default workqueue allocates memory and reclaim code comes into pnfs path. Using default workqueue here can cause application hang forever. If we have a private workqueue, these problems can be solve IMO. Best, Tao > > Cheers > Trond >
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 30a0394..55fdf02 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -1193,6 +1193,9 @@ pnfs_ld_write_done(struct nfs_write_data *data) dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, data->pnfs_error); + + put_lseg(data->lseg); + data->lseg = NULL; status = nfs_initiate_write(data, NFS_CLIENT(data->inode), data->mds_ops, NFS_FILE_SYNC); return status ? : -EAGAIN; @@ -1240,6 +1243,9 @@ pnfs_ld_read_done(struct nfs_read_data *data) dprintk("%s: pnfs_error=%d, retry via MDS\n", __func__, data->pnfs_error); + + put_lseg(data->lseg); + data->lseg = NULL; status = nfs_initiate_read(data, NFS_CLIENT(data->inode), data->mds_ops); return status ? : -EAGAIN;
There is no need to keep lseg reference when read/write through MDS. This fixes a null pointer crash at nfs_post_op_update_inode_force_wcc because nfs4_proc_write_setup will unset wdata->res.fattr if wdata->lseg is not NULL. Signed-off-by: Peng Tao <peng_tao@emc.com> --- fs/nfs/pnfs.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)