Message ID | 20171215211232.18942-1-smayhew@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Trond, Anna, Any concerns with this patch? We left this run on multiple machines through the holidays... all doing direct I/O to a Netapp filer while doing storage failovers on an hourly basis on the filer. No more issues with processes getting stuck in inode_dio_wait. -Scott On Fri, 15 Dec 2017, Scott Mayhew wrote: > Currently when falling back to doing I/O through the MDS (via > pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header > without releasing the reference taken on the dreq > via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> > nfs_direct_pgio_init. It then takes another reference on the dreq via > nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and > as a result the requester will become stuck in inode_dio_wait. Once > that happens, other processes accessing the inode will become stuck as > well. > > Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean > up correctly by calling hdr->completion_ops->completion() instead of > calling hdr->release() directly. > > This can be reproduced (sometimes) by performing "storage failover > takeover" commands on NetApp filer while doing direct I/O from a client. > > This can also be reproduced using SystemTap to simulate a failure while > doing direct I/O from a client (from Dave Wysochanski > <dwysocha@redhat.com>): > > stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' > > Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > --- > fs/nfs/pnfs.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > index d602fe9..eb098cc 100644 > --- a/fs/nfs/pnfs.c > +++ b/fs/nfs/pnfs.c > @@ -2255,7 +2255,7 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc, > nfs_pageio_reset_write_mds(desc); > mirror->pg_recoalesce = 1; > } > - hdr->release(hdr); > + hdr->completion_ops->completion(hdr); > } > > static enum pnfs_try_status > @@ -2378,7 +2378,7 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc, > nfs_pageio_reset_read_mds(desc); > mirror->pg_recoalesce = 1; > } > - hdr->release(hdr); > + hdr->completion_ops->completion(hdr); > } > > /* > -- > 2.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gVGh1LCAyMDE4LTAxLTExIGF0IDA4OjEwIC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+ IFRyb25kLCBBbm5hLA0KPiANCj4gQW55IGNvbmNlcm5zIHdpdGggdGhpcyBwYXRjaD8NCj4gDQo+ IFdlIGxlZnQgdGhpcyBydW4gb24gbXVsdGlwbGUgbWFjaGluZXMgdGhyb3VnaCB0aGUgaG9saWRh eXMuLi4gYWxsDQo+IGRvaW5nDQo+IGRpcmVjdCBJL08gdG8gYSBOZXRhcHAgZmlsZXIgd2hpbGUg ZG9pbmcgc3RvcmFnZSBmYWlsb3ZlcnMgb24gYW4NCj4gaG91cmx5DQo+IGJhc2lzIG9uIHRoZSBm aWxlci4gIE5vIG1vcmUgaXNzdWVzIHdpdGggcHJvY2Vzc2VzIGdldHRpbmcgc3R1Y2sgaW4NCj4g aW5vZGVfZGlvX3dhaXQuDQoNCkknbSBmaW5lIHdpdGggdGhpcyB2ZXJzaW9uLiBTaG91bGQgaXQg YmUgYSBzdGFibGUgZml4PyBJZiBzbywgZG8gd2UNCmhhdmUgYSAiRml4ZXM6IiBjb21taXQgZm9y IGl0Pw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy LCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 11 Jan 2018, Trond Myklebust wrote: > On Thu, 2018-01-11 at 08:10 -0500, Scott Mayhew wrote: > > Trond, Anna, > > > > Any concerns with this patch? > > > > We left this run on multiple machines through the holidays... all > > doing > > direct I/O to a Netapp filer while doing storage failovers on an > > hourly > > basis on the filer. No more issues with processes getting stuck in > > inode_dio_wait. > > I'm fine with this version. Should it be a stable fix? If so, do we > have a "Fixes:" commit for it? Yeah it should probably be a stable fix. The hdr->release() call was added in commit 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails")... but I can reproduce this on earlier kernels than that (I tested 4.0) so it looks like this goes way back... -Scott > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index d602fe9..eb098cc 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -2255,7 +2255,7 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc, nfs_pageio_reset_write_mds(desc); mirror->pg_recoalesce = 1; } - hdr->release(hdr); + hdr->completion_ops->completion(hdr); } static enum pnfs_try_status @@ -2378,7 +2378,7 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc, nfs_pageio_reset_read_mds(desc); mirror->pg_recoalesce = 1; } - hdr->release(hdr); + hdr->completion_ops->completion(hdr); } /*
Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> --- fs/nfs/pnfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)