Message ID | 20171213161555.27993-1-smayhew@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
T24gV2VkLCAyMDE3LTEyLTEzIGF0IDExOjE1IC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+ IEN1cnJlbnRseSB3aGVuIGZhbGxpbmcgYmFjayB0byBkb2luZyBJL08gdGhyb3VnaCB0aGUgTURT ICh2aWENCj4gcG5mc197cmVhZHx3cml0ZX1fdGhyb3VnaF9tZHMpLCB0aGUgY2xpZW50IGZyZWVz IHRoZSBuZnNfcGdpb19oZWFkZXINCj4gd2l0aG91dCByZWxlYXNpbmcgdGhlIHJlZmVyZW5jZSB0 YWtlbiBvbiB0aGUgZHJlcQ0KPiB2aWEgcG5mc19nZW5lcmljX3BnX3tyZWFkfHdyaXRlfXBhZ2Vz IC0+IG5mc19wZ2hlYWRlcl9pbml0IC0+DQo+IG5mc19kaXJlY3RfcGdpb19pbml0LiAgSXQgdGhl biB0YWtlcyBhbm90aGVyIHJlZmVyZW5jZSBvbiB0aGUgZHJlcQ0KPiB2aWENCj4gbmZzX2dlbmVy aWNfcGdfcGdpb3MgLT4gbmZzX3BnaGVhZGVyX2luaXQgLT4gbmZzX2RpcmVjdF9wZ2lvX2luaXQg YW5kDQo+IGFzIGEgcmVzdWx0IHRoZSByZXF1ZXN0ZXIgd2lsbCBiZWNvbWUgc3R1Y2sgaW4gaW5v ZGVfZGlvX3dhaXQuICBPbmNlDQo+IHRoYXQgaGFwcGVucywgb3RoZXIgcHJvY2Vzc2VzIGFjY2Vz c2luZyB0aGUgaW5vZGUgd2lsbCBiZWNvbWUgc3R1Y2sNCj4gYXMNCj4gd2VsbC4NCj4gDQo+IE1v dmluZyB0aGUgaW5pdF9oZHIgY2FsbCBkb3duIHRvIG5mc19pbml0aWF0ZV9wZ2lvIGVuc3VyZXMg d2UgdGFrZQ0KPiB0aGUNCj4gcmVmZXJlbmNlIG9uIHRoZSBkcmVxIG9ubHkgb25jZS4NCj4gDQo+ IFRoaXMgY2FuIGJlIHJlcHJvZHVjZWQgKHNvbWV0aW1lcykgYnkgcGVyZm9ybWluZyAic3RvcmFn ZSBmYWlsb3Zlcg0KPiB0YWtlb3ZlciIgY29tbWFuZHMgb24gTmV0QXBwIGZpbGVyIHdoaWxlIGRv aW5nIGRpcmVjdCBJL08gZnJvbSBhDQo+IGNsaWVudC4NCj4gDQo+IFRoaXMgY2FuIGFsc28gYmUg cmVwcm9kdWNlZCB1c2luZyBTeXN0ZW1UYXAgdG8gc2ltdWxhdGUgYSBmYWlsdXJlDQo+IHdoaWxl DQo+IGRvaW5nIGRpcmVjdCBJL08gZnJvbSBhIGNsaWVudCAoZnJvbSBEYXZlIFd5c29jaGFuc2tp DQo+IDxkd3lzb2NoYUByZWRoYXQuY29tPik6DQo+IA0KPiBzdGFwIC12IC1nIC1lICdwcm9iZQ0K PiBtb2R1bGUoIm5mc19sYXlvdXRfbmZzdjQxX2ZpbGVzIikuZnVuY3Rpb24oIm5mczRfZmxfcHJl cGFyZV9kcyIpLnJldHUNCj4gcm4geyAkcmV0dXJuPU5VTEw7IGV4aXQoKTsgfScNCj4gDQo+IFN1 Z2dlc3RlZC1ieTogQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4NCj4g U2lnbmVkLW9mZi1ieTogU2NvdHQgTWF5aGV3IDxzbWF5aGV3QHJlZGhhdC5jb20+DQo+IC0tLQ0K PiAgZnMvbmZzL3BhZ2VsaXN0LmMgfCA2ICsrKy0tLQ0KPiAgMSBmaWxlIGNoYW5nZWQsIDMgaW5z ZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvcGFn ZWxpc3QuYyBiL2ZzL25mcy9wYWdlbGlzdC5jDQo+IGluZGV4IGQwNTQzZTEuLmQ0NzhjMTkgMTAw NjQ0DQo+IC0tLSBhL2ZzL25mcy9wYWdlbGlzdC5jDQo+ICsrKyBiL2ZzL25mcy9wYWdlbGlzdC5j DQo+IEBAIC01NCw5ICs1NCw2IEBAIHZvaWQgbmZzX3BnaGVhZGVyX2luaXQoc3RydWN0IG5mc19w YWdlaW9fZGVzY3JpcHRvcg0KPiAqZGVzYywNCj4gIAloZHItPmRyZXEgPSBkZXNjLT5wZ19kcmVx Ow0KPiAgCWhkci0+cmVsZWFzZSA9IHJlbGVhc2U7DQo+ICAJaGRyLT5jb21wbGV0aW9uX29wcyA9 IGRlc2MtPnBnX2NvbXBsZXRpb25fb3BzOw0KPiAtCWlmIChoZHItPmNvbXBsZXRpb25fb3BzLT5p bml0X2hkcikNCj4gLQkJaGRyLT5jb21wbGV0aW9uX29wcy0+aW5pdF9oZHIoaGRyKTsNCj4gLQ0K PiAgCWhkci0+cGdpb19taXJyb3JfaWR4ID0gZGVzYy0+cGdfbWlycm9yX2lkeDsNCj4gIH0NCj4g IEVYUE9SVF9TWU1CT0xfR1BMKG5mc19wZ2hlYWRlcl9pbml0KTsNCj4gQEAgLTYwNyw2ICs2MDQs OSBAQCBpbnQgbmZzX2luaXRpYXRlX3BnaW8oc3RydWN0IHJwY19jbG50ICpjbG50LA0KPiBzdHJ1 Y3QgbmZzX3BnaW9faGVhZGVyICpoZHIsDQo+ICAJfTsNCj4gIAlpbnQgcmV0ID0gMDsNCj4gIA0K PiArCWlmIChoZHItPmNvbXBsZXRpb25fb3BzLT5pbml0X2hkcikNCj4gKwkJaGRyLT5jb21wbGV0 aW9uX29wcy0+aW5pdF9oZHIoaGRyKTsNCj4gKw0KPiAgCWhkci0+cndfb3BzLT5yd19pbml0aWF0 ZShoZHIsICZtc2csIHJwY19vcHMsDQo+ICZ0YXNrX3NldHVwX2RhdGEsIGhvdyk7DQo+ICANCj4g IAlkcHJpbnRrKCJORlM6IGluaXRpYXRlZCBwZ2lvIGNhbGwgIg0KDQpXb24ndCB0aGlzIGJyZWFr IGFzIHNvb24gYXMgd2UgaGl0IGEgY2FsbCB0byBuZnNfcGdpb19lcnJvcigpPyBBbHNvLA0Kd2hh dCBhYm91dCBibG9jayBsYXlvdXQsIHdoaWNoIG5ldmVyIGNhbGxzIG5mc19pbml0aWF0ZV9wZ2lv KCk/DQoNCkknZCBwcmVmZXIgdGhhdCB3ZSBmaXggdGhpcyBpbiBwbmZzX3JlYWRfdGhyb3VnaF9t ZHMoKSBhbmQNCnBuZnNfd3JpdGVfdGhyb3VnaF9tZHMoKSBieSBlbnN1cmluZyB0aGF0IHRob3Nl IGZ1bmN0aW9ucyBjbGVhbiB1cA0KY29ycmVjdGx5LiBTaG91bGQgdGhleSBiZSBjYWxsaW5nIGhk ci0+Y29tcGxldGlvbl9vcHMtPmNvbXBsZXRpb24oKQ0KaW5zdGVhZCBvZiBjYWxsaW5nIGhkci0+ cmVsZWFzZSgpIGRpcmVjdGx5Pw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs aWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh LmNvbQ0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 13 Dec 2017, Trond Myklebust wrote: > On Wed, 2017-12-13 at 11:15 -0500, Scott Mayhew wrote: > > Currently when falling back to doing I/O through the MDS (via > > pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header > > without releasing the reference taken on the dreq > > via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> > > nfs_direct_pgio_init. It then takes another reference on the dreq > > via > > nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and > > as a result the requester will become stuck in inode_dio_wait. Once > > that happens, other processes accessing the inode will become stuck > > as > > well. > > > > Moving the init_hdr call down to nfs_initiate_pgio ensures we take > > the > > reference on the dreq only once. > > > > This can be reproduced (sometimes) by performing "storage failover > > takeover" commands on NetApp filer while doing direct I/O from a > > client. > > > > This can also be reproduced using SystemTap to simulate a failure > > while > > doing direct I/O from a client (from Dave Wysochanski > > <dwysocha@redhat.com>): > > > > stap -v -g -e 'probe > > module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").retu > > rn { $return=NULL; exit(); }' > > > > Suggested-by: Benjamin Coddington <bcodding@redhat.com> > > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > > --- > > fs/nfs/pagelist.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c > > index d0543e1..d478c19 100644 > > --- a/fs/nfs/pagelist.c > > +++ b/fs/nfs/pagelist.c > > @@ -54,9 +54,6 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor > > *desc, > > hdr->dreq = desc->pg_dreq; > > hdr->release = release; > > hdr->completion_ops = desc->pg_completion_ops; > > - if (hdr->completion_ops->init_hdr) > > - hdr->completion_ops->init_hdr(hdr); > > - > > hdr->pgio_mirror_idx = desc->pg_mirror_idx; > > } > > EXPORT_SYMBOL_GPL(nfs_pgheader_init); > > @@ -607,6 +604,9 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, > > struct nfs_pgio_header *hdr, > > }; > > int ret = 0; > > > > + if (hdr->completion_ops->init_hdr) > > + hdr->completion_ops->init_hdr(hdr); > > + > > hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, > > &task_setup_data, how); > > > > dprintk("NFS: initiated pgio call " > > Won't this break as soon as we hit a call to nfs_pgio_error()? Also, > what about block layout, which never calls nfs_initiate_pgio()? Yes, it appears it would break both of those. Sorry I missed that. > > I'd prefer that we fix this in pnfs_read_through_mds() and > pnfs_write_through_mds() by ensuring that those functions clean up > correctly. Should they be calling hdr->completion_ops->completion() > instead of calling hdr->release() directly? We were wondering if that might be the case too. I'll get to testing on that right away. Thanks, Scott > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index d0543e1..d478c19 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -54,9 +54,6 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc, hdr->dreq = desc->pg_dreq; hdr->release = release; hdr->completion_ops = desc->pg_completion_ops; - if (hdr->completion_ops->init_hdr) - hdr->completion_ops->init_hdr(hdr); - hdr->pgio_mirror_idx = desc->pg_mirror_idx; } EXPORT_SYMBOL_GPL(nfs_pgheader_init); @@ -607,6 +604,9 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, }; int ret = 0; + if (hdr->completion_ops->init_hdr) + hdr->completion_ops->init_hdr(hdr); + hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how); dprintk("NFS: initiated pgio call "
Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Moving the init_hdr call down to nfs_initiate_pgio ensures we take the reference on the dreq only once. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> --- fs/nfs/pagelist.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)