diff mbox

nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds

Message ID 20171213161555.27993-1-smayhew@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Scott Mayhew Dec. 13, 2017, 4:15 p.m. UTC
Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init.  It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait.  Once
that happens, other processes accessing the inode will become stuck as
well.

Moving the init_hdr call down to nfs_initiate_pgio ensures we take the
reference on the dreq only once.

This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.

This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
<dwysocha@redhat.com>):

stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
 fs/nfs/pagelist.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Trond Myklebust Dec. 13, 2017, 5:35 p.m. UTC | #1
T24gV2VkLCAyMDE3LTEyLTEzIGF0IDExOjE1IC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+
IEN1cnJlbnRseSB3aGVuIGZhbGxpbmcgYmFjayB0byBkb2luZyBJL08gdGhyb3VnaCB0aGUgTURT
ICh2aWENCj4gcG5mc197cmVhZHx3cml0ZX1fdGhyb3VnaF9tZHMpLCB0aGUgY2xpZW50IGZyZWVz
IHRoZSBuZnNfcGdpb19oZWFkZXINCj4gd2l0aG91dCByZWxlYXNpbmcgdGhlIHJlZmVyZW5jZSB0
YWtlbiBvbiB0aGUgZHJlcQ0KPiB2aWEgcG5mc19nZW5lcmljX3BnX3tyZWFkfHdyaXRlfXBhZ2Vz
IC0+IG5mc19wZ2hlYWRlcl9pbml0IC0+DQo+IG5mc19kaXJlY3RfcGdpb19pbml0LiAgSXQgdGhl
biB0YWtlcyBhbm90aGVyIHJlZmVyZW5jZSBvbiB0aGUgZHJlcQ0KPiB2aWENCj4gbmZzX2dlbmVy
aWNfcGdfcGdpb3MgLT4gbmZzX3BnaGVhZGVyX2luaXQgLT4gbmZzX2RpcmVjdF9wZ2lvX2luaXQg
YW5kDQo+IGFzIGEgcmVzdWx0IHRoZSByZXF1ZXN0ZXIgd2lsbCBiZWNvbWUgc3R1Y2sgaW4gaW5v
ZGVfZGlvX3dhaXQuICBPbmNlDQo+IHRoYXQgaGFwcGVucywgb3RoZXIgcHJvY2Vzc2VzIGFjY2Vz
c2luZyB0aGUgaW5vZGUgd2lsbCBiZWNvbWUgc3R1Y2sNCj4gYXMNCj4gd2VsbC4NCj4gDQo+IE1v
dmluZyB0aGUgaW5pdF9oZHIgY2FsbCBkb3duIHRvIG5mc19pbml0aWF0ZV9wZ2lvIGVuc3VyZXMg
d2UgdGFrZQ0KPiB0aGUNCj4gcmVmZXJlbmNlIG9uIHRoZSBkcmVxIG9ubHkgb25jZS4NCj4gDQo+
IFRoaXMgY2FuIGJlIHJlcHJvZHVjZWQgKHNvbWV0aW1lcykgYnkgcGVyZm9ybWluZyAic3RvcmFn
ZSBmYWlsb3Zlcg0KPiB0YWtlb3ZlciIgY29tbWFuZHMgb24gTmV0QXBwIGZpbGVyIHdoaWxlIGRv
aW5nIGRpcmVjdCBJL08gZnJvbSBhDQo+IGNsaWVudC4NCj4gDQo+IFRoaXMgY2FuIGFsc28gYmUg
cmVwcm9kdWNlZCB1c2luZyBTeXN0ZW1UYXAgdG8gc2ltdWxhdGUgYSBmYWlsdXJlDQo+IHdoaWxl
DQo+IGRvaW5nIGRpcmVjdCBJL08gZnJvbSBhIGNsaWVudCAoZnJvbSBEYXZlIFd5c29jaGFuc2tp
DQo+IDxkd3lzb2NoYUByZWRoYXQuY29tPik6DQo+IA0KPiBzdGFwIC12IC1nIC1lICdwcm9iZQ0K
PiBtb2R1bGUoIm5mc19sYXlvdXRfbmZzdjQxX2ZpbGVzIikuZnVuY3Rpb24oIm5mczRfZmxfcHJl
cGFyZV9kcyIpLnJldHUNCj4gcm4geyAkcmV0dXJuPU5VTEw7IGV4aXQoKTsgfScNCj4gDQo+IFN1
Z2dlc3RlZC1ieTogQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4NCj4g
U2lnbmVkLW9mZi1ieTogU2NvdHQgTWF5aGV3IDxzbWF5aGV3QHJlZGhhdC5jb20+DQo+IC0tLQ0K
PiAgZnMvbmZzL3BhZ2VsaXN0LmMgfCA2ICsrKy0tLQ0KPiAgMSBmaWxlIGNoYW5nZWQsIDMgaW5z
ZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvcGFn
ZWxpc3QuYyBiL2ZzL25mcy9wYWdlbGlzdC5jDQo+IGluZGV4IGQwNTQzZTEuLmQ0NzhjMTkgMTAw
NjQ0DQo+IC0tLSBhL2ZzL25mcy9wYWdlbGlzdC5jDQo+ICsrKyBiL2ZzL25mcy9wYWdlbGlzdC5j
DQo+IEBAIC01NCw5ICs1NCw2IEBAIHZvaWQgbmZzX3BnaGVhZGVyX2luaXQoc3RydWN0IG5mc19w
YWdlaW9fZGVzY3JpcHRvcg0KPiAqZGVzYywNCj4gIAloZHItPmRyZXEgPSBkZXNjLT5wZ19kcmVx
Ow0KPiAgCWhkci0+cmVsZWFzZSA9IHJlbGVhc2U7DQo+ICAJaGRyLT5jb21wbGV0aW9uX29wcyA9
IGRlc2MtPnBnX2NvbXBsZXRpb25fb3BzOw0KPiAtCWlmIChoZHItPmNvbXBsZXRpb25fb3BzLT5p
bml0X2hkcikNCj4gLQkJaGRyLT5jb21wbGV0aW9uX29wcy0+aW5pdF9oZHIoaGRyKTsNCj4gLQ0K
PiAgCWhkci0+cGdpb19taXJyb3JfaWR4ID0gZGVzYy0+cGdfbWlycm9yX2lkeDsNCj4gIH0NCj4g
IEVYUE9SVF9TWU1CT0xfR1BMKG5mc19wZ2hlYWRlcl9pbml0KTsNCj4gQEAgLTYwNyw2ICs2MDQs
OSBAQCBpbnQgbmZzX2luaXRpYXRlX3BnaW8oc3RydWN0IHJwY19jbG50ICpjbG50LA0KPiBzdHJ1
Y3QgbmZzX3BnaW9faGVhZGVyICpoZHIsDQo+ICAJfTsNCj4gIAlpbnQgcmV0ID0gMDsNCj4gIA0K
PiArCWlmIChoZHItPmNvbXBsZXRpb25fb3BzLT5pbml0X2hkcikNCj4gKwkJaGRyLT5jb21wbGV0
aW9uX29wcy0+aW5pdF9oZHIoaGRyKTsNCj4gKw0KPiAgCWhkci0+cndfb3BzLT5yd19pbml0aWF0
ZShoZHIsICZtc2csIHJwY19vcHMsDQo+ICZ0YXNrX3NldHVwX2RhdGEsIGhvdyk7DQo+ICANCj4g
IAlkcHJpbnRrKCJORlM6IGluaXRpYXRlZCBwZ2lvIGNhbGwgIg0KDQpXb24ndCB0aGlzIGJyZWFr
IGFzIHNvb24gYXMgd2UgaGl0IGEgY2FsbCB0byBuZnNfcGdpb19lcnJvcigpPyBBbHNvLA0Kd2hh
dCBhYm91dCBibG9jayBsYXlvdXQsIHdoaWNoIG5ldmVyIGNhbGxzIG5mc19pbml0aWF0ZV9wZ2lv
KCk/DQoNCkknZCBwcmVmZXIgdGhhdCB3ZSBmaXggdGhpcyBpbiBwbmZzX3JlYWRfdGhyb3VnaF9t
ZHMoKSBhbmQNCnBuZnNfd3JpdGVfdGhyb3VnaF9tZHMoKSBieSBlbnN1cmluZyB0aGF0IHRob3Nl
IGZ1bmN0aW9ucyBjbGVhbiB1cA0KY29ycmVjdGx5LiBTaG91bGQgdGhleSBiZSBjYWxsaW5nIGhk
ci0+Y29tcGxldGlvbl9vcHMtPmNvbXBsZXRpb24oKQ0KaW5zdGVhZCBvZiBjYWxsaW5nIGhkci0+
cmVsZWFzZSgpIGRpcmVjdGx5Pw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh
LmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Mayhew Dec. 13, 2017, 6:22 p.m. UTC | #2
On Wed, 13 Dec 2017, Trond Myklebust wrote:

> On Wed, 2017-12-13 at 11:15 -0500, Scott Mayhew wrote:
> > Currently when falling back to doing I/O through the MDS (via
> > pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
> > without releasing the reference taken on the dreq
> > via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
> > nfs_direct_pgio_init.  It then takes another reference on the dreq
> > via
> > nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
> > as a result the requester will become stuck in inode_dio_wait.  Once
> > that happens, other processes accessing the inode will become stuck
> > as
> > well.
> > 
> > Moving the init_hdr call down to nfs_initiate_pgio ensures we take
> > the
> > reference on the dreq only once.
> > 
> > This can be reproduced (sometimes) by performing "storage failover
> > takeover" commands on NetApp filer while doing direct I/O from a
> > client.
> > 
> > This can also be reproduced using SystemTap to simulate a failure
> > while
> > doing direct I/O from a client (from Dave Wysochanski
> > <dwysocha@redhat.com>):
> > 
> > stap -v -g -e 'probe
> > module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").retu
> > rn { $return=NULL; exit(); }'
> > 
> > Suggested-by: Benjamin Coddington <bcodding@redhat.com>
> > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > ---
> >  fs/nfs/pagelist.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index d0543e1..d478c19 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -54,9 +54,6 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor
> > *desc,
> >  	hdr->dreq = desc->pg_dreq;
> >  	hdr->release = release;
> >  	hdr->completion_ops = desc->pg_completion_ops;
> > -	if (hdr->completion_ops->init_hdr)
> > -		hdr->completion_ops->init_hdr(hdr);
> > -
> >  	hdr->pgio_mirror_idx = desc->pg_mirror_idx;
> >  }
> >  EXPORT_SYMBOL_GPL(nfs_pgheader_init);
> > @@ -607,6 +604,9 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt,
> > struct nfs_pgio_header *hdr,
> >  	};
> >  	int ret = 0;
> >  
> > +	if (hdr->completion_ops->init_hdr)
> > +		hdr->completion_ops->init_hdr(hdr);
> > +
> >  	hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops,
> > &task_setup_data, how);
> >  
> >  	dprintk("NFS: initiated pgio call "
> 
> Won't this break as soon as we hit a call to nfs_pgio_error()? Also,
> what about block layout, which never calls nfs_initiate_pgio()?

Yes, it appears it would break both of those.  Sorry I missed that.

> 
> I'd prefer that we fix this in pnfs_read_through_mds() and
> pnfs_write_through_mds() by ensuring that those functions clean up
> correctly. Should they be calling hdr->completion_ops->completion()
> instead of calling hdr->release() directly?

We were wondering if that might be the case too.  I'll get to testing on
that right away.

Thanks,
Scott
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d0543e1..d478c19 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -54,9 +54,6 @@  void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
 	hdr->dreq = desc->pg_dreq;
 	hdr->release = release;
 	hdr->completion_ops = desc->pg_completion_ops;
-	if (hdr->completion_ops->init_hdr)
-		hdr->completion_ops->init_hdr(hdr);
-
 	hdr->pgio_mirror_idx = desc->pg_mirror_idx;
 }
 EXPORT_SYMBOL_GPL(nfs_pgheader_init);
@@ -607,6 +604,9 @@  int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
 	};
 	int ret = 0;
 
+	if (hdr->completion_ops->init_hdr)
+		hdr->completion_ops->init_hdr(hdr);
+
 	hdr->rw_ops->rw_initiate(hdr, &msg, rpc_ops, &task_setup_data, how);
 
 	dprintk("NFS: initiated pgio call "