diff mbox

[v2] nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds

Message ID 20171215211232.18942-1-smayhew@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Scott Mayhew Dec. 15, 2017, 9:12 p.m. UTC
Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init.  It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait.  Once
that happens, other processes accessing the inode will become stuck as
well.

Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
up correctly by calling hdr->completion_ops->completion() instead of
calling hdr->release() directly.

This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.

This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
<dwysocha@redhat.com>):

stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'

Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
 fs/nfs/pnfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Scott Mayhew Jan. 11, 2018, 1:10 p.m. UTC | #1
Trond, Anna,

Any concerns with this patch?

We left this run on multiple machines through the holidays... all doing
direct I/O to a Netapp filer while doing storage failovers on an hourly
basis on the filer.  No more issues with processes getting stuck in
inode_dio_wait.

-Scott

On Fri, 15 Dec 2017, Scott Mayhew wrote:

> Currently when falling back to doing I/O through the MDS (via
> pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
> without releasing the reference taken on the dreq
> via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
> nfs_direct_pgio_init.  It then takes another reference on the dreq via
> nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
> as a result the requester will become stuck in inode_dio_wait.  Once
> that happens, other processes accessing the inode will become stuck as
> well.
> 
> Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
> up correctly by calling hdr->completion_ops->completion() instead of
> calling hdr->release() directly.
> 
> This can be reproduced (sometimes) by performing "storage failover
> takeover" commands on NetApp filer while doing direct I/O from a client.
> 
> This can also be reproduced using SystemTap to simulate a failure while
> doing direct I/O from a client (from Dave Wysochanski
> <dwysocha@redhat.com>):
> 
> stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'
> 
> Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
>  fs/nfs/pnfs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index d602fe9..eb098cc 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -2255,7 +2255,7 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
>  		nfs_pageio_reset_write_mds(desc);
>  		mirror->pg_recoalesce = 1;
>  	}
> -	hdr->release(hdr);
> +	hdr->completion_ops->completion(hdr);
>  }
>  
>  static enum pnfs_try_status
> @@ -2378,7 +2378,7 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
>  		nfs_pageio_reset_read_mds(desc);
>  		mirror->pg_recoalesce = 1;
>  	}
> -	hdr->release(hdr);
> +	hdr->completion_ops->completion(hdr);
>  }
>  
>  /*
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Jan. 11, 2018, 1:20 p.m. UTC | #2
T24gVGh1LCAyMDE4LTAxLTExIGF0IDA4OjEwIC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+
IFRyb25kLCBBbm5hLA0KPiANCj4gQW55IGNvbmNlcm5zIHdpdGggdGhpcyBwYXRjaD8NCj4gDQo+
IFdlIGxlZnQgdGhpcyBydW4gb24gbXVsdGlwbGUgbWFjaGluZXMgdGhyb3VnaCB0aGUgaG9saWRh
eXMuLi4gYWxsDQo+IGRvaW5nDQo+IGRpcmVjdCBJL08gdG8gYSBOZXRhcHAgZmlsZXIgd2hpbGUg
ZG9pbmcgc3RvcmFnZSBmYWlsb3ZlcnMgb24gYW4NCj4gaG91cmx5DQo+IGJhc2lzIG9uIHRoZSBm
aWxlci4gIE5vIG1vcmUgaXNzdWVzIHdpdGggcHJvY2Vzc2VzIGdldHRpbmcgc3R1Y2sgaW4NCj4g
aW5vZGVfZGlvX3dhaXQuDQoNCkknbSBmaW5lIHdpdGggdGhpcyB2ZXJzaW9uLiBTaG91bGQgaXQg
YmUgYSBzdGFibGUgZml4PyBJZiBzbywgZG8gd2UNCmhhdmUgYSAiRml4ZXM6IiBjb21taXQgZm9y
IGl0Pw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy
LCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Mayhew Jan. 12, 2018, 5:39 p.m. UTC | #3
On Thu, 11 Jan 2018, Trond Myklebust wrote:

> On Thu, 2018-01-11 at 08:10 -0500, Scott Mayhew wrote:
> > Trond, Anna,
> > 
> > Any concerns with this patch?
> > 
> > We left this run on multiple machines through the holidays... all
> > doing
> > direct I/O to a Netapp filer while doing storage failovers on an
> > hourly
> > basis on the filer.  No more issues with processes getting stuck in
> > inode_dio_wait.
> 
> I'm fine with this version. Should it be a stable fix? If so, do we
> have a "Fixes:" commit for it?

Yeah it should probably be a stable fix.  The hdr->release() call was
added in commit 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs
fails")... but I can reproduce this on earlier kernels than that (I
tested 4.0) so it looks like this goes way back...

-Scott
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index d602fe9..eb098cc 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2255,7 +2255,7 @@  pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
 		nfs_pageio_reset_write_mds(desc);
 		mirror->pg_recoalesce = 1;
 	}
-	hdr->release(hdr);
+	hdr->completion_ops->completion(hdr);
 }
 
 static enum pnfs_try_status
@@ -2378,7 +2378,7 @@  pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
 		nfs_pageio_reset_read_mds(desc);
 		mirror->pg_recoalesce = 1;
 	}
-	hdr->release(hdr);
+	hdr->completion_ops->completion(hdr);
 }
 
 /*