Message ID | 1385402270-14284-2-git-send-email-andros@netapp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: > From: Andy Adamson <andros@netapp.com> > > The state manager is recovering expired state and recovery OPENs are being > processed. If kswapd is pruning inodes at the same time, a deadlock can occur > when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the > resultant layoutreturn gets an error that the state mangager is to handle, > causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. > > At the same time an open is waiting for the inode deletion to complete in > __wait_on_freeing_inode. > > If the open is either the open called by the state manager, or an open from > the same open owner that is holding the NFSv4.0 sequence id which causes the > OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, > then the state is deadlocked with kswapd. > > Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 1:13 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: > >> From: Andy Adamson <andros@netapp.com> >> >> The state manager is recovering expired state and recovery OPENs are being >> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >> resultant layoutreturn gets an error that the state mangager is to handle, >> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >> >> At the same time an open is waiting for the inode deletion to complete in >> __wait_on_freeing_inode. >> >> If the open is either the open called by the state manager, or an open from >> the same open owner that is holding the NFSv4.0 sequence id which causes the >> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >> then the state is deadlocked with kswapd. >> >> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. > > Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. > > IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? Yeah, I was thinking about this as well - perhaps recovering from session-level errors or grace/delay errors would be useful for the block client. -->Andy > > Cheers > Trond > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 13:17, Adamson, Andy <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 1:13 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >> >>> From: Andy Adamson <andros@netapp.com> >>> >>> The state manager is recovering expired state and recovery OPENs are being >>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>> resultant layoutreturn gets an error that the state mangager is to handle, >>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>> >>> At the same time an open is waiting for the inode deletion to complete in >>> __wait_on_freeing_inode. >>> >>> If the open is either the open called by the state manager, or an open from >>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>> then the state is deadlocked with kswapd. >>> >>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >> >> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >> >> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? > > Yeah, I was thinking about this as well - perhaps recovering from session-level errors or grace/delay errors would be useful for the block client. NFS4ERR_DELAY, probably, yes. NFS4ERR_GRACE, no… That’s a reboot situation As for session level errors, I’d say that complicates things too much, since several of those can basically end up masking a NFS4ERR_STALE_CLIENTID error. Either way, all the layout types (including blocks) should be able to continue on even if we miss a layout return or two. The server has to be coded to expect a forgetful client. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 1:28 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 13:17, Adamson, Andy <William.Adamson@netapp.com> wrote: > >> >> On Nov 25, 2013, at 1:13 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >> wrote: >> >>> >>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>> >>>> From: Andy Adamson <andros@netapp.com> >>>> >>>> The state manager is recovering expired state and recovery OPENs are being >>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>> >>>> At the same time an open is waiting for the inode deletion to complete in >>>> __wait_on_freeing_inode. >>>> >>>> If the open is either the open called by the state manager, or an open from >>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>> then the state is deadlocked with kswapd. >>>> >>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>> >>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>> >>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >> >> Yeah, I was thinking about this as well - perhaps recovering from session-level errors or grace/delay errors would be useful for the block client. > > NFS4ERR_DELAY, probably, yes. > > NFS4ERR_GRACE, no… That’s a reboot situation > > As for session level errors, I’d say that complicates things too much, since several of those can basically end up masking a NFS4ERR_STALE_CLIENTID error. > > > Either way, all the layout types (including blocks) should be able to continue on even if we miss a layout return or two. The server has to be coded to expect a forgetful client. OK - I'll resend the patch. -->Andy > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: > >> From: Andy Adamson <andros@netapp.com> >> >> The state manager is recovering expired state and recovery OPENs are being >> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >> resultant layoutreturn gets an error that the state mangager is to handle, >> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >> >> At the same time an open is waiting for the inode deletion to complete in >> __wait_on_freeing_inode. >> >> If the open is either the open called by the state manager, or an open from >> the same open owner that is holding the NFSv4.0 sequence id which causes the >> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >> then the state is deadlocked with kswapd. >> >> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. > > Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. > > IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > >> >> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >> >>> From: Andy Adamson <andros@netapp.com> >>> >>> The state manager is recovering expired state and recovery OPENs are being >>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>> resultant layoutreturn gets an error that the state mangager is to handle, >>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>> >>> At the same time an open is waiting for the inode deletion to complete in >>> __wait_on_freeing_inode. >>> >>> If the open is either the open called by the state manager, or an open from >>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>> then the state is deadlocked with kswapd. >>> >>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >> >> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >> >> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? > > BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. -->Andy > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> >>> >>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>> >>>> From: Andy Adamson <andros@netapp.com> >>>> >>>> The state manager is recovering expired state and recovery OPENs are being >>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>> >>>> At the same time an open is waiting for the inode deletion to complete in >>>> __wait_on_freeing_inode. >>>> >>>> If the open is either the open called by the state manager, or an open from >>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>> then the state is deadlocked with kswapd. >>>> >>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>> >>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>> >>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >> >> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… > > Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. > > With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. Cheers Trond-- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: > >> >> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >> wrote: >> >>> >>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>> >>>> >>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>> >>>>> From: Andy Adamson <andros@netapp.com> >>>>> >>>>> The state manager is recovering expired state and recovery OPENs are being >>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>> >>>>> At the same time an open is waiting for the inode deletion to complete in >>>>> __wait_on_freeing_inode. >>>>> >>>>> If the open is either the open called by the state manager, or an open from >>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>> then the state is deadlocked with kswapd. >>>>> >>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>> >>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>> >>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>> >>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >> >> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >> >> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. > > Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. Except we just changed that to fix a different state manager hang: commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 Author: Andy Adamson <andros@netapp.com> Date: Fri Nov 15 16:36:16 2013 -0500 NFSv4 wait on recovery for async session errors > > Cheers > Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: >> >>> >>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>> wrote: >>> >>>> >>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>>> >>>>> >>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>>> >>>>>> From: Andy Adamson <andros@netapp.com> >>>>>> >>>>>> The state manager is recovering expired state and recovery OPENs are being >>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>>> >>>>>> At the same time an open is waiting for the inode deletion to complete in >>>>>> __wait_on_freeing_inode. >>>>>> >>>>>> If the open is either the open called by the state manager, or an open from >>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>>> then the state is deadlocked with kswapd. >>>>>> >>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>>> >>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>>> >>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>>> >>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >>> >>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >>> >>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. >> >> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. > > Except we just changed that to fix a different state manager hang: > > commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 > Author: Andy Adamson <andros@netapp.com> > Date: Fri Nov 15 16:36:16 2013 -0500 > > NFSv4 wait on recovery for async session errors Right, but that won’t prevent nfs4_evict_inode from completing, and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote: > >> >> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >> wrote: >> >>> >>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: >>> >>>> >>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>> wrote: >>>> >>>>> >>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>>>> >>>>>> >>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>>>> >>>>>>> From: Andy Adamson <andros@netapp.com> >>>>>>> >>>>>>> The state manager is recovering expired state and recovery OPENs are being >>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>>>> >>>>>>> At the same time an open is waiting for the inode deletion to complete in >>>>>>> __wait_on_freeing_inode. >>>>>>> >>>>>>> If the open is either the open called by the state manager, or an open from >>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>>>> then the state is deadlocked with kswapd. >>>>>>> >>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>>>> >>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>>>> >>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>>>> >>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >>>> >>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >>>> >>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. >>> >>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. >> >> Except we just changed that to fix a different state manager hang: >> >> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 >> Author: Andy Adamson <andros@netapp.com> >> Date: Fri Nov 15 16:36:16 2013 -0500 >> >> NFSv4 wait on recovery for async session errors > > Right, but that won’t prevent nfs4_evict_inode from completing, Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem -->Andy > and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread. > > Cheers > Trond > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 3:29 PM, "Adamson, Andy" <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote: >> >>> >>> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>> wrote: >>> >>>> >>>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: >>>> >>>>> >>>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>>>>> >>>>>>> >>>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>>>>> >>>>>>>> From: Andy Adamson <andros@netapp.com> >>>>>>>> >>>>>>>> The state manager is recovering expired state and recovery OPENs are being >>>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>>>>> >>>>>>>> At the same time an open is waiting for the inode deletion to complete in >>>>>>>> __wait_on_freeing_inode. >>>>>>>> >>>>>>>> If the open is either the open called by the state manager, or an open from >>>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>>>>> then the state is deadlocked with kswapd. >>>>>>>> >>>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>>>>> >>>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>>>>> >>>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>>>>> >>>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >>>>> >>>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >>>>> >>>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. >>>> >>>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. >>> >>> Except we just changed that to fix a different state manager hang: >>> >>> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 >>> Author: Andy Adamson <andros@netapp.com> >>> Date: Fri Nov 15 16:36:16 2013 -0500 >>> >>> NFSv4 wait on recovery for async session errors >> >> Right, but that won’t prevent nfs4_evict_inode from completing, > > Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem In fact, this issue is NOT an upstream issue! RHEL6.5-pre has nfs4_proc_layoutreturn as as SYNC rpc call, and _that_ is the bug that is fixed upstream. Really sorry for the confusion. I'll back port a solution for RHEL6.5 -->Andy > > -->Andy > >> and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread. > >> >> Cheers >> Trond >> -- >> Trond Myklebust >> Linux NFS client maintainer >> >> NetApp >> Trond.Myklebust@netapp.com >> www.netapp.com >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 3:51 PM, "Adamson, Andy" <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 3:29 PM, "Adamson, Andy" <William.Adamson@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >> wrote: >> >>> >>> On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote: >>> >>>> >>>> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>> wrote: >>>> >>>>> >>>>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: >>>>> >>>>>> >>>>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>>>>>> >>>>>>>>> From: Andy Adamson <andros@netapp.com> >>>>>>>>> >>>>>>>>> The state manager is recovering expired state and recovery OPENs are being >>>>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>>>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>>>>>> >>>>>>>>> At the same time an open is waiting for the inode deletion to complete in >>>>>>>>> __wait_on_freeing_inode. >>>>>>>>> >>>>>>>>> If the open is either the open called by the state manager, or an open from >>>>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>>>>>> then the state is deadlocked with kswapd. >>>>>>>>> >>>>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>>>>>> >>>>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>>>>>> >>>>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>>>>>> >>>>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >>>>>> >>>>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >>>>>> >>>>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. >>>>> >>>>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. >>>> >>>> Except we just changed that to fix a different state manager hang: >>>> >>>> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 >>>> Author: Andy Adamson <andros@netapp.com> >>>> Date: Fri Nov 15 16:36:16 2013 -0500 >>>> >>>> NFSv4 wait on recovery for async session errors >>> >>> Right, but that won’t prevent nfs4_evict_inode from completing, >> >> Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem > > In fact, this issue is NOT an upstream issue! RHEL6.5-pre has nfs4_proc_layoutreturn as as SYNC rpc call, and _that_ is the bug that is fixed upstream. > > Really sorry for the confusion. I'll back port a solution for RHEL6.5 please ignore this message. > > -->Andy > > >> >> -->Andy >> >>> and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread. >> >>> >>> Cheers >>> Trond >>> -- >>> Trond Myklebust >>> Linux NFS client maintainer >>> >>> NetApp >>> Trond.Myklebust@netapp.com >>> www.netapp.com >>> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Nov 25, 2013, at 15:51, Adamson, Andy <William.Adamson@netapp.com> wrote: > > On Nov 25, 2013, at 3:29 PM, "Adamson, Andy" <William.Adamson@netapp.com> > wrote: > >> >> On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >> wrote: >> >>> >>> On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote: >>> >>>> >>>> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>> wrote: >>>> >>>>> >>>>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote: >>>>> >>>>>> >>>>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote: >>>>>>>> >>>>>>>>> From: Andy Adamson <andros@netapp.com> >>>>>>>>> >>>>>>>>> The state manager is recovering expired state and recovery OPENs are being >>>>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur >>>>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the >>>>>>>>> resultant layoutreturn gets an error that the state mangager is to handle, >>>>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. >>>>>>>>> >>>>>>>>> At the same time an open is waiting for the inode deletion to complete in >>>>>>>>> __wait_on_freeing_inode. >>>>>>>>> >>>>>>>>> If the open is either the open called by the state manager, or an open from >>>>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the >>>>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, >>>>>>>>> then the state is deadlocked with kswapd. >>>>>>>>> >>>>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. >>>>>>>> >>>>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. >>>>>>>> >>>>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? >>>>>>> >>>>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… >>>>>> >>>>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. >>>>>> >>>>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. >>>>> >>>>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. >>>> >>>> Except we just changed that to fix a different state manager hang: >>>> >>>> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 >>>> Author: Andy Adamson <andros@netapp.com> >>>> Date: Fri Nov 15 16:36:16 2013 -0500 >>>> >>>> NFSv4 wait on recovery for async session errors >>> >>> Right, but that won’t prevent nfs4_evict_inode from completing, >> >> Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem > > In fact, this issue is NOT an upstream issue! RHEL6.5-pre has nfs4_proc_layoutreturn as as SYNC rpc call, and _that_ is the bug that is fixed upstream. > > Really sorry for the confusion. I'll back port a solution for RHEL6.5 Are you sure? As far as I can tell, the upstream nfs4_proc_layoutreturn is also synchronous. I therefore suspect that we still have the same problem. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ca36d0d..fbeadf3 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7596,6 +7596,12 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata) return; server = NFS_SERVER(lrp->args.inode); + + /* Error handling can dead lock the state manager running open + * recovery when kswapd is also pruning inodes. */ + if (lrp->ino_freeing) + return; + if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) { rpc_restart_call_prepare(task); return; diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index d75d938..a55ebf2 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -819,7 +819,7 @@ _pnfs_return_layout(struct inode *ino) LIST_HEAD(tmp_list); struct nfs4_layoutreturn *lrp; nfs4_stateid stateid; - int status = 0, empty; + int status = 0, empty, freeing = 0; dprintk("NFS: %s for inode %lu\n", __func__, ino->i_ino); @@ -844,6 +844,8 @@ _pnfs_return_layout(struct inode *ino) goto out; } lo->plh_block_lgets++; + if (ino->i_state & I_FREEING) + freeing = 1; spin_unlock(&ino->i_lock); pnfs_free_lseg_list(&tmp_list); @@ -863,6 +865,7 @@ _pnfs_return_layout(struct inode *ino) lrp->args.layout = lo; lrp->clp = NFS_SERVER(ino)->nfs_client; lrp->cred = lo->plh_lc_cred; + lrp->ino_freeing = freeing; status = nfs4_proc_layoutreturn(lrp); out: diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 3ccfcec..b815345 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -316,6 +316,7 @@ struct nfs4_layoutreturn { struct nfs4_layoutreturn_res res; struct rpc_cred *cred; struct nfs_client *clp; + int ino_freeing; int rpc_status; };