diff mbox

[4/4] NFSv4.1: pNFS has no business touching the slot table wait queue...

Message ID 1349718630-26008-4-git-send-email-Trond.Myklebust@netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Oct. 8, 2012, 5:50 p.m. UTC
Aside from it being a layering violation, there is no point in doing so:
- If slots are available, then the waitqueue will be empty.
- If no slots are available, then the wake up just causes waitqueue churn.
- If the slot table is being drained, then the state recovery machinery
  will manage the waitqueue using the calls to nfs4_begin/end_drain_session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/nfs4filelayout.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

William A. (Andy) Adamson Oct. 8, 2012, 7:59 p.m. UTC | #1
On Mon, Oct 8, 2012 at 1:50 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> Aside from it being a layering violation, there is no point in doing so:
> - If slots are available, then the waitqueue will be empty.
> - If no slots are available, then the wake up just causes waitqueue churn

We call rpc_wake_up so that I/O waiting on the DS session, all of
which needs to be re-directed to the MDS (not via the state recovery
machinery), will by-pass the RPC machine "try each task one at a time
and redirect on failure" which includes the 60 second TCP timeout on
DS connection failures.

So, it doesn't just cause waitqueue churn, it significantly reduces
the time it takes to recover to the MDS.

-->Andy



> - If the slot table is being drained, then the state recovery machinery
>   will manage the waitqueue using the calls to nfs4_begin/end_drain_session.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>  fs/nfs/nfs4filelayout.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
> index 52d8472..368f11f 100644
> --- a/fs/nfs/nfs4filelayout.c
> +++ b/fs/nfs/nfs4filelayout.c
> @@ -131,7 +131,6 @@ static int filelayout_async_handle_error(struct rpc_task *task,
>         struct nfs_server *mds_server = NFS_SERVER(inode);
>         struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
>         struct nfs_client *mds_client = mds_server->nfs_client;
> -       struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
>
>         if (task->tk_status >= 0)
>                 return 0;
> @@ -191,7 +190,6 @@ static int filelayout_async_handle_error(struct rpc_task *task,
>                  * layout is destroyed and a new valid layout is obtained.
>                  */
>                 pnfs_destroy_layout(NFS_I(inode));
> -               rpc_wake_up(&tbl->slot_tbl_waitq);
>                 goto reset;
>         /* RPC connection errors */
>         case -ECONNREFUSED:
> @@ -206,7 +204,6 @@ static int filelayout_async_handle_error(struct rpc_task *task,
>                 nfs4_mark_deviceid_unavailable(devid);
>                 clear_bit(NFS_INO_LAYOUTCOMMIT, &NFS_I(inode)->flags);
>                 _pnfs_return_layout(inode);
> -               rpc_wake_up(&tbl->slot_tbl_waitq);
>                 nfs4_ds_disconnect(clp);
>                 /* fall through */
>         default:
> --
> 1.7.11.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 8, 2012, 8:04 p.m. UTC | #2
T24gTW9uLCAyMDEyLTEwLTA4IGF0IDE1OjU5IC0wNDAwLCBBbmR5IEFkYW1zb24gd3JvdGU6DQo+
IE9uIE1vbiwgT2N0IDgsIDIwMTIgYXQgMTo1MCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+IDxUcm9u
ZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4gQXNpZGUgZnJvbSBpdCBiZWluZyBh
IGxheWVyaW5nIHZpb2xhdGlvbiwgdGhlcmUgaXMgbm8gcG9pbnQgaW4gZG9pbmcgc286DQo+ID4g
LSBJZiBzbG90cyBhcmUgYXZhaWxhYmxlLCB0aGVuIHRoZSB3YWl0cXVldWUgd2lsbCBiZSBlbXB0
eS4NCj4gPiAtIElmIG5vIHNsb3RzIGFyZSBhdmFpbGFibGUsIHRoZW4gdGhlIHdha2UgdXAganVz
dCBjYXVzZXMgd2FpdHF1ZXVlIGNodXJuDQo+IA0KPiBXZSBjYWxsIHJwY193YWtlX3VwIHNvIHRo
YXQgSS9PIHdhaXRpbmcgb24gdGhlIERTIHNlc3Npb24sIGFsbCBvZg0KPiB3aGljaCBuZWVkcyB0
byBiZSByZS1kaXJlY3RlZCB0byB0aGUgTURTIChub3QgdmlhIHRoZSBzdGF0ZSByZWNvdmVyeQ0K
PiBtYWNoaW5lcnkpLCB3aWxsIGJ5LXBhc3MgdGhlIFJQQyBtYWNoaW5lICJ0cnkgZWFjaCB0YXNr
IG9uZSBhdCBhIHRpbWUNCj4gYW5kIHJlZGlyZWN0IG9uIGZhaWx1cmUiIHdoaWNoIGluY2x1ZGVz
IHRoZSA2MCBzZWNvbmQgVENQIHRpbWVvdXQgb24NCj4gRFMgY29ubmVjdGlvbiBmYWlsdXJlcy4N
Cj4gDQo+IFNvLCBpdCBkb2Vzbid0IGp1c3QgY2F1c2Ugd2FpdHF1ZXVlIGNodXJuLCBpdCBzaWdu
aWZpY2FudGx5IHJlZHVjZXMNCj4gdGhlIHRpbWUgaXQgdGFrZXMgdG8gcmVjb3ZlciB0byB0aGUg
TURTLg0KPiANCg0KQWxsIGl0IGRvZXMgaXMgYSB3YWtlIHVwLi4uIFRoYXQgd2lsbCBoYXBwZW4g
X0FOWVdBWV8gYXMgc29vbiBhcyBhIHNsb3QNCmlzIG1hZGUgYXZhaWxhYmxlLg0KDQpTbyB5ZXMs
IGl0IGlzIHB1cmUgY2h1cm4uLi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBj
bGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3
d3cubmV0YXBwLmNvbQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William A. (Andy) Adamson Oct. 8, 2012, 8:24 p.m. UTC | #3
On Mon, Oct 8, 2012 at 4:04 PM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
> On Mon, 2012-10-08 at 15:59 -0400, Andy Adamson wrote:
>> On Mon, Oct 8, 2012 at 1:50 PM, Trond Myklebust
>> <Trond.Myklebust@netapp.com> wrote:
>> > Aside from it being a layering violation, there is no point in doing so:
>> > - If slots are available, then the waitqueue will be empty.
>> > - If no slots are available, then the wake up just causes waitqueue churn
>>
>> We call rpc_wake_up so that I/O waiting on the DS session, all of
>> which needs to be re-directed to the MDS (not via the state recovery
>> machinery), will by-pass the RPC machine "try each task one at a time
>> and redirect on failure" which includes the 60 second TCP timeout on
>> DS connection failures.
>>
>> So, it doesn't just cause waitqueue churn, it significantly reduces
>> the time it takes to recover to the MDS.
>>
>
> All it does is a wake up... That will happen _ANYWAY_ as soon as a slot
> is made available.

That is true. But why wait? Especially when all tasks are I/O, and if
the application happens to be doing a heavy I/O workload, there can be
hundreds of tasks on that wait queue.

We don't care if any DS slots are available. We have given up and re
redirecting these tasks to the MDS session. The sooner, the better.

I'll re-run some tests and demonstrate the time difference
with/without the rpc_wake_up.

-->Andy

>
> So yes, it is pure churn...
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust Oct. 8, 2012, 9:25 p.m. UTC | #4
On Mon, 2012-10-08 at 16:24 -0400, Andy Adamson wrote:
> On Mon, Oct 8, 2012 at 4:04 PM, Myklebust, Trond

> <Trond.Myklebust@netapp.com> wrote:

> > On Mon, 2012-10-08 at 15:59 -0400, Andy Adamson wrote:

> >> On Mon, Oct 8, 2012 at 1:50 PM, Trond Myklebust

> >> <Trond.Myklebust@netapp.com> wrote:

> >> > Aside from it being a layering violation, there is no point in doing so:

> >> > - If slots are available, then the waitqueue will be empty.

> >> > - If no slots are available, then the wake up just causes waitqueue churn

> >>

> >> We call rpc_wake_up so that I/O waiting on the DS session, all of

> >> which needs to be re-directed to the MDS (not via the state recovery

> >> machinery), will by-pass the RPC machine "try each task one at a time

> >> and redirect on failure" which includes the 60 second TCP timeout on

> >> DS connection failures.

> >>

> >> So, it doesn't just cause waitqueue churn, it significantly reduces

> >> the time it takes to recover to the MDS.

> >>

> >

> > All it does is a wake up... That will happen _ANYWAY_ as soon as a slot

> > is made available.

> 

> That is true. But why wait? Especially when all tasks are I/O, and if

> the application happens to be doing a heavy I/O workload, there can be

> hundreds of tasks on that wait queue.

> 

> We don't care if any DS slots are available. We have given up and re

> redirecting these tasks to the MDS session. The sooner, the better.


You aren't just redirecting those tasks, you are waking up _all_ tasks
on that nfs_client's session.

That _may_ be acceptable in the case where we call
nfs4_mark_deviceid_unavailable(), _if_ the nfs_client is a pure data
server and isn't also acting as an MDS for some filesystem.

It is clearly overkill in the case where the DS is returning 
NFS4ERR_FHEXPIRED or some such error that applies only to this one
layout.

> I'll re-run some tests and demonstrate the time difference

> with/without the rpc_wake_up.



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
diff mbox

Patch

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 52d8472..368f11f 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -131,7 +131,6 @@  static int filelayout_async_handle_error(struct rpc_task *task,
 	struct nfs_server *mds_server = NFS_SERVER(inode);
 	struct nfs4_deviceid_node *devid = FILELAYOUT_DEVID_NODE(lseg);
 	struct nfs_client *mds_client = mds_server->nfs_client;
-	struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
 
 	if (task->tk_status >= 0)
 		return 0;
@@ -191,7 +190,6 @@  static int filelayout_async_handle_error(struct rpc_task *task,
 		 * layout is destroyed and a new valid layout is obtained.
 		 */
 		pnfs_destroy_layout(NFS_I(inode));
-		rpc_wake_up(&tbl->slot_tbl_waitq);
 		goto reset;
 	/* RPC connection errors */
 	case -ECONNREFUSED:
@@ -206,7 +204,6 @@  static int filelayout_async_handle_error(struct rpc_task *task,
 		nfs4_mark_deviceid_unavailable(devid);
 		clear_bit(NFS_INO_LAYOUTCOMMIT, &NFS_I(inode)->flags);
 		_pnfs_return_layout(inode);
-		rpc_wake_up(&tbl->slot_tbl_waitq);
 		nfs4_ds_disconnect(clp);
 		/* fall through */
 	default: