[RFC] fix parallelism for rpc tasks

Message ID	CAN-5tyEX1Z_3EB3h6=z_S1E=gpTObOrcP5Ub2HmVKBB5RaU1DQ@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> MIME-Version: 1.0 From: Olga Kornievskaia <aglo@umich.edu> Date: Thu, 29 Jun 2017 09:25:36 -0400 Message-ID: <CAN-5tyEX1Z_3EB3h6=z_S1E=gpTObOrcP5Ub2HmVKBB5RaU1DQ@mail.gmail.com> Subject: [RFC] fix parallelism for rpc tasks To: linux-nfs <linux-nfs@vger.kernel.org> Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Olga Kornievskaia June 29, 2017, 1:25 p.m. UTC

Hi folks,

On a multi-core machine, is it expected that we can have parallel RPCs
handled by each of the per-core workqueue?

In testing a read workload, observing via "top" command that a single
"kworker" thread is running servicing the requests (no parallelism).
It's more prominent while doing these operations over krb5p mount.

What has been suggested by Bruce is to try this and in my testing I
see then the read workload spread among all the kworker threads.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust July 3, 2017, 2:58 p.m. UTC | #1

T24gVGh1LCAyMDE3LTA2LTI5IGF0IDA5OjI1IC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gSGkgZm9sa3MsDQo+IA0KPiBPbiBhIG11bHRpLWNvcmUgbWFjaGluZSwgaXMgaXQgZXhw
ZWN0ZWQgdGhhdCB3ZSBjYW4gaGF2ZSBwYXJhbGxlbA0KPiBSUENzDQo+IGhhbmRsZWQgYnkgZWFj
aCBvZiB0aGUgcGVyLWNvcmUgd29ya3F1ZXVlPw0KPiANCj4gSW4gdGVzdGluZyBhIHJlYWQgd29y
a2xvYWQsIG9ic2VydmluZyB2aWEgInRvcCIgY29tbWFuZCB0aGF0IGEgc2luZ2xlDQo+ICJrd29y
a2VyIiB0aHJlYWQgaXMgcnVubmluZyBzZXJ2aWNpbmcgdGhlIHJlcXVlc3RzIChubyBwYXJhbGxl
bGlzbSkuDQo+IEl0J3MgbW9yZSBwcm9taW5lbnQgd2hpbGUgZG9pbmcgdGhlc2Ugb3BlcmF0aW9u
cyBvdmVyIGtyYjVwIG1vdW50Lg0KPiANCj4gV2hhdCBoYXMgYmVlbiBzdWdnZXN0ZWQgYnkgQnJ1
Y2UgaXMgdG8gdHJ5IHRoaXMgYW5kIGluIG15IHRlc3RpbmcgSQ0KPiBzZWUgdGhlbiB0aGUgcmVh
ZCB3b3JrbG9hZCBzcHJlYWQgYW1vbmcgYWxsIHRoZSBrd29ya2VyIHRocmVhZHMuDQo+IA0KPiBT
aWduZWQtb2ZmLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8a29sZ2FAbmV0YXBwLmNvbT4NCj4gDQo+
IGRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3NjaGVkLmMgYi9uZXQvc3VucnBjL3NjaGVkLmMNCj4g
aW5kZXggMGNjODM4My4uZjgwZTY4OCAxMDA2NDQNCj4gLS0tIGEvbmV0L3N1bnJwYy9zY2hlZC5j
DQo+ICsrKyBiL25ldC9zdW5ycGMvc2NoZWQuYw0KPiBAQCAtMTA5NSw3ICsxMDk1LDcgQEAgc3Rh
dGljIGludCBycGNpb2Rfc3RhcnQodm9pZCkNCj4gwqAgKiBDcmVhdGUgdGhlIHJwY2lvZCB0aHJl
YWQgYW5kIHdhaXQgZm9yIGl0IHRvIHN0YXJ0Lg0KPiDCoCAqLw0KPiDCoCBkcHJpbnRrKCJSUEM6
wqDCoMKgwqDCoMKgwqBjcmVhdGluZyB3b3JrcXVldWUgcnBjaW9kXG4iKTsNCj4gLSB3cSA9IGFs
bG9jX3dvcmtxdWV1ZSgicnBjaW9kIiwgV1FfTUVNX1JFQ0xBSU0sIDApOw0KPiArIHdxID0gYWxs
b2Nfd29ya3F1ZXVlKCJycGNpb2QiLCBXUV9NRU1fUkVDTEFJTSB8IFdRX1VOQk9VTkQsIDApOw0K
PiDCoCBpZiAoIXdxKQ0KPiDCoCBnb3RvIG91dF9mYWlsZWQ7DQo+IMKgIHJwY2lvZF93b3JrcXVl
dWUgPSB3cTsNCj4gDQoNCldRX1VOQk9VTkQgdHVybnMgb2ZmIGNvbmN1cnJlbmN5IG1hbmFnZW1l
bnQgb24gdGhlIHRocmVhZCBwb29sIChTZWUNCkRvY3VtZW50YXRpb24vY29yZS1hcGkvd29ya3F1
ZXVlLnJzdC4gSXQgYWxzbyBtZWFucyB3ZSBjb250ZW5kIGZvciB3b3JrDQppdGVtIHF1ZXVpbmcv
ZGVxdWV1aW5nIGxvY2tzLCBzaW5jZSB0aGUgdGhyZWFkcyB3aGljaCBydW4gdGhlIHdvcmsNCml0
ZW1zIGFyZSBub3QgYm91bmQgdG8gYSBDUFUuDQoNCklPVzogVGhpcyBpcyBub3QgYSBzbGFtLWR1
bmsgb2J2aW91cyBnYWluLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVu
dCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNv
bQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Olga Kornievskaia July 5, 2017, 2:44 p.m. UTC | #2

On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>> Hi folks,
>>
>> On a multi-core machine, is it expected that we can have parallel
>> RPCs
>> handled by each of the per-core workqueue?
>>
>> In testing a read workload, observing via "top" command that a single
>> "kworker" thread is running servicing the requests (no parallelism).
>> It's more prominent while doing these operations over krb5p mount.
>>
>> What has been suggested by Bruce is to try this and in my testing I
>> see then the read workload spread among all the kworker threads.
>>
>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 0cc8383..f80e688 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>>   * Create the rpciod thread and wait for it to start.
>>   */
>>   dprintk("RPC:       creating workqueue rpciod\n");
>> - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>> + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>>   if (!wq)
>>   goto out_failed;
>>   rpciod_workqueue = wq;
>>
>
> WQ_UNBOUND turns off concurrency management on the thread pool (See
> Documentation/core-api/workqueue.rst. It also means we contend for work
> item queuing/dequeuing locks, since the threads which run the work
> items are not bound to a CPU.
>
> IOW: This is not a slam-dunk obvious gain.

I agree but I think it's worth consideration. I'm waiting to get
(real) performance numbers of improvement (instead of my VM setup) to
help my case. However, it was reported 90% degradation for the read
performance over krb5p when 1CPU is executing all ops.

Is there a different way to make sure that on a multi-processor
machine we can take advantage of all available CPUs? Simple kernel
threads instead of a work queue?

Can/should we have an WQ_UNBOUND work queue for secure mounts and
another queue for other mounts?

While I wouldn't call krb5 load long running, Documentation says that
an example for WQ_UNBOUND is for CPU intensive workloads. And also in
general "work items are not expected to hog a CPU and consume many
cycles". How "many" is too "many". How many operations are crypto
operations?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chuck Lever III July 5, 2017, 3:11 p.m. UTC | #3

> On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
> <trondmy@primarydata.com> wrote:
>> On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>>> Hi folks,
>>> 
>>> On a multi-core machine, is it expected that we can have parallel
>>> RPCs
>>> handled by each of the per-core workqueue?
>>> 
>>> In testing a read workload, observing via "top" command that a single
>>> "kworker" thread is running servicing the requests (no parallelism).
>>> It's more prominent while doing these operations over krb5p mount.
>>> 
>>> What has been suggested by Bruce is to try this and in my testing I
>>> see then the read workload spread among all the kworker threads.
>>> 
>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>> 
>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>> index 0cc8383..f80e688 100644
>>> --- a/net/sunrpc/sched.c
>>> +++ b/net/sunrpc/sched.c
>>> @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>>>  * Create the rpciod thread and wait for it to start.
>>>  */
>>>  dprintk("RPC:       creating workqueue rpciod\n");
>>> - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>>> + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>>>  if (!wq)
>>>  goto out_failed;
>>>  rpciod_workqueue = wq;
>>> 
>> 
>> WQ_UNBOUND turns off concurrency management on the thread pool (See
>> Documentation/core-api/workqueue.rst. It also means we contend for work
>> item queuing/dequeuing locks, since the threads which run the work
>> items are not bound to a CPU.
>> 
>> IOW: This is not a slam-dunk obvious gain.
> 
> I agree but I think it's worth consideration. I'm waiting to get
> (real) performance numbers of improvement (instead of my VM setup) to
> help my case. However, it was reported 90% degradation for the read
> performance over krb5p when 1CPU is executing all ops.
> 
> Is there a different way to make sure that on a multi-processor
> machine we can take advantage of all available CPUs? Simple kernel
> threads instead of a work queue?

There is a trade-off between spreading the work, and ensuring it
is executed on a CPU close to the I/O and application. IMO UNBOUND
is a good way to do that. UNBOUND will attempt to schedule the
work on the preferred CPU, but allow it to be migrated if that
CPU is busy.

The advantage of this is that when the client workload is CPU
intensive (say, a software build), RPC client work can be scheduled
and run more quickly, which reduces latency.


> Can/should we have an WQ_UNBOUND work queue for secure mounts and
> another queue for other mounts?
> 
> While I wouldn't call krb5 load long running, Documentation says that
> an example for WQ_UNBOUND is for CPU intensive workloads. And also in
> general "work items are not expected to hog a CPU and consume many
> cycles". How "many" is too "many". How many operations are crypto
> operations?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust July 5, 2017, 3:46 p.m. UTC | #4

On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
> > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>

> > wrote:

> > 

> > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust

> > <trondmy@primarydata.com> wrote:

> > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:

> > > > Hi folks,

> > > > 

> > > > On a multi-core machine, is it expected that we can have

> > > > parallel

> > > > RPCs

> > > > handled by each of the per-core workqueue?

> > > > 

> > > > In testing a read workload, observing via "top" command that a

> > > > single

> > > > "kworker" thread is running servicing the requests (no

> > > > parallelism).

> > > > It's more prominent while doing these operations over krb5p

> > > > mount.

> > > > 

> > > > What has been suggested by Bruce is to try this and in my

> > > > testing I

> > > > see then the read workload spread among all the kworker

> > > > threads.

> > > > 

> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

> > > > 

> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c

> > > > index 0cc8383..f80e688 100644

> > > > --- a/net/sunrpc/sched.c

> > > > +++ b/net/sunrpc/sched.c

> > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)

> > > >  * Create the rpciod thread and wait for it to start.

> > > >  */

> > > >  dprintk("RPC:       creating workqueue rpciod\n");

> > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);

> > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND,

> > > > 0);

> > > >  if (!wq)

> > > >  goto out_failed;

> > > >  rpciod_workqueue = wq;

> > > > 

> > > 

> > > WQ_UNBOUND turns off concurrency management on the thread pool

> > > (See

> > > Documentation/core-api/workqueue.rst. It also means we contend

> > > for work

> > > item queuing/dequeuing locks, since the threads which run the

> > > work

> > > items are not bound to a CPU.

> > > 

> > > IOW: This is not a slam-dunk obvious gain.

> > 

> > I agree but I think it's worth consideration. I'm waiting to get

> > (real) performance numbers of improvement (instead of my VM setup)

> > to

> > help my case. However, it was reported 90% degradation for the read

> > performance over krb5p when 1CPU is executing all ops.

> > 

> > Is there a different way to make sure that on a multi-processor

> > machine we can take advantage of all available CPUs? Simple kernel

> > threads instead of a work queue?

> 

> There is a trade-off between spreading the work, and ensuring it

> is executed on a CPU close to the I/O and application. IMO UNBOUND

> is a good way to do that. UNBOUND will attempt to schedule the

> work on the preferred CPU, but allow it to be migrated if that

> CPU is busy.

> 

> The advantage of this is that when the client workload is CPU

> intensive (say, a software build), RPC client work can be scheduled

> and run more quickly, which reduces latency.

> 


That should no longer be a huge issue, since queue_work() will now
default to the WORK_CPU_UNBOUND flag, which prefers the local CPU, but
will schedule elsewhere if the local CPU is congested.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

Olga Kornievskaia July 5, 2017, 4:09 p.m. UTC | #5

On Wed, Jul 5, 2017 at 11:46 AM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
>> > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>
>> > wrote:
>> >
>> > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
>> > <trondmy@primarydata.com> wrote:
>> > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>> > > > Hi folks,
>> > > >
>> > > > On a multi-core machine, is it expected that we can have
>> > > > parallel
>> > > > RPCs
>> > > > handled by each of the per-core workqueue?
>> > > >
>> > > > In testing a read workload, observing via "top" command that a
>> > > > single
>> > > > "kworker" thread is running servicing the requests (no
>> > > > parallelism).
>> > > > It's more prominent while doing these operations over krb5p
>> > > > mount.
>> > > >
>> > > > What has been suggested by Bruce is to try this and in my
>> > > > testing I
>> > > > see then the read workload spread among all the kworker
>> > > > threads.
>> > > >
>> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>> > > >
>> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> > > > index 0cc8383..f80e688 100644
>> > > > --- a/net/sunrpc/sched.c
>> > > > +++ b/net/sunrpc/sched.c
>> > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>> > > >  * Create the rpciod thread and wait for it to start.
>> > > >  */
>> > > >  dprintk("RPC:       creating workqueue rpciod\n");
>> > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>> > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND,
>> > > > 0);
>> > > >  if (!wq)
>> > > >  goto out_failed;
>> > > >  rpciod_workqueue = wq;
>> > > >
>> > >
>> > > WQ_UNBOUND turns off concurrency management on the thread pool
>> > > (See
>> > > Documentation/core-api/workqueue.rst. It also means we contend
>> > > for work
>> > > item queuing/dequeuing locks, since the threads which run the
>> > > work
>> > > items are not bound to a CPU.
>> > >
>> > > IOW: This is not a slam-dunk obvious gain.
>> >
>> > I agree but I think it's worth consideration. I'm waiting to get
>> > (real) performance numbers of improvement (instead of my VM setup)
>> > to
>> > help my case. However, it was reported 90% degradation for the read
>> > performance over krb5p when 1CPU is executing all ops.
>> >
>> > Is there a different way to make sure that on a multi-processor
>> > machine we can take advantage of all available CPUs? Simple kernel
>> > threads instead of a work queue?
>>
>> There is a trade-off between spreading the work, and ensuring it
>> is executed on a CPU close to the I/O and application. IMO UNBOUND
>> is a good way to do that. UNBOUND will attempt to schedule the
>> work on the preferred CPU, but allow it to be migrated if that
>> CPU is busy.
>>
>> The advantage of this is that when the client workload is CPU
>> intensive (say, a software build), RPC client work can be scheduled
>> and run more quickly, which reduces latency.
>>
>
> That should no longer be a huge issue, since queue_work() will now
> default to the WORK_CPU_UNBOUND flag, which prefers the local CPU, but
> will schedule elsewhere if the local CPU is congested.

I don't believe NFS use workqueue_congested() to somehow schedule the
work elsewhere. Unless the queue is marked UNBOUNDED I don't believe
there is any intention of balancing the CPU load.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trond Myklebust July 5, 2017, 4:14 p.m. UTC | #6

On Wed, 2017-07-05 at 12:09 -0400, Olga Kornievskaia wrote:
> On Wed, Jul 5, 2017 at 11:46 AM, Trond Myklebust

> <trondmy@primarydata.com> wrote:

> > On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:

> > > > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>

> > > > wrote:

> > > > 

> > > > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust

> > > > <trondmy@primarydata.com> wrote:

> > > > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:

> > > > > > Hi folks,

> > > > > > 

> > > > > > On a multi-core machine, is it expected that we can have

> > > > > > parallel

> > > > > > RPCs

> > > > > > handled by each of the per-core workqueue?

> > > > > > 

> > > > > > In testing a read workload, observing via "top" command

> > > > > > that a

> > > > > > single

> > > > > > "kworker" thread is running servicing the requests (no

> > > > > > parallelism).

> > > > > > It's more prominent while doing these operations over krb5p

> > > > > > mount.

> > > > > > 

> > > > > > What has been suggested by Bruce is to try this and in my

> > > > > > testing I

> > > > > > see then the read workload spread among all the kworker

> > > > > > threads.

> > > > > > 

> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>

> > > > > > 

> > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c

> > > > > > index 0cc8383..f80e688 100644

> > > > > > --- a/net/sunrpc/sched.c

> > > > > > +++ b/net/sunrpc/sched.c

> > > > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)

> > > > > >  * Create the rpciod thread and wait for it to start.

> > > > > >  */

> > > > > >  dprintk("RPC:       creating workqueue rpciod\n");

> > > > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);

> > > > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM |

> > > > > > WQ_UNBOUND,

> > > > > > 0);

> > > > > >  if (!wq)

> > > > > >  goto out_failed;

> > > > > >  rpciod_workqueue = wq;

> > > > > > 

> > > > > 

> > > > > WQ_UNBOUND turns off concurrency management on the thread

> > > > > pool

> > > > > (See

> > > > > Documentation/core-api/workqueue.rst. It also means we

> > > > > contend

> > > > > for work

> > > > > item queuing/dequeuing locks, since the threads which run the

> > > > > work

> > > > > items are not bound to a CPU.

> > > > > 

> > > > > IOW: This is not a slam-dunk obvious gain.

> > > > 

> > > > I agree but I think it's worth consideration. I'm waiting to

> > > > get

> > > > (real) performance numbers of improvement (instead of my VM

> > > > setup)

> > > > to

> > > > help my case. However, it was reported 90% degradation for the

> > > > read

> > > > performance over krb5p when 1CPU is executing all ops.

> > > > 

> > > > Is there a different way to make sure that on a multi-processor

> > > > machine we can take advantage of all available CPUs? Simple

> > > > kernel

> > > > threads instead of a work queue?

> > > 

> > > There is a trade-off between spreading the work, and ensuring it

> > > is executed on a CPU close to the I/O and application. IMO

> > > UNBOUND

> > > is a good way to do that. UNBOUND will attempt to schedule the

> > > work on the preferred CPU, but allow it to be migrated if that

> > > CPU is busy.

> > > 

> > > The advantage of this is that when the client workload is CPU

> > > intensive (say, a software build), RPC client work can be

> > > scheduled

> > > and run more quickly, which reduces latency.

> > > 

> > 

> > That should no longer be a huge issue, since queue_work() will now

> > default to the WORK_CPU_UNBOUND flag, which prefers the local CPU,

> > but

> > will schedule elsewhere if the local CPU is congested.

> 

> I don't believe NFS use workqueue_congested() to somehow schedule the

> work elsewhere. Unless the queue is marked UNBOUNDED I don't believe

> there is any intention of balancing the CPU load.

> 


I shouldn't have to test the queue when scheduling with
WORK_CPU_UNBOUND.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

Olga Kornievskaia July 5, 2017, 5:33 p.m. UTC | #7

On Wed, Jul 5, 2017 at 12:14 PM, Trond Myklebust
<trondmy@primarydata.com> wrote:
> On Wed, 2017-07-05 at 12:09 -0400, Olga Kornievskaia wrote:
>> On Wed, Jul 5, 2017 at 11:46 AM, Trond Myklebust
>> <trondmy@primarydata.com> wrote:
>> > On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
>> > > > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>
>> > > > wrote:
>> > > >
>> > > > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
>> > > > <trondmy@primarydata.com> wrote:
>> > > > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>> > > > > > Hi folks,
>> > > > > >
>> > > > > > On a multi-core machine, is it expected that we can have
>> > > > > > parallel
>> > > > > > RPCs
>> > > > > > handled by each of the per-core workqueue?
>> > > > > >
>> > > > > > In testing a read workload, observing via "top" command
>> > > > > > that a
>> > > > > > single
>> > > > > > "kworker" thread is running servicing the requests (no
>> > > > > > parallelism).
>> > > > > > It's more prominent while doing these operations over krb5p
>> > > > > > mount.
>> > > > > >
>> > > > > > What has been suggested by Bruce is to try this and in my
>> > > > > > testing I
>> > > > > > see then the read workload spread among all the kworker
>> > > > > > threads.
>> > > > > >
>> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>> > > > > >
>> > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> > > > > > index 0cc8383..f80e688 100644
>> > > > > > --- a/net/sunrpc/sched.c
>> > > > > > +++ b/net/sunrpc/sched.c
>> > > > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>> > > > > >  * Create the rpciod thread and wait for it to start.
>> > > > > >  */
>> > > > > >  dprintk("RPC:       creating workqueue rpciod\n");
>> > > > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>> > > > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM |
>> > > > > > WQ_UNBOUND,
>> > > > > > 0);
>> > > > > >  if (!wq)
>> > > > > >  goto out_failed;
>> > > > > >  rpciod_workqueue = wq;
>> > > > > >
>> > > > >
>> > > > > WQ_UNBOUND turns off concurrency management on the thread
>> > > > > pool
>> > > > > (See
>> > > > > Documentation/core-api/workqueue.rst. It also means we
>> > > > > contend
>> > > > > for work
>> > > > > item queuing/dequeuing locks, since the threads which run the
>> > > > > work
>> > > > > items are not bound to a CPU.
>> > > > >
>> > > > > IOW: This is not a slam-dunk obvious gain.
>> > > >
>> > > > I agree but I think it's worth consideration. I'm waiting to
>> > > > get
>> > > > (real) performance numbers of improvement (instead of my VM
>> > > > setup)
>> > > > to
>> > > > help my case. However, it was reported 90% degradation for the
>> > > > read
>> > > > performance over krb5p when 1CPU is executing all ops.
>> > > >
>> > > > Is there a different way to make sure that on a multi-processor
>> > > > machine we can take advantage of all available CPUs? Simple
>> > > > kernel
>> > > > threads instead of a work queue?
>> > >
>> > > There is a trade-off between spreading the work, and ensuring it
>> > > is executed on a CPU close to the I/O and application. IMO
>> > > UNBOUND
>> > > is a good way to do that. UNBOUND will attempt to schedule the
>> > > work on the preferred CPU, but allow it to be migrated if that
>> > > CPU is busy.
>> > >
>> > > The advantage of this is that when the client workload is CPU
>> > > intensive (say, a software build), RPC client work can be
>> > > scheduled
>> > > and run more quickly, which reduces latency.
>> > >
>> >
>> > That should no longer be a huge issue, since queue_work() will now
>> > default to the WORK_CPU_UNBOUND flag, which prefers the local CPU,
>> > but
>> > will schedule elsewhere if the local CPU is congested.
>>
>> I don't believe NFS use workqueue_congested() to somehow schedule the
>> work elsewhere. Unless the queue is marked UNBOUNDED I don't believe
>> there is any intention of balancing the CPU load.
>>
>
> I shouldn't have to test the queue when scheduling with
> WORK_CPU_UNBOUND.
>

Comments in the code says that "if CPU dies" it'll be re-scheduled on
another. I think the code requires to mark the queue UNBOUND to really
be scheduled on a different CPU. Just my reading of the code and it
matches what is seen with the krb5 workload.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Olga Kornievskaia July 19, 2017, 5:59 p.m. UTC | #8

On Wed, Jul 5, 2017 at 1:33 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Wed, Jul 5, 2017 at 12:14 PM, Trond Myklebust
> <trondmy@primarydata.com> wrote:
>> On Wed, 2017-07-05 at 12:09 -0400, Olga Kornievskaia wrote:
>>> On Wed, Jul 5, 2017 at 11:46 AM, Trond Myklebust
>>> <trondmy@primarydata.com> wrote:
>>> > On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
>>> > > > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>
>>> > > > wrote:
>>> > > >
>>> > > > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
>>> > > > <trondmy@primarydata.com> wrote:
>>> > > > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>>> > > > > > Hi folks,
>>> > > > > >
>>> > > > > > On a multi-core machine, is it expected that we can have
>>> > > > > > parallel
>>> > > > > > RPCs
>>> > > > > > handled by each of the per-core workqueue?
>>> > > > > >
>>> > > > > > In testing a read workload, observing via "top" command
>>> > > > > > that a
>>> > > > > > single
>>> > > > > > "kworker" thread is running servicing the requests (no
>>> > > > > > parallelism).
>>> > > > > > It's more prominent while doing these operations over krb5p
>>> > > > > > mount.
>>> > > > > >
>>> > > > > > What has been suggested by Bruce is to try this and in my
>>> > > > > > testing I
>>> > > > > > see then the read workload spread among all the kworker
>>> > > > > > threads.
>>> > > > > >
>>> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>> > > > > >
>>> > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>> > > > > > index 0cc8383..f80e688 100644
>>> > > > > > --- a/net/sunrpc/sched.c
>>> > > > > > +++ b/net/sunrpc/sched.c
>>> > > > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>>> > > > > >  * Create the rpciod thread and wait for it to start.
>>> > > > > >  */
>>> > > > > >  dprintk("RPC:       creating workqueue rpciod\n");
>>> > > > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>>> > > > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM |
>>> > > > > > WQ_UNBOUND,
>>> > > > > > 0);
>>> > > > > >  if (!wq)
>>> > > > > >  goto out_failed;
>>> > > > > >  rpciod_workqueue = wq;
>>> > > > > >
>>> > > > >
>>> > > > > WQ_UNBOUND turns off concurrency management on the thread
>>> > > > > pool
>>> > > > > (See
>>> > > > > Documentation/core-api/workqueue.rst. It also means we
>>> > > > > contend
>>> > > > > for work
>>> > > > > item queuing/dequeuing locks, since the threads which run the
>>> > > > > work
>>> > > > > items are not bound to a CPU.
>>> > > > >
>>> > > > > IOW: This is not a slam-dunk obvious gain.
>>> > > >
>>> > > > I agree but I think it's worth consideration. I'm waiting to
>>> > > > get
>>> > > > (real) performance numbers of improvement (instead of my VM
>>> > > > setup)
>>> > > > to
>>> > > > help my case. However, it was reported 90% degradation for the
>>> > > > read
>>> > > > performance over krb5p when 1CPU is executing all ops.
>>> > > >
>>> > > > Is there a different way to make sure that on a multi-processor
>>> > > > machine we can take advantage of all available CPUs? Simple
>>> > > > kernel
>>> > > > threads instead of a work queue?
>>> > >
>>> > > There is a trade-off between spreading the work, and ensuring it
>>> > > is executed on a CPU close to the I/O and application. IMO
>>> > > UNBOUND
>>> > > is a good way to do that. UNBOUND will attempt to schedule the
>>> > > work on the preferred CPU, but allow it to be migrated if that
>>> > > CPU is busy.
>>> > >
>>> > > The advantage of this is that when the client workload is CPU
>>> > > intensive (say, a software build), RPC client work can be
>>> > > scheduled
>>> > > and run more quickly, which reduces latency.
>>> > >
>>> >
>>> > That should no longer be a huge issue, since queue_work() will now
>>> > default to the WORK_CPU_UNBOUND flag, which prefers the local CPU,
>>> > but
>>> > will schedule elsewhere if the local CPU is congested.
>>>
>>> I don't believe NFS use workqueue_congested() to somehow schedule the
>>> work elsewhere. Unless the queue is marked UNBOUNDED I don't believe
>>> there is any intention of balancing the CPU load.
>>>
>>
>> I shouldn't have to test the queue when scheduling with
>> WORK_CPU_UNBOUND.
>>
>
> Comments in the code says that "if CPU dies" it'll be re-scheduled on
> another. I think the code requires to mark the queue UNBOUND to really
> be scheduled on a different CPU. Just my reading of the code and it
> matches what is seen with the krb5 workload.

Trond, what's the path forward here? What about a run-time
configuration that starts rpciod with the UNBOUND options instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chuck Lever III Feb. 17, 2018, 6:55 p.m. UTC | #9

> On Feb 14, 2018, at 6:13 PM, Mora, Jorge <Jorge.Mora@netapp.com> wrote:
> 
> Hello,
> 
> The patch gives some performance improvement on Kerberos read.
> The following results show performance comparisons between unpatched
> and patched systems. The html files included as attachments show the
> results as line charts.
> 
> - Best read performance improvement when testing with a single dd transfer.
>  The patched system gives 70% better performance than the unpatched system.
>  (first set of results)
> 
> - The patched system gives 18% better performance than the unpatched system
>  when testing with multiple dd transfers.
>  (second set of results)
> 
> - The write test shows there is no performance hit by the patch.
>  (third set of results)
> 
> - When testing on a different client having less RAM and fewer number of CPU cores,
>  there is no performance degradation for Kerberos in the unpatched system.
>  In this case, the patch does not provide any performance improvement.
>  (fourth set of results)
> 
> ================================================================================
> Test environment:
> 
> NFS client:  CPU: 16 cores, RAM: 32GB (E5620 @ 2.40GHz)
> NFS servers: CPU: 16 cores, RAM: 32GB (E5620 @ 2.40GHz)
> NFS mount:   NFSv3 with sec=(sys or krb5p)
> 
> For tests with a single dd transfer there is of course one NFS server used
> and one file being read -- only one transfer was needed to fill up the
> network connection.
> 
> For tests with multiple dd transfers, three different NFS server were used
> and four different files were used per NFS server for a total of 12 different
> files being read (12 different transfers in parallel).
> 
> The patch was applied on top of 4.14.0-rc3 kernel and the NFS servers were
> running RHEL 7.4.
> 
> The fourth set of results below show an unpatched system with no Kerberos
> degradation (same kernel 4.14.0-rc3) but in contrast with the main client
> used for testing this client has only 4 CPU cores and 8GB of RAM.
> I believe that even though this system has less CPU cores and less RAM,
> the CPU is faster (E31220 @ 3.10GHz vs E5620 @ 2.40GHz) so it is able
> to handle the Kerberos load better and fill up the network connection
> with a single thread than the main client with more CPU cores and more
> memory.

Jorge, thanks for publishing these results.

Can you do a "numactl -H" on your clients and post the output? I suspect
the throughput improvement on the big client is because WQ_UNBOUND
behaves differently on NUMA systems. (Even so, I agree that the proposed
change is valuable).


> ================================================================================
> 
> Kerberos Read Performance: 170.15% (patched system over unpatched system)
> 
> Client CPU:        Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
> CPU cores:         16
> RAM:               32 GB
> NFS version:       3
> Mount points:      1
> dd's per mount:    1
> Total dd's:        1
> Data transferred:  7.81 GB (per run)
> Number of runs:    10
> 
> Kerberos Read Performance (unpatched system vs patched system)
> Transfer rate (unpatched system)  avg:  65.88 MB/s,   var:  20.28,   stddev:   4.50
> Transfer rate (patched system)    avg: 112.10 MB/s,   var:   0.00,   stddev:   0.01
> Performance (patched over unpatched):  170.15%
> 
> Unpatched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 111.96 MB/s,   var:   0.02,   stddev:   0.13
> Transfer rate (sec=krb5p)  avg:  65.88 MB/s,   var:  20.28,   stddev:   4.50
> Performance (krb5p over sys):   58.84%
> 
> Patched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 111.94 MB/s,   var:   0.02,   stddev:   0.14
> Transfer rate (sec=krb5p)  avg: 112.10 MB/s,   var:   0.00,   stddev:   0.01
> Performance (krb5p over sys):  100.14%
> 
> ================================================================================
> 
> Kerberos Read Performance: 118.02% (patched system over unpatched system)
> 
> Client CPU:        Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
> CPU cores:         16
> RAM:               32 GB
> NFS version:       3
> Mount points:      3
> dd's per mount:    4
> Total dd's:        12
> Data transferred:  93.75 GB (per run)
> Number of runs:    10
> 
> Kerberos Read Performance (unpatched system vs patched system)
> Transfer rate (unpatched system)  avg:  94.99 MB/s,   var:  68.96,   stddev:   8.30
> Transfer rate (patched system)    avg: 112.11 MB/s,   var:   0.00,   stddev:   0.03
> Performance (patched over unpatched):  118.02%
> 
> Unpatched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 112.21 MB/s,   var:   0.00,   stddev:   0.00
> Transfer rate (sec=krb5p)  avg:  94.99 MB/s,   var:  68.96,   stddev:   8.30
> Performance (krb5p over sys):   84.66%
> 
> Patched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 112.20 MB/s,   var:   0.00,   stddev:   0.00
> Transfer rate (sec=krb5p)  avg: 112.11 MB/s,   var:   0.00,   stddev:   0.03
> Performance (krb5p over sys):   99.92%
> 
> ================================================================================
> 
> Kerberos Write Performance: 101.55% (patched system over unpatched system)
> 
> Client CPU:        Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
> CPU cores:         16
> RAM:               32 GB
> NFS version:       3
> Mount points:      3
> dd's per mount:    4
> Total dd's:        12
> Data transferred:  93.75 GB (per run)
> Number of runs:    10
> 
> Kerberos Write Performance (unpatched system vs patched system)
> Transfer rate (unpatched system)  avg: 103.70 MB/s,   var: 110.51,   stddev:  10.51
> Transfer rate (patched system)    avg: 105.31 MB/s,   var:  35.04,   stddev:   5.92
> Performance (patched over unpatched):  101.55%
> 
> Unpatched System Write Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 109.87 MB/s,   var:  10.27,   stddev:   3.20
> Transfer rate (sec=krb5p)  avg: 103.70 MB/s,   var: 110.51,   stddev:  10.51
> Performance (krb5p over sys):   94.39%
> 
> Patched System Write Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 111.03 MB/s,   var:   0.58,   stddev:   0.76
> Transfer rate (sec=krb5p)  avg: 105.31 MB/s,   var:  35.04,   stddev:   5.92
> Performance (krb5p over sys):   94.85%
> 
> ================================================================================
> 
> Kerberos Read Performance: 99.99% (patched system over unpatched system)
> 
> Client CPU:        Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
> CPU cores:         4
> RAM:               8 GB
> NFS version:       3
> Mount points:      1
> dd's per mount:    1
> Total dd's:        1
> Data transferred:  7.81 GB (per run)
> Number of runs:    10
> 
> Kerberos Read Performance (unpatched system vs patched system)
> Transfer rate (unpatched system)  avg: 112.02 MB/s,   var:   0.04,   stddev:   0.21
> Transfer rate (patched system)    avg: 112.01 MB/s,   var:   0.06,   stddev:   0.25
> Performance (patched over unpatched):   99.99%
> 
> Unpatched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 111.86 MB/s,   var:   0.06,   stddev:   0.24
> Transfer rate (sec=krb5p)  avg: 112.02 MB/s,   var:   0.04,   stddev:   0.21
> Performance (krb5p over sys):  100.14%
> 
> Patched System Read Performance (sys vs krb5p)
> Transfer rate (sec=sys)    avg: 111.76 MB/s,   var:   0.12,   stddev:   0.34
> Transfer rate (sec=krb5p)  avg: 112.01 MB/s,   var:   0.06,   stddev:   0.25
> Performance (krb5p over sys):  100.22%
> 
> 
> --Jorge
> 
> ________________________________________
> From: linux-nfs-owner@vger.kernel.org <linux-nfs-owner@vger.kernel.org> on behalf of Olga Kornievskaia <aglo@umich.edu>
> Sent: Wednesday, July 19, 2017 11:59 AM
> To: Trond Myklebust
> Cc: linux-nfs@vger.kernel.org; chuck.lever@oracle.com
> Subject: Re: [RFC] fix parallelism for rpc tasks
> 
> On Wed, Jul 5, 2017 at 1:33 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Wed, Jul 5, 2017 at 12:14 PM, Trond Myklebust
>> <trondmy@primarydata.com> wrote:
>>> On Wed, 2017-07-05 at 12:09 -0400, Olga Kornievskaia wrote:
>>>> On Wed, Jul 5, 2017 at 11:46 AM, Trond Myklebust
>>>> <trondmy@primarydata.com> wrote:
>>>>> On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
>>>>>>> On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
>>>>>>> <trondmy@primarydata.com> wrote:
>>>>>>>> On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> On a multi-core machine, is it expected that we can have
>>>>>>>>> parallel
>>>>>>>>> RPCs
>>>>>>>>> handled by each of the per-core workqueue?
>>>>>>>>> 
>>>>>>>>> In testing a read workload, observing via "top" command
>>>>>>>>> that a
>>>>>>>>> single
>>>>>>>>> "kworker" thread is running servicing the requests (no
>>>>>>>>> parallelism).
>>>>>>>>> It's more prominent while doing these operations over krb5p
>>>>>>>>> mount.
>>>>>>>>> 
>>>>>>>>> What has been suggested by Bruce is to try this and in my
>>>>>>>>> testing I
>>>>>>>>> see then the read workload spread among all the kworker
>>>>>>>>> threads.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>>>>>>>> 
>>>>>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>>>>>> index 0cc8383..f80e688 100644
>>>>>>>>> --- a/net/sunrpc/sched.c
>>>>>>>>> +++ b/net/sunrpc/sched.c
>>>>>>>>> @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>>>>>>>>> * Create the rpciod thread and wait for it to start.
>>>>>>>>> */
>>>>>>>>> dprintk("RPC:       creating workqueue rpciod\n");
>>>>>>>>> - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>>>>>>>>> + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM |
>>>>>>>>> WQ_UNBOUND,
>>>>>>>>> 0);
>>>>>>>>> if (!wq)
>>>>>>>>> goto out_failed;
>>>>>>>>> rpciod_workqueue = wq;
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> WQ_UNBOUND turns off concurrency management on the thread
>>>>>>>> pool
>>>>>>>> (See
>>>>>>>> Documentation/core-api/workqueue.rst. It also means we
>>>>>>>> contend
>>>>>>>> for work
>>>>>>>> item queuing/dequeuing locks, since the threads which run the
>>>>>>>> work
>>>>>>>> items are not bound to a CPU.
>>>>>>>> 
>>>>>>>> IOW: This is not a slam-dunk obvious gain.
>>>>>>> 
>>>>>>> I agree but I think it's worth consideration. I'm waiting to
>>>>>>> get
>>>>>>> (real) performance numbers of improvement (instead of my VM
>>>>>>> setup)
>>>>>>> to
>>>>>>> help my case. However, it was reported 90% degradation for the
>>>>>>> read
>>>>>>> performance over krb5p when 1CPU is executing all ops.
>>>>>>> 
>>>>>>> Is there a different way to make sure that on a multi-processor
>>>>>>> machine we can take advantage of all available CPUs? Simple
>>>>>>> kernel
>>>>>>> threads instead of a work queue?
>>>>>> 
>>>>>> There is a trade-off between spreading the work, and ensuring it
>>>>>> is executed on a CPU close to the I/O and application. IMO
>>>>>> UNBOUND
>>>>>> is a good way to do that. UNBOUND will attempt to schedule the
>>>>>> work on the preferred CPU, but allow it to be migrated if that
>>>>>> CPU is busy.
>>>>>> 
>>>>>> The advantage of this is that when the client workload is CPU
>>>>>> intensive (say, a software build), RPC client work can be
>>>>>> scheduled
>>>>>> and run more quickly, which reduces latency.
>>>>>> 
>>>>> 
>>>>> That should no longer be a huge issue, since queue_work() will now
>>>>> default to the WORK_CPU_UNBOUND flag, which prefers the local CPU,
>>>>> but
>>>>> will schedule elsewhere if the local CPU is congested.
>>>> 
>>>> I don't believe NFS use workqueue_congested() to somehow schedule the
>>>> work elsewhere. Unless the queue is marked UNBOUNDED I don't believe
>>>> there is any intention of balancing the CPU load.
>>>> 
>>> 
>>> I shouldn't have to test the queue when scheduling with
>>> WORK_CPU_UNBOUND.
>>> 
>> 
>> Comments in the code says that "if CPU dies" it'll be re-scheduled on
>> another. I think the code requires to mark the queue UNBOUND to really
>> be scheduled on a different CPU. Just my reading of the code and it
>> matches what is seen with the krb5 workload.
> 
> Trond, what's the path forward here? What about a run-time
> configuration that starts rpciod with the UNBOUND options instead?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> <dd_read_single.html><dd_read_mult.html><dd_write_mult.html><dd_read_single1.html>

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] fix parallelism for rpc tasks

Commit Message

Comments

Patch