mbox series

[0/4,RFC] nfsd: allocate/free session-based DRC slots on demand

Message ID 20241113055345.494856-1-neilb@suse.de (mailing list archive)
Headers show
Series nfsd: allocate/free session-based DRC slots on demand | expand

Message

NeilBrown Nov. 13, 2024, 5:38 a.m. UTC
This patch set aims to allocate session-based DRC slots on demand, and
free them when not in use, or when memory is tight.

I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
overly agreesive, and with lots of printks, and it seems to do the right
thing, though memory pressure has never freed anything - I think you
need several clients with a non-trivial number of slots allocated before
the thresholds in the shrinker code will trigger any freeing.

I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
useful that is.  There are certainly cases where simply setting the
target in a SEQUENCE reply might not be enough, but I doubt they are
very common.  You would need a session to be completely idle, with the
last request received on it indicating that lots of slots were still in
use.

Currently we allocate slots one at a time when the last available slot
was used by the client, and only if a NOWAIT allocation can succeed.  It
is possible that this isn't quite agreesive enough.  When performing a
lot of writeback it can be useful to have lots of slots, but memory
pressure is also likely to build up on the server so GFP_NOWAIT is likely
to fail.  Maybe occasionally using a firmer request (outside the
spinlock) would be justified.

We free slots when the number of unused slots passes some threshold -
currently 6 (because ...  why not).  Possible a hysteresis should be
added so we don't free unused slots for a least N seconds.

When the shrinker wants to apply presure we remove slots equally from
all sessions.  Maybe there should be some proportionality but that would
be more complex and I'm not sure it would gain much.  Slot 0 can never
be freed of course.

I'm very interested to see what people think of the over-all approach,
and of the specifics of the code.

Thanks,
NeilBrown


 [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
 [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
 [PATCH 3/4] nfsd: free unused session-DRC slots
 [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated

Comments

Daire Byrne Nov. 13, 2024, 11:23 a.m. UTC | #1
Neil,

I'm curious if this work relates to:

https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com

As my thread described, we currently use NFSv3 for our high latency
NFS re-export cases simply because it performs way better for parallel
client operations. You see, when you use re-exporting serving many
clients, you are in effect taking all those client operations and
stuffing them through a single client (the re-export server) which
then becomes a bottleneck. Add any kind of latency on top (>10ms) and
the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major
bottleneck for a single client (re-export server).

We also used your "VFS: support parallel updates in the one directory"
patches for similar reasons up until I couldn't port it to newer
kernels anymore (my kernel code munging skills are not sufficient!).

Sorry to spam the thread if I am misinterpreting what this patch set
is all about.

Daire


On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
>
> This patch set aims to allocate session-based DRC slots on demand, and
> free them when not in use, or when memory is tight.
>
> I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> overly agreesive, and with lots of printks, and it seems to do the right
> thing, though memory pressure has never freed anything - I think you
> need several clients with a non-trivial number of slots allocated before
> the thresholds in the shrinker code will trigger any freeing.
>
> I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> useful that is.  There are certainly cases where simply setting the
> target in a SEQUENCE reply might not be enough, but I doubt they are
> very common.  You would need a session to be completely idle, with the
> last request received on it indicating that lots of slots were still in
> use.
>
> Currently we allocate slots one at a time when the last available slot
> was used by the client, and only if a NOWAIT allocation can succeed.  It
> is possible that this isn't quite agreesive enough.  When performing a
> lot of writeback it can be useful to have lots of slots, but memory
> pressure is also likely to build up on the server so GFP_NOWAIT is likely
> to fail.  Maybe occasionally using a firmer request (outside the
> spinlock) would be justified.
>
> We free slots when the number of unused slots passes some threshold -
> currently 6 (because ...  why not).  Possible a hysteresis should be
> added so we don't free unused slots for a least N seconds.
>
> When the shrinker wants to apply presure we remove slots equally from
> all sessions.  Maybe there should be some proportionality but that would
> be more complex and I'm not sure it would gain much.  Slot 0 can never
> be freed of course.
>
> I'm very interested to see what people think of the over-all approach,
> and of the specifics of the code.
>
> Thanks,
> NeilBrown
>
>
>  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
>  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
>  [PATCH 3/4] nfsd: free unused session-DRC slots
>  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
>
Chuck Lever III Nov. 13, 2024, 2:48 p.m. UTC | #2
> On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> 
> This patch set aims to allocate session-based DRC slots on demand, and
> free them when not in use, or when memory is tight.
> 
> I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> overly agreesive, and with lots of printks, and it seems to do the right
> thing, though memory pressure has never freed anything - I think you
> need several clients with a non-trivial number of slots allocated before
> the thresholds in the shrinker code will trigger any freeing.

Can you describe your test set-up? Generally a system
with less than 4GB of memory can trigger shrinkers
pretty easily.

If we never see the mechanism being triggered due to
memory exhaustion, then I wonder if the additional
complexity is adding substantial value.


> I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> useful that is.  There are certainly cases where simply setting the
> target in a SEQUENCE reply might not be enough, but I doubt they are
> very common.  You would need a session to be completely idle, with the
> last request received on it indicating that lots of slots were still in
> use.
> 
> Currently we allocate slots one at a time when the last available slot
> was used by the client, and only if a NOWAIT allocation can succeed.  It
> is possible that this isn't quite agreesive enough.  When performing a
> lot of writeback it can be useful to have lots of slots, but memory
> pressure is also likely to build up on the server so GFP_NOWAIT is likely
> to fail.  Maybe occasionally using a firmer request (outside the
> spinlock) would be justified.

I'm wondering why GFP_NOWAIT is used here, and I admit
I'm not strongly familiar with the code or mechanism.
Why not always use GFP_KERNEL ?


> We free slots when the number of unused slots passes some threshold -
> currently 6 (because ...  why not).  Possible a hysteresis should be
> added so we don't free unused slots for a least N seconds.

Generally freeing unused resources is un-Linux like. :-)
Can you provide a rationale for why this is needed?


> When the shrinker wants to apply presure we remove slots equally from
> all sessions.  Maybe there should be some proportionality but that would
> be more complex and I'm not sure it would gain much.  Slot 0 can never
> be freed of course.
> 
> I'm very interested to see what people think of the over-all approach,
> and of the specifics of the code.
> 
> Thanks,
> NeilBrown
> 
> 
> [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> [PATCH 3/4] nfsd: free unused session-DRC slots
> [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> 

--
Chuck Lever
NeilBrown Nov. 15, 2024, 4:56 a.m. UTC | #3
On Wed, 13 Nov 2024, Daire Byrne wrote:
> Neil,
> 
> I'm curious if this work relates to:
> 
> https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com

Yes, it could possibly help with that - though more work would be
needed.
nfsd currently has a hard limit of 160 slots per session.  That wouldn't
be enough I suspect.  The Linux client has a hard limit of 1024.  That
might be enough.
Allowed nfsd to have 1024 (or more) shouldn't be too hard...

> 
> As my thread described, we currently use NFSv3 for our high latency
> NFS re-export cases simply because it performs way better for parallel
> client operations. You see, when you use re-exporting serving many
> clients, you are in effect taking all those client operations and
> stuffing them through a single client (the re-export server) which
> then becomes a bottleneck. Add any kind of latency on top (>10ms) and
> the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major
> bottleneck for a single client (re-export server).
> 
> We also used your "VFS: support parallel updates in the one directory"
> patches for similar reasons up until I couldn't port it to newer
> kernels anymore (my kernel code munging skills are not sufficient!).

Yeah - I really should get back to that.  Al and Linus suggested some
changes and I just never got around to making them.

Thanks,
NeilBrown


> 
> Sorry to spam the thread if I am misinterpreting what this patch set
> is all about.
> 
> Daire
> 
> 
> On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
> >
> > This patch set aims to allocate session-based DRC slots on demand, and
> > free them when not in use, or when memory is tight.
> >
> > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > overly agreesive, and with lots of printks, and it seems to do the right
> > thing, though memory pressure has never freed anything - I think you
> > need several clients with a non-trivial number of slots allocated before
> > the thresholds in the shrinker code will trigger any freeing.
> >
> > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > useful that is.  There are certainly cases where simply setting the
> > target in a SEQUENCE reply might not be enough, but I doubt they are
> > very common.  You would need a session to be completely idle, with the
> > last request received on it indicating that lots of slots were still in
> > use.
> >
> > Currently we allocate slots one at a time when the last available slot
> > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > is possible that this isn't quite agreesive enough.  When performing a
> > lot of writeback it can be useful to have lots of slots, but memory
> > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > to fail.  Maybe occasionally using a firmer request (outside the
> > spinlock) would be justified.
> >
> > We free slots when the number of unused slots passes some threshold -
> > currently 6 (because ...  why not).  Possible a hysteresis should be
> > added so we don't free unused slots for a least N seconds.
> >
> > When the shrinker wants to apply presure we remove slots equally from
> > all sessions.  Maybe there should be some proportionality but that would
> > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > be freed of course.
> >
> > I'm very interested to see what people think of the over-all approach,
> > and of the specifics of the code.
> >
> > Thanks,
> > NeilBrown
> >
> >
> >  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> >  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> >  [PATCH 3/4] nfsd: free unused session-DRC slots
> >  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> >
>
NeilBrown Nov. 15, 2024, 9:03 a.m. UTC | #4
On Thu, 14 Nov 2024, Chuck Lever III wrote:
> 
> 
> > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > This patch set aims to allocate session-based DRC slots on demand, and
> > free them when not in use, or when memory is tight.
> > 
> > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > overly agreesive, and with lots of printks, and it seems to do the right
> > thing, though memory pressure has never freed anything - I think you
> > need several clients with a non-trivial number of slots allocated before
> > the thresholds in the shrinker code will trigger any freeing.
> 
> Can you describe your test set-up? Generally a system
> with less than 4GB of memory can trigger shrinkers
> pretty easily.
> 
> If we never see the mechanism being triggered due to
> memory exhaustion, then I wonder if the additional
> complexity is adding substantial value.

Just a single VM with 1G RAM.  Only one client so only one session.
The default batch count for shrinkers is 64 and the reported count of
freeable items is normally scaled down a lot until memory gets really
tight.  So if I only have 6 slots that could be freed the shrinker isn't
going to notice.
I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing
things.  This allowed me to see some bugs.

One that I haven't resolved yet is the need to wait to get confirmation
from the client before rejecting requests with larger numbered slots.

> 
> 
> > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > useful that is.  There are certainly cases where simply setting the
> > target in a SEQUENCE reply might not be enough, but I doubt they are
> > very common.  You would need a session to be completely idle, with the
> > last request received on it indicating that lots of slots were still in
> > use.
> > 
> > Currently we allocate slots one at a time when the last available slot
> > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > is possible that this isn't quite agreesive enough.  When performing a
> > lot of writeback it can be useful to have lots of slots, but memory
> > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > to fail.  Maybe occasionally using a firmer request (outside the
> > spinlock) would be justified.
> 
> I'm wondering why GFP_NOWAIT is used here, and I admit
> I'm not strongly familiar with the code or mechanism.
> Why not always use GFP_KERNEL ?

Partly because the kmalloc call is under a spinlock, so we cannot wait. 
But that could be changed with a bit of work.

GFP_KERNEL can block indefinitely, and we don't actually need the
allocation to succeed to satisfy the current request, so it seems wrong
to block at all when we don't need to.

I'm hoping that GFP_NOWAIT will succeed often enough that the slot table
will grow when there is demand - maybe not instantly but not too slowly.
If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the
shrinker will probably ask us to return some slots soon - maybe it isn't
worth trying hard to allocate something we will have to return soon.

> 
> 
> > We free slots when the number of unused slots passes some threshold -
> > currently 6 (because ...  why not).  Possible a hysteresis should be
> > added so we don't free unused slots for a least N seconds.
> 
> Generally freeing unused resources is un-Linux like. :-)
> Can you provide a rationale for why this is needed?

Uhm...  No.  I added it so that patch which adds slot retirement could do
something useful before the shrinker was added, and when I added the
shrinker I couldn't bring myself to remove it.  Probably I should.


Thanks for your thoughtful review.

NeilBrown

> 
> 
> > When the shrinker wants to apply presure we remove slots equally from
> > all sessions.  Maybe there should be some proportionality but that would
> > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > be freed of course.
> > 
> > I'm very interested to see what people think of the over-all approach,
> > and of the specifics of the code.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> > [PATCH 3/4] nfsd: free unused session-DRC slots
> > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> > 
> 
> --
> Chuck Lever
> 
> 
>
Chuck Lever III Nov. 15, 2024, 2:18 p.m. UTC | #5
On Fri, Nov 15, 2024 at 03:56:08PM +1100, NeilBrown wrote:
> On Wed, 13 Nov 2024, Daire Byrne wrote:
> > Neil,
> > 
> > I'm curious if this work relates to:
> > 
> > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com
> 
> Yes, it could possibly help with that - though more work would be
> needed.
> nfsd currently has a hard limit of 160 slots per session.  That wouldn't
> be enough I suspect.  The Linux client has a hard limit of 1024.  That
> might be enough.
> Allowed nfsd to have 1024 (or more) shouldn't be too hard...

1024 could be in the range where having a shrinker (when there are
multiple clients with that many slots) will start to bring some
value.

Maybe 1024 is too large at the beginning of a session, but enabling
NFSD to grow a session to that size would not be a bad thing.
Chuck Lever III Nov. 15, 2024, 2:27 p.m. UTC | #6
On Fri, Nov 15, 2024 at 08:03:00PM +1100, NeilBrown wrote:
> On Thu, 14 Nov 2024, Chuck Lever III wrote:
> > 
> > 
> > > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > This patch set aims to allocate session-based DRC slots on demand, and
> > > free them when not in use, or when memory is tight.
> > > 
> > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > > overly agreesive, and with lots of printks, and it seems to do the right
> > > thing, though memory pressure has never freed anything - I think you
> > > need several clients with a non-trivial number of slots allocated before
> > > the thresholds in the shrinker code will trigger any freeing.
> > 
> > Can you describe your test set-up? Generally a system
> > with less than 4GB of memory can trigger shrinkers
> > pretty easily.
> > 
> > If we never see the mechanism being triggered due to
> > memory exhaustion, then I wonder if the additional
> > complexity is adding substantial value.
> 
> Just a single VM with 1G RAM.  Only one client so only one session.

That's a good RAM size for this test, but I think you do need to
have more than a single client to stress the server a little more.


> The default batch count for shrinkers is 64 and the reported count of
> freeable items is normally scaled down a lot until memory gets really
> tight.  So if I only have 6 slots that could be freed the shrinker isn't
> going to notice.
> I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing
> things.  This allowed me to see some bugs.
> 
> One that I haven't resolved yet is the need to wait to get confirmation
> from the client before rejecting requests with larger numbered slots.
> 
> > 
> > 
> > > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > > useful that is.  There are certainly cases where simply setting the
> > > target in a SEQUENCE reply might not be enough, but I doubt they are
> > > very common.  You would need a session to be completely idle, with the
> > > last request received on it indicating that lots of slots were still in
> > > use.
> > > 
> > > Currently we allocate slots one at a time when the last available slot
> > > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > > is possible that this isn't quite agreesive enough.  When performing a
> > > lot of writeback it can be useful to have lots of slots, but memory
> > > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > > to fail.  Maybe occasionally using a firmer request (outside the
> > > spinlock) would be justified.
> > 
> > I'm wondering why GFP_NOWAIT is used here, and I admit
> > I'm not strongly familiar with the code or mechanism.
> > Why not always use GFP_KERNEL ?
> 
> Partly because the kmalloc call is under a spinlock, so we cannot wait. 
> But that could be changed with a bit of work.
> 
> GFP_KERNEL can block indefinitely, and we don't actually need the
> allocation to succeed to satisfy the current request, so it seems wrong
> to block at all when we don't need to.

Ah, that is sensible. A comment might be added that explains that
the server response to the client's "request" to increase the slot
table size does not have to be immediate, and in fact, the failure
is a sign the server is under memory pressure and shouldn't grow the
slot table size anyway.


> I'm hoping that GFP_NOWAIT will succeed often enough that the slot table
> will grow when there is demand - maybe not instantly but not too slowly.

I don't think it will ever fail if the requested allocation is below
4KB, so this might be no more than a fussy detail. It does indeed
make sense to me, though.


> If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the
> shrinker will probably ask us to return some slots soon - maybe it isn't
> worth trying hard to allocate something we will have to return soon.
Daire Byrne Nov. 18, 2024, 3:43 p.m. UTC | #7
On Fri, 15 Nov 2024 at 04:56, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 13 Nov 2024, Daire Byrne wrote:
> > Neil,
> >
> > I'm curious if this work relates to:
> >
> > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375
> > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com
>
> Yes, it could possibly help with that - though more work would be
> needed.
> nfsd currently has a hard limit of 160 slots per session.  That wouldn't
> be enough I suspect.  The Linux client has a hard limit of 1024.  That
> might be enough.
> Allowed nfsd to have 1024 (or more) shouldn't be too hard...

That would be cool - I'd love to be able to switch out NFSv3 for NFSv4
for our use cases. Although saying that, any changes to nfsd to
support that would likely take many years to make it into the RHEL
based storage servers we currently use.

Our re-export case is pretty unique and niche in the sense that a
single client is essentially trying to do the work of many clients.

> > We also used your "VFS: support parallel updates in the one directory"
> > patches for similar reasons up until I couldn't port it to newer
> > kernels anymore (my kernel code munging skills are not sufficient!).
>
> Yeah - I really should get back to that.  Al and Linus suggested some
> changes and I just never got around to making them.

That would also be awesome. Again, our specific niche use case (many
clients writing to the same directory via a re-export) is probably the
main beneficiary, but maybe it helps with other (more common)
workloads too.

Cheers,

Daire

> Thanks,
> NeilBrown
>
>
> >
> > Sorry to spam the thread if I am misinterpreting what this patch set
> > is all about.
> >
> > Daire
> >
> >
> > On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote:
> > >
> > > This patch set aims to allocate session-based DRC slots on demand, and
> > > free them when not in use, or when memory is tight.
> > >
> > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is
> > > overly agreesive, and with lots of printks, and it seems to do the right
> > > thing, though memory pressure has never freed anything - I think you
> > > need several clients with a non-trivial number of slots allocated before
> > > the thresholds in the shrinker code will trigger any freeing.
> > >
> > > I haven't made use of the CB_RECALL_SLOT callback.  I'm not sure how
> > > useful that is.  There are certainly cases where simply setting the
> > > target in a SEQUENCE reply might not be enough, but I doubt they are
> > > very common.  You would need a session to be completely idle, with the
> > > last request received on it indicating that lots of slots were still in
> > > use.
> > >
> > > Currently we allocate slots one at a time when the last available slot
> > > was used by the client, and only if a NOWAIT allocation can succeed.  It
> > > is possible that this isn't quite agreesive enough.  When performing a
> > > lot of writeback it can be useful to have lots of slots, but memory
> > > pressure is also likely to build up on the server so GFP_NOWAIT is likely
> > > to fail.  Maybe occasionally using a firmer request (outside the
> > > spinlock) would be justified.
> > >
> > > We free slots when the number of unused slots passes some threshold -
> > > currently 6 (because ...  why not).  Possible a hysteresis should be
> > > added so we don't free unused slots for a least N seconds.
> > >
> > > When the shrinker wants to apply presure we remove slots equally from
> > > all sessions.  Maybe there should be some proportionality but that would
> > > be more complex and I'm not sure it would gain much.  Slot 0 can never
> > > be freed of course.
> > >
> > > I'm very interested to see what people think of the over-all approach,
> > > and of the specifics of the code.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > >  [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC
> > >  [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand.
> > >  [PATCH 3/4] nfsd: free unused session-DRC slots
> > >  [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated
> > >
> >
>