Message ID | 20241113055345.494856-1-neilb@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | nfsd: allocate/free session-based DRC slots on demand | expand |
Neil, I'm curious if this work relates to: https://bugzilla.linux-nfs.org/show_bug.cgi?id=375 https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com As my thread described, we currently use NFSv3 for our high latency NFS re-export cases simply because it performs way better for parallel client operations. You see, when you use re-exporting serving many clients, you are in effect taking all those client operations and stuffing them through a single client (the re-export server) which then becomes a bottleneck. Add any kind of latency on top (>10ms) and the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major bottleneck for a single client (re-export server). We also used your "VFS: support parallel updates in the one directory" patches for similar reasons up until I couldn't port it to newer kernels anymore (my kernel code munging skills are not sufficient!). Sorry to spam the thread if I am misinterpreting what this patch set is all about. Daire On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote: > > This patch set aims to allocate session-based DRC slots on demand, and > free them when not in use, or when memory is tight. > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > overly agreesive, and with lots of printks, and it seems to do the right > thing, though memory pressure has never freed anything - I think you > need several clients with a non-trivial number of slots allocated before > the thresholds in the shrinker code will trigger any freeing. > > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > useful that is. There are certainly cases where simply setting the > target in a SEQUENCE reply might not be enough, but I doubt they are > very common. You would need a session to be completely idle, with the > last request received on it indicating that lots of slots were still in > use. > > Currently we allocate slots one at a time when the last available slot > was used by the client, and only if a NOWAIT allocation can succeed. It > is possible that this isn't quite agreesive enough. When performing a > lot of writeback it can be useful to have lots of slots, but memory > pressure is also likely to build up on the server so GFP_NOWAIT is likely > to fail. Maybe occasionally using a firmer request (outside the > spinlock) would be justified. > > We free slots when the number of unused slots passes some threshold - > currently 6 (because ... why not). Possible a hysteresis should be > added so we don't free unused slots for a least N seconds. > > When the shrinker wants to apply presure we remove slots equally from > all sessions. Maybe there should be some proportionality but that would > be more complex and I'm not sure it would gain much. Slot 0 can never > be freed of course. > > I'm very interested to see what people think of the over-all approach, > and of the specifics of the code. > > Thanks, > NeilBrown > > > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand. > [PATCH 3/4] nfsd: free unused session-DRC slots > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated >
> On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote: > > This patch set aims to allocate session-based DRC slots on demand, and > free them when not in use, or when memory is tight. > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > overly agreesive, and with lots of printks, and it seems to do the right > thing, though memory pressure has never freed anything - I think you > need several clients with a non-trivial number of slots allocated before > the thresholds in the shrinker code will trigger any freeing. Can you describe your test set-up? Generally a system with less than 4GB of memory can trigger shrinkers pretty easily. If we never see the mechanism being triggered due to memory exhaustion, then I wonder if the additional complexity is adding substantial value. > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > useful that is. There are certainly cases where simply setting the > target in a SEQUENCE reply might not be enough, but I doubt they are > very common. You would need a session to be completely idle, with the > last request received on it indicating that lots of slots were still in > use. > > Currently we allocate slots one at a time when the last available slot > was used by the client, and only if a NOWAIT allocation can succeed. It > is possible that this isn't quite agreesive enough. When performing a > lot of writeback it can be useful to have lots of slots, but memory > pressure is also likely to build up on the server so GFP_NOWAIT is likely > to fail. Maybe occasionally using a firmer request (outside the > spinlock) would be justified. I'm wondering why GFP_NOWAIT is used here, and I admit I'm not strongly familiar with the code or mechanism. Why not always use GFP_KERNEL ? > We free slots when the number of unused slots passes some threshold - > currently 6 (because ... why not). Possible a hysteresis should be > added so we don't free unused slots for a least N seconds. Generally freeing unused resources is un-Linux like. :-) Can you provide a rationale for why this is needed? > When the shrinker wants to apply presure we remove slots equally from > all sessions. Maybe there should be some proportionality but that would > be more complex and I'm not sure it would gain much. Slot 0 can never > be freed of course. > > I'm very interested to see what people think of the over-all approach, > and of the specifics of the code. > > Thanks, > NeilBrown > > > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand. > [PATCH 3/4] nfsd: free unused session-DRC slots > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated > -- Chuck Lever
On Wed, 13 Nov 2024, Daire Byrne wrote: > Neil, > > I'm curious if this work relates to: > > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375 > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com Yes, it could possibly help with that - though more work would be needed. nfsd currently has a hard limit of 160 slots per session. That wouldn't be enough I suspect. The Linux client has a hard limit of 1024. That might be enough. Allowed nfsd to have 1024 (or more) shouldn't be too hard... > > As my thread described, we currently use NFSv3 for our high latency > NFS re-export cases simply because it performs way better for parallel > client operations. You see, when you use re-exporting serving many > clients, you are in effect taking all those client operations and > stuffing them through a single client (the re-export server) which > then becomes a bottleneck. Add any kind of latency on top (>10ms) and > the NFSD_CACHE_SIZE_SLOTS_PER_SESSION (32) for NFSv4 becomes a major > bottleneck for a single client (re-export server). > > We also used your "VFS: support parallel updates in the one directory" > patches for similar reasons up until I couldn't port it to newer > kernels anymore (my kernel code munging skills are not sufficient!). Yeah - I really should get back to that. Al and Linus suggested some changes and I just never got around to making them. Thanks, NeilBrown > > Sorry to spam the thread if I am misinterpreting what this patch set > is all about. > > Daire > > > On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote: > > > > This patch set aims to allocate session-based DRC slots on demand, and > > free them when not in use, or when memory is tight. > > > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > > overly agreesive, and with lots of printks, and it seems to do the right > > thing, though memory pressure has never freed anything - I think you > > need several clients with a non-trivial number of slots allocated before > > the thresholds in the shrinker code will trigger any freeing. > > > > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > > useful that is. There are certainly cases where simply setting the > > target in a SEQUENCE reply might not be enough, but I doubt they are > > very common. You would need a session to be completely idle, with the > > last request received on it indicating that lots of slots were still in > > use. > > > > Currently we allocate slots one at a time when the last available slot > > was used by the client, and only if a NOWAIT allocation can succeed. It > > is possible that this isn't quite agreesive enough. When performing a > > lot of writeback it can be useful to have lots of slots, but memory > > pressure is also likely to build up on the server so GFP_NOWAIT is likely > > to fail. Maybe occasionally using a firmer request (outside the > > spinlock) would be justified. > > > > We free slots when the number of unused slots passes some threshold - > > currently 6 (because ... why not). Possible a hysteresis should be > > added so we don't free unused slots for a least N seconds. > > > > When the shrinker wants to apply presure we remove slots equally from > > all sessions. Maybe there should be some proportionality but that would > > be more complex and I'm not sure it would gain much. Slot 0 can never > > be freed of course. > > > > I'm very interested to see what people think of the over-all approach, > > and of the specifics of the code. > > > > Thanks, > > NeilBrown > > > > > > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC > > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand. > > [PATCH 3/4] nfsd: free unused session-DRC slots > > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated > > >
On Thu, 14 Nov 2024, Chuck Lever III wrote: > > > > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote: > > > > This patch set aims to allocate session-based DRC slots on demand, and > > free them when not in use, or when memory is tight. > > > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > > overly agreesive, and with lots of printks, and it seems to do the right > > thing, though memory pressure has never freed anything - I think you > > need several clients with a non-trivial number of slots allocated before > > the thresholds in the shrinker code will trigger any freeing. > > Can you describe your test set-up? Generally a system > with less than 4GB of memory can trigger shrinkers > pretty easily. > > If we never see the mechanism being triggered due to > memory exhaustion, then I wonder if the additional > complexity is adding substantial value. Just a single VM with 1G RAM. Only one client so only one session. The default batch count for shrinkers is 64 and the reported count of freeable items is normally scaled down a lot until memory gets really tight. So if I only have 6 slots that could be freed the shrinker isn't going to notice. I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing things. This allowed me to see some bugs. One that I haven't resolved yet is the need to wait to get confirmation from the client before rejecting requests with larger numbered slots. > > > > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > > useful that is. There are certainly cases where simply setting the > > target in a SEQUENCE reply might not be enough, but I doubt they are > > very common. You would need a session to be completely idle, with the > > last request received on it indicating that lots of slots were still in > > use. > > > > Currently we allocate slots one at a time when the last available slot > > was used by the client, and only if a NOWAIT allocation can succeed. It > > is possible that this isn't quite agreesive enough. When performing a > > lot of writeback it can be useful to have lots of slots, but memory > > pressure is also likely to build up on the server so GFP_NOWAIT is likely > > to fail. Maybe occasionally using a firmer request (outside the > > spinlock) would be justified. > > I'm wondering why GFP_NOWAIT is used here, and I admit > I'm not strongly familiar with the code or mechanism. > Why not always use GFP_KERNEL ? Partly because the kmalloc call is under a spinlock, so we cannot wait. But that could be changed with a bit of work. GFP_KERNEL can block indefinitely, and we don't actually need the allocation to succeed to satisfy the current request, so it seems wrong to block at all when we don't need to. I'm hoping that GFP_NOWAIT will succeed often enough that the slot table will grow when there is demand - maybe not instantly but not too slowly. If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the shrinker will probably ask us to return some slots soon - maybe it isn't worth trying hard to allocate something we will have to return soon. > > > > We free slots when the number of unused slots passes some threshold - > > currently 6 (because ... why not). Possible a hysteresis should be > > added so we don't free unused slots for a least N seconds. > > Generally freeing unused resources is un-Linux like. :-) > Can you provide a rationale for why this is needed? Uhm... No. I added it so that patch which adds slot retirement could do something useful before the shrinker was added, and when I added the shrinker I couldn't bring myself to remove it. Probably I should. Thanks for your thoughtful review. NeilBrown > > > > When the shrinker wants to apply presure we remove slots equally from > > all sessions. Maybe there should be some proportionality but that would > > be more complex and I'm not sure it would gain much. Slot 0 can never > > be freed of course. > > > > I'm very interested to see what people think of the over-all approach, > > and of the specifics of the code. > > > > Thanks, > > NeilBrown > > > > > > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC > > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand. > > [PATCH 3/4] nfsd: free unused session-DRC slots > > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated > > > > -- > Chuck Lever > > >
On Fri, Nov 15, 2024 at 03:56:08PM +1100, NeilBrown wrote: > On Wed, 13 Nov 2024, Daire Byrne wrote: > > Neil, > > > > I'm curious if this work relates to: > > > > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375 > > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com > > Yes, it could possibly help with that - though more work would be > needed. > nfsd currently has a hard limit of 160 slots per session. That wouldn't > be enough I suspect. The Linux client has a hard limit of 1024. That > might be enough. > Allowed nfsd to have 1024 (or more) shouldn't be too hard... 1024 could be in the range where having a shrinker (when there are multiple clients with that many slots) will start to bring some value. Maybe 1024 is too large at the beginning of a session, but enabling NFSD to grow a session to that size would not be a bad thing.
On Fri, Nov 15, 2024 at 08:03:00PM +1100, NeilBrown wrote: > On Thu, 14 Nov 2024, Chuck Lever III wrote: > > > > > > > On Nov 13, 2024, at 12:38 AM, NeilBrown <neilb@suse.de> wrote: > > > > > > This patch set aims to allocate session-based DRC slots on demand, and > > > free them when not in use, or when memory is tight. > > > > > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > > > overly agreesive, and with lots of printks, and it seems to do the right > > > thing, though memory pressure has never freed anything - I think you > > > need several clients with a non-trivial number of slots allocated before > > > the thresholds in the shrinker code will trigger any freeing. > > > > Can you describe your test set-up? Generally a system > > with less than 4GB of memory can trigger shrinkers > > pretty easily. > > > > If we never see the mechanism being triggered due to > > memory exhaustion, then I wonder if the additional > > complexity is adding substantial value. > > Just a single VM with 1G RAM. Only one client so only one session. That's a good RAM size for this test, but I think you do need to have more than a single client to stress the server a little more. > The default batch count for shrinkers is 64 and the reported count of > freeable items is normally scaled down a lot until memory gets really > tight. So if I only have 6 slots that could be freed the shrinker isn't > going to notice. > I set ->batch to 2 and ->seeks to 0 and the shrinker started freeing > things. This allowed me to see some bugs. > > One that I haven't resolved yet is the need to wait to get confirmation > from the client before rejecting requests with larger numbered slots. > > > > > > > > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > > > useful that is. There are certainly cases where simply setting the > > > target in a SEQUENCE reply might not be enough, but I doubt they are > > > very common. You would need a session to be completely idle, with the > > > last request received on it indicating that lots of slots were still in > > > use. > > > > > > Currently we allocate slots one at a time when the last available slot > > > was used by the client, and only if a NOWAIT allocation can succeed. It > > > is possible that this isn't quite agreesive enough. When performing a > > > lot of writeback it can be useful to have lots of slots, but memory > > > pressure is also likely to build up on the server so GFP_NOWAIT is likely > > > to fail. Maybe occasionally using a firmer request (outside the > > > spinlock) would be justified. > > > > I'm wondering why GFP_NOWAIT is used here, and I admit > > I'm not strongly familiar with the code or mechanism. > > Why not always use GFP_KERNEL ? > > Partly because the kmalloc call is under a spinlock, so we cannot wait. > But that could be changed with a bit of work. > > GFP_KERNEL can block indefinitely, and we don't actually need the > allocation to succeed to satisfy the current request, so it seems wrong > to block at all when we don't need to. Ah, that is sensible. A comment might be added that explains that the server response to the client's "request" to increase the slot table size does not have to be immediate, and in fact, the failure is a sign the server is under memory pressure and shouldn't grow the slot table size anyway. > I'm hoping that GFP_NOWAIT will succeed often enough that the slot table > will grow when there is demand - maybe not instantly but not too slowly. I don't think it will ever fail if the requested allocation is below 4KB, so this might be no more than a fussy detail. It does indeed make sense to me, though. > If GFP_NOWAIT doesn't succeed, then reclaim will be happening and the > shrinker will probably ask us to return some slots soon - maybe it isn't > worth trying hard to allocate something we will have to return soon.
On Fri, 15 Nov 2024 at 04:56, NeilBrown <neilb@suse.de> wrote: > > On Wed, 13 Nov 2024, Daire Byrne wrote: > > Neil, > > > > I'm curious if this work relates to: > > > > https://bugzilla.linux-nfs.org/show_bug.cgi?id=375 > > https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com > > Yes, it could possibly help with that - though more work would be > needed. > nfsd currently has a hard limit of 160 slots per session. That wouldn't > be enough I suspect. The Linux client has a hard limit of 1024. That > might be enough. > Allowed nfsd to have 1024 (or more) shouldn't be too hard... That would be cool - I'd love to be able to switch out NFSv3 for NFSv4 for our use cases. Although saying that, any changes to nfsd to support that would likely take many years to make it into the RHEL based storage servers we currently use. Our re-export case is pretty unique and niche in the sense that a single client is essentially trying to do the work of many clients. > > We also used your "VFS: support parallel updates in the one directory" > > patches for similar reasons up until I couldn't port it to newer > > kernels anymore (my kernel code munging skills are not sufficient!). > > Yeah - I really should get back to that. Al and Linus suggested some > changes and I just never got around to making them. That would also be awesome. Again, our specific niche use case (many clients writing to the same directory via a re-export) is probably the main beneficiary, but maybe it helps with other (more common) workloads too. Cheers, Daire > Thanks, > NeilBrown > > > > > > Sorry to spam the thread if I am misinterpreting what this patch set > > is all about. > > > > Daire > > > > > > On Wed, 13 Nov 2024 at 05:54, NeilBrown <neilb@suse.de> wrote: > > > > > > This patch set aims to allocate session-based DRC slots on demand, and > > > free them when not in use, or when memory is tight. > > > > > > I've tested with NFSD_MAX_UNUSED_SLOTS set to 1 so that freeing is > > > overly agreesive, and with lots of printks, and it seems to do the right > > > thing, though memory pressure has never freed anything - I think you > > > need several clients with a non-trivial number of slots allocated before > > > the thresholds in the shrinker code will trigger any freeing. > > > > > > I haven't made use of the CB_RECALL_SLOT callback. I'm not sure how > > > useful that is. There are certainly cases where simply setting the > > > target in a SEQUENCE reply might not be enough, but I doubt they are > > > very common. You would need a session to be completely idle, with the > > > last request received on it indicating that lots of slots were still in > > > use. > > > > > > Currently we allocate slots one at a time when the last available slot > > > was used by the client, and only if a NOWAIT allocation can succeed. It > > > is possible that this isn't quite agreesive enough. When performing a > > > lot of writeback it can be useful to have lots of slots, but memory > > > pressure is also likely to build up on the server so GFP_NOWAIT is likely > > > to fail. Maybe occasionally using a firmer request (outside the > > > spinlock) would be justified. > > > > > > We free slots when the number of unused slots passes some threshold - > > > currently 6 (because ... why not). Possible a hysteresis should be > > > added so we don't free unused slots for a least N seconds. > > > > > > When the shrinker wants to apply presure we remove slots equally from > > > all sessions. Maybe there should be some proportionality but that would > > > be more complex and I'm not sure it would gain much. Slot 0 can never > > > be freed of course. > > > > > > I'm very interested to see what people think of the over-all approach, > > > and of the specifics of the code. > > > > > > Thanks, > > > NeilBrown > > > > > > > > > [PATCH 1/4] nfsd: remove artificial limits on the session-based DRC > > > [PATCH 2/4] nfsd: allocate new session-based DRC slots on demand. > > > [PATCH 3/4] nfsd: free unused session-DRC slots > > > [PATCH 4/4] nfsd: add shrinker to reduce number of slots allocated > > > > > >