mbox series

[v7,0/3] fuse: add kernel-enforced request timeout option

Message ID 20241007184258.2837492-1-joannelkoong@gmail.com (mailing list archive)
Headers show
Series fuse: add kernel-enforced request timeout option | expand

Message

Joanne Koong Oct. 7, 2024, 6:42 p.m. UTC
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This patchset adds a timeout option where if the server does not reply to a
request by the time the timeout elapses, the connection will be aborted.
This patchset also adds two dynamically configurable fuse sysctls
"default_request_timeout" and "max_request_timeout" for controlling/enforcing
timeout behavior system-wide.

Existing systems running fuse servers will not be affected unless they
explicitly opt into the timeout.

v6:
https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
Changes from v6 -> v7:
- Make timer per-connection instead of per-request (Miklos)
- Make default granularity of time minutes instead of seconds
- Removed the reviewed-bys since the interface of this has changed (now
  minutes, instead of seconds)

v5:
https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
Changes from v5 -> v6:
- Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
- Reword/clarify last sentence in cover letter (Miklos)

v4:
https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
Changes from v4 -> v5:
- Change timeout behavior from aborting request to aborting connection
  (Miklos)
- Clarify wording for sysctl documentation (Jingbo)

v3:
https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
- Fix wording on some comments to make it more clear
- Use simpler logic for timer (eg remove extra if checks, use mod timer API)
  (Josef)
- Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
- Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)

v2:
https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
- Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
- Disarm timer in error handling for fatal interrupt (Yafang)
- Clean up do_fuse_request_end (Jingbo)
- Add timer for notify retrieve requests 
- Fix kernel test robot errors for #define no-op functions

v1:
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1 -> v2:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls

Joanne Koong (3):
  fs_parser: add fsparam_u16 helper
  fuse: add optional kernel-enforced timeout for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++
 fs/fs_parser.c                          | 14 ++++++
 fs/fuse/dev.c                           | 63 ++++++++++++++++++++++++-
 fs/fuse/fuse_i.h                        | 55 +++++++++++++++++++++
 fs/fuse/inode.c                         | 34 +++++++++++++
 fs/fuse/sysctl.c                        | 20 ++++++++
 include/linux/fs_parser.h               |  9 ++--
 7 files changed, 218 insertions(+), 4 deletions(-)

Comments

Joanne Koong Oct. 8, 2024, 8:58 p.m. UTC | #1
On Mon, Oct 7, 2024 at 11:43 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
>
> This patchset adds a timeout option where if the server does not reply to a
> request by the time the timeout elapses, the connection will be aborted.
> This patchset also adds two dynamically configurable fuse sysctls
> "default_request_timeout" and "max_request_timeout" for controlling/enforcing
> timeout behavior system-wide.
>
> Existing systems running fuse servers will not be affected unless they
> explicitly opt into the timeout.
>
> v6:
> https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
> Changes from v6 -> v7:
> - Make timer per-connection instead of per-request (Miklos)
> - Make default granularity of time minutes instead of seconds
> - Removed the reviewed-bys since the interface of this has changed (now
>   minutes, instead of seconds)
>
> v5:
> https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
> Changes from v5 -> v6:
> - Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
> - Reword/clarify last sentence in cover letter (Miklos)
>
> v4:
> https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
> Changes from v4 -> v5:
> - Change timeout behavior from aborting request to aborting connection
>   (Miklos)
> - Clarify wording for sysctl documentation (Jingbo)
>
> v3:
> https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
> Changes from v3 -> v4:
> - Fix wording on some comments to make it more clear
> - Use simpler logic for timer (eg remove extra if checks, use mod timer API)
>   (Josef)
> - Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
> - Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)
>
> v2:
> https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
> Changes from v2 -> v3:
> - Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
> - Disarm timer in error handling for fatal interrupt (Yafang)
> - Clean up do_fuse_request_end (Jingbo)
> - Add timer for notify retrieve requests
> - Fix kernel test robot errors for #define no-op functions
>
> v1:
> https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> Changes from v1 -> v2:
> - Add timeout for background requests
> - Handle resend race condition
> - Add sysctls
>
> Joanne Koong (3):
>   fs_parser: add fsparam_u16 helper
>   fuse: add optional kernel-enforced timeout for requests
>   fuse: add default_request_timeout and max_request_timeout sysctls
>
>  Documentation/admin-guide/sysctl/fs.rst | 27 +++++++++++
>  fs/fs_parser.c                          | 14 ++++++
>  fs/fuse/dev.c                           | 63 ++++++++++++++++++++++++-
>  fs/fuse/fuse_i.h                        | 55 +++++++++++++++++++++
>  fs/fuse/inode.c                         | 34 +++++++++++++
>  fs/fuse/sysctl.c                        | 20 ++++++++
>  include/linux/fs_parser.h               |  9 ++--
>  7 files changed, 218 insertions(+), 4 deletions(-)
>
> --
> 2.43.5
>

These are the benchmark numbers I am seeing on my machine:

--- Machine info ---
Architecture:             x86_64
CPU(s):                   36
  On-line CPU(s) list:    0-35
Model name:             Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz
    BIOS Model name:      Intel(R) Xeon(R) D-2191A CPU @ 1.60GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   18
    Socket(s):            1
    Stepping:             4
    Frequency boost:      disabled
    CPU(s) scaling MHz:   100%
    CPU max MHz:          1601.0000
    CPU min MHz:          800.0000


--- Setting up the testing environment ---

sudo mount -t tmpfs -o size=10G tmpfs ~/tmp_mount

Mount libfuse server for tests:
for test a) Non-passthrough writes -
./libfuse/build/example/passthrough_ll -o max_threads=4  -o
source=/root/tmp_mount /root/fuse_mount
for test b) Passthrough writes -
./libfuse/build/example/passthrough_hp --num-threads=4
/root/tmp_mount.  /root/fuse_mount

Test using fio:
fio --name=seqwrite --ioengine=sync --rw=write --bs=1k --size=1G
--numjobs=4 --fallocate=none --ramp_time=30 --group_reporting=1
--directory=/root/fuse_mount

Enable timeouts by running 'echo 500  | sudo tee
/proc/sys/fs/fuse/default_request_timeout' before mounting fuse server
Disable timeouts by running 'echo 0  | sudo tee
/proc/sys/fs/fuse/default_request_timeout' before mounting fuse server

Discarded outliers


--- Tests ---
a) Non-passthrough sequential writes
 ./libfuse/build/example/passthrough_ll -o max_threads=4  -o
source=/root/tmp_mount /root/fuse_mount

--- Baseline (no timeouts) ---
Ran this on origin/for-next
Saw around ~273 MiB/s
WRITE: bw=277MiB/s (291MB/s), 277MiB/s-277MiB/s (291MB/s-291MB/s),
io=4096MiB (4295MB), run=14761-14761msec
WRITE: bw=271MiB/s (285MB/s), 271MiB/s-271MiB/s (285MB/s-285MB/s),
io=4096MiB (4295MB), run=15091-15091msec
WRITE: bw=274MiB/s (287MB/s), 274MiB/s-274MiB/s (287MB/s-287MB/s),
io=4096MiB (4295MB), run=14949-14949msec
WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s),
io=4096MiB (4295MB), run=14801-14801msec
WRITE: bw=274MiB/s (288MB/s), 274MiB/s-274MiB/s (288MB/s-288MB/s),
io=4096MiB (4295MB), run=14939-14939msec
WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s),
io=4096MiB (4295MB), run=15060-15060msec
WRITE: bw=269MiB/s (282MB/s), 269MiB/s-269MiB/s (282MB/s-282MB/s),
io=4096MiB (4295MB), run=15254-15254msec
WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s),
io=4096MiB (4295MB), run=15055-15055msec
WRITE: bw=275MiB/s (288MB/s), 275MiB/s-275MiB/s (288MB/s-288MB/s),
io=4096MiB (4295MB), run=14893-14893msec
WRITE: bw=270MiB/s (283MB/s), 270MiB/s-270MiB/s (283MB/s-283MB/s),
io=4096MiB (4295MB), run=15176-15176msec

--- Request timeouts with periodic timer (approach from this patchset) ---
Saw around ~271MiB/s
WRITE: bw=265MiB/s (278MB/s), 265MiB/s-265MiB/s (278MB/s-278MB/s),
io=4096MiB (4295MB), run=15454-15454msec
WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s),
io=4096MiB (4295MB), run=15262-15262msec
WRITE: bw=271MiB/s (284MB/s), 271MiB/s-271MiB/s (284MB/s-284MB/s),
io=4096MiB (4295MB), run=15113-15113msec
WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s),
io=4096MiB (4295MB), run=15301-15301msec
WRITE: bw=274MiB/s (287MB/s), 274MiB/s-274MiB/s (287MB/s-287MB/s),
io=4096MiB (4295MB), run=14965-14965msec
WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s),
io=4096MiB (4295MB), run=15277-15277msec
WRITE: bw=276MiB/s (290MB/s), 276MiB/s-276MiB/s (290MB/s-290MB/s),
io=4096MiB (4295MB), run=14828-14828msec
WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s),
io=4096MiB (4295MB), run=15069-15069msec
WRITE: bw=273MiB/s (287MB/s), 273MiB/s-273MiB/s (287MB/s-287MB/s),
io=4096MiB (4295MB), run=14987-14987msec
WRITE: bw=279MiB/s (293MB/s), 279MiB/s-279MiB/s (293MB/s-293MB/s),
io=4096MiB (4295MB), run=14662-14662msec
WRITE: bw=272MiB/s (285MB/s), 272MiB/s-272MiB/s (285MB/s-285MB/s),
io=4096MiB (4295MB), run=15071-15071msec

--- Request timeouts with one timer per request (approach from v6 [1]) ---
Saw around ~263MiB/s
WRITE: bw=262MiB/s (275MB/s), 262MiB/s-262MiB/s (275MB/s-275MB/s),
io=4096MiB (4295MB), run=15620-15620msec
WRITE: bw=262MiB/s (275MB/s), 262MiB/s-262MiB/s (275MB/s-275MB/s),
io=4096MiB (4295MB), run=15614-15614msec
WRITE: bw=256MiB/s (269MB/s), 256MiB/s-256MiB/s (269MB/s-269MB/s),
io=4096MiB (4295MB), run=15995-15995msec
WRITE: bw=264MiB/s (277MB/s), 264MiB/s-264MiB/s (277MB/s-277MB/s),
io=4096MiB (4295MB), run=15504-15504msec
WRITE: bw=260MiB/s (273MB/s), 260MiB/s-260MiB/s (273MB/s-273MB/s),
io=4096MiB (4295MB), run=15749-15749msec
WRITE: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s),
io=4096MiB (4295MB), run=15354-15354msec
WRITE: bw=266MiB/s (279MB/s), 266MiB/s-266MiB/s (279MB/s-279MB/s),
io=4096MiB (4295MB), run=15409-15409msec
WRITE: bw=265MiB/s (277MB/s), 265MiB/s-265MiB/s (277MB/s-277MB/s),
io=4096MiB (4295MB), run=15480-15480msec
WRITE: bw=268MiB/s (281MB/s), 268MiB/s-268MiB/s (281MB/s-281MB/s),
io=4096MiB (4295MB), run=15283-15283msec
WRITE: bw=267MiB/s (280MB/s), 267MiB/s-267MiB/s (280MB/s-280MB/s),
io=4096MiB (4295MB), run=15332-15332msec


b) Passthrough sequential writes
./libfuse/build/example/passthrough_hp --num-threads=4
/root/tmp_mount.  /root/fuse_mount

--- Baseline (no timeouts) ---
Ran this on origin/for-next
Saw around ~245 MiB/s
WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s),
io=4096MiB (4295MB), run=16676-16676msec
WRITE: bw=248MiB/s (260MB/s), 248MiB/s-248MiB/s (260MB/s-260MB/s),
io=4096MiB (4295MB), run=16508-16508msec
WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s),
io=4096MiB (4295MB), run=16636-16636msec
WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s),
io=4096MiB (4295MB), run=16654-16654msec
WRITE: bw=242MiB/s (253MB/s), 242MiB/s-242MiB/s (253MB/s-253MB/s),
io=4096MiB (4295MB), run=16957-16957msec
WRITE: bw=249MiB/s (261MB/s), 249MiB/s-249MiB/s (261MB/s-261MB/s),
io=4096MiB (4295MB), run=16449-16449msec
WRITE: bw=245MiB/s (257MB/s), 245MiB/s-245MiB/s (257MB/s-257MB/s),
io=4096MiB (4295MB), run=16699-16699msc
WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s),
io=4096MiB (4295MB), run=16981-16981msec
WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s),
io=4096MiB (4295MB), run=16792-16792msec
WRITE: bw=246MiB/s (258MB/s), 246MiB/s-246MiB/s (258MB/s-258MB/s),
io=4096MiB (4295MB), run=16665-16665msec

--- Request timeouts with periodic timer (approach from this patchset) ---
Saw around ~237 MiB/s
WRITE: bw=237MiB/s (248MB/s), 237MiB/s-237MiB/s (248MB/s-248MB/s),
io=4096MiB (4295MB), run=17295-17295msec
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
io=4096MiB (4295MB), run=17357-17357msec
WRITE: bw=240MiB/s (251MB/s), 240MiB/s-240MiB/s (251MB/s-251MB/s),
io=4096MiB (4295MB), run=17096-17096msec
WRITE: bw=238MiB/s (249MB/s), 238MiB/s-238MiB/s (249MB/s-249MB/s),
io=4096MiB (4295MB), run=17245-17245msec
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
io=4096MiB (4295MB), run=17365-17365msec
WRITE: bw=235MiB/s (246MB/s), 235MiB/s-235MiB/s (246MB/s-246MB/s),
io=4096MiB (4295MB), run=17466-17466msec
WRITE: bw=235MiB/s (246MB/s), 235MiB/s-235MiB/s (246MB/s-246MB/s),
io=4096MiB (4295MB), run=17444-17444msec
WRITE: bw=241MiB/s (253MB/s), 241MiB/s-241MiB/s (253MB/s-253MB/s),
io=4096MiB (4295MB), run=17003-17003msec
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
io=4096MiB (4295MB), run=17361-17361msec
WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s),
io=4096MiB (4295MB), run=16777-16777msec

--- Request timeouts with one timer per request (approach from v6 [1]) ---
Saw around ~232 MiB/s
WRITE: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s),
io=4096MiB (4295MB), run=17816-17816msec
WRITE: bw=233MiB/s (244MB/s), 233MiB/s-233MiB/s (244MB/s-244MB/s),
io=4096MiB (4295MB), run=17613-17613msec
WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s),
io=4096MiB (4295MB), run=17716-17716msec
WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s),
io=4096MiB (4295MB), run=17728-17728msec
WRITE: bw=233MiB/s (244MB/s), 233MiB/s-233MiB/s (244MB/s-244MB/s),
io=4096MiB (4295MB), run=17578-17578msec
WRITE: bw=232MiB/s (243MB/s), 232MiB/s-232MiB/s (243MB/s-243MB/s),
io=4096MiB (4295MB), run=17676-17676msec
WRITE: bw=231MiB/s (242MB/s), 231MiB/s-231MiB/s (242MB/s-242MB/s),
io=4096MiB (4295MB), run=17761-17761msec
WRITE: bw=234MiB/s (245MB/s), 234MiB/s-234MiB/s (245MB/s-245MB/s),
io=4096MiB (4295MB), run=17529-17529msec
WRITE: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s),
io=4096MiB (4295MB), run=17823-17823msec
WRITE: bw=235MiB/s (247MB/s), 235MiB/s-235MiB/s (247MB/s-247MB/s),
io=4096MiB (4295MB), run=17393-17393msec


Overall
-  request timeouts with a periodic timer performs better than the
approach in v6 of attaching one timer to each request.
- I didn't see a significant difference in performance with enabling
timers when running non-passthrough fuse server, but did see about a
3% drop on passthrough servers


Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/