Message ID | 165852076926.11403.44005570813790008.stgit@manet.1015granger.net (mailing list archive) |
---|---|
Headers | show |
Series | Put struct nfsd4_copy on a diet | expand |
Chuck, Are there pre-reqs for this series? I had tried to apply the patches on top of 5-19-rc6 but I get the following compile error: fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of ‘nfsd4_interssc_connect’ from incompatible pointer type [-Werror=incompatible-pointer-types] status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); ^~~~~~~~~~~~~ fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but argument is of type ‘struct nl4_server **’ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, ~~~~~~~~~~~~~~~~~~~^~~ cc1: some warnings being treated as errors make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 make: *** [Makefile:1843: fs] Error 2 On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > While testing NFSD for-next, I noticed svc_generic_init_request() > was an unexpected hot spot on NFSv4 workloads. Drilling into the > perf report, it shows that the hot path in there is: > > 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); > 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); > > For an NFSv4 COMPOUND, > > procp->pc_argsize = sizeof(nfsd4_compoundargs), > > struct nfsd4_compoundargs on my system is more than 17KB! This is > due to the size of the iops field: > > struct nfsd4_op iops[8]; > > Each struct nfsd4_op contains a union of the arguments for each > NFSv4 operation. Each argument is typically less than 128 bytes > except that struct nfsd4_copy and struct nfsd4_copy_notify are both > larger than 2KB each. > > I'm not yet totally convinced this series never orphans memory, but > it does reduce the size of nfsd4_compoundargs to just over 4KB. This > is still due to struct nfsd4_copy being almost 500 bytes. I don't > see more low-hanging fruit there, though. > > --- > > Chuck Lever (11): > NFSD: Shrink size of struct nfsd4_copy_notify > NFSD: Shrink size of struct nfsd4_copy > NFSD: Reorder the fields in struct nfsd4_op > NFSD: Make nfs4_put_copy() static > NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags > NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) > NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) > NFSD: Refactor nfsd4_do_copy() > NFSD: Remove kmalloc from nfsd4_do_async_copy() > NFSD: Add nfsd4_send_cb_offload() > NFSD: Move copy offload callback arguments into a separate structure > > > fs/nfsd/nfs4callback.c | 37 +++++---- > fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- > fs/nfsd/nfs4xdr.c | 30 +++++--- > fs/nfsd/state.h | 1 - > fs/nfsd/xdr4.h | 54 ++++++++++---- > 5 files changed, 163 insertions(+), 124 deletions(-) > > -- > Chuck Lever >
Hi Chuck, To make it compile I did: diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 7196bcafdd86..f6deffc921d0 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out; - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); if (status) goto out; But when I tried to run the nfstest_ssc. The first test (intra01) made the server oops: [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 48 29 [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) knlGS:0000000000000000 [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 [ 9569.577586] Call Trace: [ 9569.578220] <TASK> [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] [ 9569.586169] nfsd+0xd5/0x190 [nfsd] [ 9569.587170] kthread+0xe8/0x110 [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 [ 9569.588934] ret_from_fork+0x22/0x30 [ 9569.589759] </TASK> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3 drm libata [ 9569.610612] CR2: 0000000000000000 [ 9569.611375] ---[ end trace 0000000000000000 ]--- [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 48 29 [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) knlGS:0000000000000000 [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 [ 9569.632043] Kernel panic - not syncing: Fatal exception On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: > > Chuck, > > Are there pre-reqs for this series? I had tried to apply the patches > on top of 5-19-rc6 but I get the following compile error: > > fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: > fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of > ‘nfsd4_interssc_connect’ from incompatible pointer type > [-Werror=incompatible-pointer-types] > status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > ^~~~~~~~~~~~~ > fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but > argument is of type ‘struct nl4_server **’ > nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, > ~~~~~~~~~~~~~~~~~~~^~~ > cc1: some warnings being treated as errors > make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 > make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 > make: *** [Makefile:1843: fs] Error 2 > > On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > While testing NFSD for-next, I noticed svc_generic_init_request() > > was an unexpected hot spot on NFSv4 workloads. Drilling into the > > perf report, it shows that the hot path in there is: > > > > 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); > > 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); > > > > For an NFSv4 COMPOUND, > > > > procp->pc_argsize = sizeof(nfsd4_compoundargs), > > > > struct nfsd4_compoundargs on my system is more than 17KB! This is > > due to the size of the iops field: > > > > struct nfsd4_op iops[8]; > > > > Each struct nfsd4_op contains a union of the arguments for each > > NFSv4 operation. Each argument is typically less than 128 bytes > > except that struct nfsd4_copy and struct nfsd4_copy_notify are both > > larger than 2KB each. > > > > I'm not yet totally convinced this series never orphans memory, but > > it does reduce the size of nfsd4_compoundargs to just over 4KB. This > > is still due to struct nfsd4_copy being almost 500 bytes. I don't > > see more low-hanging fruit there, though. > > > > --- > > > > Chuck Lever (11): > > NFSD: Shrink size of struct nfsd4_copy_notify > > NFSD: Shrink size of struct nfsd4_copy > > NFSD: Reorder the fields in struct nfsd4_op > > NFSD: Make nfs4_put_copy() static > > NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags > > NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) > > NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) > > NFSD: Refactor nfsd4_do_copy() > > NFSD: Remove kmalloc from nfsd4_do_async_copy() > > NFSD: Add nfsd4_send_cb_offload() > > NFSD: Move copy offload callback arguments into a separate structure > > > > > > fs/nfsd/nfs4callback.c | 37 +++++---- > > fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- > > fs/nfsd/nfs4xdr.c | 30 +++++--- > > fs/nfsd/state.h | 1 - > > fs/nfsd/xdr4.h | 54 ++++++++++---- > > 5 files changed, 163 insertions(+), 124 deletions(-) > > > > -- > > Chuck Lever > >
Hi Olga, I got the same problem. Can you try this patch: diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 21830cc1ed0a..18dd708ff846 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1921,14 +1921,15 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) if (xdr_stream_decode_u32(argp->xdr, &count) < 0) return nfserr_bad_xdr; - if (count == 0) { /* intra-server copy */ - __set_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); - return nfs_ok; - } copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); if (copy->cp_src == NULL) - return nfserrno(-ENOMEM); /* XXX: jukebox? */ + return nfserrno(-ENOMEM); + if (count == 0) { /* intra-server copy */ + __set_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); + return nfs_ok; + } else + __clear_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); /* decode all the supplied server addresses but use only the first */ status = nfsd4_decode_nl4_server(argp, copy->cp_src); -Dai On 7/27/22 9:18 AM, Olga Kornievskaia wrote: > Hi Chuck, > > To make it compile I did: > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 7196bcafdd86..f6deffc921d0 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, > if (status) > goto out; > > - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); > if (status) > goto out; > > But when I tried to run the nfstest_ssc. The first test (intra01) made > the server oops: > > [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 > [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > 48 29 > [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > knlGS:0000000000000000 > [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > [ 9569.577586] Call Trace: > [ 9569.578220] <TASK> > [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] > [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] > [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] > [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] > [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] > [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] > [ 9569.586169] nfsd+0xd5/0x190 [nfsd] > [ 9569.587170] kthread+0xe8/0x110 > [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 > [ 9569.588934] ret_from_fork+0x22/0x30 > [ 9569.589759] </TASK> > [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event > intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul > vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm > btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus > videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm > videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi > snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc > ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel > ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw > crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect > sysimgblt fb_sys_fops vmxnet3 drm libata > [ 9569.610612] CR2: 0000000000000000 > [ 9569.611375] ---[ end trace 0000000000000000 ]--- > [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > 48 29 > [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > knlGS:0000000000000000 > [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > [ 9569.632043] Kernel panic - not syncing: Fatal exception > > > > On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: >> Chuck, >> >> Are there pre-reqs for this series? I had tried to apply the patches >> on top of 5-19-rc6 but I get the following compile error: >> >> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: >> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of >> ‘nfsd4_interssc_connect’ from incompatible pointer type >> [-Werror=incompatible-pointer-types] >> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >> ^~~~~~~~~~~~~ >> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but >> argument is of type ‘struct nl4_server **’ >> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, >> ~~~~~~~~~~~~~~~~~~~^~~ >> cc1: some warnings being treated as errors >> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 >> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 >> make: *** [Makefile:1843: fs] Error 2 >> >> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>> While testing NFSD for-next, I noticed svc_generic_init_request() >>> was an unexpected hot spot on NFSv4 workloads. Drilling into the >>> perf report, it shows that the hot path in there is: >>> >>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); >>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); >>> >>> For an NFSv4 COMPOUND, >>> >>> procp->pc_argsize = sizeof(nfsd4_compoundargs), >>> >>> struct nfsd4_compoundargs on my system is more than 17KB! This is >>> due to the size of the iops field: >>> >>> struct nfsd4_op iops[8]; >>> >>> Each struct nfsd4_op contains a union of the arguments for each >>> NFSv4 operation. Each argument is typically less than 128 bytes >>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both >>> larger than 2KB each. >>> >>> I'm not yet totally convinced this series never orphans memory, but >>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This >>> is still due to struct nfsd4_copy being almost 500 bytes. I don't >>> see more low-hanging fruit there, though. >>> >>> --- >>> >>> Chuck Lever (11): >>> NFSD: Shrink size of struct nfsd4_copy_notify >>> NFSD: Shrink size of struct nfsd4_copy >>> NFSD: Reorder the fields in struct nfsd4_op >>> NFSD: Make nfs4_put_copy() static >>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) >>> NFSD: Refactor nfsd4_do_copy() >>> NFSD: Remove kmalloc from nfsd4_do_async_copy() >>> NFSD: Add nfsd4_send_cb_offload() >>> NFSD: Move copy offload callback arguments into a separate structure >>> >>> >>> fs/nfsd/nfs4callback.c | 37 +++++---- >>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- >>> fs/nfsd/nfs4xdr.c | 30 +++++--- >>> fs/nfsd/state.h | 1 - >>> fs/nfsd/xdr4.h | 54 ++++++++++---- >>> 5 files changed, 163 insertions(+), 124 deletions(-) >>> >>> -- >>> Chuck Lever >>>
> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > Hi Chuck, Sorry for the delay, I was traveling. > To make it compile I did: > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 7196bcafdd86..f6deffc921d0 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, > if (status) > goto out; > > - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); > if (status) > goto out; Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, as I hadn't fully tested it. Sorry for mislabeling it. I will post a v2 of this series with this fixed and with Dai's fix for nfsd4_decode_copy(). Stand by. > But when I tried to run the nfstest_ssc. The first test (intra01) made > the server oops: > > [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 > [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > 48 29 > [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > knlGS:0000000000000000 > [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > [ 9569.577586] Call Trace: > [ 9569.578220] <TASK> > [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] > [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] > [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] > [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] > [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] > [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] > [ 9569.586169] nfsd+0xd5/0x190 [nfsd] > [ 9569.587170] kthread+0xe8/0x110 > [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 > [ 9569.588934] ret_from_fork+0x22/0x30 > [ 9569.589759] </TASK> > [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event > intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul > vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm > btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus > videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm > videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi > snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc > ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel > ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw > crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect > sysimgblt fb_sys_fops vmxnet3 drm libata > [ 9569.610612] CR2: 0000000000000000 > [ 9569.611375] ---[ end trace 0000000000000000 ]--- > [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > 48 29 > [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > knlGS:0000000000000000 > [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > [ 9569.632043] Kernel panic - not syncing: Fatal exception > > > > On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: >> >> Chuck, >> >> Are there pre-reqs for this series? I had tried to apply the patches >> on top of 5-19-rc6 but I get the following compile error: >> >> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: >> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of >> ‘nfsd4_interssc_connect’ from incompatible pointer type >> [-Werror=incompatible-pointer-types] >> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >> ^~~~~~~~~~~~~ >> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but >> argument is of type ‘struct nl4_server **’ >> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, >> ~~~~~~~~~~~~~~~~~~~^~~ >> cc1: some warnings being treated as errors >> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 >> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 >> make: *** [Makefile:1843: fs] Error 2 >> >> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>> While testing NFSD for-next, I noticed svc_generic_init_request() >>> was an unexpected hot spot on NFSv4 workloads. Drilling into the >>> perf report, it shows that the hot path in there is: >>> >>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); >>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); >>> >>> For an NFSv4 COMPOUND, >>> >>> procp->pc_argsize = sizeof(nfsd4_compoundargs), >>> >>> struct nfsd4_compoundargs on my system is more than 17KB! This is >>> due to the size of the iops field: >>> >>> struct nfsd4_op iops[8]; >>> >>> Each struct nfsd4_op contains a union of the arguments for each >>> NFSv4 operation. Each argument is typically less than 128 bytes >>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both >>> larger than 2KB each. >>> >>> I'm not yet totally convinced this series never orphans memory, but >>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This >>> is still due to struct nfsd4_copy being almost 500 bytes. I don't >>> see more low-hanging fruit there, though. >>> >>> --- >>> >>> Chuck Lever (11): >>> NFSD: Shrink size of struct nfsd4_copy_notify >>> NFSD: Shrink size of struct nfsd4_copy >>> NFSD: Reorder the fields in struct nfsd4_op >>> NFSD: Make nfs4_put_copy() static >>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) >>> NFSD: Refactor nfsd4_do_copy() >>> NFSD: Remove kmalloc from nfsd4_do_async_copy() >>> NFSD: Add nfsd4_send_cb_offload() >>> NFSD: Move copy offload callback arguments into a separate structure >>> >>> >>> fs/nfsd/nfs4callback.c | 37 +++++---- >>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- >>> fs/nfsd/nfs4xdr.c | 30 +++++--- >>> fs/nfsd/state.h | 1 - >>> fs/nfsd/xdr4.h | 54 ++++++++++---- >>> 5 files changed, 163 insertions(+), 124 deletions(-) >>> >>> -- >>> Chuck Lever >>> -- Chuck Lever
After applying Dai's patch I got further... I hit the next panic (below)... before that it ran into a failure for "inter01" failed with ECOMM. On hte trace, after the COPY is places the server returns ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just basically something really wrong happened on the server)... After failing a new more tests in the similar fashion.. On cleanup the oops happens. [ 842.455939] list_del corruption. prev->next should be ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510) [ 842.460118] ------------[ cut here ]------------ [ 842.461599] kernel BUG at lib/list_debug.c:53! [ 842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70 [ 842.466656] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 842.470309] Workqueue: nfsd4 laundromat_main [nfsd] [ 842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a [ 842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 89 ee [ 842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 [ 842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 [ 842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff [ 842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff [ 842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 [ 842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 [ 842.494406] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) knlGS:0000000000000000 [ 842.496939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 [ 842.500957] Call Trace: [ 842.501740] <TASK> [ 842.502479] _free_cpntf_state_locked+0x36/0x90 [nfsd] [ 842.504157] laundromat_main+0x59e/0x8b0 [nfsd] [ 842.505594] ? finish_task_switch+0xbd/0x2a0 [ 842.507247] process_one_work+0x1c8/0x390 [ 842.508538] worker_thread+0x30/0x360 [ 842.509670] ? process_one_work+0x390/0x390 [ 842.510957] kthread+0xe8/0x110 [ 842.511938] ? kthread_complete_and_exit+0x20/0x20 [ 842.513422] ret_from_fork+0x22/0x30 [ 842.514533] </TASK> [ 842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371 videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64 vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata [ 842.541753] ---[ end trace 0000000000000000 ]--- [ 842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a [ 842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 89 ee [ 842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 [ 842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 [ 842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff [ 842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff [ 842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 [ 842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 [ 842.564300] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) knlGS:0000000000000000 [ 842.567357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 [ 842.571598] Kernel panic - not syncing: Fatal exception [ 842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]--- On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > Hi Chuck, > > Sorry for the delay, I was traveling. > > > To make it compile I did: > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index 7196bcafdd86..f6deffc921d0 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, > > if (status) > > goto out; > > > > - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > > + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); > > if (status) > > goto out; > > Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, > as I hadn't fully tested it. Sorry for mislabeling it. > > I will post a v2 of this series with this fixed and with Dai's > fix for nfsd4_decode_copy(). Stand by. > > > > But when I tried to run the nfstest_ssc. The first test (intra01) made > > the server oops: > > > > [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 > > [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > > [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > > [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > > 48 29 > > [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > > [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > > [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > > [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > > [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > > [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > > [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > > knlGS:0000000000000000 > > [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > > [ 9569.577586] Call Trace: > > [ 9569.578220] <TASK> > > [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] > > [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] > > [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] > > [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] > > [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] > > [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > > [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] > > [ 9569.586169] nfsd+0xd5/0x190 [nfsd] > > [ 9569.587170] kthread+0xe8/0x110 > > [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 > > [ 9569.588934] ret_from_fork+0x22/0x30 > > [ 9569.589759] </TASK> > > [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > > vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event > > intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul > > vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm > > btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus > > videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm > > videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi > > snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc > > ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel > > ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw > > crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect > > sysimgblt fb_sys_fops vmxnet3 drm libata > > [ 9569.610612] CR2: 0000000000000000 > > [ 9569.611375] ---[ end trace 0000000000000000 ]--- > > [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > > [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > > 48 29 > > [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > > [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > > [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > > [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > > [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > > [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > > [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > > knlGS:0000000000000000 > > [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > > [ 9569.632043] Kernel panic - not syncing: Fatal exception > > > > > > > > On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: > >> > >> Chuck, > >> > >> Are there pre-reqs for this series? I had tried to apply the patches > >> on top of 5-19-rc6 but I get the following compile error: > >> > >> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: > >> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of > >> ‘nfsd4_interssc_connect’ from incompatible pointer type > >> [-Werror=incompatible-pointer-types] > >> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > >> ^~~~~~~~~~~~~ > >> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but > >> argument is of type ‘struct nl4_server **’ > >> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, > >> ~~~~~~~~~~~~~~~~~~~^~~ > >> cc1: some warnings being treated as errors > >> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 > >> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 > >> make: *** [Makefile:1843: fs] Error 2 > >> > >> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: > >>> > >>> While testing NFSD for-next, I noticed svc_generic_init_request() > >>> was an unexpected hot spot on NFSv4 workloads. Drilling into the > >>> perf report, it shows that the hot path in there is: > >>> > >>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); > >>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); > >>> > >>> For an NFSv4 COMPOUND, > >>> > >>> procp->pc_argsize = sizeof(nfsd4_compoundargs), > >>> > >>> struct nfsd4_compoundargs on my system is more than 17KB! This is > >>> due to the size of the iops field: > >>> > >>> struct nfsd4_op iops[8]; > >>> > >>> Each struct nfsd4_op contains a union of the arguments for each > >>> NFSv4 operation. Each argument is typically less than 128 bytes > >>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both > >>> larger than 2KB each. > >>> > >>> I'm not yet totally convinced this series never orphans memory, but > >>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This > >>> is still due to struct nfsd4_copy being almost 500 bytes. I don't > >>> see more low-hanging fruit there, though. > >>> > >>> --- > >>> > >>> Chuck Lever (11): > >>> NFSD: Shrink size of struct nfsd4_copy_notify > >>> NFSD: Shrink size of struct nfsd4_copy > >>> NFSD: Reorder the fields in struct nfsd4_op > >>> NFSD: Make nfs4_put_copy() static > >>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags > >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) > >>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) > >>> NFSD: Refactor nfsd4_do_copy() > >>> NFSD: Remove kmalloc from nfsd4_do_async_copy() > >>> NFSD: Add nfsd4_send_cb_offload() > >>> NFSD: Move copy offload callback arguments into a separate structure > >>> > >>> > >>> fs/nfsd/nfs4callback.c | 37 +++++---- > >>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- > >>> fs/nfsd/nfs4xdr.c | 30 +++++--- > >>> fs/nfsd/state.h | 1 - > >>> fs/nfsd/xdr4.h | 54 ++++++++++---- > >>> 5 files changed, 163 insertions(+), 124 deletions(-) > >>> > >>> -- > >>> Chuck Lever > >>> > > -- > Chuck Lever > > >
> On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > After applying Dai's patch I got further... I hit the next panic > (below)... before that it ran into a failure for "inter01" failed with > ECOMM. On hte trace, after the COPY is places the server returns > ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just > basically something really wrong happened on the server)... After > failing a new more tests in the similar fashion.. On cleanup the oops > happens. What test should I run to reproduce this? > [ 842.455939] list_del corruption. prev->next should be > ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510) > [ 842.460118] ------------[ cut here ]------------ > [ 842.461599] kernel BUG at lib/list_debug.c:53! > [ 842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI > [ 842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70 > [ 842.466656] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > [ 842.470309] Workqueue: nfsd4 laundromat_main [nfsd] > [ 842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > [ 842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > 89 ee > [ 842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > [ 842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > [ 842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > [ 842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > [ 842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > [ 842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > [ 842.494406] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > knlGS:0000000000000000 > [ 842.496939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > [ 842.500957] Call Trace: > [ 842.501740] <TASK> > [ 842.502479] _free_cpntf_state_locked+0x36/0x90 [nfsd] > [ 842.504157] laundromat_main+0x59e/0x8b0 [nfsd] > [ 842.505594] ? finish_task_switch+0xbd/0x2a0 > [ 842.507247] process_one_work+0x1c8/0x390 > [ 842.508538] worker_thread+0x30/0x360 > [ 842.509670] ? process_one_work+0x390/0x390 > [ 842.510957] kthread+0xe8/0x110 > [ 842.511938] ? kthread_complete_and_exit+0x20/0x20 > [ 842.513422] ret_from_fork+0x22/0x30 > [ 842.514533] </TASK> > [ 842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi > snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul > vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl > btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371 > videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq > videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi > ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4 > auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic > nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64 > vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata > [ 842.541753] ---[ end trace 0000000000000000 ]--- > [ 842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > [ 842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > 89 ee > [ 842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > [ 842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > [ 842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > [ 842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > [ 842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > [ 842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > [ 842.564300] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > knlGS:0000000000000000 > [ 842.567357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > [ 842.571598] Kernel panic - not syncing: Fatal exception > [ 842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote: >> >> >> >>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> Hi Chuck, >> >> Sorry for the delay, I was traveling. >> >>> To make it compile I did: >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>> index 7196bcafdd86..f6deffc921d0 100644 >>> --- a/fs/nfsd/nfs4proc.c >>> +++ b/fs/nfsd/nfs4proc.c >>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, >>> if (status) >>> goto out; >>> >>> - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >>> + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); >>> if (status) >>> goto out; >> >> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, >> as I hadn't fully tested it. Sorry for mislabeling it. >> >> I will post a v2 of this series with this fixed and with Dai's >> fix for nfsd4_decode_copy(). Stand by. >> >> >>> But when I tried to run the nfstest_ssc. The first test (intra01) made >>> the server oops: >>> >>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 >>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] >>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 >>> 48 29 >>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 >>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 >>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 >>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 >>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 >>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 >>> [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) >>> knlGS:0000000000000000 >>> [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 >>> [ 9569.577586] Call Trace: >>> [ 9569.578220] <TASK> >>> [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] >>> [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] >>> [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] >>> [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] >>> [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] >>> [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] >>> [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] >>> [ 9569.586169] nfsd+0xd5/0x190 [nfsd] >>> [ 9569.587170] kthread+0xe8/0x110 >>> [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 >>> [ 9569.588934] ret_from_fork+0x22/0x30 >>> [ 9569.589759] </TASK> >>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep >>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event >>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul >>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm >>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus >>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm >>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi >>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc >>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel >>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw >>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect >>> sysimgblt fb_sys_fops vmxnet3 drm libata >>> [ 9569.610612] CR2: 0000000000000000 >>> [ 9569.611375] ---[ end trace 0000000000000000 ]--- >>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] >>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 >>> 48 29 >>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 >>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 >>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 >>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 >>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 >>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 >>> [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) >>> knlGS:0000000000000000 >>> [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 >>> [ 9569.632043] Kernel panic - not syncing: Fatal exception >>> >>> >>> >>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> Chuck, >>>> >>>> Are there pre-reqs for this series? I had tried to apply the patches >>>> on top of 5-19-rc6 but I get the following compile error: >>>> >>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: >>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of >>>> ‘nfsd4_interssc_connect’ from incompatible pointer type >>>> [-Werror=incompatible-pointer-types] >>>> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >>>> ^~~~~~~~~~~~~ >>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but >>>> argument is of type ‘struct nl4_server **’ >>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, >>>> ~~~~~~~~~~~~~~~~~~~^~~ >>>> cc1: some warnings being treated as errors >>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 >>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 >>>> make: *** [Makefile:1843: fs] Error 2 >>>> >>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>> While testing NFSD for-next, I noticed svc_generic_init_request() >>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the >>>>> perf report, it shows that the hot path in there is: >>>>> >>>>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); >>>>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); >>>>> >>>>> For an NFSv4 COMPOUND, >>>>> >>>>> procp->pc_argsize = sizeof(nfsd4_compoundargs), >>>>> >>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is >>>>> due to the size of the iops field: >>>>> >>>>> struct nfsd4_op iops[8]; >>>>> >>>>> Each struct nfsd4_op contains a union of the arguments for each >>>>> NFSv4 operation. Each argument is typically less than 128 bytes >>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both >>>>> larger than 2KB each. >>>>> >>>>> I'm not yet totally convinced this series never orphans memory, but >>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This >>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't >>>>> see more low-hanging fruit there, though. >>>>> >>>>> --- >>>>> >>>>> Chuck Lever (11): >>>>> NFSD: Shrink size of struct nfsd4_copy_notify >>>>> NFSD: Shrink size of struct nfsd4_copy >>>>> NFSD: Reorder the fields in struct nfsd4_op >>>>> NFSD: Make nfs4_put_copy() static >>>>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) >>>>> NFSD: Refactor nfsd4_do_copy() >>>>> NFSD: Remove kmalloc from nfsd4_do_async_copy() >>>>> NFSD: Add nfsd4_send_cb_offload() >>>>> NFSD: Move copy offload callback arguments into a separate structure >>>>> >>>>> >>>>> fs/nfsd/nfs4callback.c | 37 +++++---- >>>>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- >>>>> fs/nfsd/nfs4xdr.c | 30 +++++--- >>>>> fs/nfsd/state.h | 1 - >>>>> fs/nfsd/xdr4.h | 54 ++++++++++---- >>>>> 5 files changed, 163 insertions(+), 124 deletions(-) >>>>> >>>>> -- >>>>> Chuck Lever >>>>> >> >> -- >> Chuck Lever >> >> >> -- Chuck Lever
On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > After applying Dai's patch I got further... I hit the next panic > > (below)... before that it ran into a failure for "inter01" failed with > > ECOMM. On hte trace, after the COPY is places the server returns > > ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just > > basically something really wrong happened on the server)... After > > failing a new more tests in the similar fashion.. On cleanup the oops > > happens. > > What test should I run to reproduce this? I'm running "./nfstest_ssc". It ran thru all with "inter15" being last, then started "cleanup" and that's what panic-ed the server. It's been a while since I tested ssc... so i'll undo all the patched and re-run the tests to make sure that before code worked. > > [ 842.455939] list_del corruption. prev->next should be > > ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510) > > [ 842.460118] ------------[ cut here ]------------ > > [ 842.461599] kernel BUG at lib/list_debug.c:53! > > [ 842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI > > [ 842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70 > > [ 842.466656] Hardware name: VMware, Inc. VMware Virtual > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > > [ 842.470309] Workqueue: nfsd4 laundromat_main [nfsd] > > [ 842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > > [ 842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > > 89 ee > > [ 842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > > [ 842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > > [ 842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > > [ 842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > > [ 842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > > [ 842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > > [ 842.494406] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > > knlGS:0000000000000000 > > [ 842.496939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > > [ 842.500957] Call Trace: > > [ 842.501740] <TASK> > > [ 842.502479] _free_cpntf_state_locked+0x36/0x90 [nfsd] > > [ 842.504157] laundromat_main+0x59e/0x8b0 [nfsd] > > [ 842.505594] ? finish_task_switch+0xbd/0x2a0 > > [ 842.507247] process_one_work+0x1c8/0x390 > > [ 842.508538] worker_thread+0x30/0x360 > > [ 842.509670] ? process_one_work+0x390/0x390 > > [ 842.510957] kthread+0xe8/0x110 > > [ 842.511938] ? kthread_complete_and_exit+0x20/0x20 > > [ 842.513422] ret_from_fork+0x22/0x30 > > [ 842.514533] </TASK> > > [ 842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > > vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi > > snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul > > vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl > > btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371 > > videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq > > videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi > > ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4 > > auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic > > nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64 > > vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea > > sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata > > [ 842.541753] ---[ end trace 0000000000000000 ]--- > > [ 842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > > [ 842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > > 89 ee > > [ 842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > > [ 842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > > [ 842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > > [ 842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > > [ 842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > > [ 842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > > [ 842.564300] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > > knlGS:0000000000000000 > > [ 842.567357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > > [ 842.571598] Kernel panic - not syncing: Fatal exception > > [ 842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000 > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > >> > >> > >> > >>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > >>> > >>> Hi Chuck, > >> > >> Sorry for the delay, I was traveling. > >> > >>> To make it compile I did: > >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > >>> index 7196bcafdd86..f6deffc921d0 100644 > >>> --- a/fs/nfsd/nfs4proc.c > >>> +++ b/fs/nfsd/nfs4proc.c > >>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, > >>> if (status) > >>> goto out; > >>> > >>> - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > >>> + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); > >>> if (status) > >>> goto out; > >> > >> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, > >> as I hadn't fully tested it. Sorry for mislabeling it. > >> > >> I will post a v2 of this series with this fixed and with Dai's > >> fix for nfsd4_decode_copy(). Stand by. > >> > >> > >>> But when I tried to run the nfstest_ssc. The first test (intra01) made > >>> the server oops: > >>> > >>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 > >>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual > >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > >>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > >>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > >>> 48 29 > >>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > >>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > >>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > >>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > >>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > >>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > >>> [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > >>> knlGS:0000000000000000 > >>> [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > >>> [ 9569.577586] Call Trace: > >>> [ 9569.578220] <TASK> > >>> [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] > >>> [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] > >>> [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] > >>> [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] > >>> [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] > >>> [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > >>> [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] > >>> [ 9569.586169] nfsd+0xd5/0x190 [nfsd] > >>> [ 9569.587170] kthread+0xe8/0x110 > >>> [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 > >>> [ 9569.588934] ret_from_fork+0x22/0x30 > >>> [ 9569.589759] </TASK> > >>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > >>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event > >>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul > >>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm > >>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus > >>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm > >>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi > >>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc > >>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel > >>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw > >>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect > >>> sysimgblt fb_sys_fops vmxnet3 drm libata > >>> [ 9569.610612] CR2: 0000000000000000 > >>> [ 9569.611375] ---[ end trace 0000000000000000 ]--- > >>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > >>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > >>> 48 29 > >>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > >>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > >>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > >>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > >>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > >>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > >>> [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > >>> knlGS:0000000000000000 > >>> [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > >>> [ 9569.632043] Kernel panic - not syncing: Fatal exception > >>> > >>> > >>> > >>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: > >>>> > >>>> Chuck, > >>>> > >>>> Are there pre-reqs for this series? I had tried to apply the patches > >>>> on top of 5-19-rc6 but I get the following compile error: > >>>> > >>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: > >>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of > >>>> ‘nfsd4_interssc_connect’ from incompatible pointer type > >>>> [-Werror=incompatible-pointer-types] > >>>> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > >>>> ^~~~~~~~~~~~~ > >>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but > >>>> argument is of type ‘struct nl4_server **’ > >>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, > >>>> ~~~~~~~~~~~~~~~~~~~^~~ > >>>> cc1: some warnings being treated as errors > >>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 > >>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 > >>>> make: *** [Makefile:1843: fs] Error 2 > >>>> > >>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: > >>>>> > >>>>> While testing NFSD for-next, I noticed svc_generic_init_request() > >>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the > >>>>> perf report, it shows that the hot path in there is: > >>>>> > >>>>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); > >>>>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); > >>>>> > >>>>> For an NFSv4 COMPOUND, > >>>>> > >>>>> procp->pc_argsize = sizeof(nfsd4_compoundargs), > >>>>> > >>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is > >>>>> due to the size of the iops field: > >>>>> > >>>>> struct nfsd4_op iops[8]; > >>>>> > >>>>> Each struct nfsd4_op contains a union of the arguments for each > >>>>> NFSv4 operation. Each argument is typically less than 128 bytes > >>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both > >>>>> larger than 2KB each. > >>>>> > >>>>> I'm not yet totally convinced this series never orphans memory, but > >>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This > >>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't > >>>>> see more low-hanging fruit there, though. > >>>>> > >>>>> --- > >>>>> > >>>>> Chuck Lever (11): > >>>>> NFSD: Shrink size of struct nfsd4_copy_notify > >>>>> NFSD: Shrink size of struct nfsd4_copy > >>>>> NFSD: Reorder the fields in struct nfsd4_op > >>>>> NFSD: Make nfs4_put_copy() static > >>>>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags > >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) > >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) > >>>>> NFSD: Refactor nfsd4_do_copy() > >>>>> NFSD: Remove kmalloc from nfsd4_do_async_copy() > >>>>> NFSD: Add nfsd4_send_cb_offload() > >>>>> NFSD: Move copy offload callback arguments into a separate structure > >>>>> > >>>>> > >>>>> fs/nfsd/nfs4callback.c | 37 +++++---- > >>>>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- > >>>>> fs/nfsd/nfs4xdr.c | 30 +++++--- > >>>>> fs/nfsd/state.h | 1 - > >>>>> fs/nfsd/xdr4.h | 54 ++++++++++---- > >>>>> 5 files changed, 163 insertions(+), 124 deletions(-) > >>>>> > >>>>> -- > >>>>> Chuck Lever > >>>>> > >> > >> -- > >> Chuck Lever > >> > >> > >> > > -- > Chuck Lever > > >
On Wed, Jul 27, 2022 at 2:21 PM Olga Kornievskaia <aglo@umich.edu> wrote: > > On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > > > After applying Dai's patch I got further... I hit the next panic > > > (below)... before that it ran into a failure for "inter01" failed with > > > ECOMM. On hte trace, after the COPY is places the server returns > > > ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just > > > basically something really wrong happened on the server)... After > > > failing a new more tests in the similar fashion.. On cleanup the oops > > > happens. > > > > What test should I run to reproduce this? > > I'm running "./nfstest_ssc". It ran thru all with "inter15" being > last, then started "cleanup" and that's what panic-ed the server. > > It's been a while since I tested ssc... so i'll undo all the patched > and re-run the tests to make sure that before code worked. It looks like the code got broken before this patch set. The ESTALE in CB_OFFLOAD leading to ECOM error happens without your patches. And then the kernel panic. I'll do my best to git bisect where the problem occurred first. > > > > [ 842.455939] list_del corruption. prev->next should be > > > ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510) > > > [ 842.460118] ------------[ cut here ]------------ > > > [ 842.461599] kernel BUG at lib/list_debug.c:53! > > > [ 842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI > > > [ 842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70 > > > [ 842.466656] Hardware name: VMware, Inc. VMware Virtual > > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > > > [ 842.470309] Workqueue: nfsd4 laundromat_main [nfsd] > > > [ 842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > > > [ 842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > > > 89 ee > > > [ 842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > > > [ 842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > > > [ 842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > > > [ 842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > > > [ 842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > > > [ 842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > > > [ 842.494406] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > > > knlGS:0000000000000000 > > > [ 842.496939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > > > [ 842.500957] Call Trace: > > > [ 842.501740] <TASK> > > > [ 842.502479] _free_cpntf_state_locked+0x36/0x90 [nfsd] > > > [ 842.504157] laundromat_main+0x59e/0x8b0 [nfsd] > > > [ 842.505594] ? finish_task_switch+0xbd/0x2a0 > > > [ 842.507247] process_one_work+0x1c8/0x390 > > > [ 842.508538] worker_thread+0x30/0x360 > > > [ 842.509670] ? process_one_work+0x390/0x390 > > > [ 842.510957] kthread+0xe8/0x110 > > > [ 842.511938] ? kthread_complete_and_exit+0x20/0x20 > > > [ 842.513422] ret_from_fork+0x22/0x30 > > > [ 842.514533] </TASK> > > > [ 842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > > > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > > > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > > > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > > > vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi > > > snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul > > > vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl > > > btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371 > > > videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq > > > videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi > > > ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4 > > > auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic > > > nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64 > > > vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea > > > sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata > > > [ 842.541753] ---[ end trace 0000000000000000 ]--- > > > [ 842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a > > > [ 842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 > > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd > > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 > > > 89 ee > > > [ 842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 > > > [ 842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 > > > [ 842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff > > > [ 842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff > > > [ 842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 > > > [ 842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 > > > [ 842.564300] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) > > > knlGS:0000000000000000 > > > [ 842.567357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 > > > [ 842.571598] Kernel panic - not syncing: Fatal exception > > > [ 842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000 > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > > >> > > >> > > >> > > >>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > >>> > > >>> Hi Chuck, > > >> > > >> Sorry for the delay, I was traveling. > > >> > > >>> To make it compile I did: > > >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > >>> index 7196bcafdd86..f6deffc921d0 100644 > > >>> --- a/fs/nfsd/nfs4proc.c > > >>> +++ b/fs/nfsd/nfs4proc.c > > >>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, > > >>> if (status) > > >>> goto out; > > >>> > > >>> - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > > >>> + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); > > >>> if (status) > > >>> goto out; > > >> > > >> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, > > >> as I hadn't fully tested it. Sorry for mislabeling it. > > >> > > >> I will post a v2 of this series with this fixed and with Dai's > > >> fix for nfsd4_decode_copy(). Stand by. > > >> > > >> > > >>> But when I tried to run the nfstest_ssc. The first test (intra01) made > > >>> the server oops: > > >>> > > >>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 > > >>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual > > >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > > >>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > > >>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > > >>> 48 29 > > >>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > > >>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > > >>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > > >>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > > >>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > > >>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > > >>> [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > > >>> knlGS:0000000000000000 > > >>> [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > > >>> [ 9569.577586] Call Trace: > > >>> [ 9569.578220] <TASK> > > >>> [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] > > >>> [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] > > >>> [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] > > >>> [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] > > >>> [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] > > >>> [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > > >>> [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] > > >>> [ 9569.586169] nfsd+0xd5/0x190 [nfsd] > > >>> [ 9569.587170] kthread+0xe8/0x110 > > >>> [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 > > >>> [ 9569.588934] ret_from_fork+0x22/0x30 > > >>> [ 9569.589759] </TASK> > > >>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm > > >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse > > >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT > > >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep > > >>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event > > >>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul > > >>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm > > >>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus > > >>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm > > >>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi > > >>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc > > >>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel > > >>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw > > >>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect > > >>> sysimgblt fb_sys_fops vmxnet3 drm libata > > >>> [ 9569.610612] CR2: 0000000000000000 > > >>> [ 9569.611375] ---[ end trace 0000000000000000 ]--- > > >>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] > > >>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d > > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 > > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 > > >>> 48 29 > > >>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 > > >>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 > > >>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 > > >>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 > > >>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 > > >>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 > > >>> [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) > > >>> knlGS:0000000000000000 > > >>> [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 > > >>> [ 9569.632043] Kernel panic - not syncing: Fatal exception > > >>> > > >>> > > >>> > > >>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: > > >>>> > > >>>> Chuck, > > >>>> > > >>>> Are there pre-reqs for this series? I had tried to apply the patches > > >>>> on top of 5-19-rc6 but I get the following compile error: > > >>>> > > >>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: > > >>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of > > >>>> ‘nfsd4_interssc_connect’ from incompatible pointer type > > >>>> [-Werror=incompatible-pointer-types] > > >>>> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); > > >>>> ^~~~~~~~~~~~~ > > >>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but > > >>>> argument is of type ‘struct nl4_server **’ > > >>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, > > >>>> ~~~~~~~~~~~~~~~~~~~^~~ > > >>>> cc1: some warnings being treated as errors > > >>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 > > >>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 > > >>>> make: *** [Makefile:1843: fs] Error 2 > > >>>> > > >>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > >>>>> > > >>>>> While testing NFSD for-next, I noticed svc_generic_init_request() > > >>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the > > >>>>> perf report, it shows that the hot path in there is: > > >>>>> > > >>>>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); > > >>>>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); > > >>>>> > > >>>>> For an NFSv4 COMPOUND, > > >>>>> > > >>>>> procp->pc_argsize = sizeof(nfsd4_compoundargs), > > >>>>> > > >>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is > > >>>>> due to the size of the iops field: > > >>>>> > > >>>>> struct nfsd4_op iops[8]; > > >>>>> > > >>>>> Each struct nfsd4_op contains a union of the arguments for each > > >>>>> NFSv4 operation. Each argument is typically less than 128 bytes > > >>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both > > >>>>> larger than 2KB each. > > >>>>> > > >>>>> I'm not yet totally convinced this series never orphans memory, but > > >>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This > > >>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't > > >>>>> see more low-hanging fruit there, though. > > >>>>> > > >>>>> --- > > >>>>> > > >>>>> Chuck Lever (11): > > >>>>> NFSD: Shrink size of struct nfsd4_copy_notify > > >>>>> NFSD: Shrink size of struct nfsd4_copy > > >>>>> NFSD: Reorder the fields in struct nfsd4_op > > >>>>> NFSD: Make nfs4_put_copy() static > > >>>>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags > > >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) > > >>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) > > >>>>> NFSD: Refactor nfsd4_do_copy() > > >>>>> NFSD: Remove kmalloc from nfsd4_do_async_copy() > > >>>>> NFSD: Add nfsd4_send_cb_offload() > > >>>>> NFSD: Move copy offload callback arguments into a separate structure > > >>>>> > > >>>>> > > >>>>> fs/nfsd/nfs4callback.c | 37 +++++---- > > >>>>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- > > >>>>> fs/nfsd/nfs4xdr.c | 30 +++++--- > > >>>>> fs/nfsd/state.h | 1 - > > >>>>> fs/nfsd/xdr4.h | 54 ++++++++++---- > > >>>>> 5 files changed, 163 insertions(+), 124 deletions(-) > > >>>>> > > >>>>> -- > > >>>>> Chuck Lever > > >>>>> > > >> > > >> -- > > >> Chuck Lever > > >> > > >> > > >> > > > > -- > > Chuck Lever > > > > > >
On 7/27/22 11:48 AM, Olga Kornievskaia wrote: > On Wed, Jul 27, 2022 at 2:21 PM Olga Kornievskaia <aglo@umich.edu> wrote: >> On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote: >>> >>> >>>> On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>> >>>> After applying Dai's patch I got further... I hit the next panic >>>> (below)... before that it ran into a failure for "inter01" failed with >>>> ECOMM. On hte trace, after the COPY is places the server returns >>>> ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just >>>> basically something really wrong happened on the server)... After >>>> failing a new more tests in the similar fashion.. On cleanup the oops >>>> happens. >>> What test should I run to reproduce this? >> I'm running "./nfstest_ssc". It ran thru all with "inter15" being >> last, then started "cleanup" and that's what panic-ed the server. >> >> It's been a while since I tested ssc... so i'll undo all the patched >> and re-run the tests to make sure that before code worked. > It looks like the code got broken before this patch set. The ESTALE in > CB_OFFLOAD leading to ECOM error happens without your patches. And > then the kernel panic. I'll do my best to git bisect where the problem > occurred first. I think this this is what lead to the list_del corruption problem: Jul 27 12:14:23 nfsvmd07 kernel: ================================================================== Jul 27 12:14:23 nfsvmd07 kernel: BUG: KASAN: use-after-free in __list_del_entry_valid+0x16e/0x180 Jul 27 12:14:23 nfsvmd07 kernel: Read of size 8 at addr ffff8881189c8230 by task kworker/u2:1/23 Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 5.19.0-rc7+ #1 Jul 27 12:14:23 nfsvmd07 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Jul 27 12:14:23 nfsvmd07 kernel: Workqueue: nfsd4 laundromat_main [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: Call Trace: Jul 27 12:14:23 nfsvmd07 kernel: <TASK> Jul 27 12:14:23 nfsvmd07 kernel: dump_stack_lvl+0x57/0x7d Jul 27 12:14:23 nfsvmd07 kernel: print_report.cold+0xf8/0x654 Jul 27 12:14:23 nfsvmd07 kernel: ? __list_del_entry_valid+0x16e/0x180 Jul 27 12:14:23 nfsvmd07 kernel: kasan_report+0x8a/0x190 Jul 27 12:14:23 nfsvmd07 kernel: ? pm_suspend.cold+0x4e2/0x4e2 Jul 27 12:14:23 nfsvmd07 kernel: ? __list_del_entry_valid+0x16e/0x180 Jul 27 12:14:23 nfsvmd07 kernel: __list_del_entry_valid+0x16e/0x180 Jul 27 12:14:23 nfsvmd07 kernel: __list_del_entry+0xa/0xb0 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: _free_cpntf_state_locked+0x75/0x170 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: laundromat_main.cold+0x23/0x28 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: ? release_lock_stateid+0x70/0x70 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: ? rcu_read_lock_sched_held+0x81/0xb0 Jul 27 12:14:23 nfsvmd07 kernel: ? rcu_read_lock_bh_held+0x90/0x90 Jul 27 12:14:23 nfsvmd07 kernel: process_one_work+0x7cc/0x1350 Jul 27 12:14:23 nfsvmd07 kernel: ? lockdep_hardirqs_on_prepare+0x410/0x410 Jul 27 12:14:23 nfsvmd07 kernel: ? queue_delayed_work_on+0x90/0x90 Jul 27 12:14:23 nfsvmd07 kernel: ? rwlock_bug.part.0+0x90/0x90 Jul 27 12:14:23 nfsvmd07 kernel: worker_thread+0x55d/0xe80 Jul 27 12:14:23 nfsvmd07 kernel: ? process_one_work+0x1350/0x1350 Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340 Jul 27 12:14:23 nfsvmd07 kernel: ? kthread_complete_and_exit+0x20/0x20 Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30 Jul 27 12:14:23 nfsvmd07 kernel: </TASK> Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: Allocated by task 4051: Jul 27 12:14:23 nfsvmd07 kernel: kasan_save_stack+0x1e/0x40 Jul 27 12:14:23 nfsvmd07 kernel: __kasan_slab_alloc+0x64/0x80 Jul 27 12:14:23 nfsvmd07 kernel: kmem_cache_alloc+0xeb/0x2c0 Jul 27 12:14:23 nfsvmd07 kernel: nfs4_alloc_stid+0x29/0x430 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_lock+0x1e9e/0x3cb0 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_proc_compound+0xd75/0x26c0 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd_dispatch+0x4e8/0xc00 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: svc_process_common+0xb51/0x1af0 [sunrpc] Jul 27 12:14:23 nfsvmd07 kernel: svc_process+0x361/0x4f0 [sunrpc] Jul 27 12:14:23 nfsvmd07 kernel: nfsd+0x2d6/0x570 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340 Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30 Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: Freed by task 4051: Jul 27 12:14:23 nfsvmd07 kernel: kasan_save_stack+0x1e/0x40 Jul 27 12:14:23 nfsvmd07 kernel: kasan_set_track+0x21/0x30 Jul 27 12:14:23 nfsvmd07 kernel: kasan_set_free_info+0x20/0x30 Jul 27 12:14:23 nfsvmd07 kernel: __kasan_slab_free+0xf0/0x160 Jul 27 12:14:23 nfsvmd07 kernel: kmem_cache_free.part.0+0x7f/0x1c0 Jul 27 12:14:23 nfsvmd07 kernel: free_ol_stateid_reaplist+0x12b/0x200 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_close+0x58e/0xe10 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_proc_compound+0xd75/0x26c0 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: nfsd_dispatch+0x4e8/0xc00 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: svc_process_common+0xb51/0x1af0 [sunrpc] Jul 27 12:14:23 nfsvmd07 kernel: svc_process+0x361/0x4f0 [sunrpc] Jul 27 12:14:23 nfsvmd07 kernel: nfsd+0x2d6/0x570 [nfsd] Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340 Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30 Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: The buggy address belongs to the object at ffff8881189c8228#012 which belongs to the cache nfsd4_stateids of size 360 Jul 27 12:14:23 nfsvmd07 kernel: The buggy address is located 8 bytes inside of#012 360-byte region [ffff8881189c8228, ffff8881189c8390) Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: The buggy address belongs to the physical page: Jul 27 12:14:23 nfsvmd07 kernel: page:000000009faa88de refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1189c8 Jul 27 12:14:23 nfsvmd07 kernel: flags: 0x8000000000000200(slab|zone=2) Jul 27 12:14:23 nfsvmd07 kernel: raw: 8000000000000200 ffff8881008a0950 ffffea000399e380 ffff888108fd9d00 Jul 27 12:14:23 nfsvmd07 kernel: raw: 0000000000000000 ffff8881189c8080 0000000100000009 Jul 27 12:14:23 nfsvmd07 kernel: page dumped because: kasan: bad access detected Jul 27 12:14:23 nfsvmd07 kernel: Jul 27 12:14:23 nfsvmd07 kernel: Memory state around the buggy address: Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc Jul 27 12:14:23 nfsvmd07 kernel: >ffff8881189c8200: fc fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb Jul 27 12:14:23 nfsvmd07 kernel: ^ Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jul 27 12:14:23 nfsvmd07 kernel: ================================================================== I think nfs4_free_ol_stateid needs to also removing the nfs4_cpntf_state from the s2s_cp_stateids list, still validating. -Dai > >>>> [ 842.455939] list_del corruption. prev->next should be >>>> ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510) >>>> [ 842.460118] ------------[ cut here ]------------ >>>> [ 842.461599] kernel BUG at lib/list_debug.c:53! >>>> [ 842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI >>>> [ 842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70 >>>> [ 842.466656] Hardware name: VMware, Inc. VMware Virtual >>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >>>> [ 842.470309] Workqueue: nfsd4 laundromat_main [nfsd] >>>> [ 842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a >>>> [ 842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 >>>> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd >>>> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 >>>> 89 ee >>>> [ 842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 >>>> [ 842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 >>>> [ 842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff >>>> [ 842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff >>>> [ 842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 >>>> [ 842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 >>>> [ 842.494406] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) >>>> knlGS:0000000000000000 >>>> [ 842.496939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 >>>> [ 842.500957] Call Trace: >>>> [ 842.501740] <TASK> >>>> [ 842.502479] _free_cpntf_state_locked+0x36/0x90 [nfsd] >>>> [ 842.504157] laundromat_main+0x59e/0x8b0 [nfsd] >>>> [ 842.505594] ? finish_task_switch+0xbd/0x2a0 >>>> [ 842.507247] process_one_work+0x1c8/0x390 >>>> [ 842.508538] worker_thread+0x30/0x360 >>>> [ 842.509670] ? process_one_work+0x390/0x390 >>>> [ 842.510957] kthread+0xe8/0x110 >>>> [ 842.511938] ? kthread_complete_and_exit+0x20/0x20 >>>> [ 842.513422] ret_from_fork+0x22/0x30 >>>> [ 842.514533] </TASK> >>>> [ 842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm >>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse >>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT >>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep >>>> vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi >>>> snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul >>>> vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl >>>> btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371 >>>> videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq >>>> videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi >>>> ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4 >>>> auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic >>>> nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64 >>>> vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea >>>> sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata >>>> [ 842.541753] ---[ end trace 0000000000000000 ]--- >>>> [ 842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a >>>> [ 842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4 >>>> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd >>>> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48 >>>> 89 ee >>>> [ 842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246 >>>> [ 842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002 >>>> [ 842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff >>>> [ 842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff >>>> [ 842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50 >>>> [ 842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198 >>>> [ 842.564300] FS: 0000000000000000(0000) GS:ffff9aaafbe40000(0000) >>>> knlGS:0000000000000000 >>>> [ 842.567357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0 >>>> [ 842.571598] Kernel panic - not syncing: Fatal exception >>>> [ 842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000 >>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>> [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]--- >>>> >>>> On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote: >>>>> >>>>> >>>>>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>> >>>>>> Hi Chuck, >>>>> Sorry for the delay, I was traveling. >>>>> >>>>>> To make it compile I did: >>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>>>>> index 7196bcafdd86..f6deffc921d0 100644 >>>>>> --- a/fs/nfsd/nfs4proc.c >>>>>> +++ b/fs/nfsd/nfs4proc.c >>>>>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, >>>>>> if (status) >>>>>> goto out; >>>>>> >>>>>> - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >>>>>> + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); >>>>>> if (status) >>>>>> goto out; >>>>> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC, >>>>> as I hadn't fully tested it. Sorry for mislabeling it. >>>>> >>>>> I will post a v2 of this series with this fixed and with Dai's >>>>> fix for nfsd4_decode_copy(). Stand by. >>>>> >>>>> >>>>>> But when I tried to run the nfstest_ssc. The first test (intra01) made >>>>>> the server oops: >>>>>> >>>>>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73 >>>>>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual >>>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >>>>>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] >>>>>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d >>>>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 >>>>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 >>>>>> 48 29 >>>>>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 >>>>>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 >>>>>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 >>>>>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 >>>>>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 >>>>>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 >>>>>> [ 9569.572052] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 9569.573926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 >>>>>> [ 9569.577586] Call Trace: >>>>>> [ 9569.578220] <TASK> >>>>>> [ 9569.578770] ? nfsd4_proc_compound+0x3d2/0x730 [nfsd] >>>>>> [ 9569.579945] nfsd4_proc_compound+0x3d2/0x730 [nfsd] >>>>>> [ 9569.581055] nfsd_dispatch+0x146/0x270 [nfsd] >>>>>> [ 9569.581987] svc_process_common+0x365/0x5c0 [sunrpc] >>>>>> [ 9569.583122] ? nfsd_svc+0x350/0x350 [nfsd] >>>>>> [ 9569.583986] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] >>>>>> [ 9569.585129] svc_process+0xb7/0xf0 [sunrpc] >>>>>> [ 9569.586169] nfsd+0xd5/0x190 [nfsd] >>>>>> [ 9569.587170] kthread+0xe8/0x110 >>>>>> [ 9569.587898] ? kthread_complete_and_exit+0x20/0x20 >>>>>> [ 9569.588934] ret_from_fork+0x22/0x30 >>>>>> [ 9569.589759] </TASK> >>>>>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm >>>>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse >>>>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT >>>>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep >>>>>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event >>>>>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul >>>>>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm >>>>>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus >>>>>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm >>>>>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi >>>>>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc >>>>>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel >>>>>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw >>>>>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect >>>>>> sysimgblt fb_sys_fops vmxnet3 drm libata >>>>>> [ 9569.610612] CR2: 0000000000000000 >>>>>> [ 9569.611375] ---[ end trace 0000000000000000 ]--- >>>>>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd] >>>>>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d >>>>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20 >>>>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00 >>>>>> 48 29 >>>>>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282 >>>>>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000 >>>>>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008 >>>>>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228 >>>>>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00 >>>>>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000 >>>>>> [ 9569.627456] FS: 0000000000000000(0000) GS:ffff99b5bbe00000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 9569.629249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0 >>>>>> [ 9569.632043] Kernel panic - not syncing: Fatal exception >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> Chuck, >>>>>>> >>>>>>> Are there pre-reqs for this series? I had tried to apply the patches >>>>>>> on top of 5-19-rc6 but I get the following compile error: >>>>>>> >>>>>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’: >>>>>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of >>>>>>> ‘nfsd4_interssc_connect’ from incompatible pointer type >>>>>>> [-Werror=incompatible-pointer-types] >>>>>>> status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); >>>>>>> ^~~~~~~~~~~~~ >>>>>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but >>>>>>> argument is of type ‘struct nl4_server **’ >>>>>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, >>>>>>> ~~~~~~~~~~~~~~~~~~~^~~ >>>>>>> cc1: some warnings being treated as errors >>>>>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1 >>>>>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2 >>>>>>> make: *** [Makefile:1843: fs] Error 2 >>>>>>> >>>>>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>>>> While testing NFSD for-next, I noticed svc_generic_init_request() >>>>>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the >>>>>>>> perf report, it shows that the hot path in there is: >>>>>>>> >>>>>>>> 1208 memset(rqstp->rq_argp, 0, procp->pc_argsize); >>>>>>>> 1209 memset(rqstp->rq_resp, 0, procp->pc_ressize); >>>>>>>> >>>>>>>> For an NFSv4 COMPOUND, >>>>>>>> >>>>>>>> procp->pc_argsize = sizeof(nfsd4_compoundargs), >>>>>>>> >>>>>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is >>>>>>>> due to the size of the iops field: >>>>>>>> >>>>>>>> struct nfsd4_op iops[8]; >>>>>>>> >>>>>>>> Each struct nfsd4_op contains a union of the arguments for each >>>>>>>> NFSv4 operation. Each argument is typically less than 128 bytes >>>>>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both >>>>>>>> larger than 2KB each. >>>>>>>> >>>>>>>> I'm not yet totally convinced this series never orphans memory, but >>>>>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This >>>>>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't >>>>>>>> see more low-hanging fruit there, though. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> Chuck Lever (11): >>>>>>>> NFSD: Shrink size of struct nfsd4_copy_notify >>>>>>>> NFSD: Shrink size of struct nfsd4_copy >>>>>>>> NFSD: Reorder the fields in struct nfsd4_op >>>>>>>> NFSD: Make nfs4_put_copy() static >>>>>>>> NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags >>>>>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) >>>>>>>> NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) >>>>>>>> NFSD: Refactor nfsd4_do_copy() >>>>>>>> NFSD: Remove kmalloc from nfsd4_do_async_copy() >>>>>>>> NFSD: Add nfsd4_send_cb_offload() >>>>>>>> NFSD: Move copy offload callback arguments into a separate structure >>>>>>>> >>>>>>>> >>>>>>>> fs/nfsd/nfs4callback.c | 37 +++++---- >>>>>>>> fs/nfsd/nfs4proc.c | 165 +++++++++++++++++++++-------------------- >>>>>>>> fs/nfsd/nfs4xdr.c | 30 +++++--- >>>>>>>> fs/nfsd/state.h | 1 - >>>>>>>> fs/nfsd/xdr4.h | 54 ++++++++++---- >>>>>>>> 5 files changed, 163 insertions(+), 124 deletions(-) >>>>>>>> >>>>>>>> -- >>>>>>>> Chuck Lever >>>>>>>> >>>>> -- >>>>> Chuck Lever >>>>> >>>>> >>>>> >>> -- >>> Chuck Lever >>> >>> >>>