diff mbox series

NFSv4: fix possible NULL-pointer dereference in nfs42_complete_copies after state recovery failed

Message ID 20240926081611.148894-1-zhangyanjun@cestc.cn (mailing list archive)
State New
Headers show
Series NFSv4: fix possible NULL-pointer dereference in nfs42_complete_copies after state recovery failed | expand

Commit Message

zhangyanjun@cestc.cn Sept. 26, 2024, 8:16 a.m. UTC
From: Yanjun Zhang <zhangyanjun@cestc.cn>

On the node of an nfs client, some files saved in the mountpoint of the nfs server was coping within
the same nfs server. Accidentally, the nfs42_complete_copies get a NULL-pointer dereference crash, as
can be seen in following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

With the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added with
the field copies in the handle_async_copy, and we found that waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

We found that the NULL-pointer dereference was due to nfs42_complete_copies list the nfs_server->ss_copies
by the filed ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 is offset by 0x10
and the data accessed through this pointer is also incorrect. Generally, the nfs4_state_owner->so_states
list is ordered and open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state.
But when destination state was reclaimed failed with NFS_STATE_RECOVERY_FAILED and copies were not deleted
in nfs_server->ss_copies, the source state may be taken into nfs42_complete_copies earlier resulting in this
situation. To solve this, we newly add a list_head nfs_server->ds_copies for a server-to-server copy specially.

Fixes: b1a0913adbef ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn>
---
 fs/nfs/nfs42proc.c        | 2 +-
 fs/nfs/nfs4state.c        | 2 +-
 include/linux/nfs_fs_sb.h | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

Comments

Trond Myklebust Sept. 26, 2024, 8:23 p.m. UTC | #1
Hi Yanjun,

On Thu, 2024-09-26 at 16:16 +0800, zhangyanjun@cestc.cn wrote:
> [You don't often get email from zhangyanjun@cestc.cn. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> From: Yanjun Zhang <zhangyanjun@cestc.cn>
> 
> On the node of an nfs client, some files saved in the mountpoint of
> the nfs server was coping within
> the same nfs server. Accidentally, the nfs42_complete_copies get a
> NULL-pointer dereference crash, as
> can be seen in following syslog:
> 
> [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-
> 12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
> [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-
> 12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
> [232066.588183] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000058
> [232066.588586] Mem abort info:
> [232066.588701]   ESR = 0x0000000096000007
> [232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
> [232066.589084]   SET = 0, FnV = 0
> [232066.589216]   EA = 0, S1PTW = 0
> [232066.589340]   FSC = 0x07: level 3 translation fault
> [232066.589559] Data abort info:
> [232066.589683]   ISV = 0, ISS = 0x00000007
> [232066.589842]   CM = 0, WnR = 0
> [232066.589967] user pgtable: 64k pages, 48-bit VAs,
> pgdp=00002000956ff400
> [232066.590231] [0000000000000058] pgd=08001100ae100003,
> p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003,
> pte=0000000000000000
> [232066.590757] Internal error: Oops: 96000007 [#1] SMP
> [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs
> ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun
> ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net
> xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard
> libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink
> ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port
> ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs
> iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre
> ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel
> udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time
> dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark
> xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink
> ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp
> libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp
> bonding tls rfkill sunrpc ext4 mbcache jbd2
> [232066.591052]  vfat fat cas_cache cas_disk ses enclosure
> scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
> ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio
> dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast
> drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce
> sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64
> sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm
> nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas
> aes_neon_bs
> [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump:
> loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
> [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260
> V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
> [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
> [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
> [232066.598595] sp : ffff8000f568fc70
> [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27:
> ffff21003db33000
> [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24:
> 0000000000000001
> [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21:
> ffff21003db33050
> [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18:
> 0000000000000000
> [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15:
> 0000000000000000
> [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12:
> 00000000eac0c6e6
> [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 :
> ffff8000054e5828
> [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 :
> 000000000a9eb14a
> [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 :
> 0000000000000058
> [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 :
> 0000000000000000
> [232066.601636] Call trace:
> [232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
> [232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
> [232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
> [232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
> [232066.602690]  kthread+0x110/0x114
> [232066.602830]  ret_from_fork+0x10/0x20
> [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
> [232066.603284] SMP: stopping secondary CPUs
> [232066.606936] Starting crashdump kernel...
> [232066.607146] Bye!
> 
> With the vmcore, we know that nfs4_copy_state listed by destination
> nfs_server->ss_copies was added with
> the field copies in the handle_async_copy, and we found that waiting
> copy process with the stack as:
> PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
>  #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
>  #1 [ffff8001116ef760] __schedule at ffff800008dd0650
>  #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
>  #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
>  #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
>  #5 [ffff8001116ef8e0] wait_for_completion_interruptible at
> ffff800008dd1898
>  #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
>  #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
>  #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
>  #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at
> ffff8000054ed694 [nfsv4]
> 
> We found that the NULL-pointer dereference was due to
> nfs42_complete_copies list the nfs_server->ss_copies
> by the filed ss_copies of nfs4_copy_state. So the nfs4_copy_state
> address ffff0100f98fa3f0 is offset by 0x10
> and the data accessed through this pointer is also incorrect.
> Generally, the nfs4_state_owner->so_states
> list is ordered and open(O_RDWR) or open(O_WRITE) states are
> reclaimed firstly by nfs4_reclaim_open_state.
> But when destination state was reclaimed failed with
> NFS_STATE_RECOVERY_FAILED and copies were not deleted
> in nfs_server->ss_copies, the source state may be taken into
> nfs42_complete_copies earlier resulting in this
> situation. To solve this, we newly add a list_head nfs_server-
> >ds_copies for a server-to-server copy specially.
> 
> Fixes: b1a0913adbef ("NFS: handle source server reboot")
> Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn>
> ---
>  fs/nfs/nfs42proc.c        | 2 +-
>  fs/nfs/nfs4state.c        | 2 +-
>  include/linux/nfs_fs_sb.h | 1 +
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 28704f924..5a55a1b20 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -218,7 +218,7 @@ static int handle_async_copy(struct
> nfs42_copy_res *res,
> 
>         if (dst_server != src_server) {
>                 spin_lock(&src_server->nfs_client->cl_lock);
> -               list_add_tail(&copy->src_copies, &src_server-
> >ss_copies);
> +               list_add_tail(&copy->src_copies, &src_server-
> >ds_copies);
>                 spin_unlock(&src_server->nfs_client->cl_lock);
>         }
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 877f682b4..7b4fbba84 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1596,7 +1596,7 @@ static void nfs42_complete_copies(struct
> nfs4_state_owner *sp, struct nfs4_state
>                         complete(&copy->completion);
>                 }
>         }
> -       list_for_each_entry(copy, &sp->so_server->ss_copies,
> src_copies) {
> +       list_for_each_entry(copy, &sp->so_server->ds_copies,
> src_copies) {
>                 if ((test_bit(NFS_CLNT_SRC_SSC_COPY_STATE, &state-
> >flags) &&
>                                 !nfs4_stateid_match_other(&state-
> >stateid,
>                                 &copy->parent_src_state->stateid)))
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 1df86ab98..f07448c85 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -240,6 +240,7 @@ struct nfs_server {
>         struct list_head        layouts;
>         struct list_head        delegations;
>         struct list_head        ss_copies;
> +       struct list_head        ds_copies;

This new list appears to be missing an initialiser in
nfs_alloc_server().
Also, can we please change the name to "ss_src_copies" just for
clarity? When I saw "ds" my initial reaction was that it might have
something to do with the destination.

> 
>         unsigned long           delegation_gen;
>         unsigned long           mig_gen;
> --
> 2.31.1

Otherwise, this looks good. Thanks for finding it!
diff mbox series

Patch

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 28704f924..5a55a1b20 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -218,7 +218,7 @@  static int handle_async_copy(struct nfs42_copy_res *res,
 
 	if (dst_server != src_server) {
 		spin_lock(&src_server->nfs_client->cl_lock);
-		list_add_tail(&copy->src_copies, &src_server->ss_copies);
+		list_add_tail(&copy->src_copies, &src_server->ds_copies);
 		spin_unlock(&src_server->nfs_client->cl_lock);
 	}
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b4..7b4fbba84 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1596,7 +1596,7 @@  static void nfs42_complete_copies(struct nfs4_state_owner *sp, struct nfs4_state
 			complete(&copy->completion);
 		}
 	}
-	list_for_each_entry(copy, &sp->so_server->ss_copies, src_copies) {
+	list_for_each_entry(copy, &sp->so_server->ds_copies, src_copies) {
 		if ((test_bit(NFS_CLNT_SRC_SSC_COPY_STATE, &state->flags) &&
 				!nfs4_stateid_match_other(&state->stateid,
 				&copy->parent_src_state->stateid)))
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 1df86ab98..f07448c85 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -240,6 +240,7 @@  struct nfs_server {
 	struct list_head	layouts;
 	struct list_head	delegations;
 	struct list_head	ss_copies;
+	struct list_head	ds_copies;
 
 	unsigned long		delegation_gen;
 	unsigned long		mig_gen;