Message ID | 20220730011411.11214-2-heming.zhao@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | re-enable non-clustered mount & add MMP support | expand |
On 7/30/22 9:14 AM, Heming Zhao wrote: > On local mount mode, there is no dlm resource initalized. If > ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling > flow will call ocfs2_dlm_shutdown(), then does dlm resource > cleanup job, which will trigger kernel crash. > > Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before > return error") Should be put at the same line. > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > fs/ocfs2/dlmglue.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c > index 801e60bab955..1438ac14940b 100644 > --- a/fs/ocfs2/dlmglue.c > +++ b/fs/ocfs2/dlmglue.c > @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb) > void ocfs2_dlm_shutdown(struct ocfs2_super *osb, > int hangup_pending) > { > + if (ocfs2_mount_local(osb)) > + return; > + IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking added by ocfs2_xxx_lock_res_init(). Before commit 0737e01de9c4, it seems this issue also exists since osb->cconn is already set under local mount mode. Thanks, Joseph
On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote: > > > On 7/30/22 9:14 AM, Heming Zhao wrote: > > On local mount mode, there is no dlm resource initalized. If > > ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling > > flow will call ocfs2_dlm_shutdown(), then does dlm resource > > cleanup job, which will trigger kernel crash. > > > > Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before > > return error") > > Should be put at the same line. OK > > > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > > --- > > fs/ocfs2/dlmglue.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c > > index 801e60bab955..1438ac14940b 100644 > > --- a/fs/ocfs2/dlmglue.c > > +++ b/fs/ocfs2/dlmglue.c > > @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb) > > void ocfs2_dlm_shutdown(struct ocfs2_super *osb, > > int hangup_pending) > > { > > + if (ocfs2_mount_local(osb)) > > + return; > > + > > IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as > ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking > added by ocfs2_xxx_lock_res_init(). > ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init. This patch fixed crash in local mount error flow. In local mount mode, ocfs2_dlm_init does nothing, which should make ocfs2_dlm_shutdown do nothing. And I checked all calling ocfs2_dlm_shutdown cases: 1. mount flow: ocfs2_fill_super + xxx =fails=> label:out_super (checked, work fine) | + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine) | | | + xxx =fails=> cleanup everything before returning | + xxx =fails=> label:out_dismout At this time, dlm has been init successfully, we can call all lines of ocfs2_dlm_shutdown. 2. ocfs2_dismount_volume => 'osb->cconn' is true. this MUST be dlm successfully init case. everything looks fine. In previous mail/patch: [PATCH] test error handling during mount stage, I may forget to test local mount mode. So this crash didn't be triggered. > Before commit 0737e01de9c4, it seems this issue also exists since > osb->cconn is already set under local mount mode. Yes. The bug exists since local mount feature was introduced, commit number: c271c5c22b0a7ca45fda15f1f4d258bca36a5b94. I will change 'Fixes' on next version. (Hope 'CC' list takes effect for this mail. -_-# ) Thanks, Heming
On 8/8/22 8:09 PM, Heming Zhao wrote: > On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote: >> >> >> On 7/30/22 9:14 AM, Heming Zhao wrote: >>> On local mount mode, there is no dlm resource initalized. If >>> ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling >>> flow will call ocfs2_dlm_shutdown(), then does dlm resource >>> cleanup job, which will trigger kernel crash. >>> >>> Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before >>> return error") >> >> Should be put at the same line. > OK > >> >>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>> --- >>> fs/ocfs2/dlmglue.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c >>> index 801e60bab955..1438ac14940b 100644 >>> --- a/fs/ocfs2/dlmglue.c >>> +++ b/fs/ocfs2/dlmglue.c >>> @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb) >>> void ocfs2_dlm_shutdown(struct ocfs2_super *osb, >>> int hangup_pending) >>> { >>> + if (ocfs2_mount_local(osb)) >>> + return; >>> + >> >> IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as >> ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking >> added by ocfs2_xxx_lock_res_init(). >> > > ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init. > This patch fixed crash in local mount error flow. > In local mount mode, ocfs2_dlm_init does nothing, which should make > ocfs2_dlm_shutdown do nothing. > Not exactly, even in local mount we have to initialize lockres like super lock. > And I checked all calling ocfs2_dlm_shutdown cases: > 1. mount flow: > > ocfs2_fill_super > + xxx =fails=> label:out_super (checked, work fine) > | > + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine) > | | > | + xxx =fails=> cleanup everything before returning > | > + xxx =fails=> label:out_dismout > At this time, dlm has been init successfully, we can call all lines > of ocfs2_dlm_shutdown. > > > 2. ocfs2_dismount_volume => 'osb->cconn' is true. > this MUST be dlm successfully init case. everything looks fine. > > In previous mail/patch: [PATCH] test error handling during mount stage, I may > forget to test local mount mode. So this crash didn't be triggered. > >> Before commit 0737e01de9c4, it seems this issue also exists since >> osb->cconn is already set under local mount mode. > > Yes. The bug exists since local mount feature was introduced, commit number: > c271c5c22b0a7ca45fda15f1f4d258bca36a5b94. I will change 'Fixes' on next version. > This patch can be a separate fix and we don't have to bind with nocluster mount feature, even though you've triggered it by related local mount. Thanks, Joseph > (Hope 'CC' list takes effect for this mail. -_-# ) > > Thanks, > Heming
On 8/10/22 09:31, Joseph Qi wrote: > > > On 8/8/22 8:09 PM, Heming Zhao wrote: >> On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote: >>> >>> >>> On 7/30/22 9:14 AM, Heming Zhao wrote: >>>> On local mount mode, there is no dlm resource initalized. If >>>> ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling >>>> flow will call ocfs2_dlm_shutdown(), then does dlm resource >>>> cleanup job, which will trigger kernel crash. >>>> >>>> Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before >>>> return error") >>> >>> Should be put at the same line. >> OK >> >>> >>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >>>> --- >>>> fs/ocfs2/dlmglue.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c >>>> index 801e60bab955..1438ac14940b 100644 >>>> --- a/fs/ocfs2/dlmglue.c >>>> +++ b/fs/ocfs2/dlmglue.c >>>> @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb) >>>> void ocfs2_dlm_shutdown(struct ocfs2_super *osb, >>>> int hangup_pending) >>>> { >>>> + if (ocfs2_mount_local(osb)) >>>> + return; >>>> + >>> >>> IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as >>> ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking >>> added by ocfs2_xxx_lock_res_init(). >>> >> >> ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init. >> This patch fixed crash in local mount error flow. >> In local mount mode, ocfs2_dlm_init does nothing, which should make >> ocfs2_dlm_shutdown do nothing. >> > Not exactly, even in local mount we have to initialize lockres like > super lock. Thank you for your patient explaination, I understood your meaning, will change it in next version. I am very busy these days, it makes my brain stop working. > > >> And I checked all calling ocfs2_dlm_shutdown cases: >> 1. mount flow: >> >> ocfs2_fill_super >> + xxx =fails=> label:out_super (checked, work fine) >> | >> + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine) >> | | >> | + xxx =fails=> cleanup everything before returning >> | >> + xxx =fails=> label:out_dismout >> At this time, dlm has been init successfully, we can call all lines >> of ocfs2_dlm_shutdown. >> >> >> 2. ocfs2_dismount_volume => 'osb->cconn' is true. >> this MUST be dlm successfully init case. everything looks fine. >> >> In previous mail/patch: [PATCH] test error handling during mount stage, I may >> forget to test local mount mode. So this crash didn't be triggered. >> >>> Before commit 0737e01de9c4, it seems this issue also exists since >>> osb->cconn is already set under local mount mode. >> >> Yes. The bug exists since local mount feature was introduced, commit number: >> c271c5c22b0a7ca45fda15f1f4d258bca36a5b94. I will change 'Fixes' on next version. >> > This patch can be a separate fix and we don't have to bind with > nocluster mount feature, even though you've triggered it by related > local mount. OK. > > Thanks, > Joseph > >> (Hope 'CC' list takes effect for this mail. -_-# ) >> >> Thanks, >> Heming Thanks, Heming
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 801e60bab955..1438ac14940b 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb) void ocfs2_dlm_shutdown(struct ocfs2_super *osb, int hangup_pending) { + if (ocfs2_mount_local(osb)) + return; + ocfs2_drop_osb_locks(osb); /*
On local mount mode, there is no dlm resource initalized. If ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling flow will call ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will trigger kernel crash. Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error") Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- fs/ocfs2/dlmglue.c | 3 +++ 1 file changed, 3 insertions(+)