mbox series

[0/4] re-enable non-clustered mount & add MMP support

Message ID 20220730011411.11214-1-heming.zhao@suse.com (mailing list archive)
Headers show
Series re-enable non-clustered mount & add MMP support | expand

Message

heming.zhao@suse.com July 30, 2022, 1:14 a.m. UTC
This serial patches re-enable ocfs2 non-clustered mount feature.

the previous patch c80af0c250c8 (Revert "ocfs2: mount shared volume
without ha stack") revert Gang's non-clustered mount patch. This
serial patches re-enable ocfs2 non-clustered mount.

the key different between local mount and non-clustered mount: 
local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do
convert job without ha stack. non-clustered mount feature can run
totally without ha stack.

For avoiding data corruption when non-clustered & clustered mount are
happening at same time, this serial patches also introduces MMP
feature. MMP (Multiple Mount Protection) idea got from ext4 MMP
(fs/ext4/mmp.c) which protects fs from being mounted more than once.
For ocfs2 is a clustered fs and also for compatible with existing
slotmap feature, I did some optimization and modification when
porting from ext4 MMP to ocfs2.

The related userspace code for supporting MMP had been sent to github
for reviewing:
- https://github.com/markfasheh/ocfs2-tools/pull/58

ocfs2-tools enable MMP and check status:

```
# enable MMP
tunefs.ocfs2 --fs-feature=mmp /dev/vdb

# check the command result
tunefs.ocfs2 -Q "%H\n" /dev/vdb | grep MMP

# active MMP on nocluster mount
mount -t ocfs2 -o nocluster /dev/vdb /mnt

# check slotmap info
# echo slotmap | PAGER=cat debugfs.ocfs2 /dev/vdb
```

=== below are test cases for patches ====

<1> non-clustered mount vs local mount

1.1 tunefs.ocfs2 can't convert local/nolocal mount without ha stack.

```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=local /dev/vdb  (<== success)
tunefs.ocfs2 --fs-features=nolocal /dev/vdb  (<== success)
(on another node without ha stack)
tunefs.ocfs2 --fs-features=local /dev/vdb  (<== failure)
```

1.2 non-cluster feature can run without ha stack.
```
(on ha stack env)
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
(on another node without ha stack)
mount -t ocfs2 -o nocluster /dev/vdb /mnt  (<== success)
```


<2> do clustered & non-clustered mount on same node

2.1  non-clustered mount => clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
mount -t ocfs2 /dev/vdb /mnt               (<=== failure)
```

2.2 clustered mount => non-clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt  (<=== failure)
```

<3> one node does clustered mount, another does non-clustered mount

test rule: clustered mount and non-clustered mount can not exist at same
time.

3.1 clustered mount @node1 => [no]clustered mount @node2

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```

3.2 enable mmp, repeate 3.1 case

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb   (<== enable mmp)
mount -t ocfs2 /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== wait ~22s [*] for mmp,
then success)
umount /mnt
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure)
```

[*] 22s:
(OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) * 2 times (calling
schedule_timeout_interruptible)

3.3 noclustered mount @node1 => [no]clustered  mount @node2

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success, without mmp
enable)
umount /mnt               (<== will ZERO out slotmap area while node1
still mounting)
```

3.4 enable mmp, repeate 3.3 case.

```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb   (<== enable mmp)
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 /dev/vdb /mnt              (<== failure)
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== failure, denied by mmp)
```

<4> simulate mounting after machine crash

info:
- below all steps do on one node
- address 287387648 is the '//slot_map' extent address.
- test the rule: If last mount didn't do unmount, (eg: crash), the next
  mount MUST be same mount type.

4.0 how to calculate '//slot_map' extent address

```
# PAGER=cat debugfs.ocfs2 -R "stats" /dev/vdb | grep "Block Size Bits"
        Block Size Bits: 12   Cluster Size Bits: 12

# PAGER=cat debugfs.ocfs2 -R "stat //slot_map" /dev/vdb | grep -A1
# "Block#"
        ## Offset        Clusters       Block#          Flags
        0  0             1              70163           0x0
```

70163 * (1<<12) = 70163 * 4096 = 287387648


4.1 clustered mount => crash => non-clustered mount fails => clean
slotmap => non-clustered mount succeeds

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.cluster.mnted  (<== backup slot info)
umount /mnt
dd if=/root/slotmap.cluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32 (<== overwrite)

mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== failure
mount -t ocfs2 /dev/vdb /mnt && umount /mnt <== clean slot 0
mount -t ocfs2 -o nocluster /dev/vdb /mnt   <== success
```

4.2  non-clustered mount => crash => clustered mount fails => clean
slotmap => clustered mount succeeds

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt
dd if=/dev/vdb bs=1 count=32 skip=287387648
of=/root/slotmap.nocluster.mnted
umount /mnt
dd if=/root/slotmap.nocluster.mnted of=/dev/vdb seek=287387648 bs=1
count=32

mount -t ocfs2 /dev/vdb /mnt   <== failure
mount -t ocfs2 -o nocluster /dev/vdb /mnt && umount /mnt <== clean slot
0
mount -t ocfs2 /dev/vdb /mnt   <== success
```

<5> MMP test

5.1 node1 noclustered mount => node 2 noclustered mount

disable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== success)
```

enable mmp
```
node1:
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb
mount -t ocfs2 -o nocluster /dev/vdb /mnt

node2:
mount -t ocfs2 -o nocluster /dev/vdb /mnt (<== wait ~12s[*], failure by
mmp)
```

[*] 12s:
sleep (OCFS2_MMP_MIN_CHECK_INTERVAL * 2 + 1) then detect mmp_seq was
changed, then failed.

5.2 node1 clustered mount => node 2 clustered mount

see case 3.2

5.3 node1 noclustered mount => node 2 noclustered mount

see case 3.4

5.4 remount test

5.4.1 non-clustered mount (run commands on same node)

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb

mount -t ocfs2 -o nocluster /dev/vdb /mnt
ps axj | grep kmmpd                            (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ')

mount -o remount,ro,nocluster /dev/vdb /mnt    (<== kmmpd will stop)
ps axj | grep kmmpd  (<== won't show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ_CLEAN')

mount -o remount,rw,nocluster /dev/vdb /mnt    (<== kmmpd will start)
ps axj | grep kmmpd  (<== will show kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_MMP_SEQ')
```

5.4.2 clustered mount

```
mkfs.ocfs2 --cluster-stack=pcmk --cluster-name=hacluster -N 4 /dev/vdb
tunefs.ocfs2 --fs-features=mmp /dev/vdb

mount -t ocfs2 /dev/vdb /mnt                   (<== clustered mount
won't create kmmpd)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')

mount -o remount,ro /dev/vdb /mnt
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')

mount -o remount,rw /dev/vdb /mnt              (<== wait for ~22s by mmp
start)
PAGER=cat debugfs.ocfs2 -R "slotmap" /dev/vdb  (<== show
'OCFS2_VALID_CLUSTER')
```

Heming Zhao (4):
  ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown
  ocfs2: add mlog ML_WARNING support
  re-enable "ocfs2: mount shared volume without ha stack"
  ocfs2: introduce ext4 MMP feature

 fs/ocfs2/cluster/masklog.c |   3 +
 fs/ocfs2/cluster/masklog.h |   9 +-
 fs/ocfs2/dlmglue.c         |   3 +
 fs/ocfs2/ocfs2.h           |   6 +-
 fs/ocfs2/ocfs2_fs.h        |  13 +-
 fs/ocfs2/slot_map.c        | 479 +++++++++++++++++++++++++++++++++++--
 fs/ocfs2/slot_map.h        |   3 +
 fs/ocfs2/super.c           |  42 +++-
 8 files changed, 527 insertions(+), 31 deletions(-)